docs: initialize project

This commit is contained in:
admin
2026-02-24 10:49:52 -05:00
parent e6e1b1fa6f
commit 972760b957

76
.planning/PROJECT.md Normal file
View File

@@ -0,0 +1,76 @@
# CIM Summary — Analytics & Monitoring
## What This Is
An analytics dashboard and service health monitoring system for the existing CIM Summary application. Provides document processing metrics, user activity tracking, real-time service health detection, scheduled health probes, and email + in-app alerting when APIs or credentials need attention.
## Core Value
When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
## Requirements
### Validated
- ✓ Document upload and processing pipeline — existing
- ✓ Multi-provider LLM integration (Anthropic, OpenAI, OpenRouter) — existing
- ✓ Google Document AI text extraction — existing
- ✓ Supabase PostgreSQL with pgvector for storage and search — existing
- ✓ Firebase Authentication — existing
- ✓ Google Cloud Storage for file management — existing
- ✓ Background job queue with retry logic — existing
- ✓ Structured logging with Winston and correlation IDs — existing
- ✓ Basic health endpoints (`/health`, `/health/config`, `/monitoring/dashboard`) — existing
- ✓ PDF generation and export — existing
### Active
- [ ] In-app admin analytics dashboard (processing metrics + user activity)
- [ ] Service health monitoring for Google Document AI, Claude/OpenAI, Supabase, Firebase Auth
- [ ] Real-time auth failure detection with actionable alerts
- [ ] Scheduled periodic health probes for all 4 services
- [ ] Email alerting for critical service issues
- [ ] In-app alert notifications for admin
- [ ] 30-day rolling data retention for analytics
### Out of Scope
- External monitoring tools (Grafana, Datadog) — keeping it in-app for simplicity
- Non-admin user analytics views — admin-only for now
- Mobile push notifications — email + in-app sufficient
- Historical analytics beyond 30 days — lean storage, can extend later
- Real-time WebSocket updates — polling is sufficient for admin dashboard
## Context
The CIM Summary application already has basic health endpoints and structured logging with correlation IDs. The existing `/monitoring/dashboard` endpoint provides some system metrics. The `performance_metrics` table in Supabase already exists for storing system performance data. Winston logging captures errors with context, but there's no alerting mechanism — errors are logged but nobody gets notified.
The admin user is jpressnell@bluepointcapital.com. This is a single-admin system for now.
Four external services need monitoring:
1. **Google Document AI** — uses service account credentials, can expire or lose permissions
2. **Claude/OpenAI** — API keys can be revoked, rate limited, or run out of credits
3. **Supabase** — connection pool issues, service key rotation, pgvector availability
4. **Firebase Auth** — project config changes, token verification failures
## Constraints
- **Tech stack**: Must integrate with existing Express.js backend and React frontend
- **Auth**: Admin-only access, use existing Firebase Auth with role check for jpressnell@bluepointcapital.com
- **Storage**: Use existing Supabase PostgreSQL — no new database infrastructure
- **Email**: Need an email sending service (SendGrid, Resend, or similar) for alerts
- **Deployment**: Must work within Firebase Cloud Functions 14-minute timeout
- **Data retention**: 30-day rolling window to keep storage costs low
## Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| In-app dashboard over external tools | Simpler setup, no additional infrastructure, admin can see everything in one place | — Pending |
| Email + in-app dual alerting | Redundancy for critical issues — in-app for when you're already looking, email for when you're not | — Pending |
| 30-day retention | Balances useful trend data with storage efficiency | — Pending |
| Single admin (jpressnell@bluepointcapital.com) | Simple RBAC for now, can extend later | — Pending |
| Real-time detection + scheduled probes | Catches failures as they happen AND proactively tests services before users hit them | — Pending |
---
*Last updated: 2026-02-24 after initialization*