docs: initialize project

2026-02-24 10:49:52 -05:00
parent e6e1b1fa6f
commit 972760b957
1 changed files with 76 additions and 0 deletions
--- a/.planning/PROJECT.md
+++ b/.planning/PROJECT.md
@@ -0,0 +1,76 @@
+# CIM Summary — Analytics & Monitoring
+
+## What This Is
+
+An analytics dashboard and service health monitoring system for the existing CIM Summary application. Provides document processing metrics, user activity tracking, real-time service health detection, scheduled health probes, and email + in-app alerting when APIs or credentials need attention.
+
+## Core Value
+
+When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
+
+## Requirements
+
+### Validated
+
+- ✓ Document upload and processing pipeline — existing
+- ✓ Multi-provider LLM integration (Anthropic, OpenAI, OpenRouter) — existing
+- ✓ Google Document AI text extraction — existing
+- ✓ Supabase PostgreSQL with pgvector for storage and search — existing
+- ✓ Firebase Authentication — existing
+- ✓ Google Cloud Storage for file management — existing
+- ✓ Background job queue with retry logic — existing
+- ✓ Structured logging with Winston and correlation IDs — existing
+- ✓ Basic health endpoints (`/health`, `/health/config`, `/monitoring/dashboard`) — existing
+- ✓ PDF generation and export — existing
+
+### Active
+
+- [ ] In-app admin analytics dashboard (processing metrics + user activity)
+- [ ] Service health monitoring for Google Document AI, Claude/OpenAI, Supabase, Firebase Auth
+- [ ] Real-time auth failure detection with actionable alerts
+- [ ] Scheduled periodic health probes for all 4 services
+- [ ] Email alerting for critical service issues
+- [ ] In-app alert notifications for admin
+- [ ] 30-day rolling data retention for analytics
+
+### Out of Scope
+
+- External monitoring tools (Grafana, Datadog) — keeping it in-app for simplicity
+- Non-admin user analytics views — admin-only for now
+- Mobile push notifications — email + in-app sufficient
+- Historical analytics beyond 30 days — lean storage, can extend later
+- Real-time WebSocket updates — polling is sufficient for admin dashboard
+
+## Context
+
+The CIM Summary application already has basic health endpoints and structured logging with correlation IDs. The existing `/monitoring/dashboard` endpoint provides some system metrics. The `performance_metrics` table in Supabase already exists for storing system performance data. Winston logging captures errors with context, but there's no alerting mechanism — errors are logged but nobody gets notified.
+
+The admin user is jpressnell@bluepointcapital.com. This is a single-admin system for now.
+
+Four external services need monitoring:
+1. **Google Document AI** — uses service account credentials, can expire or lose permissions
+2. **Claude/OpenAI** — API keys can be revoked, rate limited, or run out of credits
+3. **Supabase** — connection pool issues, service key rotation, pgvector availability
+4. **Firebase Auth** — project config changes, token verification failures
+
+## Constraints
+
+- **Tech stack**: Must integrate with existing Express.js backend and React frontend
+- **Auth**: Admin-only access, use existing Firebase Auth with role check for jpressnell@bluepointcapital.com
+- **Storage**: Use existing Supabase PostgreSQL — no new database infrastructure
+- **Email**: Need an email sending service (SendGrid, Resend, or similar) for alerts
+- **Deployment**: Must work within Firebase Cloud Functions 14-minute timeout
+- **Data retention**: 30-day rolling window to keep storage costs low
+
+## Key Decisions
+
+| Decision | Rationale | Outcome |
+|----------|-----------|---------|
+| In-app dashboard over external tools | Simpler setup, no additional infrastructure, admin can see everything in one place | — Pending |
+| Email + in-app dual alerting | Redundancy for critical issues — in-app for when you're already looking, email for when you're not | — Pending |
+| 30-day retention | Balances useful trend data with storage efficiency | — Pending |
+| Single admin (jpressnell@bluepointcapital.com) | Simple RBAC for now, can extend later | — Pending |
+| Real-time detection + scheduled probes | Catches failures as they happen AND proactively tests services before users hit them | — Pending |
+
+---
+*Last updated: 2026-02-24 after initialization*