Archive milestone artifacts (roadmap, requirements, audit, phase directories) to .planning/milestones/. Evolve PROJECT.md with validated requirements and decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
111 lines
4.3 KiB
Markdown
111 lines
4.3 KiB
Markdown
# Requirements Archive: v1.0 Analytics & Monitoring
|
|
|
|
**Archived:** 2026-02-25
|
|
**Status:** SHIPPED
|
|
|
|
For current requirements, see `.planning/REQUIREMENTS.md`.
|
|
|
|
---
|
|
|
|
# Requirements: CIM Summary — Analytics & Monitoring
|
|
|
|
**Defined:** 2026-02-24
|
|
**Core Value:** When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
|
|
|
|
## v1 Requirements
|
|
|
|
Requirements for initial release. Each maps to roadmap phases.
|
|
|
|
### Service Health
|
|
|
|
- [x] **HLTH-01**: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
|
|
- [x] **HLTH-02**: Each health probe makes a real authenticated API call, not just config checks
|
|
- [x] **HLTH-03**: Health probes run on a scheduled interval, separate from document processing
|
|
- [x] **HLTH-04**: Health probe results persist to Supabase (survive cold starts)
|
|
|
|
### Alerting
|
|
|
|
- [x] **ALRT-01**: Admin receives email alert when a service goes down or degrades
|
|
- [x] **ALRT-02**: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
|
|
- [x] **ALRT-03**: Admin sees in-app alert banner for active critical issues
|
|
- [x] **ALRT-04**: Alert recipient stored as configuration, not hardcoded
|
|
|
|
### Processing Analytics
|
|
|
|
- [x] **ANLY-01**: Document processing events persist to Supabase at write time (not in-memory only)
|
|
- [x] **ANLY-02**: Admin can view processing summary: upload counts, success/failure rates, avg processing time
|
|
- [x] **ANLY-03**: Analytics instrumentation is non-blocking (fire-and-forget, never delays processing pipeline)
|
|
|
|
### Infrastructure
|
|
|
|
- [x] **INFR-01**: Database migrations create service_health_checks and alert_events tables with indexes on created_at
|
|
- [x] **INFR-02**: Admin API routes protected by Firebase Auth with admin email check
|
|
- [x] **INFR-03**: 30-day rolling data retention cleanup runs on schedule
|
|
- [x] **INFR-04**: Analytics writes use existing Supabase connection, no new database infrastructure
|
|
|
|
## v2 Requirements
|
|
|
|
Deferred to future release. Tracked but not in current roadmap.
|
|
|
|
### Service Health
|
|
|
|
- **HLTH-05**: Admin can view 7-day service health history with uptime percentages
|
|
- **HLTH-06**: Real-time auth failure detection classifies auth errors (401/403) vs transient errors (429/503) and alerts immediately on credential issues
|
|
|
|
### Alerting
|
|
|
|
- **ALRT-05**: Admin can acknowledge or snooze alerts from the UI
|
|
- **ALRT-06**: Admin receives recovery email when a downed service returns healthy
|
|
|
|
### Processing Analytics
|
|
|
|
- **ANLY-04**: Admin can view processing time trend charts over time
|
|
- **ANLY-05**: Admin can view LLM token usage and estimated cost per document and per month
|
|
|
|
### Infrastructure
|
|
|
|
- **INFR-05**: Dashboard shows staleness warning when monitoring data stops arriving
|
|
|
|
## Out of Scope
|
|
|
|
| Feature | Reason |
|
|
|---------|--------|
|
|
| External monitoring tools (Grafana, Datadog) | Operational overhead unjustified for single-admin app |
|
|
| Multi-user analytics views | One admin user, RBAC complexity for zero benefit |
|
|
| WebSocket/SSE real-time updates | Polling at 60s intervals sufficient; WebSockets complex in Cloud Functions |
|
|
| Mobile push notifications | Email + in-app covers notification needs |
|
|
| Historical analytics beyond 30 days | Storage costs; can extend later |
|
|
| ML-based anomaly detection | Threshold-based alerting sufficient for this scale |
|
|
| Log aggregation / log search UI | Firebase Cloud Logging handles this |
|
|
|
|
## Traceability
|
|
|
|
Which phases cover which requirements. Updated during roadmap creation.
|
|
|
|
| Requirement | Phase | Status |
|
|
|-------------|-------|--------|
|
|
| INFR-01 | Phase 1 | Complete |
|
|
| INFR-04 | Phase 1 | Complete |
|
|
| HLTH-02 | Phase 2 | Complete |
|
|
| HLTH-03 | Phase 2 | Complete |
|
|
| HLTH-04 | Phase 2 | Complete |
|
|
| ALRT-01 | Phase 2 | Complete |
|
|
| ALRT-02 | Phase 2 | Complete |
|
|
| ALRT-04 | Phase 2 | Complete |
|
|
| ANLY-01 | Phase 2 | Complete |
|
|
| ANLY-03 | Phase 2 | Complete |
|
|
| INFR-03 | Phase 2 | Complete |
|
|
| INFR-02 | Phase 3 | Complete |
|
|
| HLTH-01 | Phase 3 | Complete |
|
|
| ANLY-02 | Phase 3 | Complete |
|
|
| ALRT-03 | Phase 4 | Complete |
|
|
|
|
**Coverage:**
|
|
- v1 requirements: 15 total
|
|
- Mapped to phases: 15
|
|
- Unmapped: 0
|
|
|
|
---
|
|
*Requirements defined: 2026-02-24*
|
|
*Last updated: 2026-02-24 — traceability mapped after roadmap creation*
|