chore: complete v1.0 Analytics & Monitoring milestone

Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
admin
2026-02-25 10:34:18 -05:00
parent 8bad951d63
commit 38a0f0619d
39 changed files with 299 additions and 186 deletions

View File

@@ -0,0 +1,110 @@
# Requirements Archive: v1.0 Analytics & Monitoring
**Archived:** 2026-02-25
**Status:** SHIPPED
For current requirements, see `.planning/REQUIREMENTS.md`.
---
# Requirements: CIM Summary — Analytics & Monitoring
**Defined:** 2026-02-24
**Core Value:** When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
## v1 Requirements
Requirements for initial release. Each maps to roadmap phases.
### Service Health
- [x] **HLTH-01**: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
- [x] **HLTH-02**: Each health probe makes a real authenticated API call, not just config checks
- [x] **HLTH-03**: Health probes run on a scheduled interval, separate from document processing
- [x] **HLTH-04**: Health probe results persist to Supabase (survive cold starts)
### Alerting
- [x] **ALRT-01**: Admin receives email alert when a service goes down or degrades
- [x] **ALRT-02**: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
- [x] **ALRT-03**: Admin sees in-app alert banner for active critical issues
- [x] **ALRT-04**: Alert recipient stored as configuration, not hardcoded
### Processing Analytics
- [x] **ANLY-01**: Document processing events persist to Supabase at write time (not in-memory only)
- [x] **ANLY-02**: Admin can view processing summary: upload counts, success/failure rates, avg processing time
- [x] **ANLY-03**: Analytics instrumentation is non-blocking (fire-and-forget, never delays processing pipeline)
### Infrastructure
- [x] **INFR-01**: Database migrations create service_health_checks and alert_events tables with indexes on created_at
- [x] **INFR-02**: Admin API routes protected by Firebase Auth with admin email check
- [x] **INFR-03**: 30-day rolling data retention cleanup runs on schedule
- [x] **INFR-04**: Analytics writes use existing Supabase connection, no new database infrastructure
## v2 Requirements
Deferred to future release. Tracked but not in current roadmap.
### Service Health
- **HLTH-05**: Admin can view 7-day service health history with uptime percentages
- **HLTH-06**: Real-time auth failure detection classifies auth errors (401/403) vs transient errors (429/503) and alerts immediately on credential issues
### Alerting
- **ALRT-05**: Admin can acknowledge or snooze alerts from the UI
- **ALRT-06**: Admin receives recovery email when a downed service returns healthy
### Processing Analytics
- **ANLY-04**: Admin can view processing time trend charts over time
- **ANLY-05**: Admin can view LLM token usage and estimated cost per document and per month
### Infrastructure
- **INFR-05**: Dashboard shows staleness warning when monitoring data stops arriving
## Out of Scope
| Feature | Reason |
|---------|--------|
| External monitoring tools (Grafana, Datadog) | Operational overhead unjustified for single-admin app |
| Multi-user analytics views | One admin user, RBAC complexity for zero benefit |
| WebSocket/SSE real-time updates | Polling at 60s intervals sufficient; WebSockets complex in Cloud Functions |
| Mobile push notifications | Email + in-app covers notification needs |
| Historical analytics beyond 30 days | Storage costs; can extend later |
| ML-based anomaly detection | Threshold-based alerting sufficient for this scale |
| Log aggregation / log search UI | Firebase Cloud Logging handles this |
## Traceability
Which phases cover which requirements. Updated during roadmap creation.
| Requirement | Phase | Status |
|-------------|-------|--------|
| INFR-01 | Phase 1 | Complete |
| INFR-04 | Phase 1 | Complete |
| HLTH-02 | Phase 2 | Complete |
| HLTH-03 | Phase 2 | Complete |
| HLTH-04 | Phase 2 | Complete |
| ALRT-01 | Phase 2 | Complete |
| ALRT-02 | Phase 2 | Complete |
| ALRT-04 | Phase 2 | Complete |
| ANLY-01 | Phase 2 | Complete |
| ANLY-03 | Phase 2 | Complete |
| INFR-03 | Phase 2 | Complete |
| INFR-02 | Phase 3 | Complete |
| HLTH-01 | Phase 3 | Complete |
| ANLY-02 | Phase 3 | Complete |
| ALRT-03 | Phase 4 | Complete |
**Coverage:**
- v1 requirements: 15 total
- Mapped to phases: 15
- Unmapped: 0
---
*Requirements defined: 2026-02-24*
*Last updated: 2026-02-24 — traceability mapped after roadmap creation*