Files
cim_summary/.planning/REQUIREMENTS.md
admin e4a7699938 docs(02-04): complete runHealthProbes + runRetentionCleanup plan
- Phase 2 plan 4 complete — two scheduled Cloud Function exports added
- SUMMARY.md created with decisions, deviations, and phase readiness notes
- STATE.md updated: phase 2 complete, plan counter at 4/4
- ROADMAP.md updated: phase 2 all 4 plans complete
- Requirements HLTH-03 and INFR-03 marked complete

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 14:37:00 -05:00

4.1 KiB

Requirements: CIM Summary — Analytics & Monitoring

Defined: 2026-02-24 Core Value: When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.

v1 Requirements

Requirements for initial release. Each maps to roadmap phases.

Service Health

  • HLTH-01: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
  • HLTH-02: Each health probe makes a real authenticated API call, not just config checks
  • HLTH-03: Health probes run on a scheduled interval, separate from document processing
  • HLTH-04: Health probe results persist to Supabase (survive cold starts)

Alerting

  • ALRT-01: Admin receives email alert when a service goes down or degrades
  • ALRT-02: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
  • ALRT-03: Admin sees in-app alert banner for active critical issues
  • ALRT-04: Alert recipient stored as configuration, not hardcoded

Processing Analytics

  • ANLY-01: Document processing events persist to Supabase at write time (not in-memory only)
  • ANLY-02: Admin can view processing summary: upload counts, success/failure rates, avg processing time
  • ANLY-03: Analytics instrumentation is non-blocking (fire-and-forget, never delays processing pipeline)

Infrastructure

  • INFR-01: Database migrations create service_health_checks and alert_events tables with indexes on created_at
  • INFR-02: Admin API routes protected by Firebase Auth with admin email check
  • INFR-03: 30-day rolling data retention cleanup runs on schedule
  • INFR-04: Analytics writes use existing Supabase connection, no new database infrastructure

v2 Requirements

Deferred to future release. Tracked but not in current roadmap.

Service Health

  • HLTH-05: Admin can view 7-day service health history with uptime percentages
  • HLTH-06: Real-time auth failure detection classifies auth errors (401/403) vs transient errors (429/503) and alerts immediately on credential issues

Alerting

  • ALRT-05: Admin can acknowledge or snooze alerts from the UI
  • ALRT-06: Admin receives recovery email when a downed service returns healthy

Processing Analytics

  • ANLY-04: Admin can view processing time trend charts over time
  • ANLY-05: Admin can view LLM token usage and estimated cost per document and per month

Infrastructure

  • INFR-05: Dashboard shows staleness warning when monitoring data stops arriving

Out of Scope

Feature Reason
External monitoring tools (Grafana, Datadog) Operational overhead unjustified for single-admin app
Multi-user analytics views One admin user, RBAC complexity for zero benefit
WebSocket/SSE real-time updates Polling at 60s intervals sufficient; WebSockets complex in Cloud Functions
Mobile push notifications Email + in-app covers notification needs
Historical analytics beyond 30 days Storage costs; can extend later
ML-based anomaly detection Threshold-based alerting sufficient for this scale
Log aggregation / log search UI Firebase Cloud Logging handles this

Traceability

Which phases cover which requirements. Updated during roadmap creation.

Requirement Phase Status
INFR-01 Phase 1 Complete
INFR-04 Phase 1 Complete
HLTH-02 Phase 2 Complete
HLTH-03 Phase 2 Complete
HLTH-04 Phase 2 Complete
ALRT-01 Phase 2 Complete
ALRT-02 Phase 2 Complete
ALRT-04 Phase 2 Complete
ANLY-01 Phase 2 Complete
ANLY-03 Phase 2 Complete
INFR-03 Phase 2 Complete
INFR-02 Phase 3 Pending
HLTH-01 Phase 3 Pending
ANLY-02 Phase 3 Pending
ALRT-03 Phase 4 Pending

Coverage:

  • v1 requirements: 15 total
  • Mapped to phases: 15
  • Unmapped: 0

Requirements defined: 2026-02-24 Last updated: 2026-02-24 — traceability mapped after roadmap creation