6.1 KiB
Roadmap: CIM Summary — Analytics & Monitoring
Overview
This milestone adds persistent analytics and service health monitoring to the existing CIM Summary application. The work proceeds in four phases that respect hard dependency constraints: database schema must exist before services can write to it, services must exist before routes can expose them, and routes must be stable before the frontend can be wired up. Each phase delivers a complete, independently testable layer.
Phases
Phase Numbering:
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
Decimal phases appear between their surrounding integers in numeric order.
- Phase 1: Data Foundation - Create schema, DB models, and verify existing Supabase connection wiring
- Phase 2: Backend Services - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
- Phase 3: API Layer - Admin-gated routes exposing all services, instrumentation hooks in existing processors (completed 2026-02-24)
- Phase 4: Frontend - Admin dashboard page, health panel, processing metrics, alert notification banner
Phase Details
Phase 1: Data Foundation
Goal: The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used Depends on: Nothing (first phase) Requirements: INFR-01, INFR-04 Success Criteria (what must be TRUE):
service_health_checksandalert_eventstables exist in Supabase with indexes oncreated_at- All new tables use the existing Supabase client from
config/supabase.ts— no new database connections added AlertEventModel.tsexists and its CRUD methods can be called in isolation without errors- Migration SQL can be run against the live Supabase instance and produces the expected schema Plans: 2/2 plans executed
Plans:
- 01-01-PLAN.md — Migration SQL + HealthCheckModel + AlertEventModel
- 01-02-PLAN.md — Unit tests for both monitoring models
Phase 2: Backend Services
Goal: All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule Depends on: Phase 1 Requirements: HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03 Success Criteria (what must be TRUE):
- Each health probe makes a real authenticated API call to its target service and returns a structured result (status, latency_ms, error_message)
- Health probe results are written to Supabase and survive a simulated cold start (data present after function restart)
- An alert email is sent when a service probe returns degraded or down, and a second probe failure within the cooldown period does not send a duplicate email
- Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
- Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
- A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports Plans: 4/4 plans complete
Plans:
- 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
- 02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
- 02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
- 02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)
Phase 3: API Layer
Goal: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation Depends on: Phase 2 Requirements: INFR-02, HLTH-01, ANLY-02 Success Criteria (what must be TRUE):
GET /admin/healthreturns current health status for all four services; a request with a non-admin Firebase token receives 403GET /admin/analyticsreturns processing summary (upload counts, success/failure rates, avg processing time) sourced from Supabase, not in-memory stateGET /admin/alertsandPOST /admin/alerts/:id/acknowledgefunction correctly and are blocked to non-admin users- Document processing in
jobProcessorService.tsandllmService.tsemits analytics events at stage transitions without any change to existing processing behavior Plans: 2/2 plans complete
Plans:
- 03-01-PLAN.md — Admin auth middleware + admin routes (health, analytics, alerts endpoints)
- 03-02-PLAN.md — Analytics instrumentation in jobProcessorService
Phase 4: Frontend
Goal: The admin can see live service health, processing metrics, and active alerts directly in the application UI Depends on: Phase 3 Requirements: ALRT-03, ANLY-02 (UI delivery), HLTH-01 (UI delivery) Success Criteria (what must be TRUE):
- An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it
- The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible
- The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend
- A non-admin user visiting the admin route is redirected or shown an access-denied state Plans: 2 plans
Plans:
- 04-01-PLAN.md — AdminService monitoring methods + AlertBanner + AdminMonitoringDashboard components
- 04-02-PLAN.md — Wire components into Dashboard + visual verification checkpoint
Progress
Execution Order: Phases execute in numeric order: 1 → 2 → 3 → 4
| Phase | Plans Complete | Status | Completed |
|---|---|---|---|
| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
| 2. Backend Services | 4/4 | Complete | 2026-02-24 |
| 3. API Layer | 2/2 | Complete | 2026-02-24 |
| 4. Frontend | 0/TBD | Not started | - |