Files
cim_summary/.planning/ROADMAP.md
admin 081c5357c1 docs(03-02): complete analytics instrumentation plan
- Create 03-02-SUMMARY.md with task outcomes and decisions
- Update STATE.md: advance to Phase 3 Plan 2 complete
- Update ROADMAP.md: mark plan progress for phase 3
- Mark ANLY-02 requirement complete in REQUIREMENTS.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 15:45:59 -05:00

5.9 KiB

Roadmap: CIM Summary — Analytics & Monitoring

Overview

This milestone adds persistent analytics and service health monitoring to the existing CIM Summary application. The work proceeds in four phases that respect hard dependency constraints: database schema must exist before services can write to it, services must exist before routes can expose them, and routes must be stable before the frontend can be wired up. Each phase delivers a complete, independently testable layer.

Phases

Phase Numbering:

  • Integer phases (1, 2, 3): Planned milestone work
  • Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

  • Phase 1: Data Foundation - Create schema, DB models, and verify existing Supabase connection wiring
  • Phase 2: Backend Services - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
  • Phase 3: API Layer - Admin-gated routes exposing all services, instrumentation hooks in existing processors (completed 2026-02-24)
  • Phase 4: Frontend - Admin dashboard page, health panel, processing metrics, alert notification banner

Phase Details

Phase 1: Data Foundation

Goal: The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used Depends on: Nothing (first phase) Requirements: INFR-01, INFR-04 Success Criteria (what must be TRUE):

  1. service_health_checks and alert_events tables exist in Supabase with indexes on created_at
  2. All new tables use the existing Supabase client from config/supabase.ts — no new database connections added
  3. AlertEventModel.ts exists and its CRUD methods can be called in isolation without errors
  4. Migration SQL can be run against the live Supabase instance and produces the expected schema Plans: 2/2 plans executed

Plans:

  • 01-01-PLAN.md — Migration SQL + HealthCheckModel + AlertEventModel
  • 01-02-PLAN.md — Unit tests for both monitoring models

Phase 2: Backend Services

Goal: All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule Depends on: Phase 1 Requirements: HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03 Success Criteria (what must be TRUE):

  1. Each health probe makes a real authenticated API call to its target service and returns a structured result (status, latency_ms, error_message)
  2. Health probe results are written to Supabase and survive a simulated cold start (data present after function restart)
  3. An alert email is sent when a service probe returns degraded or down, and a second probe failure within the cooldown period does not send a duplicate email
  4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
  5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
  6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports Plans: 4/4 plans complete

Plans:

  • 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
  • 02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
  • 02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
  • 02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)

Phase 3: API Layer

Goal: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation Depends on: Phase 2 Requirements: INFR-02, HLTH-01, ANLY-02 Success Criteria (what must be TRUE):

  1. GET /admin/health returns current health status for all four services; a request with a non-admin Firebase token receives 403
  2. GET /admin/analytics returns processing summary (upload counts, success/failure rates, avg processing time) sourced from Supabase, not in-memory state
  3. GET /admin/alerts and POST /admin/alerts/:id/acknowledge function correctly and are blocked to non-admin users
  4. Document processing in jobProcessorService.ts and llmService.ts emits analytics events at stage transitions without any change to existing processing behavior Plans: 2/2 plans complete

Plans:

  • 03-01-PLAN.md — Admin auth middleware + admin routes (health, analytics, alerts endpoints)
  • 03-02-PLAN.md — Analytics instrumentation in jobProcessorService

Phase 4: Frontend

Goal: The admin can see live service health, processing metrics, and active alerts directly in the application UI Depends on: Phase 3 Requirements: ALRT-03, ANLY-02 (UI delivery), HLTH-01 (UI delivery) Success Criteria (what must be TRUE):

  1. An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it
  2. The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible
  3. The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend
  4. A non-admin user visiting the admin route is redirected or shown an access-denied state Plans: TBD

Progress

Execution Order: Phases execute in numeric order: 1 → 2 → 3 → 4

Phase Plans Complete Status Completed
1. Data Foundation 2/2 Complete 2026-02-24
2. Backend Services 4/4 Complete 2026-02-24
3. API Layer 2/2 Complete 2026-02-24
4. Frontend 0/TBD Not started -