# Roadmap: CIM Summary — Analytics & Monitoring

## Overview

This milestone adds persistent analytics and service health monitoring to the existing CIM Summary application. The work proceeds in four phases that respect hard dependency constraints: database schema must exist before services can write to it, services must exist before routes can expose them, and routes must be stable before the frontend can be wired up. Each phase delivers a complete, independently testable layer.

## Phases

**Phase Numbering:**
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

- [ ] **Phase 1: Data Foundation** - Create schema, DB models, and verify existing Supabase connection wiring
- [x] **Phase 2: Backend Services** - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
- [x] **Phase 3: API Layer** - Admin-gated routes exposing all services, instrumentation hooks in existing processors (completed 2026-02-24)
- [x] **Phase 4: Frontend** - Admin dashboard page, health panel, processing metrics, alert notification banner (completed 2026-02-24)
- [ ] **Phase 5: Tech Debt Cleanup** - Config-driven admin email, consolidate retention cleanup, remove hardcoded defaults

## Phase Details

### Phase 1: Data Foundation
**Goal**: The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used
**Depends on**: Nothing (first phase)
**Requirements**: INFR-01, INFR-04
**Success Criteria** (what must be TRUE):
  1. `service_health_checks` and `alert_events` tables exist in Supabase with indexes on `created_at`
  2. All new tables use the existing Supabase client from `config/supabase.ts` — no new database connections added
  3. `AlertEventModel.ts` exists and its CRUD methods can be called in isolation without errors
  4. Migration SQL can be run against the live Supabase instance and produces the expected schema
**Plans:** 2/2 plans executed

Plans:
- [x] 01-01-PLAN.md — Migration SQL + HealthCheckModel + AlertEventModel
- [x] 01-02-PLAN.md — Unit tests for both monitoring models

### Phase 2: Backend Services
**Goal**: All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule
**Depends on**: Phase 1
**Requirements**: HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03
**Success Criteria** (what must be TRUE):
  1. Each health probe makes a real authenticated API call to its target service and returns a structured result (status, latency_ms, error_message)
  2. Health probe results are written to Supabase and survive a simulated cold start (data present after function restart)
  3. An alert email is sent when a service probe returns degraded or down, and a second probe failure within the cooldown period does not send a duplicate email
  4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
  5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
  6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports
**Plans:** 4/4 plans complete

Plans:
- [ ] 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
- [ ] 02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
- [ ] 02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
- [ ] 02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)

### Phase 3: API Layer
**Goal**: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation
**Depends on**: Phase 2
**Requirements**: INFR-02, HLTH-01, ANLY-02
**Success Criteria** (what must be TRUE):
  1. `GET /admin/health` returns current health status for all four services; a request with a non-admin Firebase token receives 403
  2. `GET /admin/analytics` returns processing summary (upload counts, success/failure rates, avg processing time) sourced from Supabase, not in-memory state
  3. `GET /admin/alerts` and `POST /admin/alerts/:id/acknowledge` function correctly and are blocked to non-admin users
  4. Document processing in `jobProcessorService.ts` and `llmService.ts` emits analytics events at stage transitions without any change to existing processing behavior
**Plans:** 2/2 plans complete

Plans:
- [ ] 03-01-PLAN.md — Admin auth middleware + admin routes (health, analytics, alerts endpoints)
- [ ] 03-02-PLAN.md — Analytics instrumentation in jobProcessorService

### Phase 4: Frontend
**Goal**: The admin can see live service health, processing metrics, and active alerts directly in the application UI
**Depends on**: Phase 3
**Requirements**: ALRT-03, ANLY-02 (UI delivery), HLTH-01 (UI delivery)
**Success Criteria** (what must be TRUE):
  1. An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it
  2. The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible
  3. The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend
  4. A non-admin user visiting the admin route is redirected or shown an access-denied state
**Plans:** 2/2 plans complete

Plans:
- [ ] 04-01-PLAN.md — AdminService monitoring methods + AlertBanner + AdminMonitoringDashboard components
- [ ] 04-02-PLAN.md — Wire components into Dashboard + visual verification checkpoint

## Progress

**Execution Order:**
Phases execute in numeric order: 1 → 2 → 3 → 4 → 5

| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
| 2. Backend Services | 4/4 | Complete | 2026-02-24 |
| 3. API Layer | 2/2 | Complete | 2026-02-24 |
| 4. Frontend | 2/2 | Complete | 2026-02-25 |
| 5. Tech Debt Cleanup | 0/0 | Not Planned | — |

### Phase 5: Tech Debt Cleanup
**Goal**: All configuration values are env-driven (no hardcoded emails), retention cleanup is consolidated into a single function, and deployment defaults use placeholders
**Depends on**: Phase 4
**Requirements**: None (tech debt from v1.0 audit)
**Gap Closure**: Closes tech debt items from v1.0-MILESTONE-AUDIT.md
**Success Criteria** (what must be TRUE):
  1. Frontend `adminService.ts` reads admin email from `import.meta.env.VITE_ADMIN_EMAIL` instead of a hardcoded literal
  2. Only one retention cleanup function exists in `index.ts` (the model-layer `runRetentionCleanup`), with the pre-existing raw SQL `cleanupOldData` consolidated or removed
  3. `defineString('EMAIL_WEEKLY_RECIPIENT')` default in `index.ts` uses a placeholder (not a personal email address)
**Plans:** 0 plans

Plans:
- [ ] TBD (run /gsd:plan-phase 5 to break down)