Files

admin 081c5357c1 docs(03-02): complete analytics instrumentation plan

- Create 03-02-SUMMARY.md with task outcomes and decisions
- Update STATE.md: advance to Phase 3 Plan 2 complete
- Update ROADMAP.md: mark plan progress for phase 3
- Mark ANLY-02 requirement complete in REQUIREMENTS.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-24 15:45:59 -05:00

5.9 KiB

Raw Blame History

Roadmap: CIM Summary — Analytics & Monitoring

Overview

This milestone adds persistent analytics and service health monitoring to the existing CIM Summary application. The work proceeds in four phases that respect hard dependency constraints: database schema must exist before services can write to it, services must exist before routes can expose them, and routes must be stable before the frontend can be wired up. Each phase delivers a complete, independently testable layer.

Phases

Phase Numbering:

Integer phases (1, 2, 3): Planned milestone work
Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

Phase 1: Data Foundation - Create schema, DB models, and verify existing Supabase connection wiring
Phase 2: Backend Services - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
Phase 3: API Layer - Admin-gated routes exposing all services, instrumentation hooks in existing processors (completed 2026-02-24)
Phase 4: Frontend - Admin dashboard page, health panel, processing metrics, alert notification banner

Phase Details

Phase 1: Data Foundation

Goal: The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used Depends on: Nothing (first phase) Requirements: INFR-01, INFR-04 Success Criteria (what must be TRUE):

service_health_checks and alert_events tables exist in Supabase with indexes on created_at
All new tables use the existing Supabase client from config/supabase.ts — no new database connections added
AlertEventModel.ts exists and its CRUD methods can be called in isolation without errors
Migration SQL can be run against the live Supabase instance and produces the expected schema Plans: 2/2 plans executed

Plans:

01-01-PLAN.md — Migration SQL + HealthCheckModel + AlertEventModel
01-02-PLAN.md — Unit tests for both monitoring models

Phase 2: Backend Services

Goal: All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule Depends on: Phase 1 Requirements: HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03 Success Criteria (what must be TRUE):

Each health probe makes a real authenticated API call to its target service and returns a structured result (status, latency_ms, error_message)
Health probe results are written to Supabase and survive a simulated cold start (data present after function restart)
An alert email is sent when a service probe returns degraded or down, and a second probe failure within the cooldown period does not send a duplicate email
Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports Plans: 4/4 plans complete

Plans:

02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)

Phase 3: API Layer

Goal: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation Depends on: Phase 2 Requirements: INFR-02, HLTH-01, ANLY-02 Success Criteria (what must be TRUE):

GET /admin/health returns current health status for all four services; a request with a non-admin Firebase token receives 403
GET /admin/analytics returns processing summary (upload counts, success/failure rates, avg processing time) sourced from Supabase, not in-memory state
GET /admin/alerts and POST /admin/alerts/:id/acknowledge function correctly and are blocked to non-admin users
Document processing in jobProcessorService.ts and llmService.ts emits analytics events at stage transitions without any change to existing processing behavior Plans: 2/2 plans complete

Plans:

03-01-PLAN.md — Admin auth middleware + admin routes (health, analytics, alerts endpoints)
03-02-PLAN.md — Analytics instrumentation in jobProcessorService

Phase 4: Frontend

Goal: The admin can see live service health, processing metrics, and active alerts directly in the application UI Depends on: Phase 3 Requirements: ALRT-03, ANLY-02 (UI delivery), HLTH-01 (UI delivery) Success Criteria (what must be TRUE):

An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it
The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible
The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend
A non-admin user visiting the admin route is redirected or shown an access-denied state Plans: TBD

Progress

Execution Order: Phases execute in numeric order: 1 → 2 → 3 → 4

Phase	Plans Complete	Status	Completed
1. Data Foundation	2/2	Complete	2026-02-24
2. Backend Services	4/4	Complete	2026-02-24
3. API Layer	2/2	Complete	2026-02-24
4. Frontend	0/TBD	Not started	-

5.9 KiB Raw Blame History

Roadmap: CIM Summary — Analytics & Monitoring

Overview

Phases

Phase Details

Phase 1: Data Foundation

Phase 2: Backend Services

Phase 3: API Layer

Phase 4: Frontend

Progress

5.9 KiB

Raw Blame History