Files

admin 38a0f0619d chore: complete v1.0 Analytics & Monitoring milestone

Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-25 10:34:18 -05:00

4.9 KiB

Raw Blame History

CIM Summary — Analytics & Monitoring

What This Is

An analytics dashboard and service health monitoring system for the CIM Summary application. Provides persistent document processing metrics, scheduled health probes for all 4 external services, email + in-app alerting when APIs or credentials need attention, and an admin-only monitoring dashboard.

Core Value

When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.

Requirements

Validated

✓ Document upload and processing pipeline — existing
✓ Multi-provider LLM integration (Anthropic, OpenAI, OpenRouter) — existing
✓ Google Document AI text extraction — existing
✓ Supabase PostgreSQL with pgvector for storage and search — existing
✓ Firebase Authentication — existing
✓ Google Cloud Storage for file management — existing
✓ Background job queue with retry logic — existing
✓ Structured logging with Winston and correlation IDs — existing
✓ Basic health endpoints (/health, /health/config, /monitoring/dashboard) — existing
✓ PDF generation and export — existing
✓ Admin can view live health status for all 4 services (HLTH-01) — v1.0
✓ Health probes make real authenticated API calls (HLTH-02) — v1.0
✓ Scheduled periodic health probes (HLTH-03) — v1.0
✓ Health probe results persist to Supabase (HLTH-04) — v1.0
✓ Email alert on service down/degraded (ALRT-01) — v1.0
✓ Alert deduplication within cooldown (ALRT-02) — v1.0
✓ In-app alert banner for critical issues (ALRT-03) — v1.0
✓ Alert recipient from config, not hardcoded (ALRT-04) — v1.0
✓ Processing events persist at write time (ANLY-01) — v1.0
✓ Admin can view processing summary (ANLY-02) — v1.0
✓ Analytics instrumentation non-blocking (ANLY-03) — v1.0
✓ DB migrations with indexes on created_at (INFR-01) — v1.0
✓ Admin API routes protected by Firebase Auth (INFR-02) — v1.0
✓ 30-day rolling data retention cleanup (INFR-03) — v1.0
✓ Analytics use existing Supabase connection (INFR-04) — v1.0

Active

(None — next milestone not yet defined. Run /gsd:new-milestone to plan.)

Out of Scope

External monitoring tools (Grafana, Datadog) — keeping it in-app for simplicity
Non-admin user analytics views — admin-only for now
Mobile push notifications — email + in-app sufficient
Historical analytics beyond 30 days — lean storage, can extend later
Real-time WebSocket updates — polling is sufficient for admin dashboard
ML-based anomaly detection — threshold-based alerting sufficient at this scale

Context

Shipped v1.0 with 31,184 LOC TypeScript across Express.js backend and React frontend. Tech stack: Express.js, React, Supabase (PostgreSQL + pgvector), Firebase Auth, Firebase Cloud Functions, Google Document AI, Anthropic/OpenAI LLMs, nodemailer, Tailwind CSS.

Four external services monitored with real authenticated probes:

Google Document AI — service account credential validation
Claude/OpenAI — API key validation via cheapest model (claude-haiku-4-5, max_tokens 5)
Supabase — direct PostgreSQL pool query (SELECT 1)
Firebase Auth — SDK liveness via verifyIdToken error classification

Admin user: jpressnell@bluepointcapital.com (config-driven, not hardcoded).

Constraints

Tech stack: Express.js backend + React frontend
Auth: Admin-only access via Firebase Auth with config-driven email check
Storage: Supabase PostgreSQL — no new database infrastructure
Email: nodemailer for alert delivery
Deployment: Firebase Cloud Functions (14-minute timeout)
Data retention: 30-day rolling window

Key Decisions

Decision	Rationale	Outcome
In-app dashboard over external tools	Simpler setup, no additional infrastructure	✓ Good — admin sees everything in one place
Email + in-app dual alerting	Redundancy for critical issues	✓ Good — covers both active and passive monitoring
30-day retention	Balances useful trend data with storage efficiency	✓ Good — consolidated into single cleanup function
Single admin (config-driven)	Simple RBAC, can extend later	✓ Good — email now env-driven after tech debt cleanup
Scheduled probes + fire-and-forget analytics	Decouples monitoring from processing	✓ Good — zero impact on processing pipeline latency
404 (not 403) for non-admin routes	Does not reveal admin routes exist	✓ Good — security through obscurity at API level
void return type for analytics writes	Prevents accidental await on critical path	✓ Good — type system enforces fire-and-forget pattern
Promise.allSettled for probe orchestration	All 4 probes run even if one throws	✓ Good — partial results better than total failure

Last updated: 2026-02-25 after v1.0 milestone

4.9 KiB Raw Blame History