Files
cim_summary/.planning/milestones/v1.0-phases/01-data-foundation/01-01-PLAN.md
admin 38a0f0619d chore: complete v1.0 Analytics & Monitoring milestone
Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:34:18 -05:00

227 lines
13 KiB
Markdown

---
phase: 01-data-foundation
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- backend/src/models/migrations/012_create_monitoring_tables.sql
- backend/src/models/HealthCheckModel.ts
- backend/src/models/AlertEventModel.ts
- backend/src/models/index.ts
autonomous: true
requirements:
- INFR-01
- INFR-04
must_haves:
truths:
- "Migration SQL creates service_health_checks and alert_events tables with all required columns and CHECK constraints"
- "Both tables have indexes on created_at (INFR-01 requirement)"
- "RLS is enabled on both new tables"
- "HealthCheckModel and AlertEventModel use getSupabaseServiceClient() for all database operations (INFR-04 — no new DB infrastructure)"
- "Model static methods validate input before writing"
artifacts:
- path: "backend/src/models/migrations/012_create_monitoring_tables.sql"
provides: "DDL for service_health_checks and alert_events tables"
contains: "CREATE TABLE IF NOT EXISTS service_health_checks"
- path: "backend/src/models/HealthCheckModel.ts"
provides: "CRUD operations for service_health_checks table"
exports: ["HealthCheckModel", "ServiceHealthCheck", "CreateHealthCheckData"]
- path: "backend/src/models/AlertEventModel.ts"
provides: "CRUD operations for alert_events table"
exports: ["AlertEventModel", "AlertEvent", "CreateAlertEventData"]
- path: "backend/src/models/index.ts"
provides: "Barrel exports for new models"
contains: "HealthCheckModel"
key_links:
- from: "backend/src/models/HealthCheckModel.ts"
to: "backend/src/config/supabase.ts"
via: "getSupabaseServiceClient() import"
pattern: "import.*getSupabaseServiceClient.*from.*config/supabase"
- from: "backend/src/models/AlertEventModel.ts"
to: "backend/src/config/supabase.ts"
via: "getSupabaseServiceClient() import"
pattern: "import.*getSupabaseServiceClient.*from.*config/supabase"
- from: "backend/src/models/HealthCheckModel.ts"
to: "backend/src/utils/logger.ts"
via: "Winston logger import"
pattern: "import.*logger.*from.*utils/logger"
---
<objective>
Create the database migration and TypeScript model layer for the monitoring system.
Purpose: Establish the data foundation that all subsequent phases (health probes, alerts, analytics) depend on. Tables must exist and model CRUD must work before any service can write monitoring data.
Output: One SQL migration file, two TypeScript model classes, updated barrel exports.
</objective>
<execution_context>
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
@/home/jonathan/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-data-foundation/01-RESEARCH.md
@.planning/phases/01-data-foundation/01-CONTEXT.md
# Existing patterns to follow
@backend/src/models/DocumentModel.ts
@backend/src/models/ProcessingJobModel.ts
@backend/src/models/index.ts
@backend/src/models/migrations/005_create_processing_jobs_table.sql
@backend/src/config/supabase.ts
@backend/src/utils/logger.ts
</context>
<tasks>
<task type="auto">
<name>Task 1: Create monitoring tables migration</name>
<files>backend/src/models/migrations/012_create_monitoring_tables.sql</files>
<action>
Create migration file `012_create_monitoring_tables.sql` following the pattern from `005_create_processing_jobs_table.sql`.
**service_health_checks table:**
- `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`
- `service_name VARCHAR(100) NOT NULL`
- `status TEXT NOT NULL CHECK (status IN ('healthy', 'degraded', 'down'))`
- `latency_ms INTEGER` (nullable — INTEGER is correct, max ~2.1B ms which is impossible for latency)
- `checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP` (when the probe actually ran — distinct from created_at per Research Pitfall 5)
- `error_message TEXT` (nullable — for storing probe failure details)
- `probe_details JSONB` (nullable — flexible metadata per service: response codes, error specifics)
- `created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP`
**Indexes for service_health_checks:**
- `idx_service_health_checks_created_at ON service_health_checks(created_at)` — required by INFR-01, used for 30-day retention queries
- `idx_service_health_checks_service_created ON service_health_checks(service_name, created_at)` — composite for dashboard "latest check per service" queries
**alert_events table:**
- `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`
- `service_name VARCHAR(100) NOT NULL`
- `alert_type TEXT NOT NULL CHECK (alert_type IN ('service_down', 'service_degraded', 'recovery'))`
- `status TEXT NOT NULL CHECK (status IN ('active', 'acknowledged', 'resolved'))`
- `message TEXT` (nullable — human-readable alert description)
- `details JSONB` (nullable — structured metadata about the alert)
- `created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP`
- `acknowledged_at TIMESTAMP WITH TIME ZONE` (nullable)
- `resolved_at TIMESTAMP WITH TIME ZONE` (nullable)
**Indexes for alert_events:**
- `idx_alert_events_created_at ON alert_events(created_at)` — required by INFR-01
- `idx_alert_events_status ON alert_events(status)` — for "active alerts" queries
- `idx_alert_events_service_status ON alert_events(service_name, status)` — for "active alerts per service"
**RLS:**
- `ALTER TABLE service_health_checks ENABLE ROW LEVEL SECURITY;`
- `ALTER TABLE alert_events ENABLE ROW LEVEL SECURITY;`
- No explicit policies needed — service role key bypasses RLS automatically in Supabase (Research Pitfall 2). Policies for authenticated users will be added in Phase 3.
**Important patterns (per CONTEXT.md):**
- ALL DDL uses `IF NOT EXISTS``CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`
- Forward-only migration — no rollback/down scripts
- File must be numbered `012_` (current highest is `011_create_vector_database_tables.sql`)
- Include header comment with migration purpose and date
**Do NOT:**
- Use PostgreSQL ENUM types — use TEXT + CHECK per user decision
- Create rollback/down scripts — forward-only per user decision
- Add any DML (INSERT/UPDATE/DELETE) — migration is DDL only
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary && ls -la backend/src/models/migrations/012_create_monitoring_tables.sql && grep -c "CREATE TABLE IF NOT EXISTS" backend/src/models/migrations/012_create_monitoring_tables.sql | grep -q "2" && echo "PASS: 2 tables found" || echo "FAIL: expected 2 CREATE TABLE statements"</automated>
<manual>Verify SQL syntax is valid and matches existing migration patterns</manual>
</verify>
<done>Migration file exists with both tables, CHECK constraints on status fields, JSONB columns for flexible metadata, indexes on created_at for both tables, composite indexes for common query patterns, and RLS enabled on both tables.</done>
</task>
<task type="auto">
<name>Task 2: Create HealthCheckModel and AlertEventModel with barrel exports</name>
<files>
backend/src/models/HealthCheckModel.ts
backend/src/models/AlertEventModel.ts
backend/src/models/index.ts
</files>
<action>
**HealthCheckModel.ts** — Follow DocumentModel.ts static class pattern exactly:
Interfaces:
- `ServiceHealthCheck` — full row type matching all columns from migration (id, service_name, status, latency_ms, checked_at, error_message, probe_details, created_at). Use `'healthy' | 'degraded' | 'down'` union for status. Use `Record<string, unknown>` for probe_details (not `any` — strict TypeScript per CONVENTIONS.md).
- `CreateHealthCheckData` — input type for create method (service_name required, status required, latency_ms optional, error_message optional, probe_details optional).
Static methods:
- `create(data: CreateHealthCheckData): Promise<ServiceHealthCheck>` — Validate service_name is non-empty, validate status is one of the three allowed values. Call `getSupabaseServiceClient()` inside the method (not cached at module level — per Research finding). Use `.from('service_health_checks').insert({...}).select().single()`. Log with Winston logger on success and error. Throw on Supabase error with descriptive message.
- `findLatestByService(serviceName: string): Promise<ServiceHealthCheck | null>` — Get most recent health check for a given service. Order by `checked_at` desc, limit 1. Return null if not found (handle PGRST116 like ProcessingJobModel).
- `findAll(options?: { limit?: number; serviceName?: string }): Promise<ServiceHealthCheck[]>` — List health checks with optional filtering. Default limit 100. Order by created_at desc.
- `deleteOlderThan(days: number): Promise<number>` — For 30-day retention cleanup (used by Phase 2 scheduler). Delete rows where `created_at < NOW() - interval`. Return count of deleted rows.
**AlertEventModel.ts** — Same pattern:
Interfaces:
- `AlertEvent` — full row type (id, service_name, alert_type, status, message, details, created_at, acknowledged_at, resolved_at). Use union types for alert_type and status. Use `Record<string, unknown>` for details.
- `CreateAlertEventData` — input type (service_name, alert_type, status default 'active', message optional, details optional).
Static methods:
- `create(data: CreateAlertEventData): Promise<AlertEvent>` — Validate service_name non-empty, validate alert_type and status values. Insert with default status 'active' if not provided. Same Supabase pattern as HealthCheckModel.
- `findActive(serviceName?: string): Promise<AlertEvent[]>` — Get active (unresolved, unacknowledged) alerts. Filter `status = 'active'`. Optional service_name filter. Order by created_at desc.
- `acknowledge(id: string): Promise<AlertEvent>` — Set status to 'acknowledged' and acknowledged_at to current timestamp. Return updated row.
- `resolve(id: string): Promise<AlertEvent>` — Set status to 'resolved' and resolved_at to current timestamp. Return updated row.
- `findRecentByService(serviceName: string, alertType: string, withinMinutes: number): Promise<AlertEvent | null>` — For deduplication in Phase 2. Find most recent alert of given type for service within time window.
- `deleteOlderThan(days: number): Promise<number>` — Same retention pattern as HealthCheckModel.
**Common patterns for BOTH models:**
- Import `getSupabaseServiceClient` from `'../config/supabase'`
- Import `logger` from `'../utils/logger'`
- Call `getSupabaseServiceClient()` per-method (not at module level)
- Error handling: check `if (error)` after every Supabase call, log with `logger.error()`, throw with descriptive message
- Handle PGRST116 (not found) by returning null instead of throwing (ProcessingJobModel pattern)
- Type guard on catch: `error instanceof Error ? error.message : String(error)`
- All methods are `static async`
**index.ts update:**
- Add export lines for both new models: `export { HealthCheckModel } from './HealthCheckModel';` and `export { AlertEventModel } from './AlertEventModel';`
- Also export the interfaces: `export type { ServiceHealthCheck, CreateHealthCheckData } from './HealthCheckModel';` and `export type { AlertEvent, CreateAlertEventData } from './AlertEventModel';`
- Keep all existing exports intact
**Do NOT:**
- Use `any` type anywhere — use `Record<string, unknown>` for JSONB fields
- Use `console.log` — use Winston logger only
- Cache `getSupabaseServiceClient()` at module level
- Create a shared base model class (per Research recommendation — keep models independent)
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | tail -20</automated>
<manual>Verify both models export from index.ts and follow DocumentModel.ts patterns</manual>
</verify>
<done>HealthCheckModel.ts and AlertEventModel.ts exist with typed interfaces, static CRUD methods, input validation, getSupabaseServiceClient() per-method, Winston logging. Both models exported from index.ts. TypeScript compiles without errors.</done>
</task>
</tasks>
<verification>
1. `ls backend/src/models/migrations/012_create_monitoring_tables.sql` — migration file exists
2. `grep "CREATE TABLE IF NOT EXISTS service_health_checks" backend/src/models/migrations/012_create_monitoring_tables.sql` — table DDL present
3. `grep "CREATE TABLE IF NOT EXISTS alert_events" backend/src/models/migrations/012_create_monitoring_tables.sql` — table DDL present
4. `grep "idx_.*_created_at" backend/src/models/migrations/012_create_monitoring_tables.sql` — INFR-01 indexes present
5. `grep "ENABLE ROW LEVEL SECURITY" backend/src/models/migrations/012_create_monitoring_tables.sql` — RLS enabled
6. `grep "getSupabaseServiceClient" backend/src/models/HealthCheckModel.ts` — INFR-04 uses existing Supabase connection
7. `grep "getSupabaseServiceClient" backend/src/models/AlertEventModel.ts` — INFR-04 uses existing Supabase connection
8. `cd backend && npx tsc --noEmit` — TypeScript compiles cleanly
</verification>
<success_criteria>
- Migration file 012 creates both tables with CHECK constraints, JSONB columns, all indexes, and RLS
- Both model classes compile, export typed interfaces, use getSupabaseServiceClient() per-method
- Both models are re-exported from index.ts
- No new database connections or infrastructure introduced (INFR-04)
- TypeScript strict compilation passes
</success_criteria>
<output>
After completion, create `.planning/phases/01-data-foundation/01-01-SUMMARY.md`
</output>