Files
cim_summary/.planning/milestones/v1.0-phases/01-data-foundation/01-RESEARCH.md
admin 38a0f0619d chore: complete v1.0 Analytics & Monitoring milestone
Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:34:18 -05:00

24 KiB

Phase 1: Data Foundation - Research

Researched: 2026-02-24 Domain: PostgreSQL schema design, Supabase PostgREST model layer, TypeScript static class pattern Confidence: HIGH

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Migration approach

  • Use the existing DatabaseMigrator class in backend/src/models/migrate.ts
  • New .sql files go in src/models/migrations/, run with npm run db:migrate
  • The migrator tracks applied migrations in a migrations table — handles idempotency
  • Forward-only migrations (no rollback/down scripts). If something needs fixing, write a new migration.
  • Migrations execute via supabase.rpc('exec_sql', { sql }) — works with cloud Supabase from any environment including Firebase

Schema details

  • Status fields use TEXT with CHECK constraints (e.g., CHECK (status IN ('healthy','degraded','down'))) — easy to extend, no enum type management
  • Table names are descriptive, matching existing style: service_health_checks, alert_events (like processing_jobs, document_chunks)
  • Include JSONB probe_details / details columns for flexible metadata per service (response codes, error specifics) without future schema changes
  • All tables get indexes on created_at (required for 30-day retention queries and dashboard time-range filters)
  • Enable Row Level Security on new tables — admin-only access, matching existing security patterns

Model layer pattern

  • One model file per table: HealthCheckModel.ts, AlertEventModel.ts
  • Static methods on model classes (e.g., AlertEventModel.create(), AlertEventModel.findActive()) — matches DocumentModel.ts pattern
  • Use getSupabaseServiceClient() (PostgREST) for all monitoring reads/writes — monitoring is not on the critical processing path, so no need for direct PostgreSQL pool
  • Input validation in the model layer before writing (defense in depth alongside DB CHECK constraints)

Claude's Discretion

  • Exact column types for non-status fields (INTEGER vs BIGINT for latency_ms, etc.)
  • Whether to create a shared base model or keep models independent
  • Index strategy beyond created_at (e.g., composite indexes on service_name + created_at)
  • Winston logging patterns within model methods

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>


<phase_requirements>

Phase Requirements

ID Description Research Support
INFR-01 Database migrations create service_health_checks and alert_events tables with indexes on created_at Migration file naming convention (012_), CREATE TABLE IF NOT EXISTS + CREATE INDEX IF NOT EXISTS patterns from migration 005/010; TEXT+CHECK for status; JSONB for probe_details; TIMESTAMP WITH TIME ZONE for created_at
INFR-04 Analytics writes use existing Supabase connection, no new database infrastructure getSupabaseServiceClient() already exported from config/supabase.ts; PostgREST .from().insert().select().single() pattern confirmed in DocumentModel.ts; monitoring path is not critical so no need for direct pg pool
</phase_requirements>

Summary

Phase 1 is a pure database + model layer task. No services, routes, or frontend changes. The existing codebase has a well-established pattern: SQL migration files in backend/src/models/migrations/ (sequentially numbered), a DatabaseMigrator class that tracks and runs them via supabase.rpc('exec_sql'), and TypeScript model classes with static methods using getSupabaseServiceClient(). All of this exists and works — the task is to follow it precisely.

The most important finding is that getSupabaseServiceClient() creates a new client on every call (no singleton caching, unlike getSupabaseClient()). This is intentional for the service-key client but means model methods must call it per-operation, not store it at module level. Existing models follow both patterns — ProcessingJobModel.ts calls getSupabaseServiceClient() inline where needed, while DocumentModel.ts uses the same inline-call approach. Either is fine; inline-per-method is most consistent.

The codebase has no RLS SQL in any existing migration — existing tables pre-date or omit RLS. The CONTEXT.md requires RLS on the new tables, so this is new territory within this project. The pattern is standard Supabase RLS (ALTER TABLE ... ENABLE ROW LEVEL SECURITY + CREATE POLICY) and well-documented, but it is new to these migrations and worth verifying against the actual Supabase RLS policy syntax for service-role key bypass.

Primary recommendation: Create migration 012_create_monitoring_tables.sql following the pattern of 005_create_processing_jobs_table.sql, then create HealthCheckModel.ts and AlertEventModel.ts following the DocumentModel.ts static-class pattern, using getSupabaseServiceClient() per method.


Standard Stack

Core

Library Version Purpose Why Standard
@supabase/supabase-js Already installed PostgREST client for model layer reads/writes Locked: project uses Supabase exclusively; getSupabaseServiceClient() already in config/supabase.ts
PostgreSQL (via Supabase) Cloud-managed Table storage, indexes, CHECK constraints, RLS Already the only database; no new infrastructure
TypeScript Already installed Model type definitions Project-wide strict TypeScript
Winston logger Already installed Logging within model methods backend/src/utils/logger.ts — NEVER console.log per .cursorrules

Supporting

Library Version Purpose When to Use
pg (Pool) Already installed Direct PostgreSQL for critical-path writes NOT needed here — monitoring is not critical path; use PostgREST only

Alternatives Considered

Instead of Could Use Tradeoff
getSupabaseServiceClient() getPostgresPool() Direct pg bypasses PostgREST cache (only relevant for critical-path inserts); monitoring writes can tolerate PostgREST; service client is simpler and sufficient
TEXT + CHECK constraint PostgreSQL ENUM ENUMs require CREATE TYPE and are harder to extend; TEXT+CHECK confirmed pattern in processing_jobs, agent_executions, users tables
Separate model files Shared BaseModel class A shared base would add indirection with minimal benefit for two small models; keep independent, consistent with existing models

Installation: No new packages needed — all dependencies already installed.


Architecture Patterns

New files slot into existing structure:

backend/src/
├── models/
│   ├── migrations/
│   │   └── 012_create_monitoring_tables.sql   # NEW
│   ├── HealthCheckModel.ts                    # NEW
│   ├── AlertEventModel.ts                     # NEW
│   └── index.ts                               # UPDATE: add exports

Migration numbering: Current highest is 011_create_vector_database_tables.sql. Next must be 012_.

Pattern 1: SQL Migration File

What: CREATE TABLE IF NOT EXISTS with CHECK constraints, followed by CREATE INDEX IF NOT EXISTS for every planned query pattern. When to use: All schema changes — always forward-only.

-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
-- Migration: Create monitoring tables
-- Created: 2026-02-24

CREATE TABLE IF NOT EXISTS service_health_checks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    service_name VARCHAR(100) NOT NULL,
    status TEXT NOT NULL CHECK (status IN ('healthy', 'degraded', 'down')),
    latency_ms INTEGER,
    checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    probe_details JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_service_health_checks_created_at ON service_health_checks(created_at);
CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_name ON service_health_checks(service_name);
CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_created ON service_health_checks(service_name, created_at);

CREATE TABLE IF NOT EXISTS alert_events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    service_name VARCHAR(100) NOT NULL,
    alert_type TEXT NOT NULL CHECK (alert_type IN ('service_down', 'service_degraded', 'recovery')),
    status TEXT NOT NULL CHECK (status IN ('active', 'resolved')),
    details JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    resolved_at TIMESTAMP WITH TIME ZONE
);

CREATE INDEX IF NOT EXISTS idx_alert_events_created_at ON alert_events(created_at);
CREATE INDEX IF NOT EXISTS idx_alert_events_status ON alert_events(status);
CREATE INDEX IF NOT EXISTS idx_alert_events_service_name ON alert_events(service_name);

-- RLS
ALTER TABLE service_health_checks ENABLE ROW LEVEL SECURITY;
ALTER TABLE alert_events ENABLE ROW LEVEL SECURITY;

-- Service role bypasses RLS automatically in Supabase;
-- anon/authenticated roles get no access by default when RLS is enabled with no policies
-- Add explicit deny-all or admin-only policies if needed

Pattern 2: TypeScript Model Class (Static Methods)

What: Exported class with static async methods. Each method calls getSupabaseServiceClient() inline (not cached at module level for service client). Uses logger from utils/logger. Validates input before writing. When to use: All model methods — matches DocumentModel.ts exactly.

// Source: backend/src/models/DocumentModel.ts (verified pattern)
import { getSupabaseServiceClient } from '../config/supabase';
import { logger } from '../utils/logger';

export interface ServiceHealthCheck {
  id: string;
  service_name: string;
  status: 'healthy' | 'degraded' | 'down';
  latency_ms?: number;
  checked_at: string;
  probe_details?: Record<string, unknown>;
  created_at: string;
}

export interface CreateHealthCheckData {
  service_name: string;
  status: 'healthy' | 'degraded' | 'down';
  latency_ms?: number;
  probe_details?: Record<string, unknown>;
}

export class HealthCheckModel {
  static async create(data: CreateHealthCheckData): Promise<ServiceHealthCheck> {
    // Input validation
    if (!data.service_name) throw new Error('service_name is required');
    if (!['healthy', 'degraded', 'down'].includes(data.status)) {
      throw new Error(`Invalid status: ${data.status}`);
    }

    try {
      const supabase = getSupabaseServiceClient();
      const { data: record, error } = await supabase
        .from('service_health_checks')
        .insert({
          service_name: data.service_name,
          status: data.status,
          latency_ms: data.latency_ms,
          probe_details: data.probe_details,
        })
        .select()
        .single();

      if (error) {
        logger.error('Error creating health check', { error: error.message, data });
        throw new Error(`Failed to create health check: ${error.message}`);
      }
      if (!record) throw new Error('Failed to create health check: No data returned');

      logger.info('Health check recorded', { service: data.service_name, status: data.status });
      return record;
    } catch (error) {
      logger.error('Error in HealthCheckModel.create', {
        error: error instanceof Error ? error.message : String(error),
        data,
      });
      throw error;
    }
  }
}

Pattern 3: Running the Migration

What: npm run db:migrate calls ts-node src/scripts/setup-database.ts, which invokes DatabaseMigrator.migrate(). The migrator reads all .sql files from migrations/ sorted alphabetically, checks the migrations table for each, and executes new ones via supabase.rpc('exec_sql', { sql }).

Important: The migrator skips already-executed migrations by ID (filename without .sql). This is the idempotency mechanism — re-running npm run db:migrate is safe.

Anti-Patterns to Avoid

  • Using console.log in model files: Always use logger from ../utils/logger. The project enforces this in .cursorrules.
  • Using getPostgresPool() for monitoring writes: Only needed for critical-path operations that hit PostgREST cache issues (ProcessingJobModel is the one exception). Monitoring writes are fire-and-forget; PostgREST is fine.
  • Storing getSupabaseServiceClient() at module level: The service client function creates a new client each call (no caching). Call it inside each method. (The anon client getSupabaseClient() does cache, but monitoring models use the service client.)
  • Using any type in TypeScript interfaces: Strict TypeScript — use Record<string, unknown> for JSONB columns, or specific typed interfaces.
  • Skipping CREATE TABLE IF NOT EXISTS / CREATE INDEX IF NOT EXISTS: All migration DDL in this codebase uses IF NOT EXISTS. Never omit it.
  • Writing a rollback/down script: Forward-only migrations only. If schema needs fixing, write 013_fix_...sql.
  • Numbering the migration 11_ or 11: Must be zero-padded to three digits: 012_.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Migration tracking / idempotency Custom migration table logic Existing DatabaseMigrator in migrate.ts Already handles migrations table, skip-if-executed logic, error logging
Supabase client instantiation New client setup getSupabaseServiceClient() from config/supabase.ts Handles auth, timeout, headers; INFR-04 requires no new DB connections
Input validation before write Runtime type guards Manual validation in model (project pattern) DocumentModel and ProcessingJobModel both validate before writing; adds defense in depth
Logging Direct console.log or custom logger logger from utils/logger Winston-backed, structured JSON, correlation ID support

Key insight: The migration infrastructure is already production-ready. Adding two SQL files and two TypeScript model classes is additive work, not infrastructure work.


Common Pitfalls

Pitfall 1: Migration Numbering Gap or Conflict

What goes wrong: A migration numbered 011_ or 012_ conflicts with an existing file, or the migration runs out of alphabetical order because numbering is inconsistent. Why it happens: Not checking what the current highest number is before creating a new file. How to avoid: Verify current highest (011_create_vector_database_tables.sql) — new file must be 012_create_monitoring_tables.sql. Warning signs: Migration runs but skips one of the new tables; alphabetical sort puts new file before existing ones.

Pitfall 2: RLS Blocks Service-Role Reads

What goes wrong: After enabling RLS, getSupabaseServiceClient() (which uses the service role key) cannot read or write rows. Why it happens: Misunderstanding of how Supabase RLS interacts with the service role. Fact (HIGH confidence, Supabase docs): The service role key bypasses RLS by default. Enabling RLS only restricts the anon key and authenticated-user JWTs. So getSupabaseServiceClient() will work fine with RLS enabled and no policies defined. How to avoid: No special policies needed for service-role access. If explicit policies are desired for documentation clarity, CREATE POLICY "service_role_all" ON table USING (true) with TO service_role works, but it is not required. Warning signs: Model methods return empty results or permission errors after migration runs.

Pitfall 3: JSONB Column Typing

What goes wrong: TypeScript probe_details typed as any, then strict lint rules fail. Why it happens: JSONB has no enforced schema — the path of least resistance is any. How to avoid: Type as Record<string, unknown> | null or define a specific interface for common probe shapes. Accept that the TypeScript type is a superset of what the DB stores. Warning signs: eslint errors on no-explicit-any rule (project has strict TypeScript).

Pitfall 4: latency_ms Integer Overflow

What goes wrong: PostgreSQL INTEGER maxes out at ~2.1 billion. For latency in milliseconds this is impossible to overflow (2.1B ms = 24 days). But for metrics that could store large values, BIGINT is safer. Why it happens: Defaulting to INTEGER without considering the value range. How to avoid: INTEGER is correct for latency_ms (milliseconds always fit). No overflow risk here. Warning signs: N/A for latency; only relevant if storing epoch timestamps or byte counts in integer columns.

Pitfall 5: Missing checked_at vs created_at Distinction

What goes wrong: Using only created_at for health checks loses the distinction between "when the probe ran" and "when the row was inserted". These are usually the same, but could differ if inserts are batched or retried. Why it happens: Copying the created_at = DEFAULT CURRENT_TIMESTAMP pattern without thinking about the probe time. How to avoid: Include an explicit checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP column on service_health_checks. Let created_at be the insert time. When recording a health check, set checked_at explicitly to the moment the probe was made. The created_at index still covers retention queries; checked_at is the semantically accurate probe time. Warning signs: Dashboard shows "time checked" as several seconds after the actual API call.


Code Examples

Verified patterns from codebase:

Migration: Full SQL File Pattern

-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
-- Confirmed patterns: CREATE TABLE IF NOT EXISTS, UUID PK, TEXT CHECK constraint,
--   TIMESTAMP WITH TIME ZONE, CREATE INDEX IF NOT EXISTS on created_at

CREATE TABLE IF NOT EXISTS processing_jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_processing_jobs_created_at ON processing_jobs(created_at);

DatabaseMigrator: How It Executes SQL

// Source: backend/src/models/migrate.ts (verified)
// Migration executes via:
const { error } = await supabase.rpc('exec_sql', { sql: migration.sql });
// Idempotency: checks `migrations` table by migration ID (filename without .sql)
// Run via: npm run db:migrate → ts-node src/scripts/setup-database.ts

Supabase Service Client: Per-Method Call Pattern

// Source: backend/src/config/supabase.ts (verified)
// getSupabaseServiceClient() creates a new client each call — no singleton
export const getSupabaseServiceClient = (): SupabaseClient => {
  // Creates new createClient(...) each invocation
};

// Correct usage in model methods:
static async create(data: CreateData): Promise<Row> {
  const supabase = getSupabaseServiceClient(); // Called inside method, not at module level
  const { data: record, error } = await supabase.from('table').insert(data).select().single();
}

Model: Error Handling Pattern

// Source: backend/src/models/ProcessingJobModel.ts (verified)
// Error check pattern used throughout:
if (error) {
  if (error.code === 'PGRST116') {
    return null; // Not found — not an error
  }
  logger.error('Error doing X', { error, id });
  throw new Error(`Failed to do X: ${error.message}`);
}

Model Index Export

// Source: backend/src/models/index.ts (verified)
// New models must be added here:
export { HealthCheckModel } from './HealthCheckModel';
export { AlertEventModel } from './AlertEventModel';

State of the Art

Old Approach Current Approach When Changed Impact
In-memory uploadMonitoringService (UploadMonitoringService class with EventEmitter) Persistent Supabase tables Phase 1 introduces this Data survives cold starts; enables 30-day retention; enables dashboard queries
any type in model interfaces Record<string, unknown> or typed interface Project baseline Strict TypeScript requirement

Deprecated/outdated in this project:

  • uploadMonitoringService.ts in-memory storage: Still used by existing routes but being superseded by persistent tables. Phase 1 does NOT modify uploadMonitoringService.ts — that is Phase 2+ work. This phase only creates the tables and model classes.

Open Questions

  1. RLS Policy Detail: Should we create explicit service-role policies or rely on implicit bypass?

    • What we know: Supabase service role key bypasses RLS by default. No policy needed for service-role access to work.
    • What's unclear: The CONTEXT.md says "admin-only access, matching existing security patterns" — but no existing migration uses RLS, so there is no project pattern to match exactly.
    • Recommendation: Enable RLS (ALTER TABLE ... ENABLE ROW LEVEL SECURITY) without creating any policies initially. The service-role key bypass is sufficient for all model-layer reads/writes. Add explicit policies in Phase 3 when admin API routes are added and authenticated user access may be needed.
  2. performance_metrics table: Use or ignore?

    • What we know: 010_add_performance_metrics_and_events.sql created a performance_metrics table but CONTEXT.md notes nothing writes to it. The new service_health_checks table is a different concept (external API health vs. internal processing metrics).
    • What's unclear: Whether Phase 1 should verify the performance_metrics schema to avoid future confusion.
    • Recommendation: No action needed in Phase 1. The CONTEXT.md note "verify its schema before building on it" is a Phase 2+ concern when writing to it. Phase 1 creates new tables only.
  3. checked_at column: Explicit or use created_at?

    • What we know: created_at has the index required by INFR-01. Adding checked_at as a separate column is semantically better (Pitfall 5 above).
    • What's unclear: Whether the planner wants both columns or a single created_at.
    • Recommendation: Include both — checked_at (explicitly set when probe runs) and created_at (DB default). Index only created_at as required by INFR-01. This is Claude's discretion and adds minimal complexity.

Sources

Primary (HIGH confidence)

  • backend/src/models/migrate.ts — Verified: migration execution mechanism, idempotency via migrations table, supabase.rpc('exec_sql') call
  • backend/src/models/migrations/005_create_processing_jobs_table.sql — Verified: CREATE TABLE IF NOT EXISTS, TEXT CHECK, UUID PK, CREATE INDEX IF NOT EXISTS, TIMESTAMP WITH TIME ZONE
  • backend/src/models/migrations/010_add_performance_metrics_and_events.sql — Verified: JSONB column pattern, index naming convention
  • backend/src/config/supabase.ts — Verified: getSupabaseServiceClient() creates new client per call (no caching); getPostgresPool() exists but for critical-path only
  • backend/src/models/DocumentModel.ts — Verified: static class pattern, getSupabaseServiceClient() inside methods, logger.error() with structured object, retry pattern
  • backend/src/models/ProcessingJobModel.ts — Verified: PGRST116 not-found handling, static methods, logger usage
  • backend/src/models/index.ts — Verified: export pattern for new models
  • backend/package.json — Verified: npm run db:migrate runs ts-node src/scripts/setup-database.ts; npm test runs vitest run
  • backend/vitest.config.ts — Verified: Vitest framework, src/__tests__/**/*.{test,spec}.{ts,js} glob, 30s timeout
  • .planning/config.json — Verified: workflow.nyquist_validation not present → Validation Architecture section omitted

Secondary (MEDIUM confidence)

  • Supabase RLS service-role bypass behavior: Service role key bypasses RLS; this is standard Supabase behavior documented at supabase.com/docs. Confidence: HIGH from training data, not directly verified via web fetch in this session.

Tertiary (LOW confidence)

  • None — all critical claims verified against codebase directly.

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all libraries already in codebase, verified in package.json and import statements
  • Architecture: HIGH — migration file structure, model class pattern, and export mechanism all verified from actual source files
  • Pitfalls: HIGH for migration numbering (files counted directly); HIGH for RLS service-role bypass (standard Supabase behavior); MEDIUM for checked_at recommendation (judgement call, not a verified bug)

Research date: 2026-02-24 Valid until: 2026-03-25 (30 days — Supabase and TypeScript patterns are stable)