Archive milestone artifacts (roadmap, requirements, audit, phase directories) to .planning/milestones/. Evolve PROJECT.md with validated requirements and decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 KiB
Phase 1: Data Foundation - Research
Researched: 2026-02-24 Domain: PostgreSQL schema design, Supabase PostgREST model layer, TypeScript static class pattern Confidence: HIGH
<user_constraints>
User Constraints (from CONTEXT.md)
Locked Decisions
Migration approach
- Use the existing
DatabaseMigratorclass inbackend/src/models/migrate.ts - New
.sqlfiles go insrc/models/migrations/, run withnpm run db:migrate - The migrator tracks applied migrations in a
migrationstable — handles idempotency - Forward-only migrations (no rollback/down scripts). If something needs fixing, write a new migration.
- Migrations execute via
supabase.rpc('exec_sql', { sql })— works with cloud Supabase from any environment including Firebase
Schema details
- Status fields use TEXT with CHECK constraints (e.g.,
CHECK (status IN ('healthy','degraded','down'))) — easy to extend, no enum type management - Table names are descriptive, matching existing style:
service_health_checks,alert_events(likeprocessing_jobs,document_chunks) - Include JSONB
probe_details/detailscolumns for flexible metadata per service (response codes, error specifics) without future schema changes - All tables get indexes on
created_at(required for 30-day retention queries and dashboard time-range filters) - Enable Row Level Security on new tables — admin-only access, matching existing security patterns
Model layer pattern
- One model file per table:
HealthCheckModel.ts,AlertEventModel.ts - Static methods on model classes (e.g.,
AlertEventModel.create(),AlertEventModel.findActive()) — matchesDocumentModel.tspattern - Use
getSupabaseServiceClient()(PostgREST) for all monitoring reads/writes — monitoring is not on the critical processing path, so no need for direct PostgreSQL pool - Input validation in the model layer before writing (defense in depth alongside DB CHECK constraints)
Claude's Discretion
- Exact column types for non-status fields (INTEGER vs BIGINT for latency_ms, etc.)
- Whether to create a shared base model or keep models independent
- Index strategy beyond created_at (e.g., composite indexes on service_name + created_at)
- Winston logging patterns within model methods
Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope </user_constraints>
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| INFR-01 | Database migrations create service_health_checks and alert_events tables with indexes on created_at |
Migration file naming convention (012_), CREATE TABLE IF NOT EXISTS + CREATE INDEX IF NOT EXISTS patterns from migration 005/010; TEXT+CHECK for status; JSONB for probe_details; TIMESTAMP WITH TIME ZONE for created_at |
| INFR-04 | Analytics writes use existing Supabase connection, no new database infrastructure | getSupabaseServiceClient() already exported from config/supabase.ts; PostgREST .from().insert().select().single() pattern confirmed in DocumentModel.ts; monitoring path is not critical so no need for direct pg pool |
| </phase_requirements> |
Summary
Phase 1 is a pure database + model layer task. No services, routes, or frontend changes. The existing codebase has a well-established pattern: SQL migration files in backend/src/models/migrations/ (sequentially numbered), a DatabaseMigrator class that tracks and runs them via supabase.rpc('exec_sql'), and TypeScript model classes with static methods using getSupabaseServiceClient(). All of this exists and works — the task is to follow it precisely.
The most important finding is that getSupabaseServiceClient() creates a new client on every call (no singleton caching, unlike getSupabaseClient()). This is intentional for the service-key client but means model methods must call it per-operation, not store it at module level. Existing models follow both patterns — ProcessingJobModel.ts calls getSupabaseServiceClient() inline where needed, while DocumentModel.ts uses the same inline-call approach. Either is fine; inline-per-method is most consistent.
The codebase has no RLS SQL in any existing migration — existing tables pre-date or omit RLS. The CONTEXT.md requires RLS on the new tables, so this is new territory within this project. The pattern is standard Supabase RLS (ALTER TABLE ... ENABLE ROW LEVEL SECURITY + CREATE POLICY) and well-documented, but it is new to these migrations and worth verifying against the actual Supabase RLS policy syntax for service-role key bypass.
Primary recommendation: Create migration 012_create_monitoring_tables.sql following the pattern of 005_create_processing_jobs_table.sql, then create HealthCheckModel.ts and AlertEventModel.ts following the DocumentModel.ts static-class pattern, using getSupabaseServiceClient() per method.
Standard Stack
Core
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
@supabase/supabase-js |
Already installed | PostgREST client for model layer reads/writes | Locked: project uses Supabase exclusively; getSupabaseServiceClient() already in config/supabase.ts |
| PostgreSQL (via Supabase) | Cloud-managed | Table storage, indexes, CHECK constraints, RLS | Already the only database; no new infrastructure |
| TypeScript | Already installed | Model type definitions | Project-wide strict TypeScript |
| Winston logger | Already installed | Logging within model methods | backend/src/utils/logger.ts — NEVER console.log per .cursorrules |
Supporting
| Library | Version | Purpose | When to Use |
|---|---|---|---|
pg (Pool) |
Already installed | Direct PostgreSQL for critical-path writes | NOT needed here — monitoring is not critical path; use PostgREST only |
Alternatives Considered
| Instead of | Could Use | Tradeoff |
|---|---|---|
getSupabaseServiceClient() |
getPostgresPool() |
Direct pg bypasses PostgREST cache (only relevant for critical-path inserts); monitoring writes can tolerate PostgREST; service client is simpler and sufficient |
| TEXT + CHECK constraint | PostgreSQL ENUM | ENUMs require CREATE TYPE and are harder to extend; TEXT+CHECK confirmed pattern in processing_jobs, agent_executions, users tables |
| Separate model files | Shared BaseModel class | A shared base would add indirection with minimal benefit for two small models; keep independent, consistent with existing models |
Installation: No new packages needed — all dependencies already installed.
Architecture Patterns
Recommended Project Structure
New files slot into existing structure:
backend/src/
├── models/
│ ├── migrations/
│ │ └── 012_create_monitoring_tables.sql # NEW
│ ├── HealthCheckModel.ts # NEW
│ ├── AlertEventModel.ts # NEW
│ └── index.ts # UPDATE: add exports
Migration numbering: Current highest is 011_create_vector_database_tables.sql. Next must be 012_.
Pattern 1: SQL Migration File
What: CREATE TABLE IF NOT EXISTS with CHECK constraints, followed by CREATE INDEX IF NOT EXISTS for every planned query pattern.
When to use: All schema changes — always forward-only.
-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
-- Migration: Create monitoring tables
-- Created: 2026-02-24
CREATE TABLE IF NOT EXISTS service_health_checks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_name VARCHAR(100) NOT NULL,
status TEXT NOT NULL CHECK (status IN ('healthy', 'degraded', 'down')),
latency_ms INTEGER,
checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
probe_details JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_service_health_checks_created_at ON service_health_checks(created_at);
CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_name ON service_health_checks(service_name);
CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_created ON service_health_checks(service_name, created_at);
CREATE TABLE IF NOT EXISTS alert_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_name VARCHAR(100) NOT NULL,
alert_type TEXT NOT NULL CHECK (alert_type IN ('service_down', 'service_degraded', 'recovery')),
status TEXT NOT NULL CHECK (status IN ('active', 'resolved')),
details JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
resolved_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX IF NOT EXISTS idx_alert_events_created_at ON alert_events(created_at);
CREATE INDEX IF NOT EXISTS idx_alert_events_status ON alert_events(status);
CREATE INDEX IF NOT EXISTS idx_alert_events_service_name ON alert_events(service_name);
-- RLS
ALTER TABLE service_health_checks ENABLE ROW LEVEL SECURITY;
ALTER TABLE alert_events ENABLE ROW LEVEL SECURITY;
-- Service role bypasses RLS automatically in Supabase;
-- anon/authenticated roles get no access by default when RLS is enabled with no policies
-- Add explicit deny-all or admin-only policies if needed
Pattern 2: TypeScript Model Class (Static Methods)
What: Exported class with static async methods. Each method calls getSupabaseServiceClient() inline (not cached at module level for service client). Uses logger from utils/logger. Validates input before writing.
When to use: All model methods — matches DocumentModel.ts exactly.
// Source: backend/src/models/DocumentModel.ts (verified pattern)
import { getSupabaseServiceClient } from '../config/supabase';
import { logger } from '../utils/logger';
export interface ServiceHealthCheck {
id: string;
service_name: string;
status: 'healthy' | 'degraded' | 'down';
latency_ms?: number;
checked_at: string;
probe_details?: Record<string, unknown>;
created_at: string;
}
export interface CreateHealthCheckData {
service_name: string;
status: 'healthy' | 'degraded' | 'down';
latency_ms?: number;
probe_details?: Record<string, unknown>;
}
export class HealthCheckModel {
static async create(data: CreateHealthCheckData): Promise<ServiceHealthCheck> {
// Input validation
if (!data.service_name) throw new Error('service_name is required');
if (!['healthy', 'degraded', 'down'].includes(data.status)) {
throw new Error(`Invalid status: ${data.status}`);
}
try {
const supabase = getSupabaseServiceClient();
const { data: record, error } = await supabase
.from('service_health_checks')
.insert({
service_name: data.service_name,
status: data.status,
latency_ms: data.latency_ms,
probe_details: data.probe_details,
})
.select()
.single();
if (error) {
logger.error('Error creating health check', { error: error.message, data });
throw new Error(`Failed to create health check: ${error.message}`);
}
if (!record) throw new Error('Failed to create health check: No data returned');
logger.info('Health check recorded', { service: data.service_name, status: data.status });
return record;
} catch (error) {
logger.error('Error in HealthCheckModel.create', {
error: error instanceof Error ? error.message : String(error),
data,
});
throw error;
}
}
}
Pattern 3: Running the Migration
What: npm run db:migrate calls ts-node src/scripts/setup-database.ts, which invokes DatabaseMigrator.migrate(). The migrator reads all .sql files from migrations/ sorted alphabetically, checks the migrations table for each, and executes new ones via supabase.rpc('exec_sql', { sql }).
Important: The migrator skips already-executed migrations by ID (filename without .sql). This is the idempotency mechanism — re-running npm run db:migrate is safe.
Anti-Patterns to Avoid
- Using
console.login model files: Always useloggerfrom../utils/logger. The project enforces this in.cursorrules. - Using
getPostgresPool()for monitoring writes: Only needed for critical-path operations that hit PostgREST cache issues (ProcessingJobModelis the one exception). Monitoring writes are fire-and-forget; PostgREST is fine. - Storing
getSupabaseServiceClient()at module level: The service client function creates a new client each call (no caching). Call it inside each method. (The anon clientgetSupabaseClient()does cache, but monitoring models use the service client.) - Using
anytype in TypeScript interfaces: Strict TypeScript — useRecord<string, unknown>for JSONB columns, or specific typed interfaces. - Skipping
CREATE TABLE IF NOT EXISTS/CREATE INDEX IF NOT EXISTS: All migration DDL in this codebase usesIF NOT EXISTS. Never omit it. - Writing a rollback/down script: Forward-only migrations only. If schema needs fixing, write
013_fix_...sql. - Numbering the migration
11_or11: Must be zero-padded to three digits:012_.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Migration tracking / idempotency | Custom migration table logic | Existing DatabaseMigrator in migrate.ts |
Already handles migrations table, skip-if-executed logic, error logging |
| Supabase client instantiation | New client setup | getSupabaseServiceClient() from config/supabase.ts |
Handles auth, timeout, headers; INFR-04 requires no new DB connections |
| Input validation before write | Runtime type guards | Manual validation in model (project pattern) | DocumentModel and ProcessingJobModel both validate before writing; adds defense in depth |
| Logging | Direct console.log or custom logger |
logger from utils/logger |
Winston-backed, structured JSON, correlation ID support |
Key insight: The migration infrastructure is already production-ready. Adding two SQL files and two TypeScript model classes is additive work, not infrastructure work.
Common Pitfalls
Pitfall 1: Migration Numbering Gap or Conflict
What goes wrong: A migration numbered 011_ or 012_ conflicts with an existing file, or the migration runs out of alphabetical order because numbering is inconsistent.
Why it happens: Not checking what the current highest number is before creating a new file.
How to avoid: Verify current highest (011_create_vector_database_tables.sql) — new file must be 012_create_monitoring_tables.sql.
Warning signs: Migration runs but skips one of the new tables; alphabetical sort puts new file before existing ones.
Pitfall 2: RLS Blocks Service-Role Reads
What goes wrong: After enabling RLS, getSupabaseServiceClient() (which uses the service role key) cannot read or write rows.
Why it happens: Misunderstanding of how Supabase RLS interacts with the service role. Fact (HIGH confidence, Supabase docs): The service role key bypasses RLS by default. Enabling RLS only restricts the anon key and authenticated-user JWTs. So getSupabaseServiceClient() will work fine with RLS enabled and no policies defined.
How to avoid: No special policies needed for service-role access. If explicit policies are desired for documentation clarity, CREATE POLICY "service_role_all" ON table USING (true) with TO service_role works, but it is not required.
Warning signs: Model methods return empty results or permission errors after migration runs.
Pitfall 3: JSONB Column Typing
What goes wrong: TypeScript probe_details typed as any, then strict lint rules fail.
Why it happens: JSONB has no enforced schema — the path of least resistance is any.
How to avoid: Type as Record<string, unknown> | null or define a specific interface for common probe shapes. Accept that the TypeScript type is a superset of what the DB stores.
Warning signs: eslint errors on no-explicit-any rule (project has strict TypeScript).
Pitfall 4: latency_ms Integer Overflow
What goes wrong: PostgreSQL INTEGER maxes out at ~2.1 billion. For latency in milliseconds this is impossible to overflow (2.1B ms = 24 days). But for metrics that could store large values, BIGINT is safer.
Why it happens: Defaulting to INTEGER without considering the value range.
How to avoid: INTEGER is correct for latency_ms (milliseconds always fit). No overflow risk here.
Warning signs: N/A for latency; only relevant if storing epoch timestamps or byte counts in integer columns.
Pitfall 5: Missing checked_at vs created_at Distinction
What goes wrong: Using only created_at for health checks loses the distinction between "when the probe ran" and "when the row was inserted". These are usually the same, but could differ if inserts are batched or retried.
Why it happens: Copying the created_at = DEFAULT CURRENT_TIMESTAMP pattern without thinking about the probe time.
How to avoid: Include an explicit checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP column on service_health_checks. Let created_at be the insert time. When recording a health check, set checked_at explicitly to the moment the probe was made. The created_at index still covers retention queries; checked_at is the semantically accurate probe time.
Warning signs: Dashboard shows "time checked" as several seconds after the actual API call.
Code Examples
Verified patterns from codebase:
Migration: Full SQL File Pattern
-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
-- Confirmed patterns: CREATE TABLE IF NOT EXISTS, UUID PK, TEXT CHECK constraint,
-- TIMESTAMP WITH TIME ZONE, CREATE INDEX IF NOT EXISTS on created_at
CREATE TABLE IF NOT EXISTS processing_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_processing_jobs_created_at ON processing_jobs(created_at);
DatabaseMigrator: How It Executes SQL
// Source: backend/src/models/migrate.ts (verified)
// Migration executes via:
const { error } = await supabase.rpc('exec_sql', { sql: migration.sql });
// Idempotency: checks `migrations` table by migration ID (filename without .sql)
// Run via: npm run db:migrate → ts-node src/scripts/setup-database.ts
Supabase Service Client: Per-Method Call Pattern
// Source: backend/src/config/supabase.ts (verified)
// getSupabaseServiceClient() creates a new client each call — no singleton
export const getSupabaseServiceClient = (): SupabaseClient => {
// Creates new createClient(...) each invocation
};
// Correct usage in model methods:
static async create(data: CreateData): Promise<Row> {
const supabase = getSupabaseServiceClient(); // Called inside method, not at module level
const { data: record, error } = await supabase.from('table').insert(data).select().single();
}
Model: Error Handling Pattern
// Source: backend/src/models/ProcessingJobModel.ts (verified)
// Error check pattern used throughout:
if (error) {
if (error.code === 'PGRST116') {
return null; // Not found — not an error
}
logger.error('Error doing X', { error, id });
throw new Error(`Failed to do X: ${error.message}`);
}
Model Index Export
// Source: backend/src/models/index.ts (verified)
// New models must be added here:
export { HealthCheckModel } from './HealthCheckModel';
export { AlertEventModel } from './AlertEventModel';
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
In-memory uploadMonitoringService (UploadMonitoringService class with EventEmitter) |
Persistent Supabase tables | Phase 1 introduces this | Data survives cold starts; enables 30-day retention; enables dashboard queries |
any type in model interfaces |
Record<string, unknown> or typed interface |
Project baseline | Strict TypeScript requirement |
Deprecated/outdated in this project:
uploadMonitoringService.tsin-memory storage: Still used by existing routes but being superseded by persistent tables. Phase 1 does NOT modifyuploadMonitoringService.ts— that is Phase 2+ work. This phase only creates the tables and model classes.
Open Questions
-
RLS Policy Detail: Should we create explicit service-role policies or rely on implicit bypass?
- What we know: Supabase service role key bypasses RLS by default. No policy needed for service-role access to work.
- What's unclear: The CONTEXT.md says "admin-only access, matching existing security patterns" — but no existing migration uses RLS, so there is no project pattern to match exactly.
- Recommendation: Enable RLS (
ALTER TABLE ... ENABLE ROW LEVEL SECURITY) without creating any policies initially. The service-role key bypass is sufficient for all model-layer reads/writes. Add explicit policies in Phase 3 when admin API routes are added and authenticated user access may be needed.
-
performance_metricstable: Use or ignore?- What we know:
010_add_performance_metrics_and_events.sqlcreated aperformance_metricstable but CONTEXT.md notes nothing writes to it. The newservice_health_checkstable is a different concept (external API health vs. internal processing metrics). - What's unclear: Whether Phase 1 should verify the
performance_metricsschema to avoid future confusion. - Recommendation: No action needed in Phase 1. The CONTEXT.md note "verify its schema before building on it" is a Phase 2+ concern when writing to it. Phase 1 creates new tables only.
- What we know:
-
checked_atcolumn: Explicit or usecreated_at?- What we know:
created_athas the index required by INFR-01. Addingchecked_atas a separate column is semantically better (Pitfall 5 above). - What's unclear: Whether the planner wants both columns or a single
created_at. - Recommendation: Include both —
checked_at(explicitly set when probe runs) andcreated_at(DB default). Index onlycreated_atas required by INFR-01. This is Claude's discretion and adds minimal complexity.
- What we know:
Sources
Primary (HIGH confidence)
backend/src/models/migrate.ts— Verified: migration execution mechanism, idempotency viamigrationstable,supabase.rpc('exec_sql')callbackend/src/models/migrations/005_create_processing_jobs_table.sql— Verified:CREATE TABLE IF NOT EXISTS, TEXT CHECK, UUID PK,CREATE INDEX IF NOT EXISTS,TIMESTAMP WITH TIME ZONEbackend/src/models/migrations/010_add_performance_metrics_and_events.sql— Verified: JSONB column pattern, index naming conventionbackend/src/config/supabase.ts— Verified:getSupabaseServiceClient()creates new client per call (no caching);getPostgresPool()exists but for critical-path onlybackend/src/models/DocumentModel.ts— Verified: static class pattern,getSupabaseServiceClient()inside methods,logger.error()with structured object, retry patternbackend/src/models/ProcessingJobModel.ts— Verified:PGRST116not-found handling, static methods, logger usagebackend/src/models/index.ts— Verified: export pattern for new modelsbackend/package.json— Verified:npm run db:migraterunsts-node src/scripts/setup-database.ts;npm testrunsvitest runbackend/vitest.config.ts— Verified: Vitest framework,src/__tests__/**/*.{test,spec}.{ts,js}glob, 30s timeout.planning/config.json— Verified:workflow.nyquist_validationnot present → Validation Architecture section omitted
Secondary (MEDIUM confidence)
- Supabase RLS service-role bypass behavior: Service role key bypasses RLS; this is standard Supabase behavior documented at supabase.com/docs. Confidence: HIGH from training data, not directly verified via web fetch in this session.
Tertiary (LOW confidence)
- None — all critical claims verified against codebase directly.
Metadata
Confidence breakdown:
- Standard stack: HIGH — all libraries already in codebase, verified in package.json and import statements
- Architecture: HIGH — migration file structure, model class pattern, and export mechanism all verified from actual source files
- Pitfalls: HIGH for migration numbering (files counted directly); HIGH for RLS service-role bypass (standard Supabase behavior); MEDIUM for
checked_atrecommendation (judgement call, not a verified bug)
Research date: 2026-02-24 Valid until: 2026-03-25 (30 days — Supabase and TypeScript patterns are stable)