feat: add weekly CIM report email every Thursday at noon ET

Adds a Firebase scheduled function (sendWeeklyReport) that runs every Thursday at 12:00 America/New_York and emails a CSV attachment to the four BluePoint Capital recipients. The CSV covers all completed documents from the past 7 days with: Date Processed, Company Name, Core Operations Summary, Geography, Deal Source, Industry/Sector, Stated Reason for Sale, LTM Revenue, and LTM EBITDA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: apply numbered list normalization to all PDF text blocks including nested fields
2026-02-25 13:23:31 -05:00 · 2026-02-25 12:26:46 -05:00 · 2026-02-25 12:22:46 -05:00 · 2026-02-25 12:18:07 -05:00 · 2026-02-25 12:10:28 -05:00 · 2026-02-25 12:05:25 -05:00
216 changed files with 47533 additions and 8585 deletions
--- a/.cursorignore
+++ b/.cursorignore
@@ -0,0 +1,78 @@
+# Dependencies
+node_modules/
+**/node_modules/
+
+# Build outputs
+dist/
+**/dist/
+build/
+**/build/
+
+# Log files
+*.log
+logs/
+**/logs/
+backend/logs/
+
+# Environment files
+.env
+.env.local
+.env.*.local
+*.env
+
+# IDE and editor files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS files
+.DS_Store
+Thumbs.db
+
+# Firebase
+.firebase/
+firebase-debug.log
+firestore-debug.log
+ui-debug.log
+
+# Test coverage
+coverage/
+.nyc_output/
+
+# Temporary files
+*.tmp
+*.temp
+.cache/
+
+# Documentation files (exclude from code indexing, but keep in project)
+# These are documentation, not code, so exclude from semantic search
+*.md
+!README.md
+!QUICK_START.md
+
+# Large binary files
+*.pdf
+*.png
+*.jpg
+*.jpeg
+*.gif
+*.ico
+
+# Service account keys (security)
+**/serviceAccountKey.json
+**/*-key.json
+**/*-keys.json
+
+# SQL migration files (include in project but exclude from code indexing)
+backend/sql/*.sql
+
+# Script outputs
+backend/src/scripts/*.js
+backend/scripts/*.js
+
+# TypeScript declaration maps
+*.d.ts.map
+*.js.map
+
--- a/.cursorrules
+++ b/.cursorrules
@@ -0,0 +1,340 @@
+# CIM Document Processor - Cursor Rules
+
+## Project Overview
+
+This is an AI-powered document processing system for analyzing Confidential Information Memorandums (CIMs). The system extracts text from PDFs, processes them through LLM services (Claude AI/OpenAI), generates structured analysis, and creates summary PDFs.
+
+**Core Purpose**: Automated processing and analysis of CIM documents using Google Document AI, vector embeddings, and LLM services.
+
+## Tech Stack
+
+### Backend
+- **Runtime**: Node.js 18+ with TypeScript
+- **Framework**: Express.js
+- **Database**: Supabase (PostgreSQL + Vector Database)
+- **Storage**: Google Cloud Storage (primary), Firebase Storage (fallback)
+- **AI Services**: 
+  - Google Document AI (text extraction)
+  - Anthropic Claude (primary LLM)
+  - OpenAI (fallback LLM)
+  - OpenRouter (LLM routing)
+- **Authentication**: Firebase Auth
+- **Deployment**: Firebase Functions v2
+
+### Frontend
+- **Framework**: React 18 + TypeScript
+- **Build Tool**: Vite
+- **HTTP Client**: Axios
+- **Routing**: React Router
+- **Styling**: Tailwind CSS
+
+## Critical Rules
+
+### TypeScript Standards
+- **ALWAYS** use strict TypeScript types - avoid `any` type
+- Use proper type definitions from `backend/src/types/` and `frontend/src/types/`
+- Enable `noImplicitAny: true` in new code (currently disabled in tsconfig.json for legacy reasons)
+- Use interfaces for object shapes, types for unions/primitives
+- Prefer `unknown` over `any` when type is truly unknown
+
+### Logging Standards
+- **ALWAYS** use Winston logger from `backend/src/utils/logger.ts`
+- Use `StructuredLogger` class for operations with correlation IDs
+- Log levels:
+  - `logger.debug()` - Detailed diagnostic info
+  - `logger.info()` - Normal operations
+  - `logger.warn()` - Warning conditions
+  - `logger.error()` - Error conditions with context
+- Include correlation IDs for request tracing
+- Log structured data: `logger.error('Message', { key: value, error: error.message })`
+- Never use `console.log` in production code - use logger instead
+
+### Error Handling Patterns
+- **ALWAYS** use try-catch blocks for async operations
+- Include error context: `error instanceof Error ? error.message : String(error)`
+- Log errors with structured data before re-throwing
+- Use existing error handling middleware: `backend/src/middleware/errorHandler.ts`
+- For Firebase/Supabase errors, extract meaningful messages from error objects
+- Retry patterns: Use exponential backoff for external API calls (see `llmService.ts` for examples)
+
+### Service Architecture
+- Services should be in `backend/src/services/`
+- Use dependency injection patterns where possible
+- Services should handle their own errors and log appropriately
+- Reference existing services before creating new ones:
+  - `jobQueueService.ts` - Background job processing
+  - `unifiedDocumentProcessor.ts` - Main document processing orchestrator
+  - `llmService.ts` - LLM API interactions
+  - `fileStorageService.ts` - File storage operations
+  - `vectorDatabaseService.ts` - Vector embeddings and search
+
+### Database Patterns
+- Use Supabase client from `backend/src/config/supabase.ts`
+- Models should be in `backend/src/models/`
+- Always handle Row Level Security (RLS) policies
+- Use transactions for multi-step operations
+- Handle connection errors gracefully with retries
+
+### Testing Standards
+- Use Vitest for testing (Jest was removed - see TESTING_STRATEGY_DOCUMENTATION.md)
+- Write tests in `backend/src/__tests__/`
+- Test critical paths first: document upload, authentication, core API endpoints
+- Use TDD approach: write tests first, then implementation
+- Mock external services (Firebase, Supabase, LLM APIs)
+
+## Deprecated Patterns (DO NOT USE)
+
+### Removed Services
+- ❌ `agenticRAGDatabaseService.ts` - Removed, functionality moved to other services
+- ❌ `sessionService.ts` - Removed, use Firebase Auth directly
+- ❌ Direct PostgreSQL connections - Use Supabase client instead
+- ❌ Redis caching - Not used in current architecture
+- ❌ JWT authentication - Use Firebase Auth tokens instead
+
+### Removed Test Patterns
+- ❌ Jest - Use Vitest instead
+- ❌ Tests for PostgreSQL/Redis architecture - Architecture changed to Supabase/Firebase
+
+### Old API Patterns
+- ❌ Direct database queries - Use model methods from `backend/src/models/`
+- ❌ Manual error handling without structured logging - Use StructuredLogger
+
+## Common Bugs to Avoid
+
+### 1. Missing Correlation IDs
+- **Problem**: Logs without correlation IDs make debugging difficult
+- **Solution**: Always use `StructuredLogger` with correlation ID for request-scoped operations
+- **Example**: `const logger = new StructuredLogger(correlationId);`
+
+### 2. Unhandled Promise Rejections
+- **Problem**: Async operations without try-catch cause unhandled rejections
+- **Solution**: Always wrap async operations in try-catch blocks
+- **Check**: `backend/src/index.ts` has global unhandled rejection handler
+
+### 3. Type Assertions Instead of Type Guards
+- **Problem**: Using `as` type assertions can hide type errors
+- **Solution**: Use proper type guards: `error instanceof Error ? error.message : String(error)`
+
+### 4. Missing Error Context
+- **Problem**: Errors logged without sufficient context
+- **Solution**: Include documentId, userId, jobId, and operation context in error logs
+
+### 5. Firebase/Supabase Error Handling
+- **Problem**: Not extracting meaningful error messages from Firebase/Supabase errors
+- **Solution**: Check error.code and error.message, log full error object for debugging
+
+### 6. Vector Search Timeouts
+- **Problem**: Vector search operations can timeout
+- **Solution**: See `backend/sql/fix_vector_search_timeout.sql` for timeout fixes
+- **Reference**: `backend/src/services/vectorDatabaseService.ts`
+
+### 7. Job Processing Timeouts
+- **Problem**: Jobs can exceed 14-minute timeout limit
+- **Solution**: Check `backend/src/services/jobProcessorService.ts` for timeout handling
+- **Pattern**: Jobs should update status before timeout, handle gracefully
+
+### 8. LLM Response Validation
+- **Problem**: LLM responses may not match expected JSON schema
+- **Solution**: Use Zod validation with retry logic (see `llmService.ts` lines 236-450)
+- **Pattern**: 3 retry attempts with improved prompts on validation failure
+
+## Context Management
+
+### Using @ Symbols for Context
+
+**@Files** - Reference specific files:
+- `@backend/src/utils/logger.ts` - For logging patterns
+- `@backend/src/services/jobQueueService.ts` - For job processing patterns
+- `@backend/src/services/llmService.ts` - For LLM API patterns
+- `@backend/src/middleware/errorHandler.ts` - For error handling patterns
+
+**@Codebase** - Semantic search (Chat only):
+- Use for finding similar implementations
+- Example: "How is document processing handled?" → searches entire codebase
+
+**@Folders** - Include entire directories:
+- `@backend/src/services/` - All service files
+- `@backend/src/scripts/` - All debugging scripts
+- `@backend/src/models/` - All database models
+
+**@Lint Errors** - Reference current lint errors (Chat only):
+- Use when fixing linting issues
+
+**@Git** - Access git history:
+- Use to see recent changes and understand context
+
+### Key File References for Common Tasks
+
+**Logging:**
+- `backend/src/utils/logger.ts` - Winston logger and StructuredLogger class
+
+**Job Processing:**
+- `backend/src/services/jobQueueService.ts` - Job queue management
+- `backend/src/services/jobProcessorService.ts` - Job execution logic
+
+**Document Processing:**
+- `backend/src/services/unifiedDocumentProcessor.ts` - Main orchestrator
+- `backend/src/services/documentAiProcessor.ts` - Google Document AI integration
+- `backend/src/services/optimizedAgenticRAGProcessor.ts` - AI-powered analysis
+
+**LLM Services:**
+- `backend/src/services/llmService.ts` - LLM API interactions with retry logic
+
+**File Storage:**
+- `backend/src/services/fileStorageService.ts` - GCS and Firebase Storage operations
+
+**Database:**
+- `backend/src/models/DocumentModel.ts` - Document database operations
+- `backend/src/models/ProcessingJobModel.ts` - Job database operations
+- `backend/src/config/supabase.ts` - Supabase client configuration
+
+**Debugging Scripts:**
+- `backend/src/scripts/` - Collection of debugging and monitoring scripts
+
+## Debugging Scripts Usage
+
+### When to Use Existing Scripts vs Create New Ones
+
+**Use Existing Scripts For:**
+- Monitoring document processing: `monitor-document-processing.ts`
+- Checking job status: `check-current-job.ts`, `track-current-job.ts`
+- Database failure checks: `check-database-failures.ts`
+- System monitoring: `monitor-system.ts`
+- Testing LLM pipeline: `test-full-llm-pipeline.ts`
+
+**Create New Scripts When:**
+- Need to debug a specific new issue
+- Existing scripts don't cover the use case
+- Creating a one-time diagnostic tool
+
+### Script Naming Conventions
+- `check-*` - Diagnostic scripts that check status
+- `monitor-*` - Continuous monitoring scripts
+- `track-*` - Tracking specific operations
+- `test-*` - Testing specific functionality
+- `setup-*` - Setup and configuration scripts
+
+### Common Debugging Workflows
+
+**Debugging a Stuck Document:**
+1. Use `check-new-doc-status.ts` to check document status
+2. Use `check-current-job.ts` to check associated job
+3. Use `monitor-document.ts` for real-time monitoring
+4. Use `manually-process-job.ts` to reprocess if needed
+
+**Debugging LLM Issues:**
+1. Use `test-openrouter-simple.ts` for basic LLM connectivity
+2. Use `test-full-llm-pipeline.ts` for end-to-end LLM testing
+3. Use `test-llm-processing-offline.ts` for offline testing
+
+**Debugging Database Issues:**
+1. Use `check-database-failures.ts` to check for failures
+2. Check SQL files in `backend/sql/` for schema fixes
+3. Review `backend/src/models/` for model issues
+
+## YOLO Mode Configuration
+
+When using Cursor's YOLO mode, these commands are always allowed:
+- Test commands: `npm test`, `vitest`, `npm run test:watch`, `npm run test:coverage`
+- Build commands: `npm run build`, `tsc`, `npm run lint`
+- File operations: `touch`, `mkdir`, file creation/editing
+- Running debugging scripts: `ts-node backend/src/scripts/*.ts`
+- Database scripts: `npm run db:*` commands
+
+## Logging Patterns
+
+### Winston Logger Usage
+
+**Basic Logging:**
+```typescript
+import { logger } from './utils/logger';
+
+logger.info('Operation started', { documentId, userId });
+logger.error('Operation failed', { error: error.message, documentId });
+```
+
+**Structured Logger with Correlation ID:**
+```typescript
+import { StructuredLogger } from './utils/logger';
+
+const structuredLogger = new StructuredLogger(correlationId);
+structuredLogger.processingStart(documentId, userId, options);
+structuredLogger.processingError(error, documentId, userId, 'llm_processing');
+```
+
+**Service-Specific Logging:**
+- Upload operations: Use `structuredLogger.uploadStart()`, `uploadSuccess()`, `uploadError()`
+- Processing operations: Use `structuredLogger.processingStart()`, `processingSuccess()`, `processingError()`
+- Storage operations: Use `structuredLogger.storageOperation()`
+- Job queue operations: Use `structuredLogger.jobQueueOperation()`
+
+**Error Logging Best Practices:**
+- Always include error message: `error instanceof Error ? error.message : String(error)`
+- Include stack trace: `error instanceof Error ? error.stack : undefined`
+- Add context: documentId, userId, jobId, operation name
+- Use structured data, not string concatenation
+
+## Firebase/Supabase Error Handling
+
+### Firebase Errors
+- Check `error.code` for specific error codes
+- Firebase Auth errors: Handle `auth/` prefixed codes
+- Firebase Storage errors: Handle `storage/` prefixed codes
+- Log full error object for debugging: `logger.error('Firebase error', { error, code: error.code })`
+
+### Supabase Errors
+- Check `error.code` and `error.message`
+- RLS policy errors: Check `error.code === 'PGRST301'`
+- Connection errors: Implement retry logic
+- Log with context: `logger.error('Supabase error', { error: error.message, code: error.code, query })`
+
+## Retry Patterns
+
+### LLM API Retries (from llmService.ts)
+- 3 retry attempts for API calls
+- Exponential backoff between retries
+- Improved prompts on validation failure
+- Log each attempt with attempt number
+
+### Database Operation Retries
+- Use connection pooling (handled by Supabase client)
+- Retry on connection errors
+- Don't retry on validation errors
+
+## Testing Guidelines
+
+### Test Structure
+- Unit tests: `backend/src/__tests__/unit/`
+- Integration tests: `backend/src/__tests__/integration/`
+- Test utilities: `backend/src/__tests__/utils/`
+- Mocks: `backend/src/__tests__/mocks/`
+
+### Critical Paths to Test
+1. Document upload workflow
+2. Authentication flow
+3. Core API endpoints
+4. Job processing pipeline
+5. LLM service interactions
+
+### Mocking External Services
+- Firebase: Mock Firebase Admin SDK
+- Supabase: Mock Supabase client
+- LLM APIs: Mock HTTP responses
+- Google Cloud Storage: Mock GCS client
+
+## Performance Considerations
+
+- Vector search operations can be slow - use timeouts
+- LLM API calls are expensive - implement caching where possible
+- Job processing has 14-minute timeout limit
+- Large PDFs may cause memory issues - use streaming where possible
+- Database queries should use indexes (check Supabase dashboard)
+
+## Security Best Practices
+
+- Never log sensitive data (passwords, API keys, tokens)
+- Use environment variables for all secrets (see `backend/src/config/env.ts`)
+- Validate all user inputs (see `backend/src/middleware/validation.ts`)
+- Use Firebase Auth for authentication - never bypass
+- Respect Row Level Security (RLS) policies in Supabase
+
--- a/.planning/MILESTONES.md
+++ b/.planning/MILESTONES.md
@@ -0,0 +1,25 @@
+# Milestones
+
+## v1.0 Analytics & Monitoring (Shipped: 2026-02-25)
+
+**Phases completed:** 5 phases, 10 plans
+**Timeline:** 2 days (2026-02-24 → 2026-02-25)
+**Commits:** 42 (e606027..8bad951)
+**Codebase:** 31,184 LOC TypeScript
+
+**Delivered:** Persistent analytics dashboard and service health monitoring for the CIM Summary application — the admin knows immediately when any external service breaks and sees processing metrics at a glance.
+
+**Key accomplishments:**
+1. Database foundation with monitoring tables (service_health_checks, alert_events, document_processing_events) and typed models
+2. Fire-and-forget analytics service for non-blocking document processing event tracking
+3. Health probe system with real authenticated API calls to Document AI, Claude/OpenAI, Supabase, and Firebase Auth
+4. Alert service with email delivery, deduplication cooldown, and config-driven recipients
+5. Admin-authenticated API layer with health, analytics, and alerts endpoints (404 for non-admin)
+6. Frontend admin dashboard with service health grid, analytics summary, and critical alert banner
+7. Tech debt cleanup: env-driven config, consolidated retention cleanup, removed hardcoded defaults
+
+**Requirements:** 15/15 satisfied
+**Git range:** e606027..8bad951
+
+---
+
--- a/.planning/PROJECT.md
+++ b/.planning/PROJECT.md
@@ -0,0 +1,90 @@
+# CIM Summary — Analytics & Monitoring
+
+## What This Is
+
+An analytics dashboard and service health monitoring system for the CIM Summary application. Provides persistent document processing metrics, scheduled health probes for all 4 external services, email + in-app alerting when APIs or credentials need attention, and an admin-only monitoring dashboard.
+
+## Core Value
+
+When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
+
+## Requirements
+
+### Validated
+
+- ✓ Document upload and processing pipeline — existing
+- ✓ Multi-provider LLM integration (Anthropic, OpenAI, OpenRouter) — existing
+- ✓ Google Document AI text extraction — existing
+- ✓ Supabase PostgreSQL with pgvector for storage and search — existing
+- ✓ Firebase Authentication — existing
+- ✓ Google Cloud Storage for file management — existing
+- ✓ Background job queue with retry logic — existing
+- ✓ Structured logging with Winston and correlation IDs — existing
+- ✓ Basic health endpoints (`/health`, `/health/config`, `/monitoring/dashboard`) — existing
+- ✓ PDF generation and export — existing
+- ✓ Admin can view live health status for all 4 services (HLTH-01) — v1.0
+- ✓ Health probes make real authenticated API calls (HLTH-02) — v1.0
+- ✓ Scheduled periodic health probes (HLTH-03) — v1.0
+- ✓ Health probe results persist to Supabase (HLTH-04) — v1.0
+- ✓ Email alert on service down/degraded (ALRT-01) — v1.0
+- ✓ Alert deduplication within cooldown (ALRT-02) — v1.0
+- ✓ In-app alert banner for critical issues (ALRT-03) — v1.0
+- ✓ Alert recipient from config, not hardcoded (ALRT-04) — v1.0
+- ✓ Processing events persist at write time (ANLY-01) — v1.0
+- ✓ Admin can view processing summary (ANLY-02) — v1.0
+- ✓ Analytics instrumentation non-blocking (ANLY-03) — v1.0
+- ✓ DB migrations with indexes on created_at (INFR-01) — v1.0
+- ✓ Admin API routes protected by Firebase Auth (INFR-02) — v1.0
+- ✓ 30-day rolling data retention cleanup (INFR-03) — v1.0
+- ✓ Analytics use existing Supabase connection (INFR-04) — v1.0
+
+### Active
+
+(None — next milestone not yet defined. Run `/gsd:new-milestone` to plan.)
+
+### Out of Scope
+
+- External monitoring tools (Grafana, Datadog) — keeping it in-app for simplicity
+- Non-admin user analytics views — admin-only for now
+- Mobile push notifications — email + in-app sufficient
+- Historical analytics beyond 30 days — lean storage, can extend later
+- Real-time WebSocket updates — polling is sufficient for admin dashboard
+- ML-based anomaly detection — threshold-based alerting sufficient at this scale
+
+## Context
+
+Shipped v1.0 with 31,184 LOC TypeScript across Express.js backend and React frontend.
+Tech stack: Express.js, React, Supabase (PostgreSQL + pgvector), Firebase Auth, Firebase Cloud Functions, Google Document AI, Anthropic/OpenAI LLMs, nodemailer, Tailwind CSS.
+
+Four external services monitored with real authenticated probes:
+1. **Google Document AI** — service account credential validation
+2. **Claude/OpenAI** — API key validation via cheapest model (claude-haiku-4-5, max_tokens 5)
+3. **Supabase** — direct PostgreSQL pool query (`SELECT 1`)
+4. **Firebase Auth** — SDK liveness via verifyIdToken error classification
+
+Admin user: jpressnell@bluepointcapital.com (config-driven, not hardcoded).
+
+## Constraints
+
+- **Tech stack**: Express.js backend + React frontend
+- **Auth**: Admin-only access via Firebase Auth with config-driven email check
+- **Storage**: Supabase PostgreSQL — no new database infrastructure
+- **Email**: nodemailer for alert delivery
+- **Deployment**: Firebase Cloud Functions (14-minute timeout)
+- **Data retention**: 30-day rolling window
+
+## Key Decisions
+
+| Decision | Rationale | Outcome |
+|----------|-----------|---------|
+| In-app dashboard over external tools | Simpler setup, no additional infrastructure | ✓ Good — admin sees everything in one place |
+| Email + in-app dual alerting | Redundancy for critical issues | ✓ Good — covers both active and passive monitoring |
+| 30-day retention | Balances useful trend data with storage efficiency | ✓ Good — consolidated into single cleanup function |
+| Single admin (config-driven) | Simple RBAC, can extend later | ✓ Good — email now env-driven after tech debt cleanup |
+| Scheduled probes + fire-and-forget analytics | Decouples monitoring from processing | ✓ Good — zero impact on processing pipeline latency |
+| 404 (not 403) for non-admin routes | Does not reveal admin routes exist | ✓ Good — security through obscurity at API level |
+| void return type for analytics writes | Prevents accidental await on critical path | ✓ Good — type system enforces fire-and-forget pattern |
+| Promise.allSettled for probe orchestration | All 4 probes run even if one throws | ✓ Good — partial results better than total failure |
+
+---
+*Last updated: 2026-02-25 after v1.0 milestone*
--- a/.planning/RETROSPECTIVE.md
+++ b/.planning/RETROSPECTIVE.md
@@ -0,0 +1,66 @@
+# Project Retrospective
+
+*A living document updated after each milestone. Lessons feed forward into future planning.*
+
+## Milestone: v1.0 — Analytics & Monitoring
+
+**Shipped:** 2026-02-25
+**Phases:** 5 | **Plans:** 10 | **Sessions:** ~4
+
+### What Was Built
+- Database foundation with 3 monitoring tables (service_health_checks, alert_events, document_processing_events) and typed TypeScript models
+- Health probe system with real authenticated API calls to Document AI, Claude/OpenAI, Supabase, and Firebase Auth
+- Alert service with email delivery via nodemailer, deduplication cooldown, and config-driven recipients
+- Fire-and-forget analytics service for non-blocking document processing event tracking
+- Admin-authenticated API layer with health, analytics, and alerts endpoints
+- Frontend admin dashboard with service health grid, analytics summary, and critical alert banner
+- Tech debt cleanup: env-driven config, consolidated retention, removed hardcoded defaults
+
+### What Worked
+- Strict dependency ordering (data → services → API → frontend) prevented integration surprises — each phase consumed exactly what the prior phase provided
+- Fire-and-forget pattern enforced at the type level (void return) caught potential performance issues at compile time
+- GSD audit-milestone workflow caught 5 tech debt items before shipping — all resolved
+- 2-day milestone completion shows GSD workflow is efficient for well-scoped work
+
+### What Was Inefficient
+- Phase 5 (tech debt) was added to the roadmap but executed as a direct commit — the GSD plan/execute overhead wasn't warranted for 3 small fixes
+- Summary one-liner extraction returned null for all summaries — frontmatter format may not match what gsd-tools expects
+
+### Patterns Established
+- Static class model pattern for Supabase (no instantiation, getSupabaseServiceClient per-method)
+- makeSupabaseChain() factory for Vitest mocking of Supabase client
+- requireAdminEmail middleware returns 404 (not 403) to hide admin routes
+- Firebase Secrets read inside function body, never at module level
+- void return type to prevent accidental await on fire-and-forget operations
+
+### Key Lessons
+1. Small tech debt fixes don't need full GSD plan/execute — direct commits are fine when the audit already defines the scope
+2. Type-level enforcement (void vs Promise<void>) is more reliable than code review for architectural constraints
+3. Promise.allSettled is the right pattern when partial results are better than total failure (health probes)
+4. Admin email should always be config-driven from day one — hardcoding "just for now" creates tech debt immediately
+
+### Cost Observations
+- Model mix: ~80% sonnet (execution), ~20% haiku (research/verification)
+- Sessions: ~4
+- Notable: Phase 4 (frontend) completed fastest — well-defined API contracts from Phase 3 made UI wiring straightforward
+
+---
+
+## Cross-Milestone Trends
+
+### Process Evolution
+
+| Milestone | Sessions | Phases | Key Change |
+|-----------|----------|--------|------------|
+| v1.0 | ~4 | 5 | First milestone — established patterns |
+
+### Cumulative Quality
+
+| Milestone | Tests | Coverage | Zero-Dep Additions |
+|-----------|-------|----------|-------------------|
+| v1.0 | 14+ | — | 3 tables, 5 services, 4 routes, 3 components |
+
+### Top Lessons (Verified Across Milestones)
+
+1. Type-level enforcement > code review for architectural constraints
+2. Strict phase dependency ordering prevents integration surprises
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -0,0 +1,28 @@
+# Roadmap: CIM Summary — Analytics & Monitoring
+
+## Milestones
+
+- ✅ **v1.0 Analytics & Monitoring** — Phases 1-5 (shipped 2026-02-25)
+
+## Phases
+
+<details>
+<summary>✅ v1.0 Analytics & Monitoring (Phases 1-5) — SHIPPED 2026-02-25</summary>
+
+- [x] Phase 1: Data Foundation (2/2 plans) — completed 2026-02-24
+- [x] Phase 2: Backend Services (4/4 plans) — completed 2026-02-24
+- [x] Phase 3: API Layer (2/2 plans) — completed 2026-02-24
+- [x] Phase 4: Frontend (2/2 plans) — completed 2026-02-25
+- [x] Phase 5: Tech Debt Cleanup (direct commit) — completed 2026-02-25
+
+</details>
+
+## Progress
+
+| Phase | Milestone | Plans Complete | Status | Completed |
+|-------|-----------|----------------|--------|-----------|
+| 1. Data Foundation | v1.0 | 2/2 | Complete | 2026-02-24 |
+| 2. Backend Services | v1.0 | 4/4 | Complete | 2026-02-24 |
+| 3. API Layer | v1.0 | 2/2 | Complete | 2026-02-24 |
+| 4. Frontend | v1.0 | 2/2 | Complete | 2026-02-25 |
+| 5. Tech Debt Cleanup | v1.0 | — | Complete | 2026-02-25 |
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -0,0 +1,66 @@
+---
+gsd_state_version: 1.0
+milestone: v1.0
+milestone_name: Analytics & Monitoring
+status: shipped
+last_updated: "2026-02-25"
+progress:
+  total_phases: 5
+  completed_phases: 5
+  total_plans: 10
+  completed_plans: 10
+---
+
+# Project State
+
+## Project Reference
+
+See: .planning/PROJECT.md (updated 2026-02-25)
+
+**Core value:** When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
+**Current focus:** v1.0 shipped — next milestone not yet defined
+
+## Current Position
+
+Phase: 5 of 5 (all complete)
+Plan: All plans complete
+Status: v1.0 milestone shipped
+Last activity: 2026-02-25 — v1.0 milestone archived
+
+Progress: [██████████] 100%
+
+## Performance Metrics
+
+**Velocity:**
+- Total plans completed: 10
+- Timeline: 2 days (2026-02-24 → 2026-02-25)
+
+**By Phase:**
+
+| Phase | Plans | Total | Avg/Plan |
+|-------|-------|-------|----------|
+| 01-data-foundation | 2 | ~34 min | ~17 min |
+| 02-backend-services | 4 | ~51 min | ~13 min |
+| 03-api-layer | 2 | ~16 min | ~8 min |
+| 04-frontend | 2 | ~4 min | ~2 min |
+| 05-tech-debt-cleanup | — | direct commit | — |
+
+## Accumulated Context
+
+### Decisions
+
+All v1.0 decisions validated — see PROJECT.md Key Decisions table for outcomes.
+
+### Pending Todos
+
+None.
+
+### Blockers/Concerns
+
+None — v1.0 shipped.
+
+## Session Continuity
+
+Last session: 2026-02-25
+Stopped at: v1.0 milestone archived and tagged
+Resume file: None
--- a/.planning/codebase/ARCHITECTURE.md
+++ b/.planning/codebase/ARCHITECTURE.md
@@ -0,0 +1,243 @@
+# Architecture
+
+**Analysis Date:** 2026-02-24
+
+## Pattern Overview
+
+**Overall:** Full-stack distributed system combining Express.js backend with React frontend, implementing a **multi-stage document processing pipeline** with queued background jobs and real-time monitoring.
+
+**Key Characteristics:**
+- Server-rendered PDF generation with single-pass LLM processing
+- Asynchronous job queue for background document processing (max 3 concurrent)
+- Firebase authentication with Supabase PostgreSQL + pgvector for embeddings
+- Multi-language LLM support (Anthropic, OpenAI, OpenRouter)
+- Structured schema extraction using Zod and LLM-driven analysis
+- Google Document AI for OCR and text extraction
+- Real-time upload progress tracking via SSE/polling
+- Correlation ID tracking throughout distributed pipeline
+
+## Layers
+
+**API Layer (Express + TypeScript):**
+- Purpose: HTTP request routing, authentication, and response handling
+- Location: `backend/src/index.ts`, `backend/src/routes/`, `backend/src/controllers/`
+- Contains: Route definitions, request validation, error handling
+- Depends on: Middleware (auth, validation), Services
+- Used by: Frontend and external clients
+
+**Authentication Layer:**
+- Purpose: Firebase ID token verification and user identity validation
+- Location: `backend/src/middleware/firebaseAuth.ts`, `backend/src/config/firebase.ts`
+- Contains: Token verification, service account initialization, session recovery
+- Depends on: Firebase Admin SDK, configuration
+- Used by: All protected routes via `verifyFirebaseToken` middleware
+
+**Controller Layer:**
+- Purpose: Request handling, input validation, service orchestration
+- Location: `backend/src/controllers/documentController.ts`, `backend/src/controllers/authController.ts`
+- Contains: `getUploadUrl()`, `processDocument()`, `getDocumentStatus()` handlers
+- Depends on: Models, Services, Middleware
+- Used by: Routes
+
+**Service Layer:**
+- Purpose: Business logic, external API integration, document processing orchestration
+- Location: `backend/src/services/`
+- Contains:
+  - `unifiedDocumentProcessor.ts` - Main orchestrator, strategy selection
+  - `singlePassProcessor.ts` - 2-LLM-call extraction (pass 1 + quality check)
+  - `documentAiProcessor.ts` - Google Document AI text extraction
+  - `llmService.ts` - LLM API calls with retry logic (3 attempts, exponential backoff)
+  - `jobQueueService.ts` - Background job processing (EventEmitter-based)
+  - `fileStorageService.ts` - Google Cloud Storage signed URLs and uploads
+  - `vectorDatabaseService.ts` - Supabase vector embeddings and search
+  - `pdfGenerationService.ts` - Puppeteer-based PDF rendering
+  - `csvExportService.ts` - Financial data export
+- Depends on: Models, Config, Utilities
+- Used by: Controllers, Job Queue
+
+**Model Layer (Data Access):**
+- Purpose: Database interactions, query execution, schema validation
+- Location: `backend/src/models/`
+- Contains: `DocumentModel.ts`, `ProcessingJobModel.ts`, `UserModel.ts`, `VectorDatabaseModel.ts`
+- Depends on: Supabase client, configuration
+- Used by: Services, Controllers
+
+**Job Queue Layer:**
+- Purpose: Asynchronous background processing with priority and retry handling
+- Location: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`
+- Contains: In-memory queue, worker pool (max 3 concurrent), Firebase scheduled function trigger
+- Depends on: Services (document processor), Models
+- Used by: Controllers (to enqueue work), Scheduled functions (to trigger processing)
+
+**Frontend Layer (React + TypeScript):**
+- Purpose: User interface for document upload, processing monitoring, and review
+- Location: `frontend/src/`
+- Contains: Components (Upload, List, Viewer, Analytics), Services, Contexts
+- Depends on: Backend API, Firebase Auth, Axios
+- Used by: Web browsers
+
+## Data Flow
+
+**Document Upload & Processing Flow:**
+
+1. **Upload Initiation** (Frontend)
+   - User selects PDF file via `DocumentUpload` component
+   - Calls `documentService.getUploadUrl()` → Backend `/documents/upload-url` endpoint
+   - Backend creates document record (status: 'uploading') and generates signed GCS URL
+
+2. **File Upload** (Frontend → GCS)
+   - Frontend uploads file directly to Google Cloud Storage via signed URL
+   - Frontend polls `documentService.getDocumentStatus()` for upload completion
+   - `UploadMonitoringDashboard` displays real-time progress
+
+3. **Processing Trigger** (Frontend → Backend)
+   - Frontend calls `POST /documents/{id}/process` once upload complete
+   - Controller creates processing job and enqueues to `jobQueueService`
+   - Controller immediately returns job ID
+
+4. **Background Job Execution** (Job Queue)
+   - Scheduled Firebase function (`processDocumentJobs`) runs every 1 minute
+   - Calls `jobProcessorService.processJobs()` to dequeue and execute
+   - For each queued document:
+     - Fetch file from GCS
+     - Update status to 'extracting_text'
+     - Call `unifiedDocumentProcessor.processDocument()`
+
+5. **Document Processing** (Single-Pass Strategy)
+   - **Pass 1 - LLM Extraction:**
+     - `documentAiProcessor.extractText()` (if needed) - Google Document AI OCR
+     - `llmService.processCIMDocument()` - Claude/OpenAI structured extraction
+     - Produces `CIMReview` object with financial, market, management data
+     - Updates document status to 'processing_llm'
+
+   - **Pass 2 - Quality Check:**
+     - `llmService.validateCIMReview()` - Verify completeness and accuracy
+     - Updates status to 'quality_validation'
+
+   - **PDF Generation:**
+     - `pdfGenerationService.generatePDF()` - Puppeteer renders HTML template
+     - Uploads PDF to GCS
+     - Updates status to 'generating_pdf'
+
+   - **Vector Indexing (Background):**
+     - `vectorDatabaseService.createDocumentEmbedding()` - Generate 3072-dim embeddings
+     - Chunk document semantically, store in Supabase with vector index
+     - Status moves to 'vector_indexing' then 'completed'
+
+6. **Result Delivery** (Backend → Frontend)
+   - Frontend polls `GET /documents/{id}` to check completion
+   - When status = 'completed', fetches summary and analysis data
+   - `DocumentViewer` displays results, allows regeneration with feedback
+
+**State Management:**
+- Backend: Document status progresses through `uploading → extracting_text → processing_llm → generating_pdf → vector_indexing → completed` or `failed` at any step
+- Frontend: AuthContext manages user/token, component state tracks selected document and loading states
+- Job Queue: In-memory queue with EventEmitter for state transitions
+
+## Key Abstractions
+
+**Unified Processor:**
+- Purpose: Strategy pattern for document processing (single-pass vs. agentic RAG vs. simple)
+- Examples: `singlePassProcessor`, `simpleDocumentProcessor`, `optimizedAgenticRAGProcessor`
+- Pattern: Pluggable strategies via `ProcessingStrategy` selection in config
+
+**LLM Service:**
+- Purpose: Unified interface for multiple LLM providers with retry logic
+- Examples: `backend/src/services/llmService.ts` (Anthropic, OpenAI, OpenRouter)
+- Pattern: Provider-agnostic API with `processCIMDocument()` returning structured `CIMReview`
+
+**Vector Database Abstraction:**
+- Purpose: PostgreSQL pgvector operations via Supabase for semantic search
+- Examples: `backend/src/services/vectorDatabaseService.ts`
+- Pattern: Embedding + chunking → vector search via cosine similarity
+
+**File Storage Abstraction:**
+- Purpose: Google Cloud Storage operations with signed URLs
+- Examples: `backend/src/services/fileStorageService.ts`
+- Pattern: Signed upload/download URLs for temporary access without IAM burden
+
+**Job Queue Pattern:**
+- Purpose: Async processing with retry and priority handling
+- Examples: `backend/src/services/jobQueueService.ts` (EventEmitter-based)
+- Pattern: Priority queue with exponential backoff retry
+
+## Entry Points
+
+**API Entry Point:**
+- Location: `backend/src/index.ts`
+- Triggers: Process startup or Firebase Functions invocation
+- Responsibilities:
+  - Initialize Express app
+  - Set up middleware (CORS, helmet, rate limiting, authentication)
+  - Register routes (`/documents`, `/vector`, `/monitoring`, `/api/audit`)
+  - Start job queue service
+  - Export Firebase Functions v2 handlers (`api`, `processDocumentJobs`)
+
+**Scheduled Job Processing:**
+- Location: `backend/src/index.ts` (line 252: `processDocumentJobs` function export)
+- Triggers: Firebase Cloud Scheduler every 1 minute
+- Responsibilities:
+  - Health check database connection
+  - Detect stuck jobs (processing > 15 min, pending > 2 min)
+  - Call `jobProcessorService.processJobs()`
+  - Log metrics and errors
+
+**Frontend Entry Point:**
+- Location: `frontend/src/main.tsx`
+- Triggers: Browser navigation
+- Responsibilities:
+  - Initialize React app with AuthProvider
+  - Set up Firebase client
+  - Render routing structure (Login → Dashboard)
+
+**Document Processing Controller:**
+- Location: `backend/src/controllers/documentController.ts`
+- Route: `POST /documents/{id}/process`
+- Responsibilities:
+  - Validate user authentication
+  - Enqueue processing job
+  - Return job ID to client
+
+## Error Handling
+
+**Strategy:** Multi-layer error recovery with structured logging and graceful degradation
+
+**Patterns:**
+- **Retry Logic:** DocumentModel uses exponential backoff (1s → 2s → 4s) for network errors
+- **LLM Retry:** `llmService` retries API calls 3 times with exponential backoff
+- **Firebase Auth Recovery:** `firebaseAuth.ts` attempts session recovery on token verify failure
+- **Job Queue Retry:** Jobs retry up to 3 times with configurable backoff (5s → 300s max)
+- **Structured Error Logging:** All errors include correlation ID, stack trace, and context metadata
+- **Circuit Breaker Pattern:** Database health check in `processDocumentJobs` prevents cascading failures
+
+**Error Boundaries:**
+- Global error handler at end of Express middleware chain (`errorHandler`)
+- Try/catch in all async functions with context-aware logging
+- Unhandled rejection listener at process level (line 24 of `index.ts`)
+
+## Cross-Cutting Concerns
+
+**Logging:**
+- Framework: Winston (json + console in dev)
+- Approach: Structured logger with correlation IDs, Winston transports for error/upload logs
+- Location: `backend/src/utils/logger.ts`
+- Pattern: `logger.info()`, `logger.error()`, `StructuredLogger` for operations
+
+**Validation:**
+- Approach: Joi schema in environment config, Zod for API request/response types
+- Location: `backend/src/config/env.ts`, `backend/src/services/llmSchemas.ts`
+- Pattern: Joi for config, Zod for runtime validation
+
+**Authentication:**
+- Approach: Firebase ID tokens verified via `verifyFirebaseToken` middleware
+- Location: `backend/src/middleware/firebaseAuth.ts`
+- Pattern: Bearer token in Authorization header, cached in req.user
+
+**Correlation Tracking:**
+- Approach: UUID correlation ID added to all requests, propagated through job processing
+- Location: `backend/src/middleware/validation.ts` (addCorrelationId)
+- Pattern: X-Correlation-ID header or generated UUID, included in all logs
+
+---
+
+*Architecture analysis: 2026-02-24*
--- a/.planning/codebase/CONCERNS.md
+++ b/.planning/codebase/CONCERNS.md
@@ -0,0 +1,329 @@
+# Codebase Concerns
+
+**Analysis Date:** 2026-02-24
+
+## Tech Debt
+
+**Console.log Debug Statements in Controllers:**
+- Issue: Excessive `console.log()` calls with emoji prefixes left throughout `documentController.ts` instead of using proper structured logging via Winston logger
+- Files: `backend/src/controllers/documentController.ts` (lines 12-80, multiple scattered instances)
+- Impact: Production logs become noisy and unstructured; debug output leaks to stdout/stderr; makes it harder to parse logs for errors and metrics
+- Fix approach: Replace all `console.log()` calls with `logger.info()`, `logger.debug()`, `logger.error()` via imported `logger` from `utils/logger.ts`. Follow pattern established in other services.
+
+**Incomplete Job Statistics Tracking:**
+- Issue: `jobQueueService.ts` and `jobProcessorService.ts` both have TODO markers indicating completed/failed job counts are not tracked (lines 606-607, 635-636)
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`
+- Impact: Job queue health metrics are incomplete; cannot audit success/failure rates; monitoring dashboards will show incomplete data
+- Fix approach: Implement `completedJobs` and `failedJobs` counters in both services using persistent storage or Redis. Update schema if needed.
+
+**Config Migration Debug Cruft:**
+- Issue: Multiple `console.log()` debug statements in `config/env.ts` (lines 23, 46, 51, 292) for Firebase Functions v1→v2 migration are still present
+- Files: `backend/src/config/env.ts`
+- Impact: Production logs polluted with migration warnings; makes it harder to spot real issues; clutters server startup output
+- Fix approach: Remove all `[CONFIG DEBUG]` console.log statements once migration to Firebase Functions v2 is confirmed complete. Wrap remaining fallback logic in logger.debug() if diagnostics needed.
+
+**Hardcoded Processing Strategy:**
+- Issue: Historical commit shows processing strategy was hardcoded, potential for incomplete refactoring
+- Files: `backend/src/services/`, controller logic
+- Impact: May not correctly use configured strategy; processing may default unexpectedly
+- Fix approach: Verify all processing paths read from `config.processingStrategy` and have proper fallback logic
+
+**Type Safety Issues - `any` Type Usage:**
+- Issue: 378 instances of `any` or `unknown` types found across backend TypeScript files
+- Files: Widespread including `optimizedAgenticRAGProcessor.ts:17`, `pdfGenerationService.ts`, `vectorDatabaseService.ts`
+- Impact: Loses type safety guarantees; harder to catch errors at compile time; refactoring becomes risky
+- Fix approach: Gradually replace `any` with proper types. Start with service boundaries and public APIs. Create typed interfaces for common patterns.
+
+## Known Bugs
+
+**Project Panther CIM KPI Missing After Processing:**
+- Symptoms: Document `Project Panther - Confidential Information Memorandum_vBluePoint.pdf` processed but dashboard shows "Not specified in CIM" for Revenue, EBITDA, Employees, Founded even though numeric tables exist in PDF
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (dealOverview mapper), processing pipeline
+- Trigger: Process Project Panther test document through full agentic RAG pipeline
+- Impact: Dashboard KPI cards remain empty; users see incomplete summaries
+- Workaround: Manual data entry in dashboard; skip financial summary display for affected documents
+- Fix approach: Trace through `optimizedAgenticRAGProcessor.generateLLMAnalysisMultiPass()` → `dealOverview` mapper. Add regression test for this specific document. Check if structured table extraction is working correctly.
+
+**10+ Minute Processing Latency Regression:**
+- Symptoms: Document `document-55c4a6e2-8c08-4734-87f6-24407cea50ac.pdf` (Project Panther) took ~10 minutes end-to-end despite typical processing being 2-3 minutes
+- Files: `backend/src/services/unifiedDocumentProcessor.ts`, `optimizedAgenticRAGProcessor.ts`, `documentAiProcessor.ts`, `llmService.ts`
+- Trigger: Large or complex CIM documents (30+ pages with tables)
+- Impact: Users experience timeouts; processing approaching or exceeding 14-minute Firebase Functions limit
+- Workaround: None currently; document fails to process if latency exceeds timeout
+- Fix approach: Instrument each pipeline phase (PDF chunking, Document AI extraction, RAG passes, financial parser) with timing logs. Identify bottleneck(s). Profile GCS upload retries, Anthropic fallbacks. Consider parallel multi-pass queries within quota limits.
+
+**Vector Search Timeouts After Index Growth:**
+- Symptoms: Supabase vector search RPC calls timeout after 30 seconds; fallback to document-scoped search with limited results
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 122-182)
+- Trigger: Large embedded document collections (1000+ chunks); similarity search under load
+- Impact: Retrieval quality degrades as index grows; fallback search returns fewer contextual chunks; RAG quality suffers
+- Workaround: Fallback query uses document-scoped filtering and direct embedding lookup
+- Fix approach: Implement query batching, result caching by content hash, or query optimization. Consider Pinecone migration if Supabase vector performance doesn't improve. Add metrics to track timeout frequency.
+
+## Security Considerations
+
+**Unencrypted Debug Logs in Production:**
+- Risk: Sensitive document content, user IDs, and processing details may be exposed in logs if debug mode enabled in production
+- Files: `backend/src/middleware/firebaseAuth.ts` (AUTH_DEBUG flag), `backend/src/config/env.ts`, `backend/src/controllers/documentController.ts`
+- Current mitigation: Debug logging controlled by `AUTH_DEBUG` environment variable; not enabled by default
+- Recommendations:
+  1. Ensure `AUTH_DEBUG` is never set to `true` in production
+  2. Implement log redaction middleware to strip PII (API keys, document content, user data)
+  3. Use correlation IDs instead of logging full request bodies
+  4. Add log level enforcement (error/warn only in production)
+
+**Hardcoded Service Account Credentials Path:**
+- Risk: If service account key JSON is accidentally committed or exposed, attacker gains full GCS and Document AI access
+- Files: `backend/src/config/env.ts`, `backend/src/utils/googleServiceAccount.ts`
+- Current mitigation: `.env` file in `.gitignore`; credentials path via env var
+- Recommendations:
+  1. Use Firebase Function secrets (defineSecret()) instead of env files
+  2. Implement credential rotation policy
+  3. Add pre-commit hook to prevent `.json` key files in commits
+  4. Audit GCS bucket permissions quarterly
+
+**Concurrent LLM Rate Limiting Insufficient:**
+- Risk: Although `llmService.ts` limits concurrent calls to 1 (line 52), burst requests could still trigger Anthropic 429 rate limit errors during high load
+- Files: `backend/src/services/llmService.ts` (MAX_CONCURRENT_LLM_CALLS = 1)
+- Current mitigation: Max 1 concurrent call; retry with exponential backoff (3 attempts)
+- Recommendations:
+  1. Consider reducing to 0.5 concurrent calls (queue instead of async) during peak hours
+  2. Add request batching for multi-pass analysis
+  3. Implement circuit breaker pattern for cascading failures
+  4. Monitor token spend and throttle proactively
+
+**No Request Rate Limiting on Upload Endpoint:**
+- Risk: Unauthenticated attackers could flood `/upload/url` endpoint to exhaust quota or fill storage
+- Files: `backend/src/controllers/documentController.ts` (getUploadUrl endpoint), `backend/src/routes/documents.ts`
+- Current mitigation: Firebase Auth check; file size limit enforced
+- Recommendations:
+  1. Add rate limiter middleware (e.g., express-rate-limit) with per-user quotas
+  2. Implement request signing for upload URLs
+  3. Add CORS restrictions to known frontend domains
+  4. Monitor upload rate and alert on anomalies
+
+## Performance Bottlenecks
+
+**Large File PDF Chunking Memory Usage:**
+- Problem: Documents larger than 50 MB may cause OOM errors during chunking; no memory limit guards
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (line 35, 4000-char chunks), `backend/src/services/unifiedDocumentProcessor.ts`
+- Cause: Entire document text loaded into memory before chunking; large overlap between chunks multiplies footprint
+- Improvement path:
+  1. Implement streaming chunk processing from GCS (read chunks, embed, write to DB before next chunk)
+  2. Reduce overlap from 200 to 100 characters or make dynamic based on document size
+  3. Add memory threshold checks; fail early with user-friendly error if approaching limit
+  4. Profile heap usage in tests with 50+ MB documents
+
+**Embedding Generation for Large Documents:**
+- Problem: Embedding 1000+ chunks sequentially takes 2-3 minutes; no concurrency despite `maxConcurrentEmbeddings = 5` setting
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (lines 37, 172-180 region)
+- Cause: Batch size of 10 may be inefficient; OpenAI/Anthropic API concurrency not fully utilized
+- Improvement path:
+  1. Increase batch size to 25-50 chunks per concurrent request (test quota limits)
+  2. Use Promise.all() instead of sequential embedding calls
+  3. Cache embeddings by content hash to skip re-embedding on retries
+  4. Add progress callback to track batch completion
+
+**Multiple LLM Retries on Network Failure:**
+- Problem: 3 retry attempts for each LLM call with exponential backoff means up to 30+ seconds per call; multi-pass analysis does 3+ passes
+- Files: `backend/src/services/llmService.ts` (retry logic, lines 320+), `backend/src/services/optimizedAgenticRAGProcessor.ts` (line 83 multi-pass)
+- Cause: No circuit breaker; all retries execute even if service degraded
+- Improvement path:
+  1. Track consecutive failures; disable retries if failure rate >50% in last minute
+  2. Use adaptive retry backoff (double wait time only after first failure)
+  3. Implement multi-pass fallback: if Pass 2 fails, use Pass 1 results instead of failing entire document
+  4. Add metrics endpoint to show retry frequency and success rates
+
+**PDF Generation Memory Leak with Puppeteer Page Pool:**
+- Problem: Page pool in `pdfGenerationService.ts` may not properly release browser resources; max pool size 5 but no eviction policy
+- Files: `backend/src/services/pdfGenerationService.ts` (lines 66-71, page pool)
+- Cause: Pages may not be closed if PDF generation errors mid-stream; no cleanup on timeout
+- Improvement path:
+  1. Implement LRU eviction: close oldest page if pool reaches max size
+  2. Add page timeout with forced close after 30s
+  3. Add memory monitoring; close all pages if heap >500MB
+  4. Log page pool stats every 5 minutes to detect leaks
+
+## Fragile Areas
+
+**Job Queue State Machine:**
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`, `backend/src/models/ProcessingJobModel.ts`
+- Why fragile:
+  1. Job status transitions (pending → processing → completed) not atomic; race condition if two workers pick same job
+  2. Stuck job detection relies on timestamp comparison; clock skew or server restart breaks detection
+  3. No idempotency tokens; job retry on network error could trigger duplicate processing
+- Safe modification:
+  1. Add database-level unique constraint on job ID + processing timestamp
+  2. Use database transactions for status updates
+  3. Implement idempotency with request deduplication ID
+- Test coverage:
+  1. No unit tests found for concurrent job processing scenario
+  2. No integration tests with actual database
+  3. Add tests for: concurrent workers, stuck job reset, duplicate submissions
+
+**Document Processing Pipeline Error Handling:**
+- Files: `backend/src/controllers/documentController.ts` (lines 200+), `backend/src/services/unifiedDocumentProcessor.ts`
+- Why fragile:
+  1. Hybrid approach tries job queue then fallback to immediate processing; error in job queue doesn't fully propagate
+  2. Document status not updated if processing fails mid-pipeline (remains 'processing_llm')
+  3. No compensating transaction to roll back partial results
+- Safe modification:
+  1. Separate job submission from immediate processing; always update document status atomically
+  2. Add processing stage tracking (document_ai → chunking → embedding → llm → pdf)
+  3. Implement rollback logic: delete chunks and embeddings if LLM stage fails
+- Test coverage:
+  1. Add tests for each pipeline stage failure
+  2. Test document status consistency after each failure
+  3. Add integration test with network failure injection
+
+**Vector Database Search Fallback Chain:**
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 110-182)
+- Why fragile:
+  1. Three-level fallback (RPC search → document-scoped search → direct lookup) masks underlying issues
+  2. If Supabase RPC is degraded, system degrades silently instead of alerting
+  3. Fallback search may return stale or incorrect results without indication
+- Safe modification:
+  1. Add circuit breaker: if timeout happens 3x in 5 minutes, stop trying RPC search
+  2. Return metadata flag indicating which fallback was used (for logging/debugging)
+  3. Add explicit timeout wrapped in try/catch, not via Promise.race() (cleaner code)
+- Test coverage:
+  1. Mock Supabase timeout at each RPC level
+  2. Verify correct fallback is triggered
+  3. Add performance benchmarks for each search method
+
+**Config Initialization Race Condition:**
+- Files: `backend/src/config/env.ts` (lines 15-52)
+- Why fragile:
+  1. Firebase Functions v1 fallback (`functions.config()`) may not be thread-safe
+  2. If multiple instances start simultaneously, config merge may be incomplete
+  3. No validation that config merge was successful
+- Safe modification:
+  1. Remove v1 fallback entirely; require explicit Firebase Functions v2 setup
+  2. Validate all critical env vars before allowing service startup
+  3. Fail fast with clear error message if required vars missing
+- Test coverage:
+  1. Add test for missing required env vars
+  2. Test with incomplete config to verify error message clarity
+
+## Scaling Limits
+
+**Supabase Concurrent Vector Search Connections:**
+- Current capacity: RPC timeout 30 seconds; Supabase connection pool typically 100 max
+- Limit: With 3 concurrent workers × multiple users, could exhaust connection pool during peak load
+- Scaling path:
+  1. Implement connection pooling via PgBouncer (already in Supabase Pro tier)
+  2. Reduce timeout from 30s to 10s; fail faster and retry
+  3. Migrate to Pinecone if vector search becomes >30% of workload
+
+**Firebase Functions Timeout (14 minutes):**
+- Current capacity: Serverless function execution up to 15 minutes (1 minute buffer before hard timeout)
+- Limit: Document processing hitting ~10 minutes; adding new features could exceed limit
+- Scaling path:
+  1. Move processing to Cloud Run (1 hour limit) for large documents
+  2. Implement processing timeout failover: if approach 12 minutes, checkpoint and requeue
+  3. Add background worker pool for long-running jobs (separate from request path)
+
+**LLM API Rate Limits (Anthropic/OpenAI):**
+- Current capacity: 1 concurrent call; 3 retries per call; no per-minute or per-second throttling beyond single-call serialization
+- Limit: Burst requests from multiple users could trigger 429 rate limit errors
+- Scaling path:
+  1. Negotiate higher rate limits with API providers
+  2. Implement request queuing with exponential backoff per user
+  3. Add cost monitoring and soft-limit alerts (warn at 80% of quota)
+
+**PDF Generation Browser Pool:**
+- Current capacity: 5 browser pages maximum
+- Limit: With 3+ concurrent document processing jobs, pool contention causes delays (queue wait time)
+- Scaling path:
+  1. Increase pool size to 10 (requires more memory)
+  2. Move PDF generation to separate worker queue (decouple from request path)
+  3. Implement adaptive pool sizing based on available memory
+
+**GCS Upload/Download Throughput:**
+- Current capacity: Single-threaded upload/download; file transfer waits on GCS API latency
+- Limit: Large documents (50+ MB) may timeout or be slow
+- Scaling path:
+  1. Implement resumable uploads with multi-part chunks
+  2. Add parallel chunk uploads for files >10 MB
+  3. Cache frequently accessed documents in Redis
+
+## Dependencies at Risk
+
+**Firebase Functions v1 Deprecation (EOL Dec 31, 2025):**
+- Risk: Runtime will be decommissioned; Node.js 20 support ending Oct 30, 2026 (warning already surfaced)
+- Impact: Functions will stop working after deprecation date; forced migration required
+- Migration plan:
+  1. Migrate to Firebase Functions v2 runtime (already partially done; fallback code still present)
+  2. Update `firebase-functions` package to latest major version
+  3. Remove deprecated `functions.config()` fallback once migration confirmed
+  4. Test all functions after upgrade
+
+**Puppeteer Version Pinning:**
+- Risk: Puppeteer has frequent security updates; pinned version likely outdated
+- Impact: Browser vulnerabilities in PDF generation; potential sandbox bypass
+- Migration plan:
+  1. Audit current Puppeteer version in `package.json`
+  2. Test upgrade path (may have breaking API changes)
+  3. Implement automated dependency security scanning
+
+**Document AI API Versioning:**
+- Risk: Google Cloud Document AI API may deprecate current processor version
+- Impact: Processing pipeline breaks if processor ID no longer valid
+- Migration plan:
+  1. Document current processor version and creation date
+  2. Subscribe to Google Cloud deprecation notices
+  3. Add feature flag to switch processor versions
+  4. Test new processor version before migration
+
+## Missing Critical Features
+
+**Job Processing Observability:**
+- Problem: No metrics for job success rate, average processing time per stage, or failure breakdown by error type
+- Blocks: Cannot diagnose performance regressions; cannot identify bottlenecks
+- Implementation: Add `/health/agentic-rag` endpoint exposing per-pass timing, token usage, cost data
+
+**Document Version History:**
+- Problem: Processing pipeline overwrites `analysis_data` on each run; no ability to compare old vs. new results
+- Blocks: Cannot detect if new model version improves accuracy; hard to debug regression
+- Implementation: Add `document_versions` table; keep historical results; implement diff UI
+
+**Retry Mechanism for Failed Documents:**
+- Problem: Failed documents stay in failed state; no way to retry after infrastructure recovers
+- Blocks: User must re-upload document; processing failures are permanent per upload
+- Implementation: Add "Retry" button to failed document status; re-queue without user re-upload
+
+## Test Coverage Gaps
+
+**End-to-End Pipeline with Large Documents:**
+- What's not tested: Full processing pipeline with 50+ MB documents; covers PDF chunking, Document AI extraction, embeddings, LLM analysis, PDF generation
+- Files: No integration test covering full flow with large fixture
+- Risk: Cannot detect if scaling to large documents introduces timeouts or memory issues
+- Priority: High (Project Panther regression was not caught by tests)
+
+**Concurrent Job Processing:**
+- What's not tested: Multiple jobs submitted simultaneously; verify no race conditions in job queue or database
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/models/ProcessingJobModel.ts`
+- Risk: Race condition causes duplicate processing or lost job state in production
+- Priority: High (affects reliability)
+
+**Vector Database Fallback Scenarios:**
+- What's not tested: Simulate Supabase RPC timeout and verify correct fallback search is executed
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 110-182)
+- Risk: Fallback search silent failures or incorrect results not detected
+- Priority: Medium (affects search quality)
+
+**LLM API Provider Switching:**
+- What's not tested: Switch between Anthropic, OpenAI, OpenRouter; verify each provider works correctly
+- Files: `backend/src/services/llmService.ts` (provider selection logic)
+- Risk: Provider-specific bugs not caught until production usage
+- Priority: Medium (currently only Anthropic heavily used)
+
+**Error Propagation in Hybrid Processing:**
+- What's not tested: Job queue failure → immediate processing fallback; verify document status and error reporting
+- Files: `backend/src/controllers/documentController.ts` (lines 200+)
+- Risk: Silent failures or incorrect status updates if fallback error not properly handled
+- Priority: High (affects user experience)
+
+---
+
+*Concerns audit: 2026-02-24*
--- a/.planning/codebase/CONVENTIONS.md
+++ b/.planning/codebase/CONVENTIONS.md
@@ -0,0 +1,286 @@
+# Coding Conventions
+
+**Analysis Date:** 2026-02-24
+
+## Naming Patterns
+
+**Files:**
+- Backend service files: `camelCase.ts` (e.g., `llmService.ts`, `unifiedDocumentProcessor.ts`, `vectorDatabaseService.ts`)
+- Backend middleware/controllers: `camelCase.ts` (e.g., `errorHandler.ts`, `firebaseAuth.ts`)
+- Frontend components: `PascalCase.tsx` (e.g., `DocumentUpload.tsx`, `LoginForm.tsx`, `ProtectedRoute.tsx`)
+- Frontend utility files: `camelCase.ts` (e.g., `cn.ts` for class name utilities)
+- Type definition files: `camelCase.ts` with `.d.ts` suffix optional (e.g., `express.d.ts`)
+- Model files: `PascalCase.ts` in `backend/src/models/` (e.g., `DocumentModel.ts`)
+- Config files: `camelCase.ts` (e.g., `env.ts`, `firebase.ts`, `supabase.ts`)
+
+**Functions:**
+- Both backend and frontend use camelCase: `processDocument()`, `validateUUID()`, `handleUpload()`
+- React components are PascalCase: `DocumentUpload`, `ErrorHandler`
+- Handler functions use `handle` or verb prefix: `handleVisibilityChange()`, `onDrop()`
+- Async functions use descriptive names: `fetchDocuments()`, `uploadDocument()`, `processDocument()`
+
+**Variables:**
+- camelCase for all variables: `documentId`, `correlationId`, `isUploading`, `uploadedFiles`
+- Constant state use UPPER_SNAKE_CASE in rare cases: `MAX_CONCURRENT_LLM_CALLS`, `MAX_TOKEN_LIMITS`
+- Boolean prefixes: `is*` (isUploading, isAdmin), `has*` (hasError), `can*` (canProcess)
+
+**Types:**
+- Interfaces use PascalCase: `LLMRequest`, `UploadedFile`, `DocumentUploadProps`, `CIMReview`
+- Type unions use PascalCase: `ErrorCategory`, `ProcessingStrategy`
+- Generic types use single uppercase letter or descriptive name: `T`, `K`, `V`
+- Enum values use UPPER_SNAKE_CASE: `ErrorCategory.VALIDATION`, `ErrorCategory.AUTHENTICATION`
+
+**Interfaces vs Types:**
+- **Interfaces** for object shapes that represent entities or components: `interface Document`, `interface UploadedFile`
+- **Types** for unions, primitives, and specialized patterns: `type ProcessingStrategy = 'document_ai_agentic_rag' | 'simple_full_document'`
+
+## Code Style
+
+**Formatting:**
+- No formal Prettier config detected in repo (allow varied formatting)
+- 2-space indentation (observed in TypeScript files)
+- Semicolons required at end of statements
+- Single quotes for strings in TypeScript, double quotes in JSX attributes
+- Line length: preferably under 100 characters but not enforced
+
+**Linting:**
+- Tool: ESLint with TypeScript support
+- Config: `.eslintrc.js` in backend
+- Key rules:
+  - `@typescript-eslint/no-unused-vars`: error (allows leading underscore for intentionally unused)
+  - `@typescript-eslint/no-explicit-any`: warn (use `unknown` instead)
+  - `@typescript-eslint/no-non-null-assertion`: warn (use proper type guards)
+  - `no-console`: off in backend (logging used via Winston)
+  - `no-undef`: error (strict undefined checking)
+- Frontend ESLint ignores unused disable directives and has max-warnings: 0
+
+**TypeScript Standards:**
+- Strict mode not fully enabled (noImplicitAny disabled in tsconfig.json for legacy reasons)
+- Prefer explicit typing over `any`: use `unknown` when type is truly unknown
+- Type guards required for safety checks: `error instanceof Error ? error.message : String(error)`
+- No type assertions with `as` for complex types; use proper type narrowing
+
+## Import Organization
+
+**Order:**
+1. External framework/library imports (`express`, `react`, `winston`)
+2. Google Cloud/Firebase imports (`@google-cloud/storage`, `firebase-admin`)
+3. Third-party service imports (`axios`, `zod`, `joi`)
+4. Internal config imports (`'../config/env'`, `'../config/firebase'`)
+5. Internal utility imports (`'../utils/logger'`, `'../utils/cn'`)
+6. Internal model imports (`'../models/DocumentModel'`)
+7. Internal service imports (`'../services/llmService'`)
+8. Internal middleware/helper imports (`'../middleware/errorHandler'`)
+9. Type-only imports at the end: `import type { ProcessingStrategy } from '...'`
+
+**Examples:**
+
+Backend service pattern from `optimizedAgenticRAGProcessor.ts`:
+```typescript
+import { logger } from '../utils/logger';
+import { vectorDatabaseService } from './vectorDatabaseService';
+import { VectorDatabaseModel } from '../models/VectorDatabaseModel';
+import { llmService } from './llmService';
+import { CIMReview } from './llmSchemas';
+import { config } from '../config/env';
+import type { ParsedFinancials } from './financialTableParser';
+import type { StructuredTable } from './documentAiProcessor';
+```
+
+Frontend component pattern from `DocumentList.tsx`:
+```typescript
+import React from 'react';
+import {
+  FileText,
+  Eye,
+  Download,
+  Trash2,
+  Calendar,
+  User,
+  Clock
+} from 'lucide-react';
+import { cn } from '../utils/cn';
+```
+
+**Path Aliases:**
+- No @ alias imports detected; all use relative `../` patterns
+- Monorepo structure: frontend and backend in separate directories with independent module resolution
+
+## Error Handling
+
+**Patterns:**
+
+1. **Structured Error Objects with Categories:**
+   - Use `ErrorCategory` enum for classification: `VALIDATION`, `AUTHENTICATION`, `AUTHORIZATION`, `NOT_FOUND`, `EXTERNAL_SERVICE`, `PROCESSING`, `DATABASE`, `SYSTEM`
+   - Attach `AppError` interface properties: `statusCode`, `isOperational`, `code`, `correlationId`, `category`, `retryable`, `context`
+   - Example from `errorHandler.ts`:
+     ```typescript
+     const enhancedError: AppError = {
+       category: ErrorCategory.VALIDATION,
+       statusCode: 400,
+       code: 'INVALID_UUID_FORMAT',
+       retryable: false
+     };
+     ```
+
+2. **Try-Catch with Structured Logging:**
+   - Always catch errors with explicit type checking
+   - Log with structured data including correlation ID
+   - Example pattern:
+     ```typescript
+     try {
+       await operation();
+     } catch (error) {
+       logger.error('Operation failed', {
+         error: error instanceof Error ? error.message : String(error),
+         stack: error instanceof Error ? error.stack : undefined,
+         context: { documentId, userId }
+       });
+       throw error;
+     }
+     ```
+
+3. **HTTP Response Pattern:**
+   - Success responses: `{ success: true, data: {...} }`
+   - Error responses: `{ success: false, error: { code, message, details, correlationId, timestamp, retryable } }`
+   - User-friendly messages mapped by error category
+   - Include `X-Correlation-ID` header in responses
+
+4. **Retry Logic:**
+   - LLM service implements concurrency limiting: max 1 concurrent call to prevent rate limits
+   - 3 retry attempts for LLM API calls with exponential backoff (see `llmService.ts` lines 236-450)
+   - Jobs respect 14-minute timeout limit with graceful status updates
+
+5. **External Service Errors:**
+   - Firebase Auth errors: extract from `error.message` and `error.name` (TokenExpiredError, JsonWebTokenError)
+   - Supabase errors: check `error.code` and `error.message`, handle UUID validation errors
+   - GCS errors: extract from error objects with proper null checks
+
+## Logging
+
+**Framework:** Winston logger from `backend/src/utils/logger.ts`
+
+**Levels:**
+- `logger.debug()`: Detailed diagnostic info (disabled in production)
+- `logger.info()`: Normal operation information, upload start/completion, processing status
+- `logger.warn()`: Warning conditions, CORS rejections, non-critical issues
+- `logger.error()`: Error conditions with full context and stack traces
+
+**Structured Logging Pattern:**
+```typescript
+logger.info('Message', {
+  correlationId: correlationId,
+  category: 'operation_type',
+  operation: 'specific_action',
+  documentId: documentId,
+  userId: userId,
+  metadata: value,
+  timestamp: new Date().toISOString()
+});
+```
+
+**StructuredLogger Class:**
+- Use for operations requiring correlation ID tracking
+- Constructor: `const logger = new StructuredLogger(correlationId)`
+- Specialized methods:
+  - `uploadStart()`, `uploadSuccess()`, `uploadError()` - for file operations
+  - `processingStart()`, `processingSuccess()`, `processingError()` - for document processing
+  - `storageOperation()` - for file storage operations
+  - `jobQueueOperation()` - for background jobs
+  - `info()`, `warn()`, `error()`, `debug()` - general logging
+- All methods automatically attach correlation ID to metadata
+
+**What NOT to Log:**
+- Credentials, API keys, or sensitive data
+- Large file contents or binary data
+- User passwords or tokens (log only presence: "token available" or "NO_TOKEN")
+- Request body contents (sanitized in error handler - only whitelisted fields: documentId, id, status, fileName, fileSize, contentType, correlationId)
+
+**Console Usage:**
+- Backend: `console.log` disabled by ESLint in production code; only Winston logger used
+- Frontend: `console.log` used in development (observed in DocumentUpload, App components)
+- Special case: logger initialization may use console.warn for setup diagnostics
+
+## Comments
+
+**When to Comment:**
+- Complex algorithms or business logic: explain "why", not "what" the code does
+- Non-obvious type conversions or workarounds
+- Links to related issues, tickets, or documentation
+- Critical security considerations or performance implications
+- TODO items for incomplete work (format: `// TODO: [description]`)
+
+**JSDoc/TSDoc:**
+- Used for function and class documentation in utility and service files
+- Function signature example from `test-helpers.ts`:
+  ```typescript
+  /**
+   * Creates a mock correlation ID for testing
+   */
+  export function createMockCorrelationId(): string
+  ```
+- Parameter and return types documented via TypeScript typing (preferred over verbose JSDoc)
+- Service classes include operation summaries: `/** Process document using Document AI + Agentic RAG strategy */`
+
+## Function Design
+
+**Size:**
+- Keep functions focused on single responsibility
+- Long services (300+ lines) separate concerns into helper methods
+- Controller/middleware functions stay under 50 lines
+
+**Parameters:**
+- Max 3-4 required parameters; use object for additional config
+- Example: `processDocument(documentId: string, userId: string, text: string, options?: { strategy?: string })`
+- Use destructuring for config objects: `{ strategy, maxTokens, temperature }`
+
+**Return Values:**
+- Async operations return Promise with typed success/error objects
+- Pattern: `Promise<{ success: boolean; data: T; error?: string }>`
+- Avoid throwing in service methods; return error in object
+- Controllers/middleware can throw for Express error handler
+
+**Type Signatures:**
+- Always specify parameter and return types (no implicit `any`)
+- Use generics for reusable patterns: `Promise<T>`, `Array<Document>`
+- Union types for multiple possibilities: `'uploading' | 'uploaded' | 'processing' | 'completed' | 'error'`
+
+## Module Design
+
+**Exports:**
+- Services exported as singleton instances: `export const llmService = new LLMService()`
+- Utility functions exported as named exports: `export function validateUUID() { ... }`
+- Type definitions exported from dedicated type files or alongside implementation
+- Classes exported as default or named based on usage pattern
+
+**Barrel Files:**
+- Not consistently used; services import directly from implementation files
+- Example: `import { llmService } from './llmService'` not from `./services/index`
+- Consider adding for cleaner imports when services directory grows
+
+**Service Singletons:**
+- All services instantiated once and exported as singletons
+- Examples:
+  - `backend/src/services/llmService.ts`: `export const llmService = new LLMService()`
+  - `backend/src/services/fileStorageService.ts`: `export const fileStorageService = new FileStorageService()`
+  - `backend/src/services/vectorDatabaseService.ts`: `export const vectorDatabaseService = new VectorDatabaseService()`
+- Prevents multiple initialization and enables dependency sharing
+
+**Frontend Context Pattern:**
+- React Context for auth: `AuthContext` exports `useAuth()` hook
+- Services pattern: `documentService` contains API methods, used as singleton
+- No service singletons in frontend (class instances recreated as needed)
+
+## Deprecated Patterns (DO NOT USE)
+
+- ❌ Direct PostgreSQL connections - Use Supabase client instead
+- ❌ JWT authentication - Use Firebase Auth tokens
+- ❌ `console.log` in production code - Use Winston logger
+- ❌ Type assertions with `as` for complex types - Use type guards
+- ❌ Manual error handling without correlation IDs
+- ❌ Redis caching - Not used in current architecture
+- ❌ Jest testing - Use Vitest instead
+
+---
+
+*Convention analysis: 2026-02-24*
--- a/.planning/codebase/INTEGRATIONS.md
+++ b/.planning/codebase/INTEGRATIONS.md
@@ -0,0 +1,247 @@
+# External Integrations
+
+**Analysis Date:** 2026-02-24
+
+## APIs & External Services
+
+**Document Processing:**
+- Google Document AI
+  - Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing
+  - Client: `@google-cloud/documentai` 9.3.0
+  - Implementation: `backend/src/services/documentAiProcessor.ts`
+  - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS` or default credentials
+  - Configuration: Processor ID from `DOCUMENT_AI_PROCESSOR_ID`, location from `DOCUMENT_AI_LOCATION` (default: 'us')
+  - Max pages per chunk: 15 pages (configurable)
+
+**Large Language Models:**
+- OpenAI
+  - Purpose: LLM analysis of document content, embeddings for vector search
+  - SDK/Client: `openai` 5.10.2
+  - Auth: API key from `OPENAI_API_KEY`
+  - Models: Default `gpt-4-turbo`, embeddings via `text-embedding-3-small`
+  - Implementation: `backend/src/services/llmService.ts` with provider abstraction
+  - Retry: 3 attempts with exponential backoff
+
+- Anthropic Claude
+  - Purpose: LLM analysis and document summary generation
+  - SDK/Client: `@anthropic-ai/sdk` 0.57.0
+  - Auth: API key from `ANTHROPIC_API_KEY`
+  - Models: Default `claude-sonnet-4-20250514` (configurable via `LLM_MODEL`)
+  - Implementation: `backend/src/services/llmService.ts`
+  - Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors)
+  - Retry: 3 attempts with exponential backoff
+
+- OpenRouter
+  - Purpose: Alternative LLM provider supporting multiple models through single API
+  - SDK/Client: HTTP requests via `axios` to OpenRouter API
+  - Auth: `OPENROUTER_API_KEY` or optional Bring-Your-Own-Key mode (`OPENROUTER_USE_BYOK`)
+  - Configuration: `LLM_PROVIDER: 'openrouter'` activates this provider
+  - Implementation: `backend/src/services/llmService.ts`
+
+**File Storage:**
+- Google Cloud Storage (GCS)
+  - Purpose: Store uploaded PDFs, processed documents, and generated PDFs
+  - SDK/Client: `@google-cloud/storage` 7.16.0
+  - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS`
+  - Buckets:
+    - Input: `GCS_BUCKET_NAME` for uploaded documents
+    - Output: `DOCUMENT_AI_OUTPUT_BUCKET_NAME` for processing results
+  - Implementation: `backend/src/services/fileStorageService.ts` and `backend/src/services/documentAiProcessor.ts`
+  - Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
+
+## Data Storage
+
+**Databases:**
+- Supabase PostgreSQL
+  - Connection: `SUPABASE_URL` for PostgREST API, `DATABASE_URL` for direct PostgreSQL
+  - Client: `@supabase/supabase-js` 2.53.0 for REST API, `pg` 8.11.3 for direct pool connections
+  - Auth: `SUPABASE_ANON_KEY` for client operations, `SUPABASE_SERVICE_KEY` for server operations
+  - Implementation:
+    - `backend/src/config/supabase.ts` - Client initialization with 30-second request timeout
+    - `backend/src/models/` - All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel)
+  - Vector Support: pgvector extension for semantic search
+  - Tables:
+    - `users` - User accounts and authentication data
+    - `documents` - CIM documents with status tracking
+    - `document_chunks` - Text chunks with embeddings for vector search
+    - `document_feedback` - User feedback on summaries
+    - `document_versions` - Document version history
+    - `document_audit_logs` - Audit trail for compliance
+    - `processing_jobs` - Background job queue with status tracking
+    - `performance_metrics` - System performance data
+  - Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout
+
+**Vector Database:**
+- Supabase pgvector (built into PostgreSQL)
+  - Purpose: Semantic search and RAG context retrieval
+  - Implementation: `backend/src/services/vectorDatabaseService.ts`
+  - Embedding generation: Via OpenAI `text-embedding-3-small` (embedded in service)
+  - Search: Cosine similarity via Supabase RPC calls
+  - Semantic cache: 1-hour TTL for cached embeddings
+
+**File Storage:**
+- Google Cloud Storage (primary storage above)
+- Local filesystem (fallback for development, stored in `uploads/` directory)
+
+**Caching:**
+- In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL
+- No external cache service (Redis, Memcached) currently used
+
+## Authentication & Identity
+
+**Auth Provider:**
+- Firebase Authentication
+  - Purpose: User authentication, JWT token generation and verification
+  - Client: `firebase` 12.0.0 (frontend at `frontend/src/config/firebase.ts`)
+  - Admin: `firebase-admin` 13.4.0 (backend at `backend/src/config/firebase.ts`)
+  - Implementation:
+    - Frontend: `frontend/src/services/authService.ts` - Login, logout, token refresh
+    - Backend: `backend/src/middleware/firebaseAuth.ts` - Token verification middleware
+  - Project: `cim-summarizer` (hardcoded in config)
+  - Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header
+
+**Token-Based Auth:**
+- JWT (JSON Web Tokens)
+  - Purpose: API request authentication
+  - Implementation: `backend/src/middleware/firebaseAuth.ts`
+  - Verification: Firebase Admin SDK verifies token signature and expiration
+  - Header: `Authorization: Bearer <token>`
+
+**Fallback Auth (for service-to-service):**
+- API Key based (not currently exposed but framework supports it in `backend/src/config/env.ts`)
+
+## Monitoring & Observability
+
+**Error Tracking:**
+- No external error tracking service configured
+- Errors logged via Winston logger with correlation IDs for tracing
+
+**Logs:**
+- Winston logger 3.11.0 - Structured JSON logging at `backend/src/utils/logger.ts`
+- Transports: Console (development), File-based for production logs
+- Correlation ID middleware at `backend/src/middleware/errorHandler.ts` - Every request traced
+- Request logging: Morgan 1.10.0 with Winston transport
+- Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments
+
+**Monitoring Endpoints:**
+- `GET /health` - Basic health check with uptime and environment info
+- `GET /health/config` - Configuration validation status
+- `GET /health/agentic-rag` - Agentic RAG system health (placeholder)
+- `GET /monitoring/dashboard` - Aggregated system metrics (queryable by time range)
+
+## CI/CD & Deployment
+
+**Hosting:**
+- **Backend**:
+  - Firebase Cloud Functions (default, Node.js 20 runtime)
+  - Google Cloud Run (alternative containerized deployment)
+  - Configuration: `backend/firebase.json` defines function source, runtime, and predeploy hooks
+
+- **Frontend**:
+  - Firebase Hosting (CDN-backed static hosting)
+  - Configuration: Defined in `frontend/` directory with `firebase.json`
+
+**Deployment Commands:**
+```bash
+# Backend deployment
+npm run deploy:firebase          # Deploy functions to Firebase
+npm run deploy:cloud-run        # Deploy to Cloud Run
+npm run docker:build            # Build Docker image
+npm run docker:push             # Push to GCR
+
+# Frontend deployment
+npm run deploy:firebase         # Deploy to Firebase Hosting
+npm run deploy:preview          # Deploy to preview channel
+
+# Emulator
+npm run emulator                # Run Firebase emulator locally
+npm run emulator:ui             # Run emulator with UI
+```
+
+**Build Pipeline:**
+- TypeScript compilation: `tsc` targets ES2020
+- Predeploy: Defined in `firebase.json` - runs `npm run build`
+- Docker image for Cloud Run: `Dockerfile` in backend root
+
+## Environment Configuration
+
+**Required env vars (Production):**
+```
+NODE_ENV=production
+LLM_PROVIDER=anthropic
+GCLOUD_PROJECT_ID=cim-summarizer
+DOCUMENT_AI_PROCESSOR_ID=<processor-id>
+GCS_BUCKET_NAME=<bucket-name>
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=<output-bucket>
+SUPABASE_URL=https://<project>.supabase.co
+SUPABASE_ANON_KEY=<anon-key>
+SUPABASE_SERVICE_KEY=<service-key>
+DATABASE_URL=postgresql://postgres:<password>@aws-0-us-central-1.pooler.supabase.com:6543/postgres
+ANTHROPIC_API_KEY=sk-ant-...
+OPENAI_API_KEY=sk-...
+FIREBASE_PROJECT_ID=cim-summarizer
+```
+
+**Optional env vars:**
+```
+DOCUMENT_AI_LOCATION=us
+VECTOR_PROVIDER=supabase
+LLM_MODEL=claude-sonnet-4-20250514
+LLM_MAX_TOKENS=16000
+LLM_TEMPERATURE=0.1
+OPENROUTER_API_KEY=<key>
+OPENROUTER_USE_BYOK=true
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
+```
+
+**Secrets location:**
+- Development: `.env` file (gitignored, never committed)
+- Production: Firebase Functions secrets via `firebase functions:secrets:set`
+- Google Credentials: `backend/serviceAccountKey.json` for local dev, service account in Cloud Functions environment
+
+## Webhooks & Callbacks
+
+**Incoming:**
+- No external webhooks currently configured
+- All document processing triggered by HTTP POST to `POST /documents/upload`
+
+**Outgoing:**
+- No outgoing webhooks implemented
+- Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue
+
+**Real-time Monitoring:**
+- Server-Sent Events (SSE) not implemented
+- Polling endpoints for progress:
+  - `GET /documents/{id}/progress` - Document processing progress
+  - `GET /documents/queue/status` - Job queue status (frontend polls every 5 seconds)
+
+## Rate Limiting & Quotas
+
+**API Rate Limits:**
+- Express rate limiter: 1000 requests per 15 minutes per IP
+- LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling)
+- OpenAI rate limits: Handled by SDK with backoff
+
+**File Upload Limits:**
+- Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
+- Allowed MIME types: `application/pdf` (configurable via `ALLOWED_FILE_TYPES`)
+
+## Network Configuration
+
+**CORS Origins (Allowed):**
+- `https://cim-summarizer.web.app` (production)
+- `https://cim-summarizer.firebaseapp.com` (production)
+- `http://localhost:3000` (development)
+- `http://localhost:5173` (development)
+- `https://localhost:3000` (SSL local dev)
+- `https://localhost:5173` (SSL local dev)
+
+**Port Mappings:**
+- Frontend dev: Port 5173 (Vite dev server)
+- Backend dev: Port 5001 (Firebase Functions emulator)
+- Backend API: Port 5000 (Express in standard deployment)
+- Vite proxy to backend: `/api` routes proxied from port 5173 to `http://localhost:5000`
+
+---
+
+*Integration audit: 2026-02-24*
--- a/.planning/codebase/STACK.md
+++ b/.planning/codebase/STACK.md
@@ -0,0 +1,148 @@
+# Technology Stack
+
+**Analysis Date:** 2026-02-24
+
+## Languages
+
+**Primary:**
+- TypeScript 5.2.2 - Both backend and frontend, strict mode enabled
+- JavaScript (CommonJS) - Build outputs and configuration
+
+**Supporting:**
+- SQL - Supabase PostgreSQL database via migrations in `backend/src/models/migrations/`
+
+## Runtime
+
+**Environment:**
+- Node.js 20 (specified in `backend/firebase.json`)
+- Browser (ES2020 target for both client and server)
+
+**Package Manager:**
+- npm - Primary package manager for both backend and frontend
+- Lockfile: `package-lock.json` present in both `backend/` and `frontend/`
+
+## Frameworks
+
+**Backend - Core:**
+- Express.js 4.18.2 - HTTP server and REST API framework at `backend/src/index.ts`
+- Firebase Admin SDK 13.4.0 - Authentication and service account management at `backend/src/config/firebase.ts`
+- Firebase Functions 6.4.0 - Cloud Functions deployment runtime at port 5001
+
+**Frontend - Core:**
+- React 18.2.0 - UI framework with TypeScript support
+- Vite 4.5.0 - Build tool and dev server (port 5173 for dev, port 3000 production)
+
+**Backend - Testing:**
+- Vitest 2.1.0 - Test runner with v8 coverage provider at `backend/vitest.config.ts`
+- Configuration: Global test environment set to 'node', 30-second test timeout
+
+**Backend - Build/Dev:**
+- ts-node 10.9.2 - TypeScript execution for scripts
+- ts-node-dev 2.0.0 - Live reload development server with `--transpile-only` flag
+- TypeScript Compiler (tsc) 5.2.2 - Strict type checking, ES2020 target
+
+**Frontend - Build/Dev:**
+- Vite React plugin 4.1.1 - React JSX transformation
+- TailwindCSS 3.3.5 - Utility-first CSS framework with PostCSS 8.4.31
+
+## Key Dependencies
+
+**Critical Infrastructure:**
+- `@google-cloud/documentai` 9.3.0 - Google Document AI OCR/text extraction at `backend/src/services/documentAiProcessor.ts`
+- `@google-cloud/storage` 7.16.0 - Google Cloud Storage (GCS) for file uploads and processing
+- `@supabase/supabase-js` 2.53.0 - PostgreSQL database client with vector support at `backend/src/config/supabase.ts`
+- `pg` 8.11.3 - Direct PostgreSQL connection pool for critical operations bypassing PostgREST
+
+**LLM & AI:**
+- `@anthropic-ai/sdk` 0.57.0 - Claude API integration with support for Anthropic provider
+- `openai` 5.10.2 - OpenAI API and embeddings (text-embedding-3-small)
+- Both providers abstracted via `backend/src/services/llmService.ts`
+
+**PDF Processing:**
+- `pdf-lib` 1.17.1 - PDF generation and manipulation at `backend/src/services/pdfGenerationService.ts`
+- `pdf-parse` 1.1.1 - PDF text extraction
+- `pdfkit` 0.17.1 - PDF document creation
+
+**Document Processing:**
+- `puppeteer` 21.11.0 - Headless Chrome for HTML/PDF conversion
+
+**Security & Authentication:**
+- `firebase` 12.0.0 (frontend) - Firebase client SDK for authentication at `frontend/src/config/firebase.ts`
+- `firebase-admin` 13.4.0 (backend) - Admin SDK for token verification at `backend/src/middleware/firebaseAuth.ts`
+- `jsonwebtoken` 9.0.2 - JWT token creation and verification
+- `bcryptjs` 2.4.3 - Password hashing with 12 rounds default
+
+**API & HTTP:**
+- `axios` 1.11.0 - HTTP client for both frontend and backend
+- `cors` 2.8.5 - Cross-Origin Resource Sharing middleware for Express
+- `helmet` 7.1.0 - Security headers middleware
+- `morgan` 1.10.0 - HTTP request logging middleware
+- `express-rate-limit` 7.1.5 - Rate limiting middleware (1000 requests per 15 minutes)
+
+**Data Validation & Schema:**
+- `zod` 3.25.76 - TypeScript-first schema validation at `backend/src/services/llmSchemas.ts`
+- `zod-to-json-schema` 3.24.6 - Convert Zod schemas to JSON Schema for LLM structured output
+- `joi` 17.11.0 - Environment variable validation in `backend/src/config/env.ts`
+
+**Logging & Monitoring:**
+- `winston` 3.11.0 - Structured logging framework with multiple transports at `backend/src/utils/logger.ts`
+
+**Frontend - UI Components:**
+- `lucide-react` 0.294.0 - Icon library
+- `react-dom` 18.2.0 - React rendering for web
+- `react-router-dom` 6.20.1 - Client-side routing
+- `react-dropzone` 14.3.8 - File upload handling
+- `clsx` 2.0.0 - Conditional className utility
+- `tailwind-merge` 2.0.0 - Merge Tailwind classes with conflict resolution
+
+**Utilities:**
+- `uuid` 11.1.0 - Unique identifier generation
+- `dotenv` 16.3.1 - Environment variable loading from `.env` files
+
+## Configuration
+
+**Environment:**
+- **.env file support** - Dotenv loads from `.env` for local development in `backend/src/config/env.ts`
+- **Environment validation** - Joi schema at `backend/src/config/env.ts` validates all required/optional env vars
+- **Firebase Functions v2** - Uses `defineString()` and `defineSecret()` for secure configuration (migration from v1 functions.config())
+
+**Key Configuration Variables (Backend):**
+- `NODE_ENV` - 'development' | 'production' | 'test'
+- `LLM_PROVIDER` - 'openai' | 'anthropic' | 'openrouter' (default: 'openai')
+- `GCLOUD_PROJECT_ID` - Google Cloud project ID (required)
+- `DOCUMENT_AI_PROCESSOR_ID` - Document AI processor ID (required)
+- `GCS_BUCKET_NAME` - Google Cloud Storage bucket (required)
+- `SUPABASE_URL`, `SUPABASE_ANON_KEY`, `SUPABASE_SERVICE_KEY` - Supabase PostgreSQL connection
+- `DATABASE_URL` - Direct PostgreSQL connection string for bypass operations
+- `OPENAI_API_KEY` - OpenAI API key for embeddings and models
+- `ANTHROPIC_API_KEY` - Anthropic Claude API key
+- `OPENROUTER_API_KEY` - OpenRouter API key (optional, uses BYOK with Anthropic key)
+
+**Key Configuration Variables (Frontend):**
+- `VITE_API_BASE_URL` - Backend API endpoint
+- `VITE_FIREBASE_*` - Firebase configuration (API key, auth domain, project ID, etc.)
+
+**Build Configuration:**
+- **Backend**: `backend/tsconfig.json` - Strict TypeScript, CommonJS module output, ES2020 target
+- **Frontend**: `frontend/tsconfig.json` - ES2020 target, JSX React support, path alias `@/*`
+- **Firebase**: `backend/firebase.json` - Node.js 20 runtime, Firebase Functions emulator on port 5001
+
+## Platform Requirements
+
+**Development:**
+- Node.js 20.x
+- npm 9+
+- Google Cloud credentials (for Document AI and GCS)
+- Firebase project credentials (service account key)
+- Supabase project URL and keys
+
+**Production:**
+- **Backend**: Firebase Cloud Functions (Node.js 20 runtime) or Google Cloud Run
+- **Frontend**: Firebase Hosting (CDN-backed static hosting)
+- **Database**: Supabase PostgreSQL with pgvector extension for vector search
+- **Storage**: Google Cloud Storage for documents and generated PDFs
+- **Memory Limits**: Backend configured with `--max-old-space-size=8192` for large document processing
+
+---
+
+*Stack analysis: 2026-02-24*
--- a/.planning/codebase/STRUCTURE.md
+++ b/.planning/codebase/STRUCTURE.md
@@ -0,0 +1,374 @@
+# Codebase Structure
+
+**Analysis Date:** 2026-02-24
+
+## Directory Layout
+
+```
+cim_summary/
+├── backend/                        # Express.js + TypeScript backend (Node.js)
+│   ├── src/
+│   │   ├── index.ts               # Express app + Firebase Functions exports
+│   │   ├── controllers/           # Request handlers
+│   │   ├── models/                # Database access + schema
+│   │   ├── services/              # Business logic + external integrations
+│   │   ├── routes/                # Express route definitions
+│   │   ├── middleware/            # Express middleware (auth, validation, error)
+│   │   ├── config/                # Configuration (env, firebase, supabase)
+│   │   ├── utils/                 # Utilities (logger, validation, parsing)
+│   │   ├── types/                 # TypeScript type definitions
+│   │   ├── scripts/               # One-off CLI scripts (diagnostics, setup)
+│   │   ├── assets/                # Static assets (HTML templates)
+│   │   └── __tests__/             # Test suites (unit, integration, acceptance)
+│   ├── package.json               # Node dependencies
+│   ├── tsconfig.json              # TypeScript config
+│   ├── .eslintrc.json             # ESLint config
+│   └── dist/                       # Compiled JavaScript (generated)
+│
+├── frontend/                       # React + Vite + TypeScript frontend
+│   ├── src/
+│   │   ├── main.tsx               # React entry point
+│   │   ├── App.tsx                # Root component with routing
+│   │   ├── components/            # React components (UI)
+│   │   ├── services/              # API clients (documentService, authService)
+│   │   ├── contexts/              # React Context (AuthContext)
+│   │   ├── config/                # Configuration (env, firebase)
+│   │   ├── types/                 # TypeScript interfaces
+│   │   ├── utils/                 # Utilities (validation, cn, auth debug)
+│   │   └── assets/                # Static images and icons
+│   ├── package.json               # Node dependencies
+│   ├── tsconfig.json              # TypeScript config
+│   ├── vite.config.ts             # Vite bundler config
+│   ├── eslintrc.json              # ESLint config
+│   ├── tailwind.config.js          # Tailwind CSS config
+│   ├── postcss.config.js           # PostCSS config
+│   └── dist/                       # Built static assets (generated)
+│
+├── .planning/                      # GSD planning directory
+│   └── codebase/                  # Codebase analysis documents
+│
+├── package.json                    # Monorepo root package (if used)
+├── .git/                           # Git repository
+├── .gitignore                      # Git ignore rules
+├── .cursorrules                    # Cursor IDE configuration
+├── README.md                       # Project overview
+├── CONFIGURATION_GUIDE.md          # Setup instructions
+├── CODEBASE_ARCHITECTURE_SUMMARY.md # Existing architecture notes
+└── [PDF documents]                 # Sample CIM documents for testing
+```
+
+## Directory Purposes
+
+**backend/src/:**
+- Purpose: All backend server code
+- Contains: TypeScript source files
+- Key files: `index.ts` (main app), routes, controllers, services, models
+
+**backend/src/controllers/:**
+- Purpose: HTTP request handlers
+- Contains: `documentController.ts`, `authController.ts`
+- Functions: Map HTTP requests to service calls, handle validation, construct responses
+
+**backend/src/services/:**
+- Purpose: Business logic and external integrations
+- Contains: Document processing, LLM integration, file storage, database, job queue
+- Key files:
+  - `unifiedDocumentProcessor.ts` - Orchestrator, strategy selection
+  - `singlePassProcessor.ts` - 2-LLM extraction (current default)
+  - `optimizedAgenticRAGProcessor.ts` - Advanced agentic processing (stub)
+  - `documentAiProcessor.ts` - Google Document AI OCR
+  - `llmService.ts` - LLM API calls (Anthropic/OpenAI/OpenRouter)
+  - `jobQueueService.ts` - Async job queue (in-memory, EventEmitter)
+  - `jobProcessorService.ts` - Dequeue and execute jobs
+  - `fileStorageService.ts` - GCS signed URLs and upload
+  - `vectorDatabaseService.ts` - Supabase pgvector operations
+  - `pdfGenerationService.ts` - Puppeteer PDF rendering
+  - `uploadProgressService.ts` - Track upload status
+  - `uploadMonitoringService.ts` - Monitor processing progress
+  - `llmSchemas.ts` - Zod schemas for LLM extraction (CIMReview, financial data)
+
+**backend/src/models/:**
+- Purpose: Database access layer and schema definitions
+- Contains: Document, User, ProcessingJob, Feedback models
+- Key files:
+  - `types.ts` - TypeScript interfaces (Document, ProcessingJob, ProcessingStatus)
+  - `DocumentModel.ts` - Document CRUD with retry logic
+  - `ProcessingJobModel.ts` - Job tracking in database
+  - `UserModel.ts` - User management
+  - `VectorDatabaseModel.ts` - Vector embedding queries
+  - `migrate.ts` - Database migrations
+  - `seed.ts` - Test data seeding
+  - `migrations/` - SQL migration files
+
+**backend/src/routes/:**
+- Purpose: Express route definitions
+- Contains: Route handlers and middleware bindings
+- Key files:
+  - `documents.ts` - GET/POST/PUT/DELETE document endpoints
+  - `vector.ts` - Vector search endpoints
+  - `monitoring.ts` - Health and status endpoints
+  - `documentAudit.ts` - Audit log endpoints
+
+**backend/src/middleware/:**
+- Purpose: Express middleware for cross-cutting concerns
+- Contains: Authentication, validation, error handling
+- Key files:
+  - `firebaseAuth.ts` - Firebase ID token verification
+  - `errorHandler.ts` - Global error handling + correlation ID
+  - `notFoundHandler.ts` - 404 handler
+  - `validation.ts` - Request validation (UUID, pagination)
+
+**backend/src/config/:**
+- Purpose: Configuration and initialization
+- Contains: Environment setup, service initialization
+- Key files:
+  - `env.ts` - Environment variable validation (Joi schema)
+  - `firebase.ts` - Firebase Admin SDK initialization
+  - `supabase.ts` - Supabase client and pool setup
+  - `database.ts` - PostgreSQL connection (legacy)
+  - `errorConfig.ts` - Error handling config
+
+**backend/src/utils/:**
+- Purpose: Shared utility functions
+- Contains: Logging, validation, parsing
+- Key files:
+  - `logger.ts` - Winston logger setup (console + file transports)
+  - `validation.ts` - UUID and pagination validators
+  - `googleServiceAccount.ts` - Google Cloud credentials resolution
+  - `financialExtractor.ts` - Financial data parsing (deprecated for single-pass)
+  - `templateParser.ts` - CIM template utilities
+  - `auth.ts` - Authentication helpers
+
+**backend/src/scripts/:**
+- Purpose: One-off CLI scripts for diagnostics and setup
+- Contains: Database setup, testing, monitoring
+- Key files:
+  - `setup-database.ts` - Initialize database schema
+  - `monitor-document-processing.ts` - Watch job queue status
+  - `check-current-job.ts` - Debug stuck jobs
+  - `test-full-llm-pipeline.ts` - End-to-end testing
+  - `comprehensive-diagnostic.ts` - System health check
+
+**backend/src/__tests__/:**
+- Purpose: Test suites
+- Contains: Unit, integration, acceptance tests
+- Subdirectories:
+  - `unit/` - Isolated component tests
+  - `integration/` - Multi-component tests
+  - `acceptance/` - End-to-end flow tests
+  - `mocks/` - Mock data and fixtures
+  - `utils/` - Test utilities
+
+**frontend/src/:**
+- Purpose: All frontend code
+- Contains: React components, services, types
+
+**frontend/src/components/:**
+- Purpose: React UI components
+- Contains: Page components, reusable widgets
+- Key files:
+  - `DocumentUpload.tsx` - File upload UI with drag-and-drop
+  - `DocumentList.tsx` - List of processed documents
+  - `DocumentViewer.tsx` - View and edit extracted data
+  - `ProcessingProgress.tsx` - Real-time processing status
+  - `UploadMonitoringDashboard.tsx` - Admin view of active jobs
+  - `LoginForm.tsx` - Firebase auth login UI
+  - `ProtectedRoute.tsx` - Route guard for authenticated pages
+  - `Analytics.tsx` - Document analytics and statistics
+  - `CIMReviewTemplate.tsx` - Display extracted CIM review data
+
+**frontend/src/services/:**
+- Purpose: API clients and external service integration
+- Contains: HTTP clients for backend
+- Key files:
+  - `documentService.ts` - Document API calls (upload, list, process, status)
+  - `authService.ts` - Firebase authentication (login, logout, token)
+  - `adminService.ts` - Admin-only operations
+
+**frontend/src/contexts/:**
+- Purpose: React Context for global state
+- Contains: AuthContext for user and authentication state
+- Key files:
+  - `AuthContext.tsx` - User, token, login/logout state
+
+**frontend/src/config/:**
+- Purpose: Configuration
+- Contains: Environment variables, Firebase setup
+- Key files:
+  - `env.ts` - VITE_API_BASE_URL and other env vars
+  - `firebase.ts` - Firebase client initialization
+
+**frontend/src/types/:**
+- Purpose: TypeScript interfaces
+- Contains: API response types, component props
+- Key files:
+  - `auth.ts` - User, LoginCredentials, AuthContextType
+
+**frontend/src/utils/:**
+- Purpose: Shared utility functions
+- Contains: Validation, CSS utilities
+- Key files:
+  - `validation.ts` - Email, password validators
+  - `cn.ts` - Classname merger (clsx wrapper)
+  - `authDebug.ts` - Authentication debugging helpers
+
+## Key File Locations
+
+**Entry Points:**
+- `backend/src/index.ts` - Main Express app and Firebase Functions exports
+- `frontend/src/main.tsx` - React entry point
+- `frontend/src/App.tsx` - Root component with routing
+
+**Configuration:**
+- `backend/src/config/env.ts` - Environment variable schema and validation
+- `backend/src/config/firebase.ts` - Firebase Admin SDK setup
+- `backend/src/config/supabase.ts` - Supabase client and connection pool
+- `frontend/src/config/firebase.ts` - Firebase client configuration
+- `frontend/src/config/env.ts` - Frontend environment variables
+
+**Core Logic:**
+- `backend/src/services/unifiedDocumentProcessor.ts` - Main document processing orchestrator
+- `backend/src/services/singlePassProcessor.ts` - Single-pass 2-LLM strategy
+- `backend/src/services/llmService.ts` - LLM API integration with retry
+- `backend/src/services/jobQueueService.ts` - Background job queue
+- `backend/src/services/vectorDatabaseService.ts` - Vector search implementation
+
+**Testing:**
+- `backend/src/__tests__/unit/` - Unit tests
+- `backend/src/__tests__/integration/` - Integration tests
+- `backend/src/__tests__/acceptance/` - End-to-end tests
+
+**Database:**
+- `backend/src/models/types.ts` - TypeScript type definitions
+- `backend/src/models/DocumentModel.ts` - Document CRUD operations
+- `backend/src/models/ProcessingJobModel.ts` - Job tracking
+- `backend/src/models/migrations/` - SQL migration files
+
+**Middleware:**
+- `backend/src/middleware/firebaseAuth.ts` - JWT authentication
+- `backend/src/middleware/errorHandler.ts` - Global error handling
+- `backend/src/middleware/validation.ts` - Input validation
+
+**Logging:**
+- `backend/src/utils/logger.ts` - Winston logger configuration
+
+## Naming Conventions
+
+**Files:**
+- Controllers: `{resource}Controller.ts` (e.g., `documentController.ts`)
+- Services: `{service}Service.ts` or descriptive (e.g., `llmService.ts`, `singlePassProcessor.ts`)
+- Models: `{Entity}Model.ts` (e.g., `DocumentModel.ts`)
+- Routes: `{resource}.ts` (e.g., `documents.ts`)
+- Middleware: `{purpose}Handler.ts` or `{purpose}.ts` (e.g., `firebaseAuth.ts`)
+- Types/Interfaces: `types.ts` or `{name}Types.ts`
+- Tests: `{file}.test.ts` or `{file}.spec.ts`
+
+**Directories:**
+- Plurals for collections: `services/`, `models/`, `utils/`, `routes/`, `controllers/`
+- Singular for specific features: `config/`, `middleware/`, `types/`, `contexts/`
+- Nested by feature in larger directories: `__tests__/unit/`, `models/migrations/`
+
+**Functions/Variables:**
+- Camel case: `processDocument()`, `getUserId()`, `documentId`
+- Constants: UPPER_SNAKE_CASE: `MAX_RETRIES`, `TIMEOUT_MS`
+- Private methods: Prefix with `_` or use TypeScript `private`: `_retryOperation()`
+
+**Classes:**
+- Pascal case: `DocumentModel`, `JobQueueService`, `SinglePassProcessor`
+- Service instances exported as singletons: `export const llmService = new LLMService()`
+
+**React Components:**
+- Pascal case: `DocumentUpload.tsx`, `ProtectedRoute.tsx`
+- Hooks: `use{Feature}` (e.g., `useAuth` from AuthContext)
+
+## Where to Add New Code
+
+**New Document Processing Strategy:**
+- Primary code: `backend/src/services/{strategyName}Processor.ts`
+- Schema: Add types to `backend/src/services/llmSchemas.ts`
+- Integration: Register in `backend/src/services/unifiedDocumentProcessor.ts`
+- Tests: `backend/src/__tests__/integration/{strategyName}.test.ts`
+
+**New API Endpoint:**
+- Route: `backend/src/routes/{resource}.ts`
+- Controller: `backend/src/controllers/{resource}Controller.ts`
+- Service: `backend/src/services/{resource}Service.ts` (if needed)
+- Model: `backend/src/models/{Resource}Model.ts` (if database access)
+- Tests: `backend/src/__tests__/integration/{endpoint}.test.ts`
+
+**New React Component:**
+- Component: `frontend/src/components/{ComponentName}.tsx`
+- Types: Add to `frontend/src/types/` or inline in component
+- Services: Use existing `frontend/src/services/documentService.ts`
+- Tests: `frontend/src/__tests__/{ComponentName}.test.tsx` (if added)
+
+**Shared Utilities:**
+- Backend: `backend/src/utils/{utility}.ts`
+- Frontend: `frontend/src/utils/{utility}.ts`
+- Avoid code duplication - consider extracting common patterns
+
+**Database Schema Changes:**
+- Migration file: `backend/src/models/migrations/{timestamp}_{description}.sql`
+- TypeScript interface: Update `backend/src/models/types.ts`
+- Model methods: Update corresponding `*Model.ts` file
+- Run: `npm run db:migrate` in backend
+
+**Configuration Changes:**
+- Environment: Update `backend/src/config/env.ts` (Joi schema)
+- Frontend env: Update `frontend/src/config/env.ts`
+- Firebase secrets: Use `firebase functions:secrets:set VAR_NAME`
+- Local dev: Add to `.env` file (gitignored)
+
+## Special Directories
+
+**backend/src/__tests__/mocks/:**
+- Purpose: Mock data and fixtures for testing
+- Generated: No (manually maintained)
+- Committed: Yes
+- Usage: Import in tests for consistent test data
+
+**backend/src/scripts/:**
+- Purpose: One-off CLI utilities for development and operations
+- Generated: No (manually maintained)
+- Committed: Yes
+- Execution: `ts-node src/scripts/{script}.ts` or `npm run {script}`
+
+**backend/src/assets/:**
+- Purpose: Static HTML templates for PDF generation
+- Generated: No (manually maintained)
+- Committed: Yes
+- Usage: Rendered by Puppeteer in `pdfGenerationService.ts`
+
+**backend/src/models/migrations/:**
+- Purpose: Database schema migration SQL files
+- Generated: No (manually created)
+- Committed: Yes
+- Execution: Run via `npm run db:migrate`
+
+**frontend/src/assets/:**
+- Purpose: Images, icons, logos
+- Generated: No (manually added)
+- Committed: Yes
+- Usage: Import in components (e.g., `bluepoint-logo.png`)
+
+**backend/dist/ and frontend/dist/:**
+- Purpose: Compiled JavaScript and optimized bundles
+- Generated: Yes (build output)
+- Committed: No (gitignored)
+- Regeneration: `npm run build` in respective directory
+
+**backend/node_modules/ and frontend/node_modules/:**
+- Purpose: Installed dependencies
+- Generated: Yes (npm install)
+- Committed: No (gitignored)
+- Regeneration: `npm install`
+
+**backend/logs/:**
+- Purpose: Runtime log files
+- Generated: Yes (runtime)
+- Committed: No (gitignored)
+- Contents: `error.log`, `upload.log`, combined logs
+
+---
+
+*Structure analysis: 2026-02-24*
--- a/.planning/codebase/TESTING.md
+++ b/.planning/codebase/TESTING.md
@@ -0,0 +1,342 @@
+# Testing Patterns
+
+**Analysis Date:** 2026-02-24
+
+## Test Framework
+
+**Runner:**
+- Vitest 2.1.0
+- Config: No dedicated `vitest.config.ts` found (uses defaults)
+- Node.js test environment
+
+**Assertion Library:**
+- Vitest native assertions via `expect()`
+- Examples: `expect(value).toBe()`, `expect(value).toBeDefined()`, `expect(array).toContain()`
+
+**Run Commands:**
+```bash
+npm test                    # Run all tests once
+npm run test:watch         # Watch mode for continuous testing
+npm run test:coverage      # Generate coverage report
+```
+
+**Coverage Tool:**
+- `@vitest/coverage-v8` 2.1.0
+- Tracks line, branch, function, and statement coverage
+- V8 backend for accurate coverage metrics
+
+## Test File Organization
+
+**Location:**
+- Co-located in `backend/src/__tests__/` directory
+- Subdirectories for logical grouping:
+  - `backend/src/__tests__/utils/` - Utility function tests
+  - `backend/src/__tests__/mocks/` - Mock implementations
+  - `backend/src/__tests__/acceptance/` - Acceptance/integration tests
+
+**Naming:**
+- Pattern: `[feature].test.ts` or `[feature].spec.ts`
+- Examples:
+  - `backend/src/__tests__/financial-summary.test.ts`
+  - `backend/src/__tests__/acceptance/handiFoods.acceptance.test.ts`
+
+**Structure:**
+```
+backend/src/__tests__/
+├── utils/
+│   └── test-helpers.ts              # Test utility functions
+├── mocks/
+│   └── logger.mock.ts               # Mock implementations
+└── acceptance/
+    └── handiFoods.acceptance.test.ts # Acceptance tests
+```
+
+## Test Structure
+
+**Suite Organization:**
+```typescript
+import { describe, test, expect, beforeAll } from 'vitest';
+
+describe('Feature Category', () => {
+  describe('Nested Behavior Group', () => {
+    test('should do specific thing', () => {
+      expect(result).toBe(expected);
+    });
+
+    test('should handle edge case', () => {
+      expect(edge).toBeDefined();
+    });
+  });
+});
+```
+
+From `financial-summary.test.ts`:
+```typescript
+describe('Financial Summary Fixes', () => {
+  describe('Period Ordering', () => {
+    test('Summary table should display periods in chronological order (FY3 → FY2 → FY1 → LTM)', () => {
+      const periods = ['fy3', 'fy2', 'fy1', 'ltm'];
+      const expectedOrder = ['FY3', 'FY2', 'FY1', 'LTM'];
+
+      expect(periods[0]).toBe('fy3');
+      expect(periods[3]).toBe('ltm');
+    });
+  });
+});
+```
+
+**Patterns:**
+
+1. **Setup Pattern:**
+   - Use `beforeAll()` for shared test data initialization
+   - Example from `handiFoods.acceptance.test.ts`:
+     ```typescript
+     beforeAll(() => {
+       const normalize = (text: string) => text.replace(/\s+/g, ' ').toLowerCase();
+       const cimRaw = fs.readFileSync(cimTextPath, 'utf-8');
+       const outputRaw = fs.readFileSync(outputTextPath, 'utf-8');
+       cimNormalized = normalize(cimRaw);
+       outputNormalized = normalize(outputRaw);
+     });
+     ```
+
+2. **Teardown Pattern:**
+   - Not explicitly shown in current tests
+   - Use `afterAll()` for resource cleanup if needed
+
+3. **Assertion Pattern:**
+   - Descriptive test names that read as sentences: `'should display periods in chronological order'`
+   - Multiple assertions per test acceptable for related checks
+   - Use `expect().toContain()` for array/string membership
+   - Use `expect().toBeDefined()` for existence checks
+   - Use `expect().toBeGreaterThan()` for numeric comparisons
+
+## Mocking
+
+**Framework:** Vitest `vi` mock utilities
+
+**Patterns:**
+
+1. **Mock Logger:**
+   ```typescript
+   import { vi } from 'vitest';
+
+   export const mockLogger = {
+     debug: vi.fn(),
+     info: vi.fn(),
+     warn: vi.fn(),
+     error: vi.fn(),
+   };
+
+   export const mockStructuredLogger = {
+     uploadStart: vi.fn(),
+     uploadSuccess: vi.fn(),
+     uploadError: vi.fn(),
+     processingStart: vi.fn(),
+     processingSuccess: vi.fn(),
+     processingError: vi.fn(),
+     storageOperation: vi.fn(),
+     jobQueueOperation: vi.fn(),
+     info: vi.fn(),
+     warn: vi.fn(),
+     error: vi.fn(),
+     debug: vi.fn(),
+   };
+   ```
+
+2. **Mock Service Pattern:**
+   - Create mock implementations in `backend/src/__tests__/mocks/`
+   - Export as named exports: `export const mockLogger`, `export const mockStructuredLogger`
+   - Use `vi.fn()` for all callable methods to track calls and arguments
+
+3. **What to Mock:**
+   - External services: Firebase Auth, Supabase, Google Cloud APIs
+   - Logger: always mock to prevent log spam during tests
+   - File system operations (in unit tests; use real files in acceptance tests)
+   - LLM API calls: mock responses to avoid quota usage
+
+4. **What NOT to Mock:**
+   - Core utility functions: use real implementations
+   - Type definitions: no need to mock types
+   - Pure functions: test directly without mocks
+   - Business logic calculations: test with real data
+
+## Fixtures and Factories
+
+**Test Data:**
+
+1. **Helper Factory Pattern:**
+   From `backend/src/__tests__/utils/test-helpers.ts`:
+   ```typescript
+   export function createMockCorrelationId(): string {
+     return `test-correlation-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockUserId(): string {
+     return `test-user-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockDocumentId(): string {
+     return `test-doc-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockJobId(): string {
+     return `test-job-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function wait(ms: number): Promise<void> {
+     return new Promise((resolve) => setTimeout(resolve, ms));
+   }
+   ```
+
+2. **Acceptance Test Fixtures:**
+   - Located in `backend/test-fixtures/` directory
+   - Example: `backend/test-fixtures/handiFoods/` contains:
+     - `handi-foods-cim.txt` - Reference CIM content
+     - `handi-foods-output.txt` - Expected processor output
+   - Loaded via `fs.readFileSync()` in `beforeAll()` hooks
+
+**Location:**
+- Test helpers: `backend/src/__tests__/utils/test-helpers.ts`
+- Acceptance fixtures: `backend/test-fixtures/` (outside src)
+- Mocks: `backend/src/__tests__/mocks/`
+
+## Coverage
+
+**Requirements:**
+- No automated coverage enforcement detected (no threshold in config)
+- Manual review recommended for critical paths
+
+**View Coverage:**
+```bash
+npm run test:coverage
+```
+
+## Test Types
+
+**Unit Tests:**
+- **Scope:** Individual functions, services, utilities
+- **Approach:** Test in isolation with mocks for dependencies
+- **Examples:**
+  - Financial parser tests: parse tables with various formats
+  - Period ordering tests: verify chronological order logic
+  - Validate UUID format tests: regex pattern matching
+- **Location:** `backend/src/__tests__/[feature].test.ts`
+
+**Integration Tests:**
+- **Scope:** Multiple components working together
+- **Approach:** May use real Supabase/Firebase or mocks depending on test level
+- **Not heavily used:** minimal integration test infrastructure
+- **Pattern:** Could use real database in test environment with cleanup
+
+**Acceptance Tests:**
+- **Scope:** End-to-end feature validation with real artifacts
+- **Approach:** Load reference files, process through entire pipeline, verify output
+- **Example:** `handiFoods.acceptance.test.ts`
+  - Loads CIM text file
+  - Loads processor output file
+  - Validates all reference facts exist in both
+  - Validates key fields resolved instead of fallback messages
+- **Location:** `backend/src/__tests__/acceptance/`
+
+**E2E Tests:**
+- Not implemented in current setup
+- Would require browser automation (no Playwright/Cypress config found)
+- Frontend testing: not currently automated
+
+## Common Patterns
+
+**Async Testing:**
+```typescript
+test('should process document asynchronously', async () => {
+  const result = await processDocument(documentId, userId, text);
+  expect(result.success).toBe(true);
+});
+```
+
+**Error Testing:**
+```typescript
+test('should validate UUID format', () => {
+  const id = 'invalid-id';
+  const uuidRegex = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
+  expect(uuidRegex.test(id)).toBe(false);
+});
+```
+
+**Array/Collection Testing:**
+```typescript
+test('should extract all financial periods', () => {
+  const result = parseFinancialsFromText(tableText);
+  expect(result.data.fy3.revenue).toBeDefined();
+  expect(result.data.fy2.revenue).toBeDefined();
+  expect(result.data.fy1.revenue).toBeDefined();
+  expect(result.data.ltm.revenue).toBeDefined();
+});
+```
+
+**Text/Content Testing (Acceptance):**
+```typescript
+test('verifies each reference fact exists in CIM and generated output', () => {
+  for (const fact of referenceFacts) {
+    for (const token of fact.tokens) {
+      expect(cimNormalized).toContain(token);
+      expect(outputNormalized).toContain(token);
+    }
+  }
+});
+```
+
+**Normalization for Content Testing:**
+```typescript
+// Normalize whitespace and case for robust text matching
+const normalize = (text: string) => text.replace(/\s+/g, ' ').toLowerCase();
+const normalizedCIM = normalize(cimRaw);
+expect(normalizedCIM).toContain('reference-phrase');
+```
+
+## Test Coverage Priorities
+
+**Critical Paths (Test First):**
+1. Document upload and file storage operations
+2. Firebase authentication and token validation
+3. LLM service API interactions with retry logic
+4. Error handling and correlation ID tracking
+5. Financial data extraction and parsing
+6. PDF generation pipeline
+
+**Important Paths (Test Early):**
+1. Vector embeddings and database operations
+2. Job queue processing and timeout handling
+3. Google Document AI text extraction
+4. Supabase Row Level Security policies
+
+**Nice-to-Have (Test Later):**
+1. UI component rendering (would require React Testing Library)
+2. CSS/styling validation
+3. Frontend form submission flows
+4. Analytics tracking
+
+## Current Testing Gaps
+
+**Untested Areas:**
+- Backend services: Most services lack unit tests (llmService, fileStorageService, etc.)
+- Database models: No model tests for Supabase operations
+- Controllers/Endpoints: No API endpoint tests
+- Frontend components: No React component tests
+- Integration flows: Document upload through processing to PDF generation
+
+**Missing Patterns:**
+- No database integration test setup (fixtures, transactions)
+- No API request/response validation tests
+- No performance/load tests
+- No security tests (auth bypass, XSS, injection)
+
+## Deprecated Test Patterns (DO NOT USE)
+
+- ❌ Jest test suite - Use Vitest instead
+- ❌ Direct PostgreSQL connection tests - Use Supabase in test mode
+- ❌ Legacy test files referencing removed services - Updated implementations used only
+
+---
+
+*Testing analysis: 2026-02-24*
--- a/.planning/config.json
+++ b/.planning/config.json
@@ -0,0 +1,12 @@
+{
+  "mode": "yolo",
+  "depth": "standard",
+  "parallelization": true,
+  "commit_docs": true,
+  "model_profile": "balanced",
+  "workflow": {
+    "research": true,
+    "plan_check": true,
+    "verifier": true
+  }
+}
--- a/.planning/milestones/v1.0-MILESTONE-AUDIT.md
+++ b/.planning/milestones/v1.0-MILESTONE-AUDIT.md
@@ -0,0 +1,110 @@
+---
+milestone: v1.0
+audited: 2026-02-25
+status: tech_debt
+scores:
+  requirements: 15/15
+  phases: 4/4
+  integration: 14/14
+  flows: 5/5
+gaps:
+  requirements: []
+  integration: []
+  flows: []
+tech_debt:
+  - phase: 04-frontend
+    items:
+      - "Frontend admin email hardcoded as literal in adminService.ts line 81 — should use import.meta.env.VITE_ADMIN_EMAIL for config parity with backend"
+  - phase: 02-backend-services
+    items:
+      - "Dual retention cleanup: runRetentionCleanup (weekly, model-layer) overlaps with pre-existing cleanupOldData (daily, raw SQL) for service_health_checks and alert_events tables"
+      - "index.ts line 225: defineString('EMAIL_WEEKLY_RECIPIENT') has personal email as deployment default — recommend removing or using placeholder"
+  - phase: 03-api-layer
+    items:
+      - "Pre-existing TODO in jobProcessorService.ts line 448: 'Implement statistics method in ProcessingJobModel' — orphaned method, not part of this milestone"
+  - phase: 01-data-foundation
+    items:
+      - "Migrations 012 and 013 must be applied manually to Supabase before deployment — no automated migration runner"
+---
+
+# Milestone v1.0 Audit Report
+
+**Milestone:** CIM Summary — Analytics & Monitoring v1.0
+**Audited:** 2026-02-25
+**Status:** tech_debt (all requirements met, no blockers, accumulated deferred items)
+
+## Requirements Coverage (3-Source Cross-Reference)
+
+| REQ-ID | Description | VERIFICATION.md | SUMMARY Frontmatter | REQUIREMENTS.md | Final |
+|--------|-------------|-----------------|---------------------|-----------------|-------|
+| INFR-01 | DB migrations create tables with indexes | Phase 1: SATISFIED | 01-01, 01-02 | [x] | satisfied |
+| INFR-04 | Use existing Supabase connection | Phase 1: SATISFIED | 01-01, 01-02 | [x] | satisfied |
+| HLTH-02 | Probes make real authenticated API calls | Phase 2: SATISFIED | 02-02 | [x] | satisfied |
+| HLTH-03 | Probes run on schedule, separate from processing | Phase 2: SATISFIED | 02-04 | [x] | satisfied |
+| HLTH-04 | Probe results persist to Supabase | Phase 2: SATISFIED | 02-02 | [x] | satisfied |
+| ALRT-01 | Email alert on service down/degraded | Phase 2: SATISFIED | 02-03 | [x] | satisfied |
+| ALRT-02 | Alert deduplication within cooldown | Phase 2: SATISFIED | 02-03 | [x] | satisfied |
+| ALRT-04 | Alert recipient from config, not hardcoded | Phase 2: SATISFIED | 02-03 | [x] | satisfied |
+| ANLY-01 | Processing events persist at write time | Phase 2: SATISFIED | 02-01 | [x] | satisfied |
+| ANLY-03 | Analytics instrumentation non-blocking | Phase 2: SATISFIED | 02-01 | [x] | satisfied |
+| INFR-03 | 30-day retention cleanup on schedule | Phase 2: SATISFIED | 02-04 | [x] | satisfied |
+| INFR-02 | Admin API routes protected by Firebase Auth | Phase 3: SATISFIED | 03-01 | [x] | satisfied |
+| HLTH-01 | Admin can view live health status for 4 services | Phase 3+4: SATISFIED | 03-01, 04-01, 04-02 | [x] | satisfied |
+| ANLY-02 | Admin can view processing summary | Phase 3+4: SATISFIED | 03-01, 03-02, 04-01, 04-02 | [x] | satisfied |
+| ALRT-03 | In-app alert banner for critical issues | Phase 4: SATISFIED | 04-01, 04-02 | [x] | satisfied |
+
+**Score: 15/15 requirements satisfied. 0 orphaned. 0 unsatisfied.**
+
+## Phase Verification Summary
+
+| Phase | Status | Score | Human Items |
+|-------|--------|-------|-------------|
+| 01-data-foundation | human_needed | 3/4 | Migration execution against live Supabase |
+| 02-backend-services | passed | 14/14 | Live deployment verification (probes, email, retention) |
+| 03-api-layer | passed | 10/10 | None |
+| 04-frontend | human_needed | 4/4 code-verified | Alert banner, health grid, analytics panel with live data |
+
+## Cross-Phase Integration (14/14 wired)
+
+All exports from every phase are consumed downstream. No orphaned exports. No missing connections that break functionality.
+
+### E2E Flows Verified (5/5)
+
+1. **Health Probe Lifecycle** (HLTH-01/02/03/04): Scheduler → probes → Supabase → API → frontend grid
+2. **Alert Lifecycle** (ALRT-01/02/03/04): Probe failure → dedup check → DB + email → API → banner → acknowledge
+3. **Analytics Pipeline** (ANLY-01/02/03): Job processing → fire-and-forget events → aggregate SQL → API → dashboard
+4. **Retention Cleanup** (INFR-03): Weekly scheduler → parallel deletes across 3 tables
+5. **Admin Auth Protection** (INFR-02): Firebase Auth → admin email check → 404 for non-admin
+
+## Tech Debt
+
+### Phase 4: Frontend
+- **Admin email hardcoded** (`adminService.ts:81`): `ADMIN_EMAIL = 'jpressnell@bluepointcapital.com'` — should be `import.meta.env.VITE_ADMIN_EMAIL`. Backend is config-driven but frontend literal would silently break if admin email changes. Security not affected (API-level protection correct).
+
+### Phase 2: Backend Services
+- **Dual retention cleanup**: `runRetentionCleanup` (weekly, model-layer) overlaps with pre-existing `cleanupOldData` (daily, raw SQL) for the same tables. Both use 30-day threshold. Harmless but duplicated maintenance surface.
+- **Personal email in defineString default** (`index.ts:225`): `defineString('EMAIL_WEEKLY_RECIPIENT', { default: 'jpressnell@bluepointcapital.com' })` — recommend placeholder or removal.
+
+### Phase 3: API Layer
+- **Pre-existing TODO** (`jobProcessorService.ts:448`): `TODO: Implement statistics method` — orphaned method, not part of this milestone.
+
+### Phase 1: Data Foundation
+- **Manual migration required**: SQL files 012 and 013 must be applied to Supabase before deployment. No automated migration runner was included in this milestone.
+
+**Total: 5 items across 4 phases. None are blockers.**
+
+## Human Verification Items (Deployment Prerequisites)
+
+These items require a running application with live backend data:
+
+1. Run migrations 012 + 013 against live Supabase
+2. Deploy backend Cloud Functions (runHealthProbes, runRetentionCleanup)
+3. Verify health probes execute on schedule and write to service_health_checks
+4. Verify alert email delivery on probe failure
+5. Verify frontend monitoring dashboard renders with live data
+6. Verify alert banner appears and acknowledge works
+
+---
+
+_Audited: 2026-02-25_
+_Auditor: Claude (gsd audit-milestone)_
--- a/.planning/milestones/v1.0-REQUIREMENTS.md
+++ b/.planning/milestones/v1.0-REQUIREMENTS.md
@@ -0,0 +1,110 @@
+# Requirements Archive: v1.0 Analytics & Monitoring
+
+**Archived:** 2026-02-25
+**Status:** SHIPPED
+
+For current requirements, see `.planning/REQUIREMENTS.md`.
+
+---
+
+# Requirements: CIM Summary — Analytics & Monitoring
+
+**Defined:** 2026-02-24
+**Core Value:** When something breaks — an API key expires, a service goes down, a credential needs reauthorization — the admin knows immediately and knows exactly what to fix.
+
+## v1 Requirements
+
+Requirements for initial release. Each maps to roadmap phases.
+
+### Service Health
+
+- [x] **HLTH-01**: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
+- [x] **HLTH-02**: Each health probe makes a real authenticated API call, not just config checks
+- [x] **HLTH-03**: Health probes run on a scheduled interval, separate from document processing
+- [x] **HLTH-04**: Health probe results persist to Supabase (survive cold starts)
+
+### Alerting
+
+- [x] **ALRT-01**: Admin receives email alert when a service goes down or degrades
+- [x] **ALRT-02**: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
+- [x] **ALRT-03**: Admin sees in-app alert banner for active critical issues
+- [x] **ALRT-04**: Alert recipient stored as configuration, not hardcoded
+
+### Processing Analytics
+
+- [x] **ANLY-01**: Document processing events persist to Supabase at write time (not in-memory only)
+- [x] **ANLY-02**: Admin can view processing summary: upload counts, success/failure rates, avg processing time
+- [x] **ANLY-03**: Analytics instrumentation is non-blocking (fire-and-forget, never delays processing pipeline)
+
+### Infrastructure
+
+- [x] **INFR-01**: Database migrations create service_health_checks and alert_events tables with indexes on created_at
+- [x] **INFR-02**: Admin API routes protected by Firebase Auth with admin email check
+- [x] **INFR-03**: 30-day rolling data retention cleanup runs on schedule
+- [x] **INFR-04**: Analytics writes use existing Supabase connection, no new database infrastructure
+
+## v2 Requirements
+
+Deferred to future release. Tracked but not in current roadmap.
+
+### Service Health
+
+- **HLTH-05**: Admin can view 7-day service health history with uptime percentages
+- **HLTH-06**: Real-time auth failure detection classifies auth errors (401/403) vs transient errors (429/503) and alerts immediately on credential issues
+
+### Alerting
+
+- **ALRT-05**: Admin can acknowledge or snooze alerts from the UI
+- **ALRT-06**: Admin receives recovery email when a downed service returns healthy
+
+### Processing Analytics
+
+- **ANLY-04**: Admin can view processing time trend charts over time
+- **ANLY-05**: Admin can view LLM token usage and estimated cost per document and per month
+
+### Infrastructure
+
+- **INFR-05**: Dashboard shows staleness warning when monitoring data stops arriving
+
+## Out of Scope
+
+| Feature | Reason |
+|---------|--------|
+| External monitoring tools (Grafana, Datadog) | Operational overhead unjustified for single-admin app |
+| Multi-user analytics views | One admin user, RBAC complexity for zero benefit |
+| WebSocket/SSE real-time updates | Polling at 60s intervals sufficient; WebSockets complex in Cloud Functions |
+| Mobile push notifications | Email + in-app covers notification needs |
+| Historical analytics beyond 30 days | Storage costs; can extend later |
+| ML-based anomaly detection | Threshold-based alerting sufficient for this scale |
+| Log aggregation / log search UI | Firebase Cloud Logging handles this |
+
+## Traceability
+
+Which phases cover which requirements. Updated during roadmap creation.
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| INFR-01 | Phase 1 | Complete |
+| INFR-04 | Phase 1 | Complete |
+| HLTH-02 | Phase 2 | Complete |
+| HLTH-03 | Phase 2 | Complete |
+| HLTH-04 | Phase 2 | Complete |
+| ALRT-01 | Phase 2 | Complete |
+| ALRT-02 | Phase 2 | Complete |
+| ALRT-04 | Phase 2 | Complete |
+| ANLY-01 | Phase 2 | Complete |
+| ANLY-03 | Phase 2 | Complete |
+| INFR-03 | Phase 2 | Complete |
+| INFR-02 | Phase 3 | Complete |
+| HLTH-01 | Phase 3 | Complete |
+| ANLY-02 | Phase 3 | Complete |
+| ALRT-03 | Phase 4 | Complete |
+
+**Coverage:**
+- v1 requirements: 15 total
+- Mapped to phases: 15
+- Unmapped: 0
+
+---
+*Requirements defined: 2026-02-24*
+*Last updated: 2026-02-24 — traceability mapped after roadmap creation*
--- a/.planning/milestones/v1.0-ROADMAP.md
+++ b/.planning/milestones/v1.0-ROADMAP.md
@@ -0,0 +1,112 @@
+# Roadmap: CIM Summary — Analytics & Monitoring
+
+## Overview
+
+This milestone adds persistent analytics and service health monitoring to the existing CIM Summary application. The work proceeds in four phases that respect hard dependency constraints: database schema must exist before services can write to it, services must exist before routes can expose them, and routes must be stable before the frontend can be wired up. Each phase delivers a complete, independently testable layer.
+
+## Phases
+
+**Phase Numbering:**
+- Integer phases (1, 2, 3): Planned milestone work
+- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
+
+Decimal phases appear between their surrounding integers in numeric order.
+
+- [ ] **Phase 1: Data Foundation** - Create schema, DB models, and verify existing Supabase connection wiring
+- [x] **Phase 2: Backend Services** - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
+- [x] **Phase 3: API Layer** - Admin-gated routes exposing all services, instrumentation hooks in existing processors (completed 2026-02-24)
+- [x] **Phase 4: Frontend** - Admin dashboard page, health panel, processing metrics, alert notification banner (completed 2026-02-24)
+- [ ] **Phase 5: Tech Debt Cleanup** - Config-driven admin email, consolidate retention cleanup, remove hardcoded defaults
+
+## Phase Details
+
+### Phase 1: Data Foundation
+**Goal**: The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used
+**Depends on**: Nothing (first phase)
+**Requirements**: INFR-01, INFR-04
+**Success Criteria** (what must be TRUE):
+  1. `service_health_checks` and `alert_events` tables exist in Supabase with indexes on `created_at`
+  2. All new tables use the existing Supabase client from `config/supabase.ts` — no new database connections added
+  3. `AlertEventModel.ts` exists and its CRUD methods can be called in isolation without errors
+  4. Migration SQL can be run against the live Supabase instance and produces the expected schema
+**Plans:** 2/2 plans executed
+
+Plans:
+- [x] 01-01-PLAN.md — Migration SQL + HealthCheckModel + AlertEventModel
+- [x] 01-02-PLAN.md — Unit tests for both monitoring models
+
+### Phase 2: Backend Services
+**Goal**: All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule
+**Depends on**: Phase 1
+**Requirements**: HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03
+**Success Criteria** (what must be TRUE):
+  1. Each health probe makes a real authenticated API call to its target service and returns a structured result (status, latency_ms, error_message)
+  2. Health probe results are written to Supabase and survive a simulated cold start (data present after function restart)
+  3. An alert email is sent when a service probe returns degraded or down, and a second probe failure within the cooldown period does not send a duplicate email
+  4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
+  5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
+  6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports
+**Plans:** 4/4 plans complete
+
+Plans:
+- [ ] 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
+- [ ] 02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
+- [ ] 02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
+- [ ] 02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)
+
+### Phase 3: API Layer
+**Goal**: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation
+**Depends on**: Phase 2
+**Requirements**: INFR-02, HLTH-01, ANLY-02
+**Success Criteria** (what must be TRUE):
+  1. `GET /admin/health` returns current health status for all four services; a request with a non-admin Firebase token receives 403
+  2. `GET /admin/analytics` returns processing summary (upload counts, success/failure rates, avg processing time) sourced from Supabase, not in-memory state
+  3. `GET /admin/alerts` and `POST /admin/alerts/:id/acknowledge` function correctly and are blocked to non-admin users
+  4. Document processing in `jobProcessorService.ts` and `llmService.ts` emits analytics events at stage transitions without any change to existing processing behavior
+**Plans:** 2/2 plans complete
+
+Plans:
+- [ ] 03-01-PLAN.md — Admin auth middleware + admin routes (health, analytics, alerts endpoints)
+- [ ] 03-02-PLAN.md — Analytics instrumentation in jobProcessorService
+
+### Phase 4: Frontend
+**Goal**: The admin can see live service health, processing metrics, and active alerts directly in the application UI
+**Depends on**: Phase 3
+**Requirements**: ALRT-03, ANLY-02 (UI delivery), HLTH-01 (UI delivery)
+**Success Criteria** (what must be TRUE):
+  1. An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it
+  2. The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible
+  3. The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend
+  4. A non-admin user visiting the admin route is redirected or shown an access-denied state
+**Plans:** 2/2 plans complete
+
+Plans:
+- [ ] 04-01-PLAN.md — AdminService monitoring methods + AlertBanner + AdminMonitoringDashboard components
+- [ ] 04-02-PLAN.md — Wire components into Dashboard + visual verification checkpoint
+
+## Progress
+
+**Execution Order:**
+Phases execute in numeric order: 1 → 2 → 3 → 4 → 5
+
+| Phase | Plans Complete | Status | Completed |
+|-------|----------------|--------|-----------|
+| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
+| 2. Backend Services | 4/4 | Complete | 2026-02-24 |
+| 3. API Layer | 2/2 | Complete | 2026-02-24 |
+| 4. Frontend | 2/2 | Complete | 2026-02-25 |
+| 5. Tech Debt Cleanup | 0/0 | Not Planned | — |
+
+### Phase 5: Tech Debt Cleanup
+**Goal**: All configuration values are env-driven (no hardcoded emails), retention cleanup is consolidated into a single function, and deployment defaults use placeholders
+**Depends on**: Phase 4
+**Requirements**: None (tech debt from v1.0 audit)
+**Gap Closure**: Closes tech debt items from v1.0-MILESTONE-AUDIT.md
+**Success Criteria** (what must be TRUE):
+  1. Frontend `adminService.ts` reads admin email from `import.meta.env.VITE_ADMIN_EMAIL` instead of a hardcoded literal
+  2. Only one retention cleanup function exists in `index.ts` (the model-layer `runRetentionCleanup`), with the pre-existing raw SQL `cleanupOldData` consolidated or removed
+  3. `defineString('EMAIL_WEEKLY_RECIPIENT')` default in `index.ts` uses a placeholder (not a personal email address)
+**Plans:** 0 plans
+
+Plans:
+- [ ] TBD (run /gsd:plan-phase 5 to break down)
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-01-PLAN.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-01-PLAN.md
@@ -0,0 +1,226 @@
+---
+phase: 01-data-foundation
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - backend/src/models/migrations/012_create_monitoring_tables.sql
+  - backend/src/models/HealthCheckModel.ts
+  - backend/src/models/AlertEventModel.ts
+  - backend/src/models/index.ts
+autonomous: true
+requirements:
+  - INFR-01
+  - INFR-04
+
+must_haves:
+  truths:
+    - "Migration SQL creates service_health_checks and alert_events tables with all required columns and CHECK constraints"
+    - "Both tables have indexes on created_at (INFR-01 requirement)"
+    - "RLS is enabled on both new tables"
+    - "HealthCheckModel and AlertEventModel use getSupabaseServiceClient() for all database operations (INFR-04 — no new DB infrastructure)"
+    - "Model static methods validate input before writing"
+  artifacts:
+    - path: "backend/src/models/migrations/012_create_monitoring_tables.sql"
+      provides: "DDL for service_health_checks and alert_events tables"
+      contains: "CREATE TABLE IF NOT EXISTS service_health_checks"
+    - path: "backend/src/models/HealthCheckModel.ts"
+      provides: "CRUD operations for service_health_checks table"
+      exports: ["HealthCheckModel", "ServiceHealthCheck", "CreateHealthCheckData"]
+    - path: "backend/src/models/AlertEventModel.ts"
+      provides: "CRUD operations for alert_events table"
+      exports: ["AlertEventModel", "AlertEvent", "CreateAlertEventData"]
+    - path: "backend/src/models/index.ts"
+      provides: "Barrel exports for new models"
+      contains: "HealthCheckModel"
+  key_links:
+    - from: "backend/src/models/HealthCheckModel.ts"
+      to: "backend/src/config/supabase.ts"
+      via: "getSupabaseServiceClient() import"
+      pattern: "import.*getSupabaseServiceClient.*from.*config/supabase"
+    - from: "backend/src/models/AlertEventModel.ts"
+      to: "backend/src/config/supabase.ts"
+      via: "getSupabaseServiceClient() import"
+      pattern: "import.*getSupabaseServiceClient.*from.*config/supabase"
+    - from: "backend/src/models/HealthCheckModel.ts"
+      to: "backend/src/utils/logger.ts"
+      via: "Winston logger import"
+      pattern: "import.*logger.*from.*utils/logger"
+---
+
+<objective>
+Create the database migration and TypeScript model layer for the monitoring system.
+
+Purpose: Establish the data foundation that all subsequent phases (health probes, alerts, analytics) depend on. Tables must exist and model CRUD must work before any service can write monitoring data.
+
+Output: One SQL migration file, two TypeScript model classes, updated barrel exports.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/01-data-foundation/01-RESEARCH.md
+@.planning/phases/01-data-foundation/01-CONTEXT.md
+
+# Existing patterns to follow
+@backend/src/models/DocumentModel.ts
+@backend/src/models/ProcessingJobModel.ts
+@backend/src/models/index.ts
+@backend/src/models/migrations/005_create_processing_jobs_table.sql
+@backend/src/config/supabase.ts
+@backend/src/utils/logger.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create monitoring tables migration</name>
+  <files>backend/src/models/migrations/012_create_monitoring_tables.sql</files>
+  <action>
+Create migration file `012_create_monitoring_tables.sql` following the pattern from `005_create_processing_jobs_table.sql`.
+
+**service_health_checks table:**
+- `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`
+- `service_name VARCHAR(100) NOT NULL`
+- `status TEXT NOT NULL CHECK (status IN ('healthy', 'degraded', 'down'))`
+- `latency_ms INTEGER` (nullable — INTEGER is correct, max ~2.1B ms which is impossible for latency)
+- `checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP` (when the probe actually ran — distinct from created_at per Research Pitfall 5)
+- `error_message TEXT` (nullable — for storing probe failure details)
+- `probe_details JSONB` (nullable — flexible metadata per service: response codes, error specifics)
+- `created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP`
+
+**Indexes for service_health_checks:**
+- `idx_service_health_checks_created_at ON service_health_checks(created_at)` — required by INFR-01, used for 30-day retention queries
+- `idx_service_health_checks_service_created ON service_health_checks(service_name, created_at)` — composite for dashboard "latest check per service" queries
+
+**alert_events table:**
+- `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`
+- `service_name VARCHAR(100) NOT NULL`
+- `alert_type TEXT NOT NULL CHECK (alert_type IN ('service_down', 'service_degraded', 'recovery'))`
+- `status TEXT NOT NULL CHECK (status IN ('active', 'acknowledged', 'resolved'))`
+- `message TEXT` (nullable — human-readable alert description)
+- `details JSONB` (nullable — structured metadata about the alert)
+- `created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP`
+- `acknowledged_at TIMESTAMP WITH TIME ZONE` (nullable)
+- `resolved_at TIMESTAMP WITH TIME ZONE` (nullable)
+
+**Indexes for alert_events:**
+- `idx_alert_events_created_at ON alert_events(created_at)` — required by INFR-01
+- `idx_alert_events_status ON alert_events(status)` — for "active alerts" queries
+- `idx_alert_events_service_status ON alert_events(service_name, status)` — for "active alerts per service"
+
+**RLS:**
+- `ALTER TABLE service_health_checks ENABLE ROW LEVEL SECURITY;`
+- `ALTER TABLE alert_events ENABLE ROW LEVEL SECURITY;`
+- No explicit policies needed — service role key bypasses RLS automatically in Supabase (Research Pitfall 2). Policies for authenticated users will be added in Phase 3.
+
+**Important patterns (per CONTEXT.md):**
+- ALL DDL uses `IF NOT EXISTS` — `CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`
+- Forward-only migration — no rollback/down scripts
+- File must be numbered `012_` (current highest is `011_create_vector_database_tables.sql`)
+- Include header comment with migration purpose and date
+
+**Do NOT:**
+- Use PostgreSQL ENUM types — use TEXT + CHECK per user decision
+- Create rollback/down scripts — forward-only per user decision
+- Add any DML (INSERT/UPDATE/DELETE) — migration is DDL only
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary && ls -la backend/src/models/migrations/012_create_monitoring_tables.sql && grep -c "CREATE TABLE IF NOT EXISTS" backend/src/models/migrations/012_create_monitoring_tables.sql | grep -q "2" && echo "PASS: 2 tables found" || echo "FAIL: expected 2 CREATE TABLE statements"</automated>
+    <manual>Verify SQL syntax is valid and matches existing migration patterns</manual>
+  </verify>
+  <done>Migration file exists with both tables, CHECK constraints on status fields, JSONB columns for flexible metadata, indexes on created_at for both tables, composite indexes for common query patterns, and RLS enabled on both tables.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create HealthCheckModel and AlertEventModel with barrel exports</name>
+  <files>
+    backend/src/models/HealthCheckModel.ts
+    backend/src/models/AlertEventModel.ts
+    backend/src/models/index.ts
+  </files>
+  <action>
+**HealthCheckModel.ts** — Follow DocumentModel.ts static class pattern exactly:
+
+Interfaces:
+- `ServiceHealthCheck` — full row type matching all columns from migration (id, service_name, status, latency_ms, checked_at, error_message, probe_details, created_at). Use `'healthy' | 'degraded' | 'down'` union for status. Use `Record<string, unknown>` for probe_details (not `any` — strict TypeScript per CONVENTIONS.md).
+- `CreateHealthCheckData` — input type for create method (service_name required, status required, latency_ms optional, error_message optional, probe_details optional).
+
+Static methods:
+- `create(data: CreateHealthCheckData): Promise<ServiceHealthCheck>` — Validate service_name is non-empty, validate status is one of the three allowed values. Call `getSupabaseServiceClient()` inside the method (not cached at module level — per Research finding). Use `.from('service_health_checks').insert({...}).select().single()`. Log with Winston logger on success and error. Throw on Supabase error with descriptive message.
+- `findLatestByService(serviceName: string): Promise<ServiceHealthCheck | null>` — Get most recent health check for a given service. Order by `checked_at` desc, limit 1. Return null if not found (handle PGRST116 like ProcessingJobModel).
+- `findAll(options?: { limit?: number; serviceName?: string }): Promise<ServiceHealthCheck[]>` — List health checks with optional filtering. Default limit 100. Order by created_at desc.
+- `deleteOlderThan(days: number): Promise<number>` — For 30-day retention cleanup (used by Phase 2 scheduler). Delete rows where `created_at < NOW() - interval`. Return count of deleted rows.
+
+**AlertEventModel.ts** — Same pattern:
+
+Interfaces:
+- `AlertEvent` — full row type (id, service_name, alert_type, status, message, details, created_at, acknowledged_at, resolved_at). Use union types for alert_type and status. Use `Record<string, unknown>` for details.
+- `CreateAlertEventData` — input type (service_name, alert_type, status default 'active', message optional, details optional).
+
+Static methods:
+- `create(data: CreateAlertEventData): Promise<AlertEvent>` — Validate service_name non-empty, validate alert_type and status values. Insert with default status 'active' if not provided. Same Supabase pattern as HealthCheckModel.
+- `findActive(serviceName?: string): Promise<AlertEvent[]>` — Get active (unresolved, unacknowledged) alerts. Filter `status = 'active'`. Optional service_name filter. Order by created_at desc.
+- `acknowledge(id: string): Promise<AlertEvent>` — Set status to 'acknowledged' and acknowledged_at to current timestamp. Return updated row.
+- `resolve(id: string): Promise<AlertEvent>` — Set status to 'resolved' and resolved_at to current timestamp. Return updated row.
+- `findRecentByService(serviceName: string, alertType: string, withinMinutes: number): Promise<AlertEvent | null>` — For deduplication in Phase 2. Find most recent alert of given type for service within time window.
+- `deleteOlderThan(days: number): Promise<number>` — Same retention pattern as HealthCheckModel.
+
+**Common patterns for BOTH models:**
+- Import `getSupabaseServiceClient` from `'../config/supabase'`
+- Import `logger` from `'../utils/logger'`
+- Call `getSupabaseServiceClient()` per-method (not at module level)
+- Error handling: check `if (error)` after every Supabase call, log with `logger.error()`, throw with descriptive message
+- Handle PGRST116 (not found) by returning null instead of throwing (ProcessingJobModel pattern)
+- Type guard on catch: `error instanceof Error ? error.message : String(error)`
+- All methods are `static async`
+
+**index.ts update:**
+- Add export lines for both new models: `export { HealthCheckModel } from './HealthCheckModel';` and `export { AlertEventModel } from './AlertEventModel';`
+- Also export the interfaces: `export type { ServiceHealthCheck, CreateHealthCheckData } from './HealthCheckModel';` and `export type { AlertEvent, CreateAlertEventData } from './AlertEventModel';`
+- Keep all existing exports intact
+
+**Do NOT:**
+- Use `any` type anywhere — use `Record<string, unknown>` for JSONB fields
+- Use `console.log` — use Winston logger only
+- Cache `getSupabaseServiceClient()` at module level
+- Create a shared base model class (per Research recommendation — keep models independent)
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | tail -20</automated>
+    <manual>Verify both models export from index.ts and follow DocumentModel.ts patterns</manual>
+  </verify>
+  <done>HealthCheckModel.ts and AlertEventModel.ts exist with typed interfaces, static CRUD methods, input validation, getSupabaseServiceClient() per-method, Winston logging. Both models exported from index.ts. TypeScript compiles without errors.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `ls backend/src/models/migrations/012_create_monitoring_tables.sql` — migration file exists
+2. `grep "CREATE TABLE IF NOT EXISTS service_health_checks" backend/src/models/migrations/012_create_monitoring_tables.sql` — table DDL present
+3. `grep "CREATE TABLE IF NOT EXISTS alert_events" backend/src/models/migrations/012_create_monitoring_tables.sql` — table DDL present
+4. `grep "idx_.*_created_at" backend/src/models/migrations/012_create_monitoring_tables.sql` — INFR-01 indexes present
+5. `grep "ENABLE ROW LEVEL SECURITY" backend/src/models/migrations/012_create_monitoring_tables.sql` — RLS enabled
+6. `grep "getSupabaseServiceClient" backend/src/models/HealthCheckModel.ts` — INFR-04 uses existing Supabase connection
+7. `grep "getSupabaseServiceClient" backend/src/models/AlertEventModel.ts` — INFR-04 uses existing Supabase connection
+8. `cd backend && npx tsc --noEmit` — TypeScript compiles cleanly
+</verification>
+
+<success_criteria>
+- Migration file 012 creates both tables with CHECK constraints, JSONB columns, all indexes, and RLS
+- Both model classes compile, export typed interfaces, use getSupabaseServiceClient() per-method
+- Both models are re-exported from index.ts
+- No new database connections or infrastructure introduced (INFR-04)
+- TypeScript strict compilation passes
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/01-data-foundation/01-01-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-01-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-01-SUMMARY.md
@@ -0,0 +1,135 @@
+---
+phase: 01-data-foundation
+plan: 01
+subsystem: database
+tags: [supabase, postgresql, migrations, typescript, monitoring, health-checks, alerts]
+
+# Dependency graph
+requires: []
+provides:
+  - service_health_checks table with status CHECK constraint, JSONB probe_details, checked_at column
+  - alert_events table with alert_type/status CHECK constraints, lifecycle timestamps
+  - HealthCheckModel TypeScript class with CRUD static methods
+  - AlertEventModel TypeScript class with CRUD static methods
+  - Barrel exports for both models and their types from models/index.ts
+affects:
+  - 01-02 (Phase 1 Plan 2 — next data foundation plan)
+  - Phase 2 (health probe services will write to service_health_checks)
+  - Phase 2 (alert service will write to alert_events and use findRecentByService for deduplication)
+  - Phase 3 (API endpoints will query both tables)
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - Static class model pattern (no instantiation — all methods are static async)
+    - getSupabaseServiceClient() called per-method, never cached at module level
+    - PGRST116 error code handled as null return (not an exception)
+    - Input validation in model create() methods before any DB call
+    - Record<string, unknown> for JSONB fields (no any types)
+    - Named Winston logger import: import { logger } from '../utils/logger'
+    - IF NOT EXISTS on all DDL (idempotent forward-only migrations)
+    - TEXT + CHECK constraint pattern for enums (not PostgreSQL ENUM types)
+
+key-files:
+  created:
+    - backend/src/models/migrations/012_create_monitoring_tables.sql
+    - backend/src/models/HealthCheckModel.ts
+    - backend/src/models/AlertEventModel.ts
+  modified:
+    - backend/src/models/index.ts
+
+key-decisions:
+  - "TEXT + CHECK constraint used for status/alert_type columns (not PostgreSQL ENUM types) — consistent with existing project pattern"
+  - "getSupabaseServiceClient() called per-method (not module-level singleton) — follows Research finding about Cloud Function cold start issues"
+  - "checked_at column added to service_health_checks separate from created_at — records actual probe run time, not DB insert time"
+  - "Forward-only migration only (no rollback scripts) — per user decision documented in CONTEXT.md"
+  - "RLS enabled on both tables with no explicit policies — service role key bypasses RLS; user-facing policies deferred to Phase 3"
+
+patterns-established:
+  - "Model static class: all methods static async, getSupabaseServiceClient() per-method, PGRST116 → null"
+  - "Input validation before Supabase call: non-empty string check, union type allowlist check"
+  - "Error re-throw with method prefix: 'HealthCheckModel.create: ...' for log traceability"
+  - "deleteOlderThan(days): compute cutoff in JS then filter with .lt() — Supabase client does not support date arithmetic in filters"
+
+requirements-completed: [INFR-01, INFR-04]
+
+# Metrics
+duration: 8min
+completed: 2026-02-24
+---
+
+# Phase 01 Plan 01: Data Foundation Summary
+
+**SQL migration + TypeScript model layer for monitoring: service_health_checks and alert_events tables with HealthCheckModel and AlertEventModel static classes using getSupabaseServiceClient() per-method**
+
+## Performance
+
+- **Duration:** ~8 min
+- **Started:** 2026-02-24T16:29:39Z
+- **Completed:** 2026-02-24T16:37:45Z
+- **Tasks:** 2
+- **Files modified:** 4
+
+## Accomplishments
+
+- Created migration 012 with both monitoring tables, CHECK constraints on all enum columns, JSONB columns for flexible metadata, INFR-01 required created_at indexes, composite indexes for dashboard queries, and RLS on both tables
+- Created HealthCheckModel with 4 static methods: create (with input validation), findLatestByService, findAll (with optional filters), deleteOlderThan (30-day retention)
+- Created AlertEventModel with 6 static methods: create (with validation), findActive, acknowledge, resolve, findRecentByService (deduplication support for Phase 2), deleteOlderThan
+- Updated models/index.ts with barrel exports for both model classes and all 4 types
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create monitoring tables migration** - `ad6f452` (feat)
+2. **Task 2: Create HealthCheckModel and AlertEventModel with barrel exports** - `4a620b4` (feat)
+
+**Plan metadata:** (docs commit — see below)
+
+## Files Created/Modified
+
+- `backend/src/models/migrations/012_create_monitoring_tables.sql` - DDL for service_health_checks and alert_events tables with indexes and RLS
+- `backend/src/models/HealthCheckModel.ts` - CRUD model for service_health_checks table, exports ServiceHealthCheck and CreateHealthCheckData types
+- `backend/src/models/AlertEventModel.ts` - CRUD model for alert_events table with lifecycle methods (acknowledge/resolve), exports AlertEvent and CreateAlertEventData types
+- `backend/src/models/index.ts` - Added barrel exports for both new models and their types
+
+## Decisions Made
+
+- TEXT + CHECK constraint used for status/alert_type columns (not PostgreSQL ENUM types) — consistent with existing project pattern established in prior migrations
+- getSupabaseServiceClient() called per-method (not cached at module level) — per Research finding about potential Cloud Function cold start issues with module-level Supabase client caching
+- checked_at column kept separate from created_at on service_health_checks — probe timestamp vs. DB write timestamp are logically distinct (Research Pitfall 5)
+- No rollback scripts in migration — forward-only per user decision documented in CONTEXT.md
+
+## Deviations from Plan
+
+None — plan executed exactly as written.
+
+## Issues Encountered
+
+- Branch mismatch: Task 1 committed to `gsd/phase-01-data-foundation`, Task 2 accidentally committed to `upgrade/firebase-functions-v7-nodejs22` due to shell context reset between Bash calls. Resolved by cherry-picking the Task 2 commit onto the correct GSD branch.
+- Pre-existing TypeScript error in `backend/src/config/env.ts` (Type 'never' has no call signatures) — unrelated to this plan's changes, deferred per scope boundary rules.
+
+## User Setup Required
+
+None — no external service configuration required. The migration file must be run against the Supabase database when ready (will be part of the migration runner invocation in a future phase or manually via Supabase dashboard SQL editor).
+
+## Next Phase Readiness
+
+- Both tables and both models are ready. Phase 2 health probe services can import HealthCheckModel and AlertEventModel from `backend/src/models` immediately.
+- The findRecentByService method on AlertEventModel is ready for Phase 2 alert deduplication.
+- The deleteOlderThan method on both models is ready for Phase 2 scheduler retention enforcement.
+- Migration 012 needs to be applied to the Supabase database before any runtime writes will succeed.
+
+---
+*Phase: 01-data-foundation*
+*Completed: 2026-02-24*
+
+## Self-Check: PASSED
+
+- FOUND: backend/src/models/migrations/012_create_monitoring_tables.sql
+- FOUND: backend/src/models/HealthCheckModel.ts
+- FOUND: backend/src/models/AlertEventModel.ts
+- FOUND: .planning/phases/01-data-foundation/01-01-SUMMARY.md
+- FOUND commit: ad6f452 (Task 1)
+- FOUND commit: 4a620b4 (Task 2)
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-02-PLAN.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-02-PLAN.md
@@ -0,0 +1,194 @@
+---
+phase: 01-data-foundation
+plan: 02
+type: execute
+wave: 2
+depends_on:
+  - 01-01
+files_modified:
+  - backend/src/__tests__/models/HealthCheckModel.test.ts
+  - backend/src/__tests__/models/AlertEventModel.test.ts
+autonomous: true
+requirements:
+  - INFR-01
+  - INFR-04
+
+must_haves:
+  truths:
+    - "HealthCheckModel CRUD methods work correctly with mocked Supabase client"
+    - "AlertEventModel CRUD methods work correctly with mocked Supabase client"
+    - "Input validation rejects invalid status values and empty service names"
+    - "Models use getSupabaseServiceClient (not getSupabaseClient or getPostgresPool)"
+  artifacts:
+    - path: "backend/src/__tests__/models/HealthCheckModel.test.ts"
+      provides: "Unit tests for HealthCheckModel"
+      contains: "HealthCheckModel"
+    - path: "backend/src/__tests__/models/AlertEventModel.test.ts"
+      provides: "Unit tests for AlertEventModel"
+      contains: "AlertEventModel"
+  key_links:
+    - from: "backend/src/__tests__/models/HealthCheckModel.test.ts"
+      to: "backend/src/models/HealthCheckModel.ts"
+      via: "import HealthCheckModel"
+      pattern: "import.*HealthCheckModel"
+    - from: "backend/src/__tests__/models/AlertEventModel.test.ts"
+      to: "backend/src/models/AlertEventModel.ts"
+      via: "import AlertEventModel"
+      pattern: "import.*AlertEventModel"
+---
+
+<objective>
+Create unit tests for both monitoring model classes to verify CRUD operations, input validation, and correct Supabase client usage.
+
+Purpose: Ensure model layer works correctly before Phase 2 services depend on it. Verify INFR-04 compliance (models use existing Supabase connection) and that input validation catches bad data before it hits the database.
+
+Output: Two test files covering all model static methods with mocked Supabase client.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/01-data-foundation/01-RESEARCH.md
+@.planning/phases/01-data-foundation/01-01-SUMMARY.md
+
+# Test patterns
+@backend/src/__tests__/mocks/logger.mock.ts
+@backend/src/__tests__/utils/test-helpers.ts
+@.planning/codebase/TESTING.md
+
+# Models to test
+@backend/src/models/HealthCheckModel.ts
+@backend/src/models/AlertEventModel.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create HealthCheckModel unit tests</name>
+  <files>backend/src/__tests__/models/HealthCheckModel.test.ts</files>
+  <action>
+Create unit tests for HealthCheckModel using Vitest. This is the first model test in the project, so establish the Supabase mocking pattern.
+
+**Supabase mock setup:**
+- Mock `../config/supabase` module using `vi.mock()`
+- Create a mock Supabase client with chainable methods: `.from()` returns object with `.insert()`, `.select()`, `.single()`, `.order()`, `.limit()`, `.eq()`, `.lt()`, `.delete()`
+- Each chainable method returns the mock object (fluent pattern) except terminal methods (`.single()`, `.select()` at end) which return `{ data, error }`
+- Mock `getSupabaseServiceClient` to return the mock client
+- Also mock `'../utils/logger'` using the existing `logger.mock.ts` pattern
+
+**Test suites:**
+
+`describe('HealthCheckModel')`:
+
+`describe('create')`:
+- `test('creates a health check with valid data')` — call with { service_name: 'document_ai', status: 'healthy', latency_ms: 150 }, verify Supabase insert called with correct data, verify returned record matches
+- `test('creates a health check with minimal data')` — call with only required fields (service_name, status), verify optional fields not included
+- `test('creates a health check with probe_details')` — include JSONB probe_details, verify passed through
+- `test('throws on empty service_name')` — expect Error thrown before Supabase is called
+- `test('throws on invalid status')` — pass status 'unknown', expect Error thrown before Supabase is called
+- `test('throws on Supabase error')` — mock Supabase returning { data: null, error: { message: 'connection failed' } }, verify error thrown with descriptive message
+- `test('logs error on Supabase failure')` — verify logger.error called with error details
+
+`describe('findLatestByService')`:
+- `test('returns latest health check for service')` — mock Supabase returning a record, verify correct table and filters used
+- `test('returns null when no records found')` — mock Supabase returning null/empty, verify null returned (not thrown)
+
+`describe('findAll')`:
+- `test('returns health checks with default limit')` — verify limit 100 applied
+- `test('filters by serviceName when provided')` — verify .eq() called with service_name
+- `test('respects custom limit')` — pass limit: 50, verify .limit(50)
+
+`describe('deleteOlderThan')`:
+- `test('deletes records older than specified days')` — verify .lt() called with correct date calculation
+- `test('returns count of deleted records')` — mock returning count
+
+**Pattern notes:**
+- Use `describe`/`test` (not `it`) to match project convention
+- Use `beforeEach` to reset mocks between tests: `vi.clearAllMocks()`
+- Verify `getSupabaseServiceClient` is called per method invocation (INFR-04 pattern)
+- Import from vitest: `{ describe, test, expect, vi, beforeEach }`
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/models/HealthCheckModel.test.ts --reporter=verbose 2>&1 | tail -30</automated>
+  </verify>
+  <done>All HealthCheckModel tests pass. Tests cover create (valid, minimal, with probe_details), input validation (empty name, invalid status), Supabase error handling, findLatestByService (found, not found), findAll (default, filtered, custom limit), deleteOlderThan.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create AlertEventModel unit tests</name>
+  <files>backend/src/__tests__/models/AlertEventModel.test.ts</files>
+  <action>
+Create unit tests for AlertEventModel following the same Supabase mocking pattern established in HealthCheckModel tests.
+
+**Reuse the same mock setup pattern** from Task 1 (mock getSupabaseServiceClient and logger).
+
+**Test suites:**
+
+`describe('AlertEventModel')`:
+
+`describe('create')`:
+- `test('creates an alert event with valid data')` — call with { service_name: 'claude_ai', alert_type: 'service_down', message: 'API returned 503' }, verify insert called, verify returned record
+- `test('defaults status to active')` — create without explicit status, verify 'active' sent to Supabase
+- `test('creates with explicit status')` — pass status: 'acknowledged', verify it is used
+- `test('creates with details JSONB')` — include details object, verify passed through
+- `test('throws on empty service_name')` — expect Error before Supabase call
+- `test('throws on invalid alert_type')` — pass alert_type: 'warning', expect Error
+- `test('throws on invalid status')` — pass status: 'pending', expect Error
+- `test('throws on Supabase error')` — mock error response, verify descriptive throw
+
+`describe('findActive')`:
+- `test('returns active alerts')` — mock returning array of active alerts, verify .eq('status', 'active')
+- `test('filters by serviceName when provided')` — verify additional .eq() for service_name
+- `test('returns empty array when no active alerts')` — mock returning empty array
+
+`describe('acknowledge')`:
+- `test('sets status to acknowledged with timestamp')` — verify .update() called with { status: 'acknowledged', acknowledged_at: expect.any(String) }
+- `test('throws when alert not found')` — mock Supabase returning null/error, verify error thrown
+
+`describe('resolve')`:
+- `test('sets status to resolved with timestamp')` — verify .update() with { status: 'resolved', resolved_at: expect.any(String) }
+- `test('throws when alert not found')` — verify error handling
+
+`describe('findRecentByService')`:
+- `test('finds recent alert within time window')` — mock returning a match, verify filters for service_name, alert_type, and created_at > threshold
+- `test('returns null when no recent alerts')` — mock returning empty, verify null
+
+`describe('deleteOlderThan')`:
+- `test('deletes records older than specified days')` — same pattern as HealthCheckModel
+- `test('returns count of deleted records')` — verify count
+
+**Pattern notes:**
+- Same mock setup as HealthCheckModel test
+- Same beforeEach/clearAllMocks pattern
+- Verify getSupabaseServiceClient called per method
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/models/AlertEventModel.test.ts --reporter=verbose 2>&1 | tail -30</automated>
+  </verify>
+  <done>All AlertEventModel tests pass. Tests cover create (valid, default status, explicit status, with details), input validation (empty name, invalid alert_type, invalid status), Supabase error handling, findActive (all, filtered, empty), acknowledge, resolve, findRecentByService (found, not found), deleteOlderThan.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `cd backend && npx vitest run src/__tests__/models/ --reporter=verbose` — all model tests pass
+2. `cd backend && npx vitest run --reporter=verbose` — full test suite still passes (no regressions)
+3. Tests mock `getSupabaseServiceClient` (not `getSupabaseClient` or `getPostgresPool`) confirming INFR-04 compliance
+</verification>
+
+<success_criteria>
+- All HealthCheckModel tests pass covering create, findLatestByService, findAll, deleteOlderThan, plus validation errors
+- All AlertEventModel tests pass covering create, findActive, acknowledge, resolve, findRecentByService, deleteOlderThan, plus validation errors
+- Existing test suite continues to pass (no regressions)
+- Supabase mocking pattern established for future model tests
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/01-data-foundation/01-02-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-02-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-02-SUMMARY.md
@@ -0,0 +1,124 @@
+---
+phase: 01-data-foundation
+plan: 02
+subsystem: testing
+tags: [vitest, supabase, mocking, unit-tests, health-checks, alert-events]
+
+# Dependency graph
+requires:
+  - phase: 01-data-foundation/01-01
+    provides: HealthCheckModel and AlertEventModel classes with getSupabaseServiceClient usage
+
+provides:
+  - Unit tests for HealthCheckModel covering all CRUD methods and input validation
+  - Unit tests for AlertEventModel covering all CRUD methods, status transitions, and input validation
+  - Supabase chainable mock pattern for future model tests
+  - INFR-04 compliance verification (models call getSupabaseServiceClient per invocation)
+
+affects:
+  - 02-monitoring-services
+  - future model tests
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "Supabase chainable mock: makeSupabaseChain() helper with fluent vi.fn() returns and thenability for awaitable queries"
+    - "vi.mock hoisting: factory functions use only inline vi.fn() to avoid temporal dead zone errors"
+    - "vi.mocked() for typed access to mocked module exports after import"
+
+key-files:
+  created:
+    - backend/src/__tests__/models/HealthCheckModel.test.ts
+    - backend/src/__tests__/models/AlertEventModel.test.ts
+  modified: []
+
+key-decisions:
+  - "Supabase mock uses thenability (chain.then) so both .single() and direct await patterns work without duplicating mocks"
+  - "makeSupabaseChain() factory encapsulates mock setup — one call per test, no shared state between tests"
+  - "vi.mock() factories use only inline vi.fn() — no top-level variable references to avoid hoisting TDZ errors"
+
+patterns-established:
+  - "Model test pattern: vi.mock both supabase and logger, import vi.mocked() typed refs, makeSupabaseChain() per test, clearAllMocks in beforeEach"
+  - "Validation test pattern: verify getSupabaseServiceClient not called when validation throws (confirms no DB hit)"
+  - "PGRST116 null return: mock error.code = 'PGRST116' to test no-rows path that returns null instead of throwing"
+
+requirements-completed: [INFR-01, INFR-04]
+
+# Metrics
+duration: 26min
+completed: 2026-02-24
+---
+
+# Phase 01 Plan 02: Model Unit Tests Summary
+
+**33 unit tests for HealthCheckModel and AlertEventModel with Vitest + Supabase chainable mock pattern**
+
+## Performance
+
+- **Duration:** 26 min
+- **Started:** 2026-02-24T16:46:26Z
+- **Completed:** 2026-02-24T17:13:22Z
+- **Tasks:** 2
+- **Files modified:** 2
+
+## Accomplishments
+
+- HealthCheckModel: 14 tests covering create (valid/minimal/probe_details), 2 validation error paths, Supabase error + error logging, findLatestByService (found/null PGRST116), findAll (default limit/filtered/custom limit), deleteOlderThan (date calculation/count)
+- AlertEventModel: 19 tests covering create (valid/default status/explicit status/details JSONB), 3 validation error paths, Supabase error, findActive (all/filtered/empty), acknowledge/resolve (success + PGRST116 not-found), findRecentByService (found/null), deleteOlderThan
+- Established `makeSupabaseChain()` helper pattern for all future model tests — single source for mock client setup with fluent chain and thenable resolution
+- Full test suite (41 tests) passes with no regressions
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create HealthCheckModel unit tests** - `99c6dcb` (test)
+2. **Task 2: Create AlertEventModel unit tests** - `a3cd82b` (test)
+
+**Plan metadata:** (docs commit to follow)
+
+## Files Created/Modified
+
+- `backend/src/__tests__/models/HealthCheckModel.test.ts` - 14 unit tests for HealthCheckModel CRUD, validation, and error handling
+- `backend/src/__tests__/models/AlertEventModel.test.ts` - 19 unit tests for AlertEventModel CRUD, status transitions, validation, and error handling
+
+## Decisions Made
+
+- Supabase mock uses `chain.then` (thenability) so both `.single()` and direct `await query` patterns work from the same mock object — no need to bifurcate mocks for the two query termination patterns the models use.
+- `makeSupabaseChain(resolvedValue)` factory creates a fresh mock per test — avoids state leakage between tests that would occur with a shared top-level mock object.
+- `vi.mock()` factories use only inline `vi.fn()` — top-level variable references are in temporal dead zone when hoisted factories execute.
+
+## Deviations from Plan
+
+**1. [Rule 1 - Bug] Fixed Vitest hoisting TDZ error in initial mock approach**
+- **Found during:** Task 1 (first test run)
+- **Issue:** Initial approach created top-level mock variables, then referenced them inside `vi.mock()` factory — Vitest hoists `vi.mock` before variable initialization, causing `ReferenceError: Cannot access 'mockGetSupabaseServiceClient' before initialization`
+- **Fix:** Rewrote mock factories to use only inline `vi.fn()`, then used `vi.mocked()` after imports to get typed references
+- **Files modified:** `backend/src/__tests__/models/HealthCheckModel.test.ts`
+- **Verification:** Tests ran successfully on second attempt; this pattern used for AlertEventModel from the start
+- **Committed in:** 99c6dcb (Task 1 commit, updated file)
+
+---
+
+**Total deviations:** 1 auto-fixed (Rule 1 — runtime error in test infrastructure)
+**Impact on plan:** Fix was required for tests to run. Resulted in a cleaner, idiomatic Vitest mock pattern.
+
+## Issues Encountered
+
+- Vitest mock hoisting TDZ: the correct pattern is `vi.mock()` factory uses only `vi.fn()` inline, with `vi.mocked()` used post-import for typed access. Documented in patterns-established for all future test authors.
+
+## User Setup Required
+
+None - no external service configuration required.
+
+## Next Phase Readiness
+
+- Both model classes verified correct through unit tests
+- Supabase mock pattern established — Phase 2 service tests can reuse `makeSupabaseChain()` helper
+- INFR-04 compliance confirmed: tests verify `getSupabaseServiceClient` is called per-method invocation
+- Ready for Phase 2: monitoring services that depend on HealthCheckModel and AlertEventModel
+
+---
+*Phase: 01-data-foundation*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-CONTEXT.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-CONTEXT.md
@@ -0,0 +1,63 @@
+# Phase 1: Data Foundation - Context
+
+**Gathered:** 2026-02-24
+**Status:** Ready for planning
+
+<domain>
+## Phase Boundary
+
+Create database schema (tables, indexes, migrations) and model layer for the monitoring system. Requirements: INFR-01 (tables with indexes), INFR-04 (use existing Supabase connection). No services, no API routes, no frontend work.
+
+</domain>
+
+<decisions>
+## Implementation Decisions
+
+### Migration approach
+- Use the existing `DatabaseMigrator` class in `backend/src/models/migrate.ts`
+- New `.sql` files go in `src/models/migrations/`, run with `npm run db:migrate`
+- The migrator tracks applied migrations in a `migrations` table — handles idempotency
+- Forward-only migrations (no rollback/down scripts). If something needs fixing, write a new migration.
+- Migrations execute via `supabase.rpc('exec_sql', { sql })` — works with cloud Supabase from any environment including Firebase
+
+### Schema details
+- Status fields use TEXT with CHECK constraints (e.g., `CHECK (status IN ('healthy','degraded','down'))`) — easy to extend, no enum type management
+- Table names are descriptive, matching existing style: `service_health_checks`, `alert_events` (like `processing_jobs`, `document_chunks`)
+- Include JSONB `probe_details` / `details` columns for flexible metadata per service (response codes, error specifics) without future schema changes
+- All tables get indexes on `created_at` (required for 30-day retention queries and dashboard time-range filters)
+- Enable Row Level Security on new tables — admin-only access, matching existing security patterns
+
+### Model layer pattern
+- One model file per table: `HealthCheckModel.ts`, `AlertEventModel.ts`
+- Static methods on model classes (e.g., `AlertEventModel.create()`, `AlertEventModel.findActive()`) — matches `DocumentModel.ts` pattern
+- Use `getSupabaseServiceClient()` (PostgREST) for all monitoring reads/writes — monitoring is not on the critical processing path, so no need for direct PostgreSQL pool
+- Input validation in the model layer before writing (defense in depth alongside DB CHECK constraints)
+
+### Claude's Discretion
+- Exact column types for non-status fields (INTEGER vs BIGINT for latency_ms, etc.)
+- Whether to create a shared base model or keep models independent
+- Index strategy beyond created_at (e.g., composite indexes on service_name + created_at)
+- Winston logging patterns within model methods
+
+</decisions>
+
+<specifics>
+## Specific Ideas
+
+- The existing `performance_metrics` table already exists but nothing writes to it — verify its schema before building on it
+- Research found that `uploadMonitoringService.ts` stores data in-memory only — the new persistent tables replace this pattern
+- The `ProcessingJobModel.ts` uses direct PostgreSQL for critical writes as a pattern reference, but monitoring tables don't need this
+
+</specifics>
+
+<deferred>
+## Deferred Ideas
+
+None — discussion stayed within phase scope
+
+</deferred>
+
+---
+
+*Phase: 01-data-foundation*
+*Context gathered: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-RESEARCH.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-RESEARCH.md
@@ -0,0 +1,416 @@
+# Phase 1: Data Foundation - Research
+
+**Researched:** 2026-02-24
+**Domain:** PostgreSQL schema design, Supabase PostgREST model layer, TypeScript static class pattern
+**Confidence:** HIGH
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+
+#### Migration approach
+- Use the existing `DatabaseMigrator` class in `backend/src/models/migrate.ts`
+- New `.sql` files go in `src/models/migrations/`, run with `npm run db:migrate`
+- The migrator tracks applied migrations in a `migrations` table — handles idempotency
+- Forward-only migrations (no rollback/down scripts). If something needs fixing, write a new migration.
+- Migrations execute via `supabase.rpc('exec_sql', { sql })` — works with cloud Supabase from any environment including Firebase
+
+#### Schema details
+- Status fields use TEXT with CHECK constraints (e.g., `CHECK (status IN ('healthy','degraded','down'))`) — easy to extend, no enum type management
+- Table names are descriptive, matching existing style: `service_health_checks`, `alert_events` (like `processing_jobs`, `document_chunks`)
+- Include JSONB `probe_details` / `details` columns for flexible metadata per service (response codes, error specifics) without future schema changes
+- All tables get indexes on `created_at` (required for 30-day retention queries and dashboard time-range filters)
+- Enable Row Level Security on new tables — admin-only access, matching existing security patterns
+
+#### Model layer pattern
+- One model file per table: `HealthCheckModel.ts`, `AlertEventModel.ts`
+- Static methods on model classes (e.g., `AlertEventModel.create()`, `AlertEventModel.findActive()`) — matches `DocumentModel.ts` pattern
+- Use `getSupabaseServiceClient()` (PostgREST) for all monitoring reads/writes — monitoring is not on the critical processing path, so no need for direct PostgreSQL pool
+- Input validation in the model layer before writing (defense in depth alongside DB CHECK constraints)
+
+### Claude's Discretion
+- Exact column types for non-status fields (INTEGER vs BIGINT for latency_ms, etc.)
+- Whether to create a shared base model or keep models independent
+- Index strategy beyond created_at (e.g., composite indexes on service_name + created_at)
+- Winston logging patterns within model methods
+
+### Deferred Ideas (OUT OF SCOPE)
+None — discussion stayed within phase scope
+</user_constraints>
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| INFR-01 | Database migrations create `service_health_checks` and `alert_events` tables with indexes on `created_at` | Migration file naming convention (012_), `CREATE TABLE IF NOT EXISTS` + `CREATE INDEX IF NOT EXISTS` patterns from migration 005/010; TEXT+CHECK for status; JSONB for probe_details; TIMESTAMP WITH TIME ZONE for created_at |
+| INFR-04 | Analytics writes use existing Supabase connection, no new database infrastructure | `getSupabaseServiceClient()` already exported from `config/supabase.ts`; PostgREST `.from().insert().select().single()` pattern confirmed in DocumentModel.ts; monitoring path is not critical so no need for direct pg pool |
+</phase_requirements>
+
+---
+
+## Summary
+
+Phase 1 is a pure database + model layer task. No services, routes, or frontend changes. The existing codebase has a well-established pattern: SQL migration files in `backend/src/models/migrations/` (sequentially numbered), a `DatabaseMigrator` class that tracks and runs them via `supabase.rpc('exec_sql')`, and TypeScript model classes with static methods using `getSupabaseServiceClient()`. All of this exists and works — the task is to follow it precisely.
+
+The most important finding is that `getSupabaseServiceClient()` creates a **new client on every call** (no singleton caching, unlike `getSupabaseClient()`). This is intentional for the service-key client but means model methods must call it per-operation, not store it at module level. Existing models follow both patterns — `ProcessingJobModel.ts` calls `getSupabaseServiceClient()` inline where needed, while `DocumentModel.ts` uses the same inline-call approach. Either is fine; inline-per-method is most consistent.
+
+The codebase has no RLS SQL in any existing migration — existing tables pre-date or omit RLS. The CONTEXT.md requires RLS on the new tables, so this is new territory within this project. The pattern is standard Supabase RLS (`ALTER TABLE ... ENABLE ROW LEVEL SECURITY` + `CREATE POLICY`) and well-documented, but it is new to these migrations and worth verifying against the actual Supabase RLS policy syntax for service-role key bypass.
+
+**Primary recommendation:** Create migration `012_create_monitoring_tables.sql` following the pattern of `005_create_processing_jobs_table.sql`, then create `HealthCheckModel.ts` and `AlertEventModel.ts` following the `DocumentModel.ts` static-class pattern, using `getSupabaseServiceClient()` per method.
+
+---
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `@supabase/supabase-js` | Already installed | PostgREST client for model layer reads/writes | Locked: project uses Supabase exclusively; `getSupabaseServiceClient()` already in `config/supabase.ts` |
+| PostgreSQL (via Supabase) | Cloud-managed | Table storage, indexes, CHECK constraints, RLS | Already the only database; no new infrastructure |
+| TypeScript | Already installed | Model type definitions | Project-wide strict TypeScript |
+| Winston logger | Already installed | Logging within model methods | `backend/src/utils/logger.ts` — NEVER `console.log` per `.cursorrules` |
+
+### Supporting
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| `pg` (Pool) | Already installed | Direct PostgreSQL for critical-path writes | NOT needed here — monitoring is not critical path; use PostgREST only |
+
+### Alternatives Considered
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| `getSupabaseServiceClient()` | `getPostgresPool()` | Direct pg bypasses PostgREST cache (only relevant for critical-path inserts); monitoring writes can tolerate PostgREST; service client is simpler and sufficient |
+| TEXT + CHECK constraint | PostgreSQL ENUM | ENUMs require `CREATE TYPE` and are harder to extend; TEXT+CHECK confirmed pattern in `processing_jobs`, `agent_executions`, `users` tables |
+| Separate model files | Shared BaseModel class | A shared base would add indirection with minimal benefit for two small models; keep independent, consistent with existing models |
+
+**Installation:** No new packages needed — all dependencies already installed.
+
+---
+
+## Architecture Patterns
+
+### Recommended Project Structure
+
+New files slot into existing structure:
+
+```
+backend/src/
+├── models/
+│   ├── migrations/
+│   │   └── 012_create_monitoring_tables.sql   # NEW
+│   ├── HealthCheckModel.ts                    # NEW
+│   ├── AlertEventModel.ts                     # NEW
+│   └── index.ts                               # UPDATE: add exports
+```
+
+**Migration numbering:** Current highest is `011_create_vector_database_tables.sql`. Next must be `012_`.
+
+### Pattern 1: SQL Migration File
+
+**What:** `CREATE TABLE IF NOT EXISTS` with CHECK constraints, followed by `CREATE INDEX IF NOT EXISTS` for every planned query pattern.
+**When to use:** All schema changes — always forward-only.
+
+```sql
+-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
+-- Migration: Create monitoring tables
+-- Created: 2026-02-24
+
+CREATE TABLE IF NOT EXISTS service_health_checks (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    service_name VARCHAR(100) NOT NULL,
+    status TEXT NOT NULL CHECK (status IN ('healthy', 'degraded', 'down')),
+    latency_ms INTEGER,
+    checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    probe_details JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_service_health_checks_created_at ON service_health_checks(created_at);
+CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_name ON service_health_checks(service_name);
+CREATE INDEX IF NOT EXISTS idx_service_health_checks_service_created ON service_health_checks(service_name, created_at);
+
+CREATE TABLE IF NOT EXISTS alert_events (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    service_name VARCHAR(100) NOT NULL,
+    alert_type TEXT NOT NULL CHECK (alert_type IN ('service_down', 'service_degraded', 'recovery')),
+    status TEXT NOT NULL CHECK (status IN ('active', 'resolved')),
+    details JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    resolved_at TIMESTAMP WITH TIME ZONE
+);
+
+CREATE INDEX IF NOT EXISTS idx_alert_events_created_at ON alert_events(created_at);
+CREATE INDEX IF NOT EXISTS idx_alert_events_status ON alert_events(status);
+CREATE INDEX IF NOT EXISTS idx_alert_events_service_name ON alert_events(service_name);
+
+-- RLS
+ALTER TABLE service_health_checks ENABLE ROW LEVEL SECURITY;
+ALTER TABLE alert_events ENABLE ROW LEVEL SECURITY;
+
+-- Service role bypasses RLS automatically in Supabase;
+-- anon/authenticated roles get no access by default when RLS is enabled with no policies
+-- Add explicit deny-all or admin-only policies if needed
+```
+
+### Pattern 2: TypeScript Model Class (Static Methods)
+
+**What:** Exported class with static async methods. Each method calls `getSupabaseServiceClient()` inline (not cached at module level for service client). Uses `logger` from `utils/logger`. Validates input before writing.
+**When to use:** All model methods — matches `DocumentModel.ts` exactly.
+
+```typescript
+// Source: backend/src/models/DocumentModel.ts (verified pattern)
+import { getSupabaseServiceClient } from '../config/supabase';
+import { logger } from '../utils/logger';
+
+export interface ServiceHealthCheck {
+  id: string;
+  service_name: string;
+  status: 'healthy' | 'degraded' | 'down';
+  latency_ms?: number;
+  checked_at: string;
+  probe_details?: Record<string, unknown>;
+  created_at: string;
+}
+
+export interface CreateHealthCheckData {
+  service_name: string;
+  status: 'healthy' | 'degraded' | 'down';
+  latency_ms?: number;
+  probe_details?: Record<string, unknown>;
+}
+
+export class HealthCheckModel {
+  static async create(data: CreateHealthCheckData): Promise<ServiceHealthCheck> {
+    // Input validation
+    if (!data.service_name) throw new Error('service_name is required');
+    if (!['healthy', 'degraded', 'down'].includes(data.status)) {
+      throw new Error(`Invalid status: ${data.status}`);
+    }
+
+    try {
+      const supabase = getSupabaseServiceClient();
+      const { data: record, error } = await supabase
+        .from('service_health_checks')
+        .insert({
+          service_name: data.service_name,
+          status: data.status,
+          latency_ms: data.latency_ms,
+          probe_details: data.probe_details,
+        })
+        .select()
+        .single();
+
+      if (error) {
+        logger.error('Error creating health check', { error: error.message, data });
+        throw new Error(`Failed to create health check: ${error.message}`);
+      }
+      if (!record) throw new Error('Failed to create health check: No data returned');
+
+      logger.info('Health check recorded', { service: data.service_name, status: data.status });
+      return record;
+    } catch (error) {
+      logger.error('Error in HealthCheckModel.create', {
+        error: error instanceof Error ? error.message : String(error),
+        data,
+      });
+      throw error;
+    }
+  }
+}
+```
+
+### Pattern 3: Running the Migration
+
+**What:** `npm run db:migrate` calls `ts-node src/scripts/setup-database.ts`, which invokes `DatabaseMigrator.migrate()`. The migrator reads all `.sql` files from `migrations/` sorted alphabetically, checks the `migrations` table for each, and executes new ones via `supabase.rpc('exec_sql', { sql })`.
+
+**Important:** The migrator skips already-executed migrations by ID (filename without `.sql`). This is the idempotency mechanism — re-running `npm run db:migrate` is safe.
+
+### Anti-Patterns to Avoid
+
+- **Using `console.log` in model files:** Always use `logger` from `../utils/logger`. The project enforces this in `.cursorrules`.
+- **Using `getPostgresPool()` for monitoring writes:** Only needed for critical-path operations that hit PostgREST cache issues (`ProcessingJobModel` is the one exception). Monitoring writes are fire-and-forget; PostgREST is fine.
+- **Storing `getSupabaseServiceClient()` at module level:** The service client function creates a new client each call (no caching). Call it inside each method. (The anon client `getSupabaseClient()` does cache, but monitoring models use the service client.)
+- **Using `any` type in TypeScript interfaces:** Strict TypeScript — use `Record<string, unknown>` for JSONB columns, or specific typed interfaces.
+- **Skipping `CREATE TABLE IF NOT EXISTS` / `CREATE INDEX IF NOT EXISTS`:** All migration DDL in this codebase uses `IF NOT EXISTS`. Never omit it.
+- **Writing a rollback/down script:** Forward-only migrations only. If schema needs fixing, write `013_fix_...sql`.
+- **Numbering the migration `11_` or `11`:** Must be zero-padded to three digits: `012_`.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Migration tracking / idempotency | Custom migration table logic | Existing `DatabaseMigrator` in `migrate.ts` | Already handles migrations table, skip-if-executed logic, error logging |
+| Supabase client instantiation | New client setup | `getSupabaseServiceClient()` from `config/supabase.ts` | Handles auth, timeout, headers; INFR-04 requires no new DB connections |
+| Input validation before write | Runtime type guards | Manual validation in model (project pattern) | `DocumentModel` and `ProcessingJobModel` both validate before writing; adds defense in depth |
+| Logging | Direct `console.log` or custom logger | `logger` from `utils/logger` | Winston-backed, structured JSON, correlation ID support |
+
+**Key insight:** The migration infrastructure is already production-ready. Adding two SQL files and two TypeScript model classes is additive work, not infrastructure work.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Migration Numbering Gap or Conflict
+**What goes wrong:** A migration numbered `011_` or `012_` conflicts with an existing file, or the migration runs out of alphabetical order because numbering is inconsistent.
+**Why it happens:** Not checking what the current highest number is before creating a new file.
+**How to avoid:** Verify current highest (`011_create_vector_database_tables.sql`) — new file must be `012_create_monitoring_tables.sql`.
+**Warning signs:** Migration runs but skips one of the new tables; alphabetical sort puts new file before existing ones.
+
+### Pitfall 2: RLS Blocks Service-Role Reads
+**What goes wrong:** After enabling RLS, `getSupabaseServiceClient()` (which uses the service role key) cannot read or write rows.
+**Why it happens:** Misunderstanding of how Supabase RLS interacts with the service role. **Fact (HIGH confidence, Supabase docs):** The service role key **bypasses RLS by default**. Enabling RLS only restricts the anon key and authenticated-user JWTs. So `getSupabaseServiceClient()` will work fine with RLS enabled and no policies defined.
+**How to avoid:** No special policies needed for service-role access. If explicit policies are desired for documentation clarity, `CREATE POLICY "service_role_all" ON table USING (true)` with `TO service_role` works, but it is not required.
+**Warning signs:** Model methods return empty results or permission errors after migration runs.
+
+### Pitfall 3: JSONB Column Typing
+**What goes wrong:** TypeScript `probe_details` typed as `any`, then strict lint rules fail.
+**Why it happens:** JSONB has no enforced schema — the path of least resistance is `any`.
+**How to avoid:** Type as `Record<string, unknown> | null` or define a specific interface for common probe shapes. Accept that the TypeScript type is a superset of what the DB stores.
+**Warning signs:** `eslint` errors on `no-explicit-any` rule (project has strict TypeScript).
+
+### Pitfall 4: `latency_ms` Integer Overflow
+**What goes wrong:** PostgreSQL `INTEGER` maxes out at ~2.1 billion. For latency in milliseconds this is impossible to overflow (2.1B ms = 24 days). But for metrics that could store large values, `BIGINT` is safer.
+**Why it happens:** Defaulting to `INTEGER` without considering the value range.
+**How to avoid:** `INTEGER` is correct for `latency_ms` (milliseconds always fit). No overflow risk here.
+**Warning signs:** N/A for latency; only relevant if storing epoch timestamps or byte counts in integer columns.
+
+### Pitfall 5: Missing `checked_at` vs `created_at` Distinction
+**What goes wrong:** Using only `created_at` for health checks loses the distinction between "when the probe ran" and "when the row was inserted". These are usually the same, but could differ if inserts are batched or retried.
+**Why it happens:** Copying the `created_at = DEFAULT CURRENT_TIMESTAMP` pattern without thinking about the probe time.
+**How to avoid:** Include an explicit `checked_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP` column on `service_health_checks`. Let `created_at` be the insert time. When recording a health check, set `checked_at` explicitly to the moment the probe was made. The `created_at` index still covers retention queries; `checked_at` is the semantically accurate probe time.
+**Warning signs:** Dashboard shows "time checked" as several seconds after the actual API call.
+
+---
+
+## Code Examples
+
+Verified patterns from codebase:
+
+### Migration: Full SQL File Pattern
+```sql
+-- Source: backend/src/models/migrations/005_create_processing_jobs_table.sql (verified)
+-- Confirmed patterns: CREATE TABLE IF NOT EXISTS, UUID PK, TEXT CHECK constraint,
+--   TIMESTAMP WITH TIME ZONE, CREATE INDEX IF NOT EXISTS on created_at
+
+CREATE TABLE IF NOT EXISTS processing_jobs (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_created_at ON processing_jobs(created_at);
+```
+
+### DatabaseMigrator: How It Executes SQL
+```typescript
+// Source: backend/src/models/migrate.ts (verified)
+// Migration executes via:
+const { error } = await supabase.rpc('exec_sql', { sql: migration.sql });
+// Idempotency: checks `migrations` table by migration ID (filename without .sql)
+// Run via: npm run db:migrate → ts-node src/scripts/setup-database.ts
+```
+
+### Supabase Service Client: Per-Method Call Pattern
+```typescript
+// Source: backend/src/config/supabase.ts (verified)
+// getSupabaseServiceClient() creates a new client each call — no singleton
+export const getSupabaseServiceClient = (): SupabaseClient => {
+  // Creates new createClient(...) each invocation
+};
+
+// Correct usage in model methods:
+static async create(data: CreateData): Promise<Row> {
+  const supabase = getSupabaseServiceClient(); // Called inside method, not at module level
+  const { data: record, error } = await supabase.from('table').insert(data).select().single();
+}
+```
+
+### Model: Error Handling Pattern
+```typescript
+// Source: backend/src/models/ProcessingJobModel.ts (verified)
+// Error check pattern used throughout:
+if (error) {
+  if (error.code === 'PGRST116') {
+    return null; // Not found — not an error
+  }
+  logger.error('Error doing X', { error, id });
+  throw new Error(`Failed to do X: ${error.message}`);
+}
+```
+
+### Model Index Export
+```typescript
+// Source: backend/src/models/index.ts (verified)
+// New models must be added here:
+export { HealthCheckModel } from './HealthCheckModel';
+export { AlertEventModel } from './AlertEventModel';
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| In-memory `uploadMonitoringService` (UploadMonitoringService class with EventEmitter) | Persistent Supabase tables | Phase 1 introduces this | Data survives cold starts; enables 30-day retention; enables dashboard queries |
+| `any` type in model interfaces | `Record<string, unknown>` or typed interface | Project baseline | Strict TypeScript requirement |
+
+**Deprecated/outdated in this project:**
+- `uploadMonitoringService.ts` in-memory storage: Still used by existing routes but being superseded by persistent tables. Phase 1 does NOT modify `uploadMonitoringService.ts` — that is Phase 2+ work. This phase only creates the tables and model classes.
+
+---
+
+## Open Questions
+
+1. **RLS Policy Detail: Should we create explicit service-role policies or rely on implicit bypass?**
+   - What we know: Supabase service role key bypasses RLS by default. No policy needed for service-role access to work.
+   - What's unclear: The CONTEXT.md says "admin-only access, matching existing security patterns" — but no existing migration uses RLS, so there is no project pattern to match exactly.
+   - Recommendation: Enable RLS (`ALTER TABLE ... ENABLE ROW LEVEL SECURITY`) without creating any policies initially. The service-role key bypass is sufficient for all model-layer reads/writes. Add explicit policies in Phase 3 when admin API routes are added and authenticated user access may be needed.
+
+2. **`performance_metrics` table: Use or ignore?**
+   - What we know: `010_add_performance_metrics_and_events.sql` created a `performance_metrics` table but CONTEXT.md notes nothing writes to it. The new `service_health_checks` table is a different concept (external API health vs. internal processing metrics).
+   - What's unclear: Whether Phase 1 should verify the `performance_metrics` schema to avoid future confusion.
+   - Recommendation: No action needed in Phase 1. The CONTEXT.md note "verify its schema before building on it" is a Phase 2+ concern when writing to it. Phase 1 creates new tables only.
+
+3. **`checked_at` column: Explicit or use `created_at`?**
+   - What we know: `created_at` has the index required by INFR-01. Adding `checked_at` as a separate column is semantically better (Pitfall 5 above).
+   - What's unclear: Whether the planner wants both columns or a single `created_at`.
+   - Recommendation: Include both — `checked_at` (explicitly set when probe runs) and `created_at` (DB default). Index only `created_at` as required by INFR-01. This is Claude's discretion and adds minimal complexity.
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- `backend/src/models/migrate.ts` — Verified: migration execution mechanism, idempotency via `migrations` table, `supabase.rpc('exec_sql')` call
+- `backend/src/models/migrations/005_create_processing_jobs_table.sql` — Verified: `CREATE TABLE IF NOT EXISTS`, TEXT CHECK, UUID PK, `CREATE INDEX IF NOT EXISTS`, `TIMESTAMP WITH TIME ZONE`
+- `backend/src/models/migrations/010_add_performance_metrics_and_events.sql` — Verified: JSONB column pattern, index naming convention
+- `backend/src/config/supabase.ts` — Verified: `getSupabaseServiceClient()` creates new client per call (no caching); `getPostgresPool()` exists but for critical-path only
+- `backend/src/models/DocumentModel.ts` — Verified: static class pattern, `getSupabaseServiceClient()` inside methods, `logger.error()` with structured object, retry pattern
+- `backend/src/models/ProcessingJobModel.ts` — Verified: `PGRST116` not-found handling, static methods, logger usage
+- `backend/src/models/index.ts` — Verified: export pattern for new models
+- `backend/package.json` — Verified: `npm run db:migrate` runs `ts-node src/scripts/setup-database.ts`; `npm test` runs `vitest run`
+- `backend/vitest.config.ts` — Verified: Vitest framework, `src/__tests__/**/*.{test,spec}.{ts,js}` glob, 30s timeout
+- `.planning/config.json` — Verified: `workflow.nyquist_validation` not present → Validation Architecture section omitted
+
+### Secondary (MEDIUM confidence)
+- Supabase RLS service-role bypass behavior: Service role key bypasses RLS; this is standard Supabase behavior documented at supabase.com/docs. Confidence: HIGH from training data, not directly verified via web fetch in this session.
+
+### Tertiary (LOW confidence)
+- None — all critical claims verified against codebase directly.
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — all libraries already in codebase, verified in package.json and import statements
+- Architecture: HIGH — migration file structure, model class pattern, and export mechanism all verified from actual source files
+- Pitfalls: HIGH for migration numbering (files counted directly); HIGH for RLS service-role bypass (standard Supabase behavior); MEDIUM for `checked_at` recommendation (judgement call, not a verified bug)
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-03-25 (30 days — Supabase and TypeScript patterns are stable)
--- a/.planning/milestones/v1.0-phases/01-data-foundation/01-VERIFICATION.md
+++ b/.planning/milestones/v1.0-phases/01-data-foundation/01-VERIFICATION.md
@@ -0,0 +1,109 @@
+---
+phase: 01-data-foundation
+verified: 2026-02-24T13:00:00Z
+status: human_needed
+score: 3/4 success criteria verified
+re_verification:
+  previous_status: gaps_found
+  previous_score: 2/4
+  gaps_closed:
+    - "SC#1 table name mismatch — ROADMAP updated to use `service_health_checks` and `alert_events`; implementation matches"
+    - "SC#3 file name mismatch — ROADMAP updated to reference `AlertEventModel.ts`; implementation matches"
+  gaps_remaining: []
+  regressions: []
+human_verification:
+  - test: "Run migration 012 against the live Supabase instance"
+    expected: "Both `service_health_checks` and `alert_events` tables are created with all columns, CHECK constraints, indexes, and RLS enabled"
+    why_human: "Cannot execute SQL against the live Supabase instance from this environment; requires manual execution via Supabase Dashboard SQL editor or migration runner"
+---
+
+# Phase 01: Data Foundation Verification Report
+
+**Phase Goal:** The database schema for monitoring exists and the existing Supabase connection is the only data infrastructure used
+**Verified:** 2026-02-24T13:00:00Z
+**Status:** human_needed
+**Re-verification:** Yes — after ROADMAP success criteria updated to match finalized naming
+
+## Goal Achievement
+
+### Observable Truths (from ROADMAP Success Criteria)
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | `service_health_checks` and `alert_events` tables exist in Supabase with indexes on `created_at` | VERIFIED | Migration `012_create_monitoring_tables.sql` creates both tables; `idx_service_health_checks_created_at` (line 24) and `idx_alert_events_created_at` (line 52) present. Live DB execution requires human. |
+| 2 | All new tables use the existing Supabase client from `config/supabase.ts` — no new database connections added | VERIFIED | Both models import `getSupabaseServiceClient` from `'../config/supabase'` (line 1 of each); called per-method, not at module level; no `new Pool`, `new Client`, or `createClient` in either file |
+| 3 | `AlertEventModel.ts` exists and its CRUD methods can be called in isolation without errors | VERIFIED | `backend/src/models/AlertEventModel.ts` exists (343 lines, 6 static methods); 19 unit tests all pass |
+| 4 | Migration SQL can be run against the live Supabase instance and produces the expected schema | HUMAN NEEDED | SQL is syntactically valid and follows existing migration patterns; live execution cannot be verified programmatically |
+
+**Score:** 3/4 success criteria fully verified (1 human-needed)
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `backend/src/models/migrations/012_create_monitoring_tables.sql` | DDL for monitoring tables | VERIFIED | 65 lines; 2 tables with CHECK constraints, JSONB columns, 5 indexes total, RLS enabled on both |
+| `backend/src/models/HealthCheckModel.ts` | CRUD for service_health_checks | VERIFIED | 219 lines; 4 static methods; imports `getSupabaseServiceClient` and `logger`; exports `HealthCheckModel`, `ServiceHealthCheck`, `CreateHealthCheckData` |
+| `backend/src/models/AlertEventModel.ts` | CRUD for alert_events | VERIFIED | 343 lines; 6 static methods; imports `getSupabaseServiceClient` and `logger`; exports `AlertEventModel`, `AlertEvent`, `CreateAlertEventData` |
+| `backend/src/models/index.ts` | Barrel exports for new models | VERIFIED | Both models and all 4 types exported (lines 7-8, 11-12); existing exports unchanged |
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `HealthCheckModel.ts` | `backend/src/config/supabase.ts` | `getSupabaseServiceClient()` import | WIRED | Line 1: import confirmed; called on lines 49, 100, 139, 182 |
+| `AlertEventModel.ts` | `backend/src/config/supabase.ts` | `getSupabaseServiceClient()` import | WIRED | Line 1: import confirmed; called on lines 70, 122, 161, 207, 258, 307 |
+| `HealthCheckModel.ts` | `backend/src/utils/logger.ts` | Winston logger import | WIRED | Line 2: import confirmed; used in error/info calls throughout |
+| `AlertEventModel.ts` | `backend/src/utils/logger.ts` | Winston logger import | WIRED | Line 2: import confirmed; used in error/info calls throughout |
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|----------|
+| INFR-01 | 01-01-PLAN, 01-02-PLAN | Database migrations create service_health_checks and alert_events tables with indexes on created_at | SATISFIED | `idx_service_health_checks_created_at` and `idx_alert_events_created_at` in migration (lines 24, 52); 33 tests pass; marked complete in REQUIREMENTS.md |
+| INFR-04 | 01-01-PLAN, 01-02-PLAN | Analytics writes use existing Supabase connection, no new database infrastructure | SATISFIED | Both models call `getSupabaseServiceClient()` per-method; no `new Pool`, `new Client`, or `createClient` in new files; test mocks confirm the pattern; marked complete in REQUIREMENTS.md |
+
+No orphaned requirements. REQUIREMENTS.md traceability maps only INFR-01 and INFR-04 to Phase 1, both accounted for.
+
+### Anti-Patterns Found
+
+| File | Pattern | Severity | Impact |
+|------|---------|----------|--------|
+| None | — | — | No TODO/FIXME, no console.log, no return null stubs, no empty implementations found |
+
+### Test Results
+
+```
+Test Files  2 passed (2)
+     Tests  33 passed (33)
+  Duration  1.19s
+```
+
+All 33 tests pass. No regressions from initial verification.
+
+- `HealthCheckModel`: 14 tests covering create (valid/minimal/probe_details), validation (empty name, invalid status), Supabase error + error logging, findLatestByService (found/PGRST116 null), findAll (default limit/filtered/custom limit), deleteOlderThan (date calc/count)
+- `AlertEventModel`: 19 tests covering create (valid/default status/explicit status/JSONB details), validation (empty name, invalid alert_type, invalid status), Supabase error, findActive (all/filtered/empty), acknowledge/resolve (success/PGRST116 not-found), findRecentByService (found/null), deleteOlderThan
+
+### Human Verification Required
+
+**1. Migration Execution Against Live Supabase**
+
+**Test:** Run `backend/src/models/migrations/012_create_monitoring_tables.sql` against the live Supabase instance via the SQL editor or migration runner
+**Expected:** Both `service_health_checks` and `alert_events` tables created; all columns, CHECK constraints, JSONB columns, 5 indexes, and RLS appear when inspected in the Supabase table editor
+**Why human:** Cannot execute SQL against the live database from this verification environment
+
+### Re-Verification Summary
+
+Both gaps from the initial verification are now closed. The gaps were documentation alignment issues — the ROADMAP success criteria contained stale names from an earlier naming pass that did not survive into the finalized plan and implementation. The ROADMAP has been updated to match:
+
+- SC#1 now reads `service_health_checks` and `alert_events` (matching migration and models)
+- SC#3 now reads `AlertEventModel.ts` (matching the implemented file)
+
+The implementation was correct throughout both verifications. All automated checks pass. The one remaining item requiring human action is executing the migration SQL against the live Supabase instance — this was always a human-only step and is not a gap.
+
+**Phase goal is achieved:** The database schema for monitoring exists in the migration file and model layer, and the existing Supabase connection is the only data infrastructure used.
+
+---
+
+_Verified: 2026-02-24T13:00:00Z_
+_Verifier: Claude (gsd-verifier)_
+_Re-verification: Yes — after ROADMAP SC naming alignment_
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-01-PLAN.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-01-PLAN.md
@@ -0,0 +1,176 @@
+---
+phase: 02-backend-services
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - backend/src/models/migrations/013_create_processing_events_table.sql
+  - backend/src/services/analyticsService.ts
+  - backend/src/__tests__/unit/analyticsService.test.ts
+autonomous: true
+requirements: [ANLY-01, ANLY-03]
+
+must_haves:
+  truths:
+    - "recordProcessingEvent() writes to document_processing_events table via Supabase"
+    - "recordProcessingEvent() returns void (not Promise) so callers cannot accidentally await it"
+    - "A deliberate Supabase write failure logs an error but does not throw or reject"
+    - "deleteProcessingEventsOlderThan(30) removes rows older than 30 days"
+  artifacts:
+    - path: "backend/src/models/migrations/013_create_processing_events_table.sql"
+      provides: "document_processing_events table DDL with indexes and RLS"
+      contains: "CREATE TABLE IF NOT EXISTS document_processing_events"
+    - path: "backend/src/services/analyticsService.ts"
+      provides: "Fire-and-forget analytics event writer and retention delete"
+      exports: ["recordProcessingEvent", "deleteProcessingEventsOlderThan"]
+    - path: "backend/src/__tests__/unit/analyticsService.test.ts"
+      provides: "Unit tests for analyticsService"
+      min_lines: 50
+  key_links:
+    - from: "backend/src/services/analyticsService.ts"
+      to: "backend/src/config/supabase.ts"
+      via: "getSupabaseServiceClient() call"
+      pattern: "getSupabaseServiceClient"
+    - from: "backend/src/services/analyticsService.ts"
+      to: "document_processing_events table"
+      via: "void supabase.from('document_processing_events').insert(...)"
+      pattern: "void.*from\\('document_processing_events'\\)"
+---
+
+<objective>
+Create the analytics migration and fire-and-forget analytics service for persisting document processing events to Supabase.
+
+Purpose: ANLY-01 requires processing events to persist (not in-memory), and ANLY-03 requires instrumentation to be non-blocking. This plan creates the database table and the service that writes to it without blocking the processing pipeline.
+
+Output: Migration 013 SQL file, analyticsService.ts with recordProcessingEvent() and deleteProcessingEventsOlderThan(), and unit tests.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/02-backend-services/02-RESEARCH.md
+@.planning/phases/01-data-foundation/01-01-SUMMARY.md
+@.planning/phases/01-data-foundation/01-02-SUMMARY.md
+@backend/src/models/migrations/012_create_monitoring_tables.sql
+@backend/src/config/supabase.ts
+@backend/src/utils/logger.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create analytics migration and analyticsService</name>
+  <files>
+    backend/src/models/migrations/013_create_processing_events_table.sql
+    backend/src/services/analyticsService.ts
+  </files>
+  <action>
+**Migration 013:** Create `backend/src/models/migrations/013_create_processing_events_table.sql` following the exact pattern from migration 012. The table:
+
+```sql
+CREATE TABLE IF NOT EXISTS document_processing_events (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    document_id UUID NOT NULL,
+    user_id UUID NOT NULL,
+    event_type TEXT NOT NULL CHECK (event_type IN ('upload_started', 'processing_started', 'completed', 'failed')),
+    duration_ms INTEGER,
+    error_message TEXT,
+    stage TEXT,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_document_processing_events_created_at
+    ON document_processing_events(created_at);
+CREATE INDEX IF NOT EXISTS idx_document_processing_events_document_id
+    ON document_processing_events(document_id);
+
+ALTER TABLE document_processing_events ENABLE ROW LEVEL SECURITY;
+```
+
+**analyticsService.ts:** Create `backend/src/services/analyticsService.ts` with two exports:
+
+1. `recordProcessingEvent(data: ProcessingEventData): void` — Return type MUST be `void` (not `Promise<void>`) to prevent accidental `await`. Inside, call `getSupabaseServiceClient()` (per-method, not module level), then `void supabase.from('document_processing_events').insert({...}).then(({ error }) => { if (error) logger.error(...) })`. Never throw, never reject.
+
+2. `deleteProcessingEventsOlderThan(days: number): Promise<number>` — Compute cutoff date in JS (`new Date(Date.now() - days * 86400000).toISOString()`), then delete with `.lt('created_at', cutoff)`. Return the count of deleted rows. This follows the same pattern as `HealthCheckModel.deleteOlderThan()`.
+
+Export the `ProcessingEventData` interface:
+```typescript
+export interface ProcessingEventData {
+  document_id: string;
+  user_id: string;
+  event_type: 'upload_started' | 'processing_started' | 'completed' | 'failed';
+  duration_ms?: number;
+  error_message?: string;
+  stage?: string;
+}
+```
+
+Use Winston logger (`import { logger } from '../utils/logger'`). Use `getSupabaseServiceClient` from `'../config/supabase'`. Follow project naming conventions (camelCase file, named exports).
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
+    <manual>Verify 013 migration file exists and analyticsService exports recordProcessingEvent and deleteProcessingEventsOlderThan</manual>
+  </verify>
+  <done>Migration 013 creates document_processing_events table with indexes and RLS. analyticsService.ts exports recordProcessingEvent (void return) and deleteProcessingEventsOlderThan (Promise&lt;number&gt;). TypeScript compiles.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create analyticsService unit tests</name>
+  <files>
+    backend/src/__tests__/unit/analyticsService.test.ts
+  </files>
+  <action>
+Create `backend/src/__tests__/unit/analyticsService.test.ts` using the Vitest + Supabase mock pattern established in Phase 1 (01-02-SUMMARY.md).
+
+Mock setup:
+- `vi.mock('../../config/supabase')` with inline `vi.fn()` factory
+- `vi.mock('../../utils/logger')` with inline `vi.fn()` factory
+- Use `vi.mocked()` after import for typed access
+- `makeSupabaseChain()` helper per test (fresh mock state)
+
+Test cases for `recordProcessingEvent`:
+1. **Calls Supabase insert with correct data** — verify `.from('document_processing_events').insert(...)` called with expected fields including `created_at`
+2. **Return type is void (not a Promise)** — call `recordProcessingEvent(data)` and verify the return value is `undefined` (void), not a thenable
+3. **Logs error on Supabase failure but does not throw** — mock the `.then` callback with `{ error: { message: 'test error' } }`, verify `logger.error` was called
+4. **Handles optional fields (duration_ms, error_message, stage) as null** — pass data without optional fields, verify insert called with `null` for those columns
+
+Test cases for `deleteProcessingEventsOlderThan`:
+5. **Computes correct cutoff date and deletes** — mock Supabase delete chain, verify `.lt('created_at', ...)` called with ISO date string ~30 days ago
+6. **Returns count of deleted rows** — mock response with `data: [{}, {}, {}]` (3 rows), verify returns 3
+
+Use `beforeEach(() => vi.clearAllMocks())` for test isolation.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/analyticsService.test.ts --reporter=verbose 2>&1</automated>
+  </verify>
+  <done>All analyticsService tests pass. recordProcessingEvent verified as fire-and-forget (void return, error-swallowing). deleteProcessingEventsOlderThan verified with correct date math and row count return.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npx tsc --noEmit` passes with no errors from new files
+2. `npx vitest run src/__tests__/unit/analyticsService.test.ts` — all tests pass
+3. Migration 013 SQL is valid and follows 012 pattern
+4. `recordProcessingEvent` return type is `void` (not `Promise<void>`)
+</verification>
+
+<success_criteria>
+- Migration 013 creates document_processing_events table with id, document_id, user_id, event_type (CHECK constraint), duration_ms, error_message, stage, created_at
+- Indexes on created_at and document_id exist
+- RLS enabled on the table
+- analyticsService.recordProcessingEvent() is fire-and-forget (void return, no throw)
+- analyticsService.deleteProcessingEventsOlderThan() returns deleted row count
+- All unit tests pass
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/02-backend-services/02-01-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-01-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-01-SUMMARY.md
@@ -0,0 +1,139 @@
+---
+phase: 02-backend-services
+plan: 01
+subsystem: analytics
+tags: [supabase, vitest, fire-and-forget, analytics, postgresql, migrations]
+
+# Dependency graph
+requires:
+  - phase: 01-data-foundation/01-01
+    provides: getSupabaseServiceClient per-method call pattern, migration file format
+  - phase: 01-data-foundation/01-02
+    provides: makeSupabaseChain() Vitest mock pattern, vi.mock hoisting rules
+
+provides:
+  - Migration 013: document_processing_events table DDL with indexes and RLS
+  - analyticsService.recordProcessingEvent(): fire-and-forget void write to Supabase
+  - analyticsService.deleteProcessingEventsOlderThan(): retention delete returning row count
+  - Unit tests for both exports (6 tests)
+
+affects:
+  - 02-backend-services/02-02 (monitoring services)
+  - 02-backend-services/02-03 (health probe scheduler)
+  - 03-api (callers that instrument processing pipeline events)
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "Fire-and-forget analytics: void return type (not Promise<void>) prevents accidental await on critical path"
+    - "void supabase.from(...).insert(...).then(callback) pattern for non-blocking writes with error logging"
+    - "getSupabaseServiceClient() called per-method inside each exported function, never cached at module level"
+
+key-files:
+  created:
+    - backend/src/models/migrations/013_create_processing_events_table.sql
+    - backend/src/services/analyticsService.ts
+    - backend/src/__tests__/unit/analyticsService.test.ts
+  modified: []
+
+key-decisions:
+  - "recordProcessingEvent return type is void (not Promise<void>) — prevents callers from accidentally awaiting analytics writes on the critical processing path"
+  - "Optional fields (duration_ms, error_message, stage) coalesce to null in insert payload — consistent nullability in DB"
+  - "created_at set explicitly in insert payload (not relying on DB DEFAULT) so it matches the event occurrence time"
+
+patterns-established:
+  - "Analytics void function test: expect(result).toBeUndefined() + expect(typeof result).toBe('undefined') — toHaveProperty throws on undefined, use typeof check instead"
+  - "Fire-and-forget error path test: mock .insert().then() directly to control the resolved value, flush microtask queue with await Promise.resolve() before asserting logger call"
+
+requirements-completed: [ANLY-01, ANLY-03]
+
+# Metrics
+duration: 3min
+completed: 2026-02-24
+---
+
+# Phase 02 Plan 01: Analytics Service Summary
+
+**Fire-and-forget analyticsService with document_processing_events migration — void recordProcessingEvent that logs errors without throwing, and deleteProcessingEventsOlderThan returning row count**
+
+## Performance
+
+- **Duration:** 3 min
+- **Started:** 2026-02-24T19:21:16Z
+- **Completed:** 2026-02-24T19:24:17Z
+- **Tasks:** 2
+- **Files modified:** 3
+
+## Accomplishments
+
+- Migration 013 creates `document_processing_events` table with `event_type` CHECK constraint, indexes on `created_at` and `document_id`, and RLS enabled — follows migration 012 pattern exactly
+- `recordProcessingEvent()` has `void` return type (not `Promise<void>`) and uses `void supabase.from(...).insert(...).then(callback)` to ensure errors are logged but never thrown, never blocking the processing pipeline
+- `deleteProcessingEventsOlderThan()` computes cutoff via `Date.now() - days * 86400000`, deletes with `.lt('created_at', cutoff)`, returns `data.length` as row count
+- 6 unit tests covering all exports: insert payload, void return type, error swallowing + logging, null coalescing for optional fields, cutoff date math, and row count return
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create analytics migration and analyticsService** - `ef88541` (feat)
+2. **Task 2: Create analyticsService unit tests** - `cf30811` (test)
+
+**Plan metadata:** (docs commit to follow)
+
+## Files Created/Modified
+
+- `backend/src/models/migrations/013_create_processing_events_table.sql` - document_processing_events DDL with UUID PK, CHECK constraint on event_type, indexes on created_at + document_id, RLS enabled
+- `backend/src/services/analyticsService.ts` - recordProcessingEvent (void, fire-and-forget) and deleteProcessingEventsOlderThan (Promise<number>)
+- `backend/src/__tests__/unit/analyticsService.test.ts` - 6 unit tests using established makeSupabaseChain() pattern
+
+## Decisions Made
+
+- `recordProcessingEvent` return type is `void` (not `Promise<void>`) — the type system itself prevents accidental `await`, matching the architecture decision in STATE.md ("Analytics writes are always fire-and-forget")
+- Optional fields coalesce to `null` in the insert payload rather than omitting them — keeps the DB row shape consistent and predictable
+- `created_at` is set explicitly in the insert payload (not via DB DEFAULT) to accurately reflect event occurrence time rather than DB write time
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 1 - Bug] Fixed toHaveProperty assertion on undefined return value**
+- **Found during:** Task 2 (first test run)
+- **Issue:** `expect(result).not.toHaveProperty('then')` throws `TypeError: Cannot convert undefined or null to object` when `result` is `undefined` — Vitest's `toHaveProperty` cannot introspect `undefined`
+- **Fix:** Replaced with `expect(typeof result).toBe('undefined')` which correctly verifies the return is not a thenable without requiring the value to be an object
+- **Files modified:** `backend/src/__tests__/unit/analyticsService.test.ts`
+- **Verification:** All 6 tests pass after fix
+- **Committed in:** cf30811 (Task 2 commit)
+
+---
+
+**Total deviations:** 1 auto-fixed (Rule 1 — runtime error in test assertion)
+**Impact on plan:** Fix required for tests to run. The replacement assertion is semantically equivalent and more idiomatic for checking void returns.
+
+## Issues Encountered
+
+- Vitest `toHaveProperty` throws on `undefined`/`null` values rather than returning false — use `typeof result` checks when verifying void returns instead.
+
+## User Setup Required
+
+None - no external service configuration required.
+
+## Next Phase Readiness
+
+- `analyticsService` is ready for callers — import `recordProcessingEvent` from `'../services/analyticsService'` and call it without `await` at instrumentation points
+- Migration 013 SQL ready to run against Supabase (requires manual `psql` or Dashboard execution)
+- `makeSupabaseChain()` pattern from Phase 1 confirmed working for service-layer tests (not just model-layer tests)
+- Ready for Phase 2 plan 02: monitoring services that will call `recordProcessingEvent` during health probe lifecycle
+
+---
+*Phase: 02-backend-services*
+*Completed: 2026-02-24*
+
+## Self-Check: PASSED
+
+- FOUND: backend/src/models/migrations/013_create_processing_events_table.sql
+- FOUND: backend/src/services/analyticsService.ts
+- FOUND: backend/src/__tests__/unit/analyticsService.test.ts
+- FOUND: .planning/phases/02-backend-services/02-01-SUMMARY.md
+- FOUND commit ef88541 (Task 1: analytics migration + service)
+- FOUND commit cf30811 (Task 2: unit tests)
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-02-PLAN.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-02-PLAN.md
@@ -0,0 +1,176 @@
+---
+phase: 02-backend-services
+plan: 02
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - backend/package.json
+  - backend/src/services/healthProbeService.ts
+  - backend/src/__tests__/unit/healthProbeService.test.ts
+autonomous: true
+requirements: [HLTH-02, HLTH-04]
+
+must_haves:
+  truths:
+    - "Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken)"
+    - "Each probe returns a structured ProbeResult with service_name, status, latency_ms, and optional error_message"
+    - "Probe results are persisted to Supabase via HealthCheckModel.create()"
+    - "A single probe failure does not prevent other probes from running"
+    - "LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5"
+    - "Supabase probe uses getPostgresPool().query('SELECT 1'), not PostgREST client"
+  artifacts:
+    - path: "backend/src/services/healthProbeService.ts"
+      provides: "Health probe orchestrator with 4 individual probers"
+      exports: ["healthProbeService", "ProbeResult"]
+    - path: "backend/src/__tests__/unit/healthProbeService.test.ts"
+      provides: "Unit tests for all probes and orchestrator"
+      min_lines: 80
+  key_links:
+    - from: "backend/src/services/healthProbeService.ts"
+      to: "backend/src/models/HealthCheckModel.ts"
+      via: "HealthCheckModel.create() for persistence"
+      pattern: "HealthCheckModel\\.create"
+    - from: "backend/src/services/healthProbeService.ts"
+      to: "backend/src/config/supabase.ts"
+      via: "getPostgresPool() for Supabase probe"
+      pattern: "getPostgresPool"
+---
+
+<objective>
+Create the health probe service with four real API probers (Document AI, LLM, Supabase, Firebase Auth) and an orchestrator that runs all probes and persists results.
+
+Purpose: HLTH-02 requires real authenticated API calls (not config checks), and HLTH-04 requires results to persist to Supabase. This plan builds the probe logic and persistence layer.
+
+Output: healthProbeService.ts with 4 probers + runAllProbes orchestrator, and unit tests. Also installs nodemailer (needed by Plan 03).
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/02-backend-services/02-RESEARCH.md
+@.planning/phases/01-data-foundation/01-01-SUMMARY.md
+@backend/src/models/HealthCheckModel.ts
+@backend/src/config/supabase.ts
+@backend/src/services/documentAiProcessor.ts
+@backend/src/services/llmService.ts
+@backend/src/config/firebase.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Install nodemailer and create healthProbeService</name>
+  <files>
+    backend/package.json
+    backend/src/services/healthProbeService.ts
+  </files>
+  <action>
+**Step 1: Install nodemailer** (needed by Plan 03, installing now to avoid package.json conflicts in parallel execution):
+```bash
+cd backend && npm install nodemailer && npm install --save-dev @types/nodemailer
+```
+
+**Step 2: Create healthProbeService.ts** with the following structure:
+
+Export a `ProbeResult` interface:
+```typescript
+export interface ProbeResult {
+  service_name: string;
+  status: 'healthy' | 'degraded' | 'down';
+  latency_ms: number;
+  error_message?: string;
+  probe_details?: Record<string, unknown>;
+}
+```
+
+Create 4 individual probe functions (all private/unexported):
+
+1. **probeDocumentAI()**: Import `DocumentProcessorServiceClient` from `@google-cloud/documentai`. Call `client.listProcessors({ parent: ... })` using the project ID from config. Latency > 2000ms = 'degraded'. Catch errors = 'down' with error_message.
+
+2. **probeLLM()**: Import `Anthropic` from `@anthropic-ai/sdk`. Create client with `process.env.ANTHROPIC_API_KEY`. Call `client.messages.create({ model: 'claude-haiku-4-5', max_tokens: 5, messages: [{ role: 'user', content: 'Hi' }] })`. Use cheapest model (PITFALL B prevention). Latency > 5000ms = 'degraded'. 429 errors = 'degraded' (rate limit, not down). Other errors = 'down'.
+
+3. **probeSupabase()**: Import `getPostgresPool` from `'../config/supabase'`. Call `pool.query('SELECT 1')`. Use direct PostgreSQL, NOT PostgREST (PITFALL C prevention). Latency > 2000ms = 'degraded'. Errors = 'down'.
+
+4. **probeFirebaseAuth()**: Import `admin` from `firebase-admin` (or use the existing firebase config). Call `admin.auth().verifyIdToken('invalid-token-probe-check')`. This ALWAYS throws. If error message contains 'argument' or 'INVALID' = 'healthy' (SDK is alive). Other errors = 'down'.
+
+Create `runAllProbes()` as the orchestrator:
+- Wrap each probe in individual try/catch (PITFALL E: one probe failure must not stop others)
+- For each ProbeResult, call `HealthCheckModel.create({ service_name, status, latency_ms, error_message, probe_details, checked_at: new Date().toISOString() })`
+- Return array of all ProbeResults
+- Log summary via Winston logger
+
+Export as object: `export const healthProbeService = { runAllProbes }`.
+
+Use Winston logger for all logging. Use `getSupabaseServiceClient()` per-method pattern for any Supabase calls (though probes use `getPostgresPool()` directly for the Supabase probe).
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
+    <manual>Verify healthProbeService.ts exists with runAllProbes and ProbeResult exports</manual>
+  </verify>
+  <done>nodemailer installed. healthProbeService.ts exports ProbeResult interface and healthProbeService object with runAllProbes(). Four probes make real API calls. Each probe wrapped in try/catch. Results persisted via HealthCheckModel.create(). TypeScript compiles.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create healthProbeService unit tests</name>
+  <files>
+    backend/src/__tests__/unit/healthProbeService.test.ts
+  </files>
+  <action>
+Create `backend/src/__tests__/unit/healthProbeService.test.ts` using the established Vitest mock pattern.
+
+Mock all external dependencies:
+- `vi.mock('../../models/HealthCheckModel')` — mock `create()` to resolve successfully
+- `vi.mock('../../config/supabase')` — mock `getPostgresPool()` returning `{ query: vi.fn() }`
+- `vi.mock('@google-cloud/documentai')` — mock `DocumentProcessorServiceClient` with `listProcessors` resolving
+- `vi.mock('@anthropic-ai/sdk')` — mock `Anthropic` constructor, `messages.create` resolving
+- `vi.mock('firebase-admin')` — mock `auth().verifyIdToken()` throwing expected error
+- `vi.mock('../../utils/logger')` — mock logger
+
+Test cases for `runAllProbes`:
+1. **All probes healthy — returns 4 ProbeResults with status 'healthy'** — all mocks resolve quickly, verify 4 results returned with status 'healthy'
+2. **Each result persisted via HealthCheckModel.create** — verify `HealthCheckModel.create` called 4 times with correct service_name values: 'document_ai', 'llm_api', 'supabase', 'firebase_auth'
+3. **One probe throws — others still run** — make Document AI mock throw, verify 3 other probes still complete and all 4 HealthCheckModel.create calls happen (the failed probe creates a 'down' result)
+4. **LLM probe 429 error returns 'degraded' not 'down'** — make Anthropic mock throw error with '429' in message, verify result status is 'degraded'
+5. **Supabase probe uses getPostgresPool not getSupabaseServiceClient** — verify `getPostgresPool` was called (not getSupabaseServiceClient) during Supabase probe
+6. **Firebase Auth probe — expected error = healthy** — mock verifyIdToken throwing 'Decoding Firebase ID token failed' (argument error), verify status is 'healthy'
+7. **Firebase Auth probe — unexpected error = down** — mock verifyIdToken throwing network error, verify status is 'down'
+8. **Latency measured correctly** — use `vi.useFakeTimers()` or verify `latency_ms` is a non-negative number
+
+Use `beforeEach(() => vi.clearAllMocks())`.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/healthProbeService.test.ts --reporter=verbose 2>&1</automated>
+  </verify>
+  <done>All healthProbeService tests pass. Probes verified as making real API calls (mocked). Orchestrator verified as fault-tolerant (one probe failure doesn't stop others). Results verified as persisted via HealthCheckModel.create(). Supabase probe uses getPostgresPool, not PostgREST.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npm ls nodemailer` shows nodemailer installed
+2. `npx tsc --noEmit` passes
+3. `npx vitest run src/__tests__/unit/healthProbeService.test.ts` — all tests pass
+4. healthProbeService.ts does NOT use getSupabaseServiceClient for the Supabase probe (uses getPostgresPool)
+5. LLM probe uses 'claude-haiku-4-5' not an expensive model
+</verification>
+
+<success_criteria>
+- nodemailer and @types/nodemailer installed in backend/package.json
+- healthProbeService exports ProbeResult and healthProbeService.runAllProbes
+- 4 probes: document_ai, llm_api, supabase, firebase_auth
+- Each probe returns structured ProbeResult with status/latency_ms/error_message
+- Probe results persisted via HealthCheckModel.create()
+- Individual probe failures isolated (other probes still run)
+- All unit tests pass
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/02-backend-services/02-02-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-02-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-02-SUMMARY.md
@@ -0,0 +1,122 @@
+---
+phase: 02-backend-services
+plan: 02
+subsystem: infra
+tags: [health-probes, document-ai, anthropic, firebase-auth, postgres, vitest, nodemailer]
+
+# Dependency graph
+requires:
+  - phase: 01-data-foundation
+    provides: HealthCheckModel.create() for persistence
+  - phase: 02-backend-services
+    plan: 01
+    provides: Schema and model layer for service_health_checks table
+
+provides:
+  - healthProbeService with 4 real API probers (document_ai, llm_api, supabase, firebase_auth)
+  - ProbeResult interface exported for use by health endpoint
+  - runAllProbes orchestrator with fault-tolerant probe isolation
+  - nodemailer installed (needed by Plan 03 alert notifications)
+
+affects: [02-backend-services, 02-03-PLAN]
+
+# Tech tracking
+tech-stack:
+  added: [nodemailer@8.0.1, @types/nodemailer]
+  patterns:
+    - Promise.allSettled for fault-tolerant concurrent probe orchestration
+    - firebase-admin verifyIdToken probe distinguishes expected vs unexpected errors
+    - Direct PostgreSQL pool (getPostgresPool) for Supabase probe, not PostgREST
+    - LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5
+
+key-files:
+  created:
+    - backend/src/services/healthProbeService.ts
+    - backend/src/__tests__/unit/healthProbeService.test.ts
+  modified:
+    - backend/package.json (nodemailer + @types/nodemailer added)
+
+key-decisions:
+  - "LLM probe uses claude-haiku-4-5 with max_tokens 5 (cheapest available, prevents expensive accidental probes)"
+  - "Supabase probe uses getPostgresPool().query('SELECT 1') not PostgREST client (bypasses caching/middleware)"
+  - "Firebase Auth probe uses verifyIdToken('invalid-token') — always throws, distinguished by error message content"
+  - "Promise.allSettled chosen over Promise.all to guarantee all probes run even if one throws outside try/catch"
+  - "HealthCheckModel.create failure per probe is swallowed with logger.error — probe results still returned to caller"
+
+patterns-established:
+  - "Probe pattern: record start time, try real API call, compute latency, return ProbeResult with status/latency_ms/error_message"
+  - "Firebase SDK probe: verifyIdToken always throws; 'argument'/'INVALID'/'Decoding' in message = SDK alive = healthy"
+  - "429 rate limit errors = degraded (not down) — service is alive but throttling"
+  - "vi.mock with inline vi.fn() in factory — no outer variable references (Vitest hoisting TDZ safe)"
+
+requirements-completed: [HLTH-02, HLTH-04]
+
+# Metrics
+duration: 18min
+completed: 2026-02-24
+---
+
+# Phase 02 Plan 02: Health Probe Service Summary
+
+**Four real authenticated API probers (Document AI, LLM claude-haiku-4-5, Supabase pg pool, Firebase Auth) with fault-tolerant orchestrator and Supabase persistence via HealthCheckModel**
+
+## Performance
+
+- **Duration:** 18 min
+- **Started:** 2026-02-24T14:05:00Z
+- **Completed:** 2026-02-24T14:23:55Z
+- **Tasks:** 2
+- **Files modified:** 4
+
+## Accomplishments
+
+- Created `healthProbeService.ts` with 4 individual probers each making real authenticated API calls
+- Implemented `runAllProbes` orchestrator using `Promise.allSettled` for fault isolation (one probe failure never blocks others)
+- Each probe result persisted to Supabase via `HealthCheckModel.create()` after completion
+- 9 unit tests covering all probers, fault tolerance, 429 degraded handling, Supabase pool verification, and Firebase error discrimination
+- Installed nodemailer (needed by Plan 03 alert notifications) to avoid package.json conflicts in parallel execution
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Install nodemailer and create healthProbeService** - `4129826` (feat)
+2. **Task 2: Create healthProbeService unit tests** - `a8ba884` (test)
+
+**Plan metadata:** (docs commit — created below)
+
+## Files Created/Modified
+
+- `backend/src/services/healthProbeService.ts` - Health probe orchestrator with ProbeResult interface and 4 individual probers
+- `backend/src/__tests__/unit/healthProbeService.test.ts` - 9 unit tests covering all probers and orchestrator
+- `backend/package.json` - nodemailer + @types/nodemailer added
+
+## Decisions Made
+
+- LLM probe uses `claude-haiku-4-5` with `max_tokens: 5` — cheapest Anthropic model prevents accidental expensive probe calls
+- Supabase probe uses `getPostgresPool().query('SELECT 1')` — bypasses PostgREST middleware/caching, tests actual DB connectivity
+- Firebase Auth probe strategy: `verifyIdToken('invalid-token-probe-check')` always throws; error message containing 'argument', 'INVALID', or 'Decoding' = SDK functioning = 'healthy'
+- `Promise.allSettled` over `Promise.all` — guarantees all 4 probes run even if one rejects outside its own try/catch
+- Per-probe persistence failure is swallowed (logger.error only) so probe results are still returned to caller
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+
+None — all probes compiled and tested cleanly on first implementation.
+
+## User Setup Required
+
+None - no external service configuration required beyond what's already in .env.
+
+## Next Phase Readiness
+
+- `healthProbeService.runAllProbes()` is ready to be called by the health scheduler (Plan 03)
+- `nodemailer` is installed and ready for Plan 03 alert notification service
+- `ProbeResult` interface exported and ready for use in health status API endpoints
+
+---
+*Phase: 02-backend-services*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-03-PLAN.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-03-PLAN.md
@@ -0,0 +1,182 @@
+---
+phase: 02-backend-services
+plan: 03
+type: execute
+wave: 2
+depends_on: [02-02]
+files_modified:
+  - backend/src/services/alertService.ts
+  - backend/src/__tests__/unit/alertService.test.ts
+autonomous: true
+requirements: [ALRT-01, ALRT-02, ALRT-04]
+
+must_haves:
+  truths:
+    - "An alert email is sent when a probe returns 'degraded' or 'down'"
+    - "A second probe failure within the cooldown period does NOT send a duplicate email"
+    - "Alert recipient is read from process.env.EMAIL_WEEKLY_RECIPIENT, never hardcoded"
+    - "Email failure does not throw or break the probe pipeline"
+    - "Nodemailer transporter is created inside the function call, not at module level (Firebase Secret timing)"
+    - "An alert_events row is created before sending the email"
+  artifacts:
+    - path: "backend/src/services/alertService.ts"
+      provides: "Alert deduplication, email sending, and alert event creation"
+      exports: ["alertService"]
+    - path: "backend/src/__tests__/unit/alertService.test.ts"
+      provides: "Unit tests for alert deduplication, email sending, recipient config"
+      min_lines: 80
+  key_links:
+    - from: "backend/src/services/alertService.ts"
+      to: "backend/src/models/AlertEventModel.ts"
+      via: "findRecentByService() for deduplication, create() for alert row"
+      pattern: "AlertEventModel\\.(findRecentByService|create)"
+    - from: "backend/src/services/alertService.ts"
+      to: "nodemailer"
+      via: "createTransport + sendMail for email delivery"
+      pattern: "nodemailer\\.createTransport"
+    - from: "backend/src/services/alertService.ts"
+      to: "process.env.EMAIL_WEEKLY_RECIPIENT"
+      via: "Config-based recipient (ALRT-04)"
+      pattern: "process\\.env\\.EMAIL_WEEKLY_RECIPIENT"
+---
+
+<objective>
+Create the alert service with deduplication logic, SMTP email sending via nodemailer, and config-based recipient.
+
+Purpose: ALRT-01 requires email alerts on service degradation/failure. ALRT-02 requires deduplication with cooldown. ALRT-04 requires the recipient to come from configuration, not hardcoded source code.
+
+Output: alertService.ts with evaluateAndAlert() and sendAlertEmail(), and unit tests.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/02-backend-services/02-RESEARCH.md
+@.planning/phases/01-data-foundation/01-01-SUMMARY.md
+@.planning/phases/02-backend-services/02-02-PLAN.md
+@backend/src/models/AlertEventModel.ts
+@backend/src/index.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create alertService with deduplication and email</name>
+  <files>
+    backend/src/services/alertService.ts
+  </files>
+  <action>
+Create `backend/src/services/alertService.ts` with the following structure:
+
+**Import the ProbeResult type** from `'./healthProbeService'` (created in Plan 02).
+
+**Constants:**
+- `ALERT_COOLDOWN_MINUTES = parseInt(process.env.ALERT_COOLDOWN_MINUTES ?? '60', 10)` — configurable cooldown window
+
+**Private function `createTransporter()`:**
+Create nodemailer transporter INSIDE function scope (not module level — PITFALL A: Firebase Secrets not available at module load). Read SMTP config from `process.env`:
+- `host`: `process.env.EMAIL_HOST ?? 'smtp.gmail.com'`
+- `port`: `parseInt(process.env.EMAIL_PORT ?? '587', 10)`
+- `secure`: `process.env.EMAIL_SECURE === 'true'`
+- `auth.user`: `process.env.EMAIL_USER`
+- `auth.pass`: `process.env.EMAIL_PASS`
+
+**Private function `sendAlertEmail(serviceName, alertType, message)`:**
+- Read recipient from `process.env.EMAIL_WEEKLY_RECIPIENT` (ALRT-04: NEVER hardcode the email address)
+- If no recipient configured, log warning and return (do not throw)
+- Call `createTransporter()` then `transporter.sendMail({ from, to, subject, text, html })`
+- Subject format: `[CIM Summary] Alert: ${serviceName} — ${alertType}`
+- Wrap in try/catch — email failure logs error but does NOT throw (email failure must not break probe pipeline)
+
+**Exported function `evaluateAndAlert(probeResults: ProbeResult[])`:**
+For each ProbeResult where status is 'degraded' or 'down':
+1. Map status to alert_type: 'down' -> 'service_down', 'degraded' -> 'service_degraded'
+2. Call `AlertEventModel.findRecentByService(service_name, alert_type, ALERT_COOLDOWN_MINUTES)`
+3. If recent alert exists within cooldown, log suppression and skip BOTH row creation AND email (PITFALL 3: prevent alert storms)
+4. If no recent alert, create alert_events row via `AlertEventModel.create({ service_name, alert_type, message: error_message or status description })`
+5. Then send email via `sendAlertEmail()`
+
+Export as: `export const alertService = { evaluateAndAlert }`.
+
+Use Winston logger for all logging. Use `import { AlertEventModel } from '../models/AlertEventModel'`.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
+    <manual>Verify alertService.ts exports alertService.evaluateAndAlert. Verify no hardcoded email addresses in source.</manual>
+  </verify>
+  <done>alertService.ts exports evaluateAndAlert(). Deduplication checks AlertEventModel.findRecentByService() before creating rows or sending email. Recipient read from process.env.EMAIL_WEEKLY_RECIPIENT. Transporter created lazily. Email failures caught and logged. TypeScript compiles.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create alertService unit tests</name>
+  <files>
+    backend/src/__tests__/unit/alertService.test.ts
+  </files>
+  <action>
+Create `backend/src/__tests__/unit/alertService.test.ts` using the Vitest mock pattern.
+
+Mock dependencies:
+- `vi.mock('../../models/AlertEventModel')` — mock `findRecentByService` and `create`
+- `vi.mock('nodemailer')` — mock `createTransport` returning `{ sendMail: vi.fn().mockResolvedValue({}) }`
+- `vi.mock('../../utils/logger')` — mock logger
+
+Create test ProbeResult fixtures:
+- `healthyProbe: { service_name: 'supabase', status: 'healthy', latency_ms: 50 }`
+- `downProbe: { service_name: 'document_ai', status: 'down', latency_ms: 0, error_message: 'Connection refused' }`
+- `degradedProbe: { service_name: 'llm_api', status: 'degraded', latency_ms: 6000 }`
+
+Test cases:
+
+1. **Healthy probes — no alerts sent** — pass array of healthy ProbeResults, verify AlertEventModel.findRecentByService NOT called, sendMail NOT called
+
+2. **Down probe — creates alert_events row and sends email** — pass downProbe, mock findRecentByService returning null (no recent alert), verify AlertEventModel.create called with service_name='document_ai' and alert_type='service_down', verify sendMail called
+
+3. **Degraded probe — creates alert with type 'service_degraded'** — pass degradedProbe, mock findRecentByService returning null, verify AlertEventModel.create called with alert_type='service_degraded'
+
+4. **Deduplication — suppresses within cooldown** — pass downProbe, mock findRecentByService returning an existing alert object (non-null), verify AlertEventModel.create NOT called, sendMail NOT called, logger.info called with 'suppress' in message
+
+5. **Recipient from env — reads process.env.EMAIL_WEEKLY_RECIPIENT** — set `process.env.EMAIL_WEEKLY_RECIPIENT = 'test@example.com'`, pass downProbe with no recent alert, verify sendMail called with `to: 'test@example.com'`
+
+6. **No recipient configured — skips email but still creates alert row** — delete process.env.EMAIL_WEEKLY_RECIPIENT, pass downProbe with no recent alert, verify AlertEventModel.create IS called, sendMail NOT called, logger.warn called
+
+7. **Email failure — does not throw** — mock sendMail to reject, verify evaluateAndAlert does not throw, verify logger.error called
+
+8. **Multiple probes — processes each independently** — pass [downProbe, degradedProbe, healthyProbe], verify findRecentByService called twice (for down and degraded, not for healthy)
+
+Use `beforeEach(() => { vi.clearAllMocks(); process.env.EMAIL_WEEKLY_RECIPIENT = 'admin@test.com'; })`.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/alertService.test.ts --reporter=verbose 2>&1</automated>
+  </verify>
+  <done>All alertService tests pass. Deduplication verified (suppresses within cooldown). Email sending verified with config-based recipient. Email failure verified as non-throwing. Multiple probe evaluation verified.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npx tsc --noEmit` passes
+2. `npx vitest run src/__tests__/unit/alertService.test.ts` — all tests pass
+3. `grep -r 'jpressnell\|bluepoint' backend/src/services/alertService.ts` returns nothing (no hardcoded emails)
+4. alertService reads recipient from `process.env.EMAIL_WEEKLY_RECIPIENT`
+</verification>
+
+<success_criteria>
+- alertService exports evaluateAndAlert(probeResults)
+- Deduplication uses AlertEventModel.findRecentByService with configurable cooldown
+- Alert rows created via AlertEventModel.create before email send
+- Suppressed alerts skip BOTH row creation AND email
+- Recipient from process.env, never hardcoded
+- Transporter created inside function, not at module level
+- Email failures caught and logged, never thrown
+- All unit tests pass
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/02-backend-services/02-03-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-03-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-03-SUMMARY.md
@@ -0,0 +1,124 @@
+---
+phase: 02-backend-services
+plan: 03
+subsystem: infra
+tags: [nodemailer, smtp, alerting, deduplication, email, vitest]
+
+# Dependency graph
+requires:
+  - phase: 02-backend-services
+    provides: "AlertEventModel with findRecentByService() and create() for deduplication"
+  - phase: 02-backend-services
+    provides: "ProbeResult type from healthProbeService for alert evaluation"
+provides:
+  - "alertService with evaluateAndAlert(probeResults) — deduplication, row creation, email send"
+  - "SMTP email via nodemailer with lazy transporter (Firebase Secret timing safe)"
+  - "Config-based recipient via process.env.EMAIL_WEEKLY_RECIPIENT (never hardcoded)"
+  - "8 unit tests covering all alert scenarios and edge cases"
+affects: [02-04-scheduler, 03-api]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "Lazy transporter pattern: nodemailer.createTransport() called inside function, not at module level (Firebase Secret timing)"
+    - "Alert deduplication: findRecentByService() cooldown check before row creation AND email"
+    - "Non-throwing email: catch email errors, log them, never re-throw (probe pipeline safety)"
+    - "vi.mock factories with inline vi.fn() only — no outer variable references to avoid TDZ hoisting"
+
+key-files:
+  created:
+    - backend/src/services/alertService.ts
+    - backend/src/__tests__/unit/alertService.test.ts
+  modified: []
+
+key-decisions:
+  - "Transporter created inside sendAlertEmail() on each call — not at module level — avoids Firebase Secret not-yet-available error (PITFALL A)"
+  - "Suppressed alerts skip BOTH AlertEventModel.create() AND sendMail — prevents duplicate DB rows in addition to duplicate emails"
+  - "Email failure caught in try/catch and logged via logger.error — never re-thrown so probe pipeline continues"
+
+patterns-established:
+  - "Alert deduplication pattern: check findRecentByService before creating row or sending email"
+  - "Non-throwing side effects: email, analytics, and similar fire-and-forget paths must never throw"
+
+requirements-completed: [ALRT-01, ALRT-02, ALRT-04]
+
+# Metrics
+duration: 12min
+completed: 2026-02-24
+---
+
+# Phase 2 Plan 03: Alert Service Summary
+
+**Nodemailer SMTP alert service with cooldown deduplication via AlertEventModel, config-based recipient, and lazy transporter pattern for Firebase Secret compatibility**
+
+## Performance
+
+- **Duration:** 12 min
+- **Started:** 2026-02-24T19:27:42Z
+- **Completed:** 2026-02-24T19:39:30Z
+- **Tasks:** 2
+- **Files modified:** 2
+
+## Accomplishments
+
+- `alertService.evaluateAndAlert()` evaluates ProbeResults and sends email alerts for degraded/down services
+- Deduplication via `AlertEventModel.findRecentByService()` with configurable `ALERT_COOLDOWN_MINUTES` env var
+- Email recipient read from `process.env.EMAIL_WEEKLY_RECIPIENT` — never hardcoded (ALRT-04)
+- Lazy transporter pattern: `nodemailer.createTransport()` called inside `sendAlertEmail()` function (Firebase Secret timing fix)
+- 8 unit tests cover all alert scenarios: healthy skip, down/degraded alerts, deduplication, recipient config, missing recipient, email failure, and multi-probe processing
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create alertService with deduplication and email** - `91f609c` (feat)
+2. **Task 2: Create alertService unit tests** - `4b5afe2` (test)
+
+**Plan metadata:** `0acacd1` (docs: complete alertService plan)
+
+## Files Created/Modified
+
+- `backend/src/services/alertService.ts` - Alert evaluation, deduplication, and email delivery
+- `backend/src/__tests__/unit/alertService.test.ts` - 8 unit tests, all passing
+
+## Decisions Made
+
+- **Lazy transporter:** `nodemailer.createTransport()` called inside `sendAlertEmail()` on each call, not cached at module level. This is required because Firebase Secrets (`EMAIL_PASS`) are not injected into `process.env` at module load time — only when the function is invoked.
+- **Suppress both row and email:** When `findRecentByService()` returns a non-null alert, both `AlertEventModel.create()` and `sendMail` are skipped. This prevents duplicate DB rows in the alert_events table in addition to preventing duplicate emails.
+- **Non-throwing email path:** Email send failures are caught in try/catch and logged via `logger.error`. The function never re-throws, so email outages cannot break the health probe pipeline.
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 3 - Blocking] Restructured nodemailer mock to avoid Vitest TDZ hoisting error**
+- **Found during:** Task 2 (alertService unit tests)
+- **Issue:** Test file declared `const mockSendMail = vi.fn()` outside the `vi.mock()` factory and referenced it inside. Because `vi.mock()` is hoisted to the top of the file, `mockSendMail` was accessed before initialization, causing `ReferenceError: Cannot access 'mockSendMail' before initialization`
+- **Fix:** Removed the outer `mockSendMail` variable. The nodemailer mock factory uses only inline `vi.fn()` calls. Tests access the mock's `sendMail` via `vi.mocked(nodemailer.createTransport).mock.results[0].value` through a `getMockSendMail()` helper. This is consistent with the project decision: "vi.mock() factories must use only inline vi.fn() to avoid Vitest hoisting TDZ errors" (established in 01-02)
+- **Files modified:** `backend/src/__tests__/unit/alertService.test.ts`
+- **Verification:** All 8 tests pass after fix
+- **Committed in:** `4b5afe2` (Task 2 commit)
+
+---
+
+**Total deviations:** 1 auto-fixed (1 blocking — Vitest TDZ hoisting)
+**Impact on plan:** Required fix for tests to run. No scope creep. Consistent with established project pattern from 01-02.
+
+## Issues Encountered
+
+None beyond the auto-fixed TDZ hoisting issue above.
+
+## User Setup Required
+
+None - no external service configuration required beyond the existing email env vars (`EMAIL_HOST`, `EMAIL_PORT`, `EMAIL_SECURE`, `EMAIL_USER`, `EMAIL_PASS`, `EMAIL_WEEKLY_RECIPIENT`, `ALERT_COOLDOWN_MINUTES`) documented in prior research.
+
+## Next Phase Readiness
+
+- `alertService.evaluateAndAlert()` ready to be called from the health probe scheduler (Plan 02-04)
+- All 3 alert requirements satisfied: ALRT-01 (email on degraded/down), ALRT-02 (cooldown deduplication), ALRT-04 (recipient from config)
+- No blockers for Phase 2 Plan 04 (scheduler)
+
+---
+*Phase: 02-backend-services*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-04-PLAN.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-04-PLAN.md
@@ -0,0 +1,197 @@
+---
+phase: 02-backend-services
+plan: 04
+type: execute
+wave: 3
+depends_on: [02-01, 02-02, 02-03]
+files_modified:
+  - backend/src/index.ts
+autonomous: true
+requirements: [HLTH-03, INFR-03]
+
+must_haves:
+  truths:
+    - "runHealthProbes Cloud Function export runs on 'every 5 minutes' schedule, completely separate from processDocumentJobs"
+    - "runRetentionCleanup Cloud Function export runs on 'every monday 02:00' schedule"
+    - "runHealthProbes calls healthProbeService.runAllProbes() and then alertService.evaluateAndAlert()"
+    - "runRetentionCleanup deletes from service_health_checks, alert_events, and document_processing_events older than 30 days"
+    - "Both exports list required Firebase secrets in their secrets array"
+    - "Both exports use dynamic import() pattern (same as processDocumentJobs)"
+  artifacts:
+    - path: "backend/src/index.ts"
+      provides: "Two new onSchedule Cloud Function exports"
+      exports: ["runHealthProbes", "runRetentionCleanup"]
+  key_links:
+    - from: "backend/src/index.ts (runHealthProbes)"
+      to: "backend/src/services/healthProbeService.ts"
+      via: "dynamic import('./services/healthProbeService')"
+      pattern: "import\\('./services/healthProbeService'\\)"
+    - from: "backend/src/index.ts (runHealthProbes)"
+      to: "backend/src/services/alertService.ts"
+      via: "dynamic import('./services/alertService')"
+      pattern: "import\\('./services/alertService'\\)"
+    - from: "backend/src/index.ts (runRetentionCleanup)"
+      to: "backend/src/models/HealthCheckModel.ts"
+      via: "dynamic import for deleteOlderThan(30)"
+      pattern: "HealthCheckModel\\.deleteOlderThan"
+    - from: "backend/src/index.ts (runRetentionCleanup)"
+      to: "backend/src/services/analyticsService.ts"
+      via: "dynamic import for deleteProcessingEventsOlderThan(30)"
+      pattern: "deleteProcessingEventsOlderThan"
+---
+
+<objective>
+Add two new Firebase Cloud Function scheduled exports to index.ts: runHealthProbes (every 5 minutes) and runRetentionCleanup (weekly).
+
+Purpose: HLTH-03 requires health probes to run on a schedule separate from document processing (PITFALL-2). INFR-03 requires 30-day rolling data retention cleanup on schedule.
+
+Output: Two new onSchedule exports in backend/src/index.ts.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/02-backend-services/02-RESEARCH.md
+@.planning/phases/02-backend-services/02-01-PLAN.md
+@.planning/phases/02-backend-services/02-02-PLAN.md
+@.planning/phases/02-backend-services/02-03-PLAN.md
+@backend/src/index.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add runHealthProbes scheduled Cloud Function export</name>
+  <files>
+    backend/src/index.ts
+  </files>
+  <action>
+Add a new `onSchedule` export to `backend/src/index.ts` AFTER the existing `processDocumentJobs` export. Follow the exact same pattern as `processDocumentJobs`.
+
+```typescript
+// Health probe scheduler — separate from document processing (PITFALL-2, HLTH-03)
+export const runHealthProbes = onSchedule({
+  schedule: 'every 5 minutes',
+  timeoutSeconds: 60,
+  memory: '256MiB',
+  retryCount: 0,  // Probes should not retry — they run again in 5 minutes anyway
+  secrets: [
+    anthropicApiKey,    // for LLM probe
+    openaiApiKey,       // for OpenAI probe fallback
+    databaseUrl,        // for Supabase probe
+    supabaseServiceKey,
+    supabaseAnonKey,
+  ],
+}, async (_event) => {
+  const { healthProbeService } = await import('./services/healthProbeService');
+  const { alertService } = await import('./services/alertService');
+
+  const results = await healthProbeService.runAllProbes();
+  await alertService.evaluateAndAlert(results);
+
+  logger.info('runHealthProbes: complete', {
+    probeCount: results.length,
+    statuses: results.map(r => ({ service: r.service_name, status: r.status })),
+  });
+});
+```
+
+Key requirements:
+- Use dynamic `import()` (not static import at top of file) — same pattern as processDocumentJobs
+- List ALL secrets that probes need in the `secrets` array (Firebase Secrets must be explicitly listed per function)
+- Use the existing `anthropicApiKey`, `openaiApiKey`, `databaseUrl`, `supabaseServiceKey`, `supabaseAnonKey` variables already defined via `defineSecret` at the top of index.ts
+- Set `retryCount: 0` — probes run every 5 minutes, no need to retry failures
+- First call `runAllProbes()` to measure and persist, then `evaluateAndAlert()` to check for alerts
+- Log a summary with probe count and statuses
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
+    <manual>Verify index.ts has `export const runHealthProbes` as a separate export from processDocumentJobs</manual>
+  </verify>
+  <done>runHealthProbes export added to index.ts. Runs every 5 minutes. Calls healthProbeService.runAllProbes() then alertService.evaluateAndAlert(). Uses dynamic imports. Lists all required secrets. TypeScript compiles.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Add runRetentionCleanup scheduled Cloud Function export</name>
+  <files>
+    backend/src/index.ts
+  </files>
+  <action>
+Add a second `onSchedule` export to `backend/src/index.ts` AFTER runHealthProbes.
+
+```typescript
+// Retention cleanup — weekly, separate from document processing (PITFALL-7, INFR-03)
+export const runRetentionCleanup = onSchedule({
+  schedule: 'every monday 02:00',
+  timeoutSeconds: 120,
+  memory: '256MiB',
+  secrets: [databaseUrl, supabaseServiceKey, supabaseAnonKey],
+}, async (_event) => {
+  const { HealthCheckModel } = await import('./models/HealthCheckModel');
+  const { AlertEventModel } = await import('./models/AlertEventModel');
+  const { deleteProcessingEventsOlderThan } = await import('./services/analyticsService');
+
+  const RETENTION_DAYS = 30;
+
+  const [hcCount, alertCount, eventCount] = await Promise.all([
+    HealthCheckModel.deleteOlderThan(RETENTION_DAYS),
+    AlertEventModel.deleteOlderThan(RETENTION_DAYS),
+    deleteProcessingEventsOlderThan(RETENTION_DAYS),
+  ]);
+
+  logger.info('runRetentionCleanup: complete', {
+    retentionDays: RETENTION_DAYS,
+    deletedHealthChecks: hcCount,
+    deletedAlerts: alertCount,
+    deletedProcessingEvents: eventCount,
+  });
+});
+```
+
+Key requirements:
+- Use dynamic `import()` for all model and service imports
+- Run all three deletes in parallel with `Promise.all()` (they touch different tables)
+- Only include the secrets needed for Supabase access (no LLM keys needed for cleanup)
+- Set `timeoutSeconds: 120` (cleanup may take longer than probes)
+- The 30-day retention period is a constant, not configurable via env (matches INFR-03 spec)
+- Only manage monitoring tables: service_health_checks, alert_events, document_processing_events. Do NOT delete from performance_metrics, session_events, or execution_events (those are agentic RAG tables, out of scope per research Open Question 4)
+- Log the count of deleted rows from each table
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
+    <manual>Verify index.ts has `export const runRetentionCleanup` as a separate export. Verify it calls deleteOlderThan on all three tables.</manual>
+  </verify>
+  <done>runRetentionCleanup export added to index.ts. Runs weekly Monday 02:00. Deletes from service_health_checks, alert_events, and document_processing_events older than 30 days. Uses Promise.all for parallel execution. Logs deletion counts. TypeScript compiles.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npx tsc --noEmit` passes
+2. `grep 'export const runHealthProbes' backend/src/index.ts` returns a match
+3. `grep 'export const runRetentionCleanup' backend/src/index.ts` returns a match
+4. Both exports use `onSchedule` (not piggybacked on processDocumentJobs — PITFALL-2 compliance)
+5. Both exports use dynamic `import()` pattern
+6. Full test suite still passes: `npx vitest run --reporter=verbose`
+</verification>
+
+<success_criteria>
+- runHealthProbes is a separate onSchedule export running every 5 minutes
+- runRetentionCleanup is a separate onSchedule export running weekly Monday 02:00
+- Both are completely decoupled from processDocumentJobs
+- runHealthProbes calls runAllProbes() then evaluateAndAlert()
+- runRetentionCleanup calls deleteOlderThan(30) on all three monitoring tables
+- All required Firebase secrets listed in each function's secrets array
+- TypeScript compiles with no errors
+- Existing test suite passes with no regressions
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/02-backend-services/02-04-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-04-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-04-SUMMARY.md
@@ -0,0 +1,101 @@
+---
+phase: 02-backend-services
+plan: 04
+subsystem: infra
+tags: [firebase-functions, cloud-scheduler, health-probes, retention-cleanup, onSchedule]
+
+# Dependency graph
+requires:
+  - phase: 02-backend-services
+    provides: healthProbeService.runAllProbes(), alertService.evaluateAndAlert(), HealthCheckModel.deleteOlderThan(), AlertEventModel.deleteOlderThan(), deleteProcessingEventsOlderThan()
+provides:
+  - runHealthProbes Cloud Function export (every 5 minutes, separate from processDocumentJobs)
+  - runRetentionCleanup Cloud Function export (weekly Monday 02:00, 30-day rolling deletion)
+affects: [03-api-layer, 04-frontend, phase-03, phase-04]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "onSchedule Cloud Functions use dynamic import() to avoid cold-start overhead and module-level secret access"
+    - "Health probes as separate named Cloud Function — never piggybacked on processDocumentJobs (PITFALL-2)"
+    - "retryCount: 0 for health probes — 5-minute schedule makes retries unnecessary"
+    - "Promise.all() for parallel multi-table retention cleanup"
+
+key-files:
+  created: []
+  modified:
+    - backend/src/index.ts
+
+key-decisions:
+  - "runHealthProbes is completely separate from processDocumentJobs — distinct Cloud Function, distinct schedule (PITFALL-2 compliance)"
+  - "retryCount: 0 on runHealthProbes — probes recur every 5 minutes, retry would create confusing duplicate results"
+  - "runRetentionCleanup uses Promise.all() for parallel deletes — three tables are independent, no ordering constraint"
+  - "runRetentionCleanup only deletes monitoring tables (service_health_checks, alert_events, document_processing_events) — agentic RAG tables out of scope per research Open Question 4"
+  - "RETENTION_DAYS = 30 is a constant, not configurable — matches INFR-03 spec exactly"
+
+patterns-established:
+  - "Scheduled Cloud Functions: dynamic import() + explicit secrets array per function"
+  - "Retention cleanup: Promise.all([model.deleteOlderThan(), ...]) pattern for parallel table cleanup"
+
+requirements-completed: [HLTH-03, INFR-03]
+
+# Metrics
+duration: 1min
+completed: 2026-02-24
+---
+
+# Phase 2 Plan 04: Scheduled Cloud Function Exports Summary
+
+**Two new Firebase onSchedule Cloud Functions: runHealthProbes (5-minute interval) and runRetentionCleanup (weekly Monday 02:00) added to index.ts as standalone exports decoupled from document processing**
+
+## Performance
+
+- **Duration:** ~1 min
+- **Started:** 2026-02-24T19:34:20Z
+- **Completed:** 2026-02-24T19:35:17Z
+- **Tasks:** 2
+- **Files modified:** 1
+
+## Accomplishments
+- Added `runHealthProbes` onSchedule export that calls `healthProbeService.runAllProbes()` then `alertService.evaluateAndAlert()` on a 5-minute cadence
+- Added `runRetentionCleanup` onSchedule export that deletes rows older than 30 days from `service_health_checks`, `alert_events`, and `document_processing_events` in parallel
+- Both functions use dynamic `import()` pattern and list all required Firebase secrets explicitly
+- All 64 existing tests continue to pass
+
+## Task Commits
+
+Both tasks modified the same file in a single edit operation:
+
+1. **Task 1: Add runHealthProbes** - `1f9df62` (feat) — includes both Task 1 and Task 2
+2. **Task 2: Add runRetentionCleanup** — included in `1f9df62` above
+
+**Plan metadata:** (docs commit forthcoming)
+
+## Files Created/Modified
+- `backend/src/index.ts` - Added `runHealthProbes` and `runRetentionCleanup` scheduled Cloud Function exports after `processDocumentJobs`
+
+## Decisions Made
+- Combined both exports into one commit since they were added simultaneously to the same file — functionally equivalent to two separate commits
+- `retryCount: 0` on `runHealthProbes` — with a 5-minute schedule, a failed probe run is superseded by the next run before any retry would be useful
+- `timeoutSeconds: 120` on `runRetentionCleanup` — cleanup may process large batches; 60 seconds could be tight for large datasets
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None — TypeScript compiled cleanly on first pass, all tests passed.
+
+## User Setup Required
+None - no external service configuration required. Firebase deployment will pick up the new exports automatically.
+
+## Next Phase Readiness
+- All Phase 2 backend service plans complete (02-01 through 02-04)
+- Ready for Phase 3 API layer development
+- Health probe infrastructure fully wired: probes run on schedule, alerts sent via email, data retained for 30 days
+- Monitoring system is operational end-to-end
+
+---
+*Phase: 02-backend-services*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-RESEARCH.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-RESEARCH.md
@@ -0,0 +1,632 @@
+# Phase 2: Backend Services - Research
+
+**Researched:** 2026-02-24
+**Domain:** Firebase Cloud Functions scheduling, health probes, email alerting (Nodemailer/SMTP), fire-and-forget analytics, alert deduplication, 30-day data retention
+**Confidence:** HIGH
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| HLTH-02 | Each health probe makes a real authenticated API call, not just config checks | Verified: existing `/monitoring/diagnostics` only checks initialization, not live connectivity; each probe must make a real call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1, Firebase Auth verify-token attempt) |
+| HLTH-03 | Health probes run on a scheduled interval, separate from document processing | Verified: `processDocumentJobs` export pattern in `index.ts` shows how to add a second named Cloud Function export; `onSchedule` from `firebase-functions/v2/scheduler` is the correct mechanism; PITFALL-2 mandates decoupling |
+| HLTH-04 | Health probe results persist to Supabase and survive cold starts | Verified: `HealthCheckModel.create()` exists from Phase 1 with correct insert signature; `service_health_checks` table exists via migration 012; cold-start survival is automatic once persisted |
+| ALRT-01 | Admin receives email alert when a service goes down or degrades | Verified: SMTP config already defined in `index.ts` (`emailHost`, `emailUser`, `emailPass`, `emailPort`, `emailSecure`); `nodemailer` is the correct library (no other email SDK installed; SMTP credentials are pre-configured); `nodemailer` is NOT yet in package.json — must be installed |
+| ALRT-02 | Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period) | Verified: `AlertEventModel.findRecentByService()` from Phase 1 exists and accepts `withinMinutes` — built exactly for this use case; check it before firing email and before creating new `alert_events` row |
+| ALRT-04 | Alert recipient stored as configuration, not hardcoded | Verified: `EMAIL_WEEKLY_RECIPIENT` defineString already exists in `index.ts` with default `jpressnell@bluepointcapital.com`; alert service must read `process.env.EMAIL_WEEKLY_RECIPIENT` (or `process.env.ALERT_RECIPIENT`) — do NOT hardcode the string in service source |
+| ANLY-01 | Document processing events persist to Supabase at write time (not in-memory only) | Verified: `uploadMonitoringService.ts` is in-memory only (confirmed PITFALL-1); a `document_processing_events` table is NOT yet in any migration — Phase 2 must add migration 013 for it; `jobProcessorService.ts` has instrumentation hooks (lines 329-390) to attach fire-and-forget writes |
+| ANLY-03 | Analytics instrumentation is non-blocking (fire-and-forget, never delays processing pipeline) | Verified: PITFALL-6 documents the 14-min timeout risk; pattern is `void supabase.from(...).insert(...)` — no `await`; existing `jobProcessorService.ts` processes in ~10 minutes, so blocking even 200ms per checkpoint is risky |
+| INFR-03 | 30-day rolling data retention cleanup runs on schedule | Verified: `HealthCheckModel.deleteOlderThan(30)` and `AlertEventModel.deleteOlderThan(30)` exist from Phase 1; a third call for `document_processing_events` needs to be added; must be a separate named Cloud Function export (PITFALL-7: separate from `processDocumentJobs`) |
+
+</phase_requirements>
+
+---
+
+## Summary
+
+Phase 2 is a service-implementation phase. All database infrastructure (tables, models) was built in Phase 1. This phase builds six service classes and two new Firebase Cloud Function exports. The work falls into four groups:
+
+**Group 1 — Health Probes** (`healthProbeService.ts`): Four probers (Document AI, Anthropic/OpenAI LLM, Supabase, Firebase Auth) each making a real authenticated API call using the already-configured credentials. Results are written to Supabase via `HealthCheckModel.create()`. PITFALL-5 is the key risk: existing diagnostics only check initialization — new probes must make live API calls.
+
+**Group 2 — Alert Service** (`alertService.ts`): Reads health probe results, checks if an alert already exists within cooldown using `AlertEventModel.findRecentByService()`, creates an `alert_events` row if not, and sends email via `nodemailer` (SMTP credentials already defined as Firebase defineString/defineSecret). Alert recipient read from `process.env.EMAIL_WEEKLY_RECIPIENT` (or a new `ALERT_RECIPIENT` env var).
+
+**Group 3 — Analytics Collector** (`analyticsService.ts`): A `recordProcessingEvent()` function that writes to a new `document_processing_events` Supabase table using fire-and-forget (`void` not `await`). Requires migration 013. The `jobProcessorService.ts` already has the right instrumentation points (lines 329-390 track `processingTime` and `status`).
+
+**Group 4 — Schedulers** (new Cloud Function exports in `index.ts`): `runHealthProbes` (every 5 minutes, separate export) and `runRetentionCleanup` (weekly, separate export). Both must be completely decoupled from `processDocumentJobs`.
+
+**Primary recommendation:** Install `nodemailer` + `@types/nodemailer` first. Build services in dependency order: analytics migration → analyticsService → healthProbeService → alertService → schedulers.
+
+---
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `nodemailer` | ^6.9.x | SMTP email sending | SMTP config already pre-wired in `index.ts` (`emailHost`, `emailUser`, `emailPass`, `emailPort`, `emailSecure` via defineString/defineSecret); no other email library installed; Nodemailer is the standard Node.js SMTP library |
+| `@supabase/supabase-js` | Already installed (2.53.0) | Writing health checks and analytics to Supabase | Already the only DB client; `HealthCheckModel` and `AlertEventModel` from Phase 1 wrap all writes |
+| `firebase-admin` | Already installed (13.4.0) | Firebase Auth probe (verify-token endpoint) + `onSchedule` function exports | Already initialized via `config/firebase.ts` |
+| `firebase-functions` | Already installed (7.0.5) | `onSchedule` v2 for scheduled Cloud Functions | Existing `processDocumentJobs` uses exact same pattern |
+| `@google-cloud/documentai` | Already installed (9.3.0) | Document AI health probe (list processors call) | Already initialized in `documentAiProcessor.ts` |
+| `@anthropic-ai/sdk` | Already installed (0.57.0) | LLM health probe (minimal token message) | Already initialized in `llmService.ts` |
+| `openai` | Already installed (5.10.2) | OpenAI health probe fallback | Available when `LLM_PROVIDER=openai` |
+| `pg` | Already installed (8.11.3) | Supabase health probe (direct SELECT 1 query) | Direct pool already available via `getPostgresPool()` in `config/supabase.ts` |
+| Winston logger | Already installed (3.11.0) | All service logging | Project-wide convention; NEVER `console.log` |
+
+### New Packages Required
+| Library | Version | Purpose | Installation |
+|---------|---------|---------|-------------|
+| `nodemailer` | ^6.9.x | SMTP email transport | `npm install nodemailer` |
+| `@types/nodemailer` | ^6.4.x | TypeScript types | `npm install --save-dev @types/nodemailer` |
+
+### Alternatives Considered
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| `nodemailer` (SMTP) | Resend SDK | Resend is the STACK.md recommendation for new setups, but Gmail SMTP credentials are already fully configured in `index.ts` (`EMAIL_HOST`, `EMAIL_USER`, `EMAIL_PASS`, etc.) — switching to Resend requires new DNS records and a new API key; Nodemailer avoids all of that |
+| Separate `onSchedule` export | `node-cron` inside existing function | PITFALL-2: probe scheduling inside `processDocumentJobs` creates availability coupling; Firebase Cloud Scheduler + separate export is the correct architecture |
+| `getPostgresPool()` for Supabase health probe | Supabase PostgREST client | Direct PostgreSQL `SELECT 1` is a better health signal than PostgREST (tests TCP+auth rather than REST layer); `getPostgresPool()` already exists for this purpose |
+
+**Installation:**
+```bash
+cd backend
+npm install nodemailer
+npm install --save-dev @types/nodemailer
+```
+
+---
+
+## Architecture Patterns
+
+### Recommended Project Structure
+
+New files slot into existing service layer:
+
+```
+backend/src/
+├── models/
+│   └── migrations/
+│       └── 013_create_processing_events_table.sql   # NEW — analytics events table
+├── services/
+│   ├── healthProbeService.ts                        # NEW — probe orchestrator + individual probers
+│   ├── alertService.ts                              # NEW — deduplication + email + alert_events writer
+│   └── analyticsService.ts                          # NEW — fire-and-forget event writer
+├── index.ts                                         # UPDATE — add runHealthProbes + runRetentionCleanup exports
+└── __tests__/
+    └── models/                                      # (Phase 1 tests already here)
+    └── unit/
+        ├── healthProbeService.test.ts               # NEW
+        ├── alertService.test.ts                     # NEW
+        └── analyticsService.test.ts                 # NEW
+```
+
+### Pattern 1: Real Health Probe (HLTH-02)
+
+**What:** Each probe makes a real authenticated API call. Returns a structured `ProbeResult` with `status`, `latency_ms`, and `error_message`. Probe then calls `HealthCheckModel.create()` to persist.
+
+**Key insight:** The probe itself has no alert logic — that lives in `alertService.ts`. The probe only measures and records.
+
+```typescript
+// Source: derived from existing documentAiProcessor.ts and llmService.ts patterns
+interface ProbeResult {
+  service_name: string;
+  status: 'healthy' | 'degraded' | 'down';
+  latency_ms: number;
+  error_message?: string;
+  probe_details?: Record<string, unknown>;
+}
+
+// Document AI probe — list processors is a cheap read that tests auth + API availability
+async function probeDocumentAI(): Promise<ProbeResult> {
+  const start = Date.now();
+  try {
+    const client = new DocumentProcessorServiceClient();
+    await client.listProcessors({ parent: `projects/${projectId}/locations/us` });
+    const latency_ms = Date.now() - start;
+    return { service_name: 'document_ai', status: latency_ms > 2000 ? 'degraded' : 'healthy', latency_ms };
+  } catch (err) {
+    return {
+      service_name: 'document_ai',
+      status: 'down',
+      latency_ms: Date.now() - start,
+      error_message: err instanceof Error ? err.message : String(err),
+    };
+  }
+}
+
+// Supabase probe — direct PostgreSQL SELECT 1 via existing pg pool
+async function probeSupabase(): Promise<ProbeResult> {
+  const start = Date.now();
+  try {
+    const pool = getPostgresPool();
+    await pool.query('SELECT 1');
+    const latency_ms = Date.now() - start;
+    return { service_name: 'supabase', status: latency_ms > 2000 ? 'degraded' : 'healthy', latency_ms };
+  } catch (err) {
+    return {
+      service_name: 'supabase',
+      status: 'down',
+      latency_ms: Date.now() - start,
+      error_message: err instanceof Error ? err.message : String(err),
+    };
+  }
+}
+
+// LLM probe — minimal API call (1-word message) to verify key validity
+async function probeLLM(): Promise<ProbeResult> {
+  const start = Date.now();
+  try {
+    // Use whichever provider is configured
+    const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
+    await client.messages.create({
+      model: 'claude-haiku-4-5',
+      max_tokens: 5,
+      messages: [{ role: 'user', content: 'Hi' }],
+    });
+    const latency_ms = Date.now() - start;
+    return { service_name: 'llm_api', status: latency_ms > 5000 ? 'degraded' : 'healthy', latency_ms };
+  } catch (err) {
+    // 429 = degraded (rate limit), not 'down'
+    const is429 = err instanceof Error && err.message.includes('429');
+    return {
+      service_name: 'llm_api',
+      status: is429 ? 'degraded' : 'down',
+      latency_ms: Date.now() - start,
+      error_message: err instanceof Error ? err.message : String(err),
+    };
+  }
+}
+
+// Firebase Auth probe — verify a known-invalid token; expect auth/argument-error, not network error
+async function probeFirebaseAuth(): Promise<ProbeResult> {
+  const start = Date.now();
+  try {
+    await admin.auth().verifyIdToken('invalid-token-probe-check');
+    // Should never reach here — always throws
+    return { service_name: 'firebase_auth', status: 'healthy', latency_ms: Date.now() - start };
+  } catch (err) {
+    const latency_ms = Date.now() - start;
+    const errMsg = err instanceof Error ? err.message : String(err);
+    // 'argument-error' or 'auth/argument-error' = SDK is alive and Auth is reachable
+    const isExpectedError = errMsg.includes('argument') || errMsg.includes('INVALID');
+    return {
+      service_name: 'firebase_auth',
+      status: isExpectedError ? 'healthy' : 'down',
+      latency_ms,
+      error_message: isExpectedError ? undefined : errMsg,
+    };
+  }
+}
+```
+
+### Pattern 2: Alert Deduplication (ALRT-02)
+
+**What:** Before sending an email, check `AlertEventModel.findRecentByService()` for a matching alert within the cooldown window. If found, suppress. Uses `alert_events` table (already exists from Phase 1).
+
+**Important:** Deduplication check must happen before BOTH the `alert_events` row creation AND the email send — otherwise a suppressed email still creates a duplicate row.
+
+```typescript
+// Source: AlertEventModel.findRecentByService() from Phase 1 (verified)
+// Cooldown: 60 minutes (configurable via env var ALERT_COOLDOWN_MINUTES)
+const ALERT_COOLDOWN_MINUTES = parseInt(process.env.ALERT_COOLDOWN_MINUTES ?? '60', 10);
+
+async function maybeSendAlert(
+  serviceName: string,
+  alertType: 'service_down' | 'service_degraded',
+  message: string
+): Promise<void> {
+  // 1. Check deduplication window
+  const existing = await AlertEventModel.findRecentByService(
+    serviceName,
+    alertType,
+    ALERT_COOLDOWN_MINUTES
+  );
+
+  if (existing) {
+    logger.info('alertService: suppressing duplicate alert within cooldown', {
+      serviceName, alertType, existingAlertId: existing.id, cooldownMinutes: ALERT_COOLDOWN_MINUTES,
+    });
+    return; // suppress: both row creation AND email
+  }
+
+  // 2. Create alert_events row
+  await AlertEventModel.create({ service_name: serviceName, alert_type: alertType, message });
+
+  // 3. Send email
+  await sendAlertEmail(serviceName, alertType, message);
+}
+```
+
+### Pattern 3: Email via SMTP (Nodemailer + existing Firebase config) (ALRT-01, ALRT-04)
+
+**What:** Nodemailer transporter created using Firebase `defineString`/`defineSecret` values already in `index.ts`. Alert recipient from `process.env.EMAIL_WEEKLY_RECIPIENT` (non-hardcoded, satisfies ALRT-04).
+
+**Key insight:** The SMTP credentials (`EMAIL_HOST`, `EMAIL_USER`, `EMAIL_PASS`, `EMAIL_PORT`, `EMAIL_SECURE`) are already defined as Firebase params in `index.ts`. The service reads them from `process.env` — Firebase makes `defineString` values available there automatically.
+
+```typescript
+// Source: Firebase Functions v2 defineString/defineSecret pattern — verified in index.ts
+import nodemailer from 'nodemailer';
+import { logger } from '../utils/logger';
+
+function createTransporter() {
+  return nodemailer.createTransport({
+    host: process.env.EMAIL_HOST ?? 'smtp.gmail.com',
+    port: parseInt(process.env.EMAIL_PORT ?? '587', 10),
+    secure: process.env.EMAIL_SECURE === 'true',
+    auth: {
+      user: process.env.EMAIL_USER,
+      pass: process.env.EMAIL_PASS, // Firebase Secret — available in process.env
+    },
+  });
+}
+
+async function sendAlertEmail(serviceName: string, alertType: string, message: string): Promise<void> {
+  const recipient = process.env.EMAIL_WEEKLY_RECIPIENT; // ALRT-04: read from config, not hardcoded
+  if (!recipient) {
+    logger.warn('alertService.sendAlertEmail: no recipient configured, skipping email', { serviceName });
+    return;
+  }
+
+  const transporter = createTransporter();
+
+  try {
+    await transporter.sendMail({
+      from: process.env.EMAIL_FROM ?? process.env.EMAIL_USER,
+      to: recipient,
+      subject: `[CIM Summary] Alert: ${serviceName} — ${alertType}`,
+      text: message,
+      html: `<p><strong>${serviceName}</strong>: ${message}</p>`,
+    });
+    logger.info('alertService.sendAlertEmail: sent', { serviceName, alertType, recipient });
+  } catch (err) {
+    logger.error('alertService.sendAlertEmail: failed', {
+      error: err instanceof Error ? err.message : String(err),
+      serviceName, alertType,
+    });
+    // Do NOT re-throw — email failure should not break the probe run
+  }
+}
+```
+
+### Pattern 4: Fire-and-Forget Analytics (ANLY-03)
+
+**What:** `analyticsService.recordProcessingEvent()` uses `void` (no `await`) so the Supabase write is completely detached from the processing pipeline. The function signature returns `void` to make it impossible to accidentally `await` it.
+
+**Critical rule:** The function MUST be called with `void` or not awaited anywhere it's used. TypeScript enforcing `void` return type ensures this.
+
+```typescript
+// Source: PITFALL-6 pattern — fire-and-forget is mandatory
+export interface ProcessingEventData {
+  document_id: string;
+  user_id: string;
+  event_type: 'upload_started' | 'processing_started' | 'completed' | 'failed';
+  duration_ms?: number;
+  error_message?: string;
+  stage?: string;
+}
+
+// Return type is void (not Promise<void>) — cannot be awaited
+export function recordProcessingEvent(data: ProcessingEventData): void {
+  const supabase = getSupabaseServiceClient();
+  void supabase
+    .from('document_processing_events')
+    .insert({
+      document_id: data.document_id,
+      user_id: data.user_id,
+      event_type: data.event_type,
+      duration_ms: data.duration_ms ?? null,
+      error_message: data.error_message ?? null,
+      stage: data.stage ?? null,
+      created_at: new Date().toISOString(),
+    })
+    .then(({ error }) => {
+      if (error) {
+        // Never throw — log only (analytics failure must not affect processing)
+        logger.error('analyticsService.recordProcessingEvent: write failed', {
+          error: error.message, data,
+        });
+      }
+    });
+}
+```
+
+### Pattern 5: Scheduled Cloud Function Export (HLTH-03, INFR-03)
+
+**What:** Two new `onSchedule` exports added to `index.ts`. Each is a separate named export, completely decoupled from `processDocumentJobs`.
+
+**Important:** New exports must include the same `secrets` array as `processDocumentJobs` (all needed Firebase Secrets must be explicitly listed). `defineString` values are auto-available but `defineSecret` values require explicit listing.
+
+```typescript
+// Source: Existing processDocumentJobs pattern in index.ts (verified)
+// Add AFTER processDocumentJobs export
+
+// Health probe scheduler — separate from document processing (PITFALL-2)
+export const runHealthProbes = onSchedule({
+  schedule: 'every 5 minutes',
+  timeoutSeconds: 60,
+  memory: '256MiB',
+  secrets: [
+    anthropicApiKey,    // for LLM probe
+    openaiApiKey,       // for OpenAI probe fallback
+    databaseUrl,        // for Supabase probe
+    supabaseServiceKey,
+    supabaseAnonKey,
+  ],
+}, async (_event) => {
+  const { healthProbeService } = await import('./services/healthProbeService');
+  await healthProbeService.runAllProbes();
+});
+
+// Retention cleanup — weekly (PITFALL-7: separate from document processing scheduler)
+export const runRetentionCleanup = onSchedule({
+  schedule: 'every monday 02:00',
+  timeoutSeconds: 120,
+  memory: '256MiB',
+  secrets: [databaseUrl, supabaseServiceKey, supabaseAnonKey],
+}, async (_event) => {
+  const { HealthCheckModel } = await import('./models/HealthCheckModel');
+  const { AlertEventModel } = await import('./models/AlertEventModel');
+  const { analyticsService } = await import('./services/analyticsService');
+
+  const [hcCount, alertCount, eventCount] = await Promise.all([
+    HealthCheckModel.deleteOlderThan(30),
+    AlertEventModel.deleteOlderThan(30),
+    analyticsService.deleteProcessingEventsOlderThan(30),
+  ]);
+
+  logger.info('runRetentionCleanup: complete', { hcCount, alertCount, eventCount });
+});
+```
+
+### Pattern 6: Analytics Migration (ANLY-01)
+
+**What:** Migration `013_create_processing_events_table.sql` adds the `document_processing_events` table. Follows the migration 012 pattern exactly.
+
+```sql
+-- Source: backend/src/models/migrations/012_create_monitoring_tables.sql (verified pattern)
+CREATE TABLE IF NOT EXISTS document_processing_events (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    document_id UUID NOT NULL,
+    user_id UUID NOT NULL,
+    event_type TEXT NOT NULL CHECK (event_type IN ('upload_started', 'processing_started', 'completed', 'failed')),
+    duration_ms INTEGER,
+    error_message TEXT,
+    stage TEXT,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_document_processing_events_created_at
+    ON document_processing_events(created_at);
+CREATE INDEX IF NOT EXISTS idx_document_processing_events_document_id
+    ON document_processing_events(document_id);
+
+ALTER TABLE document_processing_events ENABLE ROW LEVEL SECURITY;
+```
+
+### Anti-Patterns to Avoid
+
+- **Probing config existence instead of live connectivity** (PITFALL-5): Any check of `if (process.env.ANTHROPIC_API_KEY)` is not a health probe. Must make a real API call.
+- **Awaiting analytics writes** (PITFALL-6): `await analyticsService.recordProcessingEvent(...)` will block the processing pipeline. Must use `void analyticsService.recordProcessingEvent(...)` or the function must not return a Promise.
+- **Piggybacking health probes on `processDocumentJobs`** (PITFALL-2): Health probes mixed into the document processing function create availability coupling. Must be a separate `onSchedule` export.
+- **Hardcoding alert recipient** (PITFALL-8): Never write `to: 'jpressnell@bluepointcapital.com'` in source. Always `process.env.EMAIL_WEEKLY_RECIPIENT`.
+- **Alert storms** (PITFALL-3): Sending email on every failed probe run is a mistake. Must check `AlertEventModel.findRecentByService()` with cooldown window before every send.
+- **Creating the nodemailer transporter at module level**: The Firebase Secret `EMAIL_PASS` is only available inside a Cloud Function invocation (it's injected at runtime). Create the transporter inside each email call or on first use inside a function execution — not at module initialization time.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Alert deduplication state | Custom in-memory `Map<service, lastAlertTime>` | `AlertEventModel.findRecentByService()` (already exists, Phase 1) | In-memory state resets on cold start (PITFALL-1); DB-backed deduplication survives restarts |
+| SMTP transport | Custom HTTP calls to Gmail API | `nodemailer` with existing SMTP config | Gmail API requires OAuth flow; SMTP App Password already configured and working |
+| Health check result storage | Custom logging or in-memory | `HealthCheckModel.create()` (already exists, Phase 1) | Already written, tested, and connected to the right table |
+| Cron scheduling | `setInterval` inside function body | `onSchedule` Firebase Cloud Scheduler | `setInterval` does not work in serverless (instances spin up/down); Cloud Scheduler is the correct mechanism |
+| Alert creation | Direct Supabase insert | `AlertEventModel.create()` (already exists, Phase 1) | Already written with input validation and error handling |
+
+**Key insight:** Phase 1 built the entire model layer specifically so Phase 2 only has to write service logic. Use every model method; don't bypass them.
+
+---
+
+## Common Pitfalls
+
+### Pitfall A: Firebase Secret Unavailable at Module Load Time
+**What goes wrong:** Nodemailer transporter created at module top level with `process.env.EMAIL_PASS` — at module load time (cold start initialization), the Firebase Secret hasn't been injected yet. `EMAIL_PASS` is `undefined`. All email attempts fail.
+**Why it happens:** Firebase Functions v2 `defineSecret()` values are injected into `process.env` when the function invocation starts, not when the module is first imported.
+**How to avoid:** Create the nodemailer transporter lazily — inside the function that sends email, not at module level. Alternatively, use a factory function called at send time.
+**Warning signs:** `nodemailer` error "authentication failed" or "invalid credentials" on first cold start; works on warm invocations.
+
+### Pitfall B: LLM Probe Cost
+**What goes wrong:** LLM health probe uses the same model as document processing (e.g., `claude-opus-4-1`). Running every 5 minutes costs ~$0.01 × 288 calls/day = ~$2.88/day just for probing.
+**Why it happens:** Copy-pasting the model name from `llmService.ts`.
+**How to avoid:** Use the cheapest available model for probes: `claude-haiku-4-5` (Anthropic) or `gpt-3.5-turbo` (OpenAI). The probe only needs to verify API key validity and reachability — response quality doesn't matter. Set `max_tokens: 5`.
+**Warning signs:** Anthropic API bill spikes after deploying `runHealthProbes`.
+
+### Pitfall C: Supabase PostgREST vs Direct Postgres for Health Probe
+**What goes wrong:** Using `getSupabaseServiceClient()` (PostgREST) for the Supabase health probe instead of `getPostgresPool()`. PostgREST adds an HTTP layer — if the Supabase API is overloaded but the DB is healthy, the probe returns "down" incorrectly.
+**Why it happens:** PostgREST client is the default Supabase client used everywhere else.
+**How to avoid:** Use `getPostgresPool().query('SELECT 1')` — this tests TCP connectivity to the database directly, which is the true health signal for data persistence operations.
+**Warning signs:** Supabase probe reports "down" while the DB is healthy; health check latency fluctuates widely.
+
+### Pitfall D: Analytics Migration Naming Conflict
+**What goes wrong:** Phase 2 creates `013_create_processing_events_table.sql` but another developer or future migration already used `013`. The migrator runs both or skips one.
+**Why it happens:** Not verifying the highest current migration number.
+**How to avoid:** Current highest is `012_create_monitoring_tables.sql` (created in Phase 1). Next migration MUST be `013_`. Confirmed safe.
+**Warning signs:** Migration run shows "already applied" for `013_` without the table existing.
+
+### Pitfall E: Probe Errors Swallowed Silently
+**What goes wrong:** A probe throws an uncaught exception. The `runHealthProbes` Cloud Function catches it at the top level and does nothing. No health check record is written. The admin dashboard shows no data.
+**Why it happens:** Each individual probe can fail independently — if one throws, the others should still run.
+**How to avoid:** Wrap each probe call in `try/catch` inside `healthProbeService.runAllProbes()`. A probe error should create a `status: 'down'` result with the error in `error_message`, then persist that to Supabase. The probe orchestrator must never throw; it must always complete all probes.
+**Warning signs:** One service's health checks stop appearing in Supabase while others continue.
+
+### Pitfall F: `deleteOlderThan` Without Batching on Large Tables
+**What goes wrong:** After 30 days of operation with health probes running every 5 minutes, `service_health_checks` could have ~8,640 rows (288/day × 30). A single `DELETE WHERE created_at < cutoff` is fine at this scale. At 6 months, however, it could be 50k+ rows — still manageable with the `created_at` index. No batching needed at Phase 2 scale.
+**Why it happens:** Concern about DB timeout on large deletes.
+**How to avoid:** Index on `created_at` (exists from Phase 1 migration 012) makes the DELETE efficient. For Phase 2 scale, a single `DELETE` is correct. Only consider batching if the table grows to millions of rows.
+**Warning signs:** N/A at Phase 2 scale. Log `deletedCount` for visibility.
+
+---
+
+## Code Examples
+
+Verified patterns from codebase and official Node.js/Firebase docs:
+
+### Adding a Second Cloud Function Export (index.ts)
+```typescript
+// Source: Existing processDocumentJobs pattern in backend/src/index.ts (verified)
+// New export follows the same onSchedule structure:
+import { onSchedule } from 'firebase-functions/v2/scheduler';
+
+export const runHealthProbes = onSchedule({
+  schedule: 'every 5 minutes',
+  timeoutSeconds: 60,
+  memory: '256MiB',
+  retryCount: 0, // Probes should not retry — they run again in 5 minutes anyway
+  secrets: [anthropicApiKey, openaiApiKey, databaseUrl, supabaseServiceKey, supabaseAnonKey],
+}, async (_event) => {
+  // Dynamic import (same pattern as processDocumentJobs)
+  const { healthProbeService } = await import('./services/healthProbeService');
+  await healthProbeService.runAllProbes();
+});
+```
+
+### HealthCheckModel.create() — Already Available (Phase 1)
+```typescript
+// Source: backend/src/models/HealthCheckModel.ts (verified, Phase 1)
+await HealthCheckModel.create({
+  service_name: 'document_ai',
+  status: 'healthy',
+  latency_ms: 234,
+  probe_details: { processor_count: 1 },
+});
+```
+
+### AlertEventModel.findRecentByService() — Already Available (Phase 1)
+```typescript
+// Source: backend/src/models/AlertEventModel.ts (verified, Phase 1)
+const recent = await AlertEventModel.findRecentByService(
+  'document_ai',      // service name
+  'service_down',     // alert type
+  60                  // within last 60 minutes
+);
+if (recent) {
+  // suppress — cooldown active
+}
+```
+
+### Nodemailer SMTP — Using Existing Firebase Config
+```typescript
+// Source: Firebase defineString/defineSecret pattern verified in index.ts lines 220-225
+// process.env.EMAIL_HOST, EMAIL_USER, EMAIL_PASS, EMAIL_PORT, EMAIL_SECURE all available
+import nodemailer from 'nodemailer';
+
+async function sendEmail(to: string, subject: string, html: string): Promise<void> {
+  // Transporter created INSIDE function call (not at module level) — Firebase Secret timing
+  const transporter = nodemailer.createTransport({
+    host: process.env.EMAIL_HOST ?? 'smtp.gmail.com',
+    port: parseInt(process.env.EMAIL_PORT ?? '587', 10),
+    secure: process.env.EMAIL_SECURE === 'true',
+    auth: {
+      user: process.env.EMAIL_USER,
+      pass: process.env.EMAIL_PASS,
+    },
+  });
+  await transporter.sendMail({ from: process.env.EMAIL_FROM, to, subject, html });
+}
+```
+
+### Fire-and-Forget Write Pattern
+```typescript
+// Source: PITFALL-6 prevention — void prevents awaiting
+// This is the ONLY correct way to write fire-and-forget to Supabase
+
+// CORRECT — non-blocking:
+void analyticsService.recordProcessingEvent({ document_id, user_id, event_type: 'completed', duration_ms });
+
+// WRONG — blocks processing pipeline:
+await analyticsService.recordProcessingEvent(...); // DO NOT DO THIS
+
+// ALSO WRONG — return type must be void, not Promise<void>:
+async function recordProcessingEvent(...): Promise<void> { ... } // enables accidental await
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| `uploadMonitoringService.ts` in-memory event store | Persistent `document_processing_events` Supabase table | Phase 2 introduces this | Analytics survives cold starts; 30-day history available |
+| Configuration-only health check (`/monitoring/diagnostics`) | Live API call probers (`healthProbeService.ts`) | Phase 2 introduces this | Actually detects downed/revoked credentials |
+| No email alerting | SMTP email via `nodemailer` + Firebase SMTP config | Phase 2 introduces this | Admin notified of outages |
+| No scheduled probe function | `runHealthProbes` Cloud Function export | Phase 2 introduces this | Probes run independently of document processing |
+
+**Existing but unused:** The `performance_metrics` table (migration 010) is scoped to agentic RAG sessions (has a FK to `agentic_rag_sessions`). It is NOT suitable for general document processing analytics — use the new `document_processing_events` table instead.
+
+---
+
+## Open Questions
+
+1. **Probe frequency for LLM (HLTH-03)**
+   - What we know: 5-minute probe interval is specified for `runHealthProbes`. An Anthropic probe every 5 minutes at min tokens costs ~$0.001/call × 288 = $0.29/day. Acceptable.
+   - What's unclear: Whether to probe BOTH Anthropic and OpenAI each run (depends on active provider) or always probe both.
+   - Recommendation: Probe the active LLM provider (from `process.env.LLM_PROVIDER`) plus always probe Supabase and Document AI. Probing inactive providers is useful for failover readiness but not required by HLTH-02.
+
+2. **Alert recipient variable name: `EMAIL_WEEKLY_RECIPIENT` vs `ALERT_RECIPIENT`**
+   - What we know: `EMAIL_WEEKLY_RECIPIENT` is already defined as a Firebase `defineString` in `index.ts`. It has the correct default value.
+   - What's unclear: The name implies "weekly" which is misleading for health alerts. Should this be a separate `ALERT_RECIPIENT` env var?
+   - Recommendation: Reuse `EMAIL_WEEKLY_RECIPIENT` for alert recipient to avoid adding another Firebase param. Document that it's dual-purpose. If a separate `ALERT_RECIPIENT` is desired, add it as a new `defineString` in `index.ts` alongside the existing one.
+
+3. **`runHealthProbes` secrets list**
+   - What we know: `defineSecret()` values must be listed in each function's `secrets:` array to be available in `process.env` during that function's execution.
+   - What's unclear: The LLM probe needs `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` depending on config. The Supabase probe needs `DATABASE_URL`, `SUPABASE_SERVICE_KEY`, `SUPABASE_ANON_KEY`.
+   - Recommendation: Include all potentially-needed secrets: `anthropicApiKey`, `openaiApiKey`, `databaseUrl`, `supabaseServiceKey`, `supabaseAnonKey`. Unused secrets don't cause issues; missing ones cause failures.
+
+4. **Should `runRetentionCleanup` also delete from `performance_metrics` / `session_events`?**
+   - What we know: `performance_metrics` (migration 010) tracks agentic RAG sessions. It has no 30-day retention requirement specified.
+   - What's unclear: INFR-03 says "30-day rolling data retention cleanup" — does this apply only to monitoring tables or all analytics tables?
+   - Recommendation: Phase 2 only manages tables introduced in the monitoring feature: `service_health_checks`, `alert_events`, `document_processing_events`. Leave `performance_metrics`, `session_events`, `execution_events` out of scope — INFR-03 is monitoring-specific.
+
+---
+
+## Validation Architecture
+
+*(Nyquist validation not configured — `workflow.nyquist_validation` not present in `.planning/config.json`. This section is omitted.)*
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- `backend/src/models/HealthCheckModel.ts` — Verified: `create()`, `findLatestByService()`, `findAll()`, `deleteOlderThan()` signatures and behavior
+- `backend/src/models/AlertEventModel.ts` — Verified: `create()`, `findActive()`, `findRecentByService()`, `deleteOlderThan()` signatures; deduplication method ready for Phase 2
+- `backend/src/index.ts` lines 208-265 — Verified: `defineSecret('EMAIL_PASS')`, `defineString('EMAIL_HOST')`, `defineString('EMAIL_USER')`, `defineString('EMAIL_PORT')`, `defineString('EMAIL_SECURE')`, `defineString('EMAIL_WEEKLY_RECIPIENT')` all already defined; `onSchedule` export pattern confirmed from `processDocumentJobs`
+- `backend/src/models/migrations/012_create_monitoring_tables.sql` — Verified: migration 012 exists and is the current highest; next migration is 013
+- `backend/src/services/jobProcessorService.ts` lines 329-390 — Verified: `processingTime` and `status` tracked at end of each job; correct hook points for analytics instrumentation
+- `backend/src/services/uploadMonitoringService.ts` — Verified: in-memory only, loses data on cold start (PITFALL-1 confirmed)
+- `backend/package.json` — Verified: `nodemailer` is NOT installed; must be added
+- `backend/vitest.config.ts` — Verified: test glob includes `src/__tests__/**/*.{test,spec}.{ts,js}`; timeout 30s
+- `.planning/research/PITFALLS.md` — Verified: PITFALL-1 through PITFALL-10 all considered in this research
+- `.planning/research/STACK.md` — Verified: Email decision (Nodemailer fallback), node-cron vs Firebase Cloud Scheduler
+
+### Secondary (MEDIUM confidence)
+- `nodemailer` SMTP pattern: Standard Node.js email library; `createTransport` + `sendMail` API is stable and well-documented. Confidence HIGH from training data; verified against package docs as of August 2025.
+- Firebase `defineSecret()` runtime injection timing: Firebase Secrets are injected at function invocation time, not module load time — confirmed behavior from Firebase Functions v2 documentation patterns. Verified via the `secrets:` array requirement in `onSchedule` config.
+
+### Tertiary (LOW confidence)
+- Specific LLM probe cost calculation: Estimated from Anthropic public pricing as of training data. Actual cost may vary — verify with Anthropic API pricing page before deploying.
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — all libraries verified in `package.json`; only `nodemailer` is new
+- Architecture: HIGH — Cloud Function export pattern verified from existing `processDocumentJobs`; model methods verified from Phase 1 source
+- Pitfalls: HIGH — PITFALL-1 through PITFALL-10 verified against codebase; Firebase Secret timing is documented Firebase behavior
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-03-25 (30 days — Firebase Functions v2 and Supabase patterns are stable)
--- a/.planning/milestones/v1.0-phases/02-backend-services/02-VERIFICATION.md
+++ b/.planning/milestones/v1.0-phases/02-backend-services/02-VERIFICATION.md
@@ -0,0 +1,157 @@
+---
+phase: 02-backend-services
+verified: 2026-02-24T14:38:30Z
+status: passed
+score: 14/14 must-haves verified
+re_verification: false
+---
+
+# Phase 2: Backend Services Verification Report
+
+**Phase Goal:** All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule
+**Verified:** 2026-02-24T14:38:30Z
+**Status:** PASSED
+**Re-verification:** No — initial verification
+
+---
+
+## Goal Achievement
+
+### Observable Truths
+
+| #  | Truth | Status | Evidence |
+|----|-------|--------|----------|
+| 1  | `recordProcessingEvent()` writes to `document_processing_events` table via Supabase | VERIFIED | `analyticsService.ts:34` — `void supabase.from('document_processing_events').insert(...)` |
+| 2  | `recordProcessingEvent()` returns `void` (not `Promise`) so callers cannot accidentally await it | VERIFIED | `analyticsService.ts:31` — `export function recordProcessingEvent(data: ProcessingEventData): void` |
+| 3  | A deliberate Supabase write failure logs an error but does not throw or reject | VERIFIED | `analyticsService.ts:45-52` — `.then(({ error }) => { if (error) logger.error(...) })` — no rethrow; test 3 passes |
+| 4  | `deleteProcessingEventsOlderThan(30)` removes rows older than 30 days | VERIFIED | `analyticsService.ts:68-88` — `.lt('created_at', cutoff)` with JS-computed ISO date; test 5-6 pass |
+| 5  | Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken) | VERIFIED | `healthProbeService.ts:32-173` — 4 individual probe functions each call real SDK clients; tests 1, 5 pass |
+| 6  | Each probe returns a structured `ProbeResult` with `service_name`, `status`, `latency_ms`, and optional `error_message` | VERIFIED | `healthProbeService.ts:13-19` — `ProbeResult` interface; all probe functions return it; test 1 passes |
+| 7  | Probe results are persisted to Supabase via `HealthCheckModel.create()` | VERIFIED | `healthProbeService.ts:219-225` — `await HealthCheckModel.create({...})` inside post-probe loop; test 2 passes |
+| 8  | A single probe failure does not prevent other probes from running | VERIFIED | `healthProbeService.ts:198` — `Promise.allSettled()` + individual try/catch on persist; test 3 passes |
+| 9  | LLM probe uses cheapest model (`claude-haiku-4-5`) with `max_tokens 5` | VERIFIED | `healthProbeService.ts:63-66` — `model: 'claude-haiku-4-5', max_tokens: 5` |
+| 10 | Supabase probe uses `getPostgresPool().query('SELECT 1')`, not PostgREST client | VERIFIED | `healthProbeService.ts:105-106` — `const pool = getPostgresPool(); await pool.query('SELECT 1')`; test 5 passes |
+| 11 | An alert email is sent when a probe returns 'degraded' or 'down'; deduplication prevents duplicate emails within cooldown | VERIFIED | `alertService.ts:103-143` — `evaluateAndAlert()` checks `findRecentByService()` before creating row and sending email; tests 2-4 pass |
+| 12 | Alert recipient is read from `process.env.EMAIL_WEEKLY_RECIPIENT`, never hardcoded in runtime logic | VERIFIED | `alertService.ts:43` — `const recipient = process.env['EMAIL_WEEKLY_RECIPIENT']`; no hardcoded address in runtime path; test 5 passes |
+| 13 | `runHealthProbes` Cloud Function export runs on 'every 5 minutes' schedule, separate from `processDocumentJobs` | VERIFIED | `index.ts:340-363` — `export const runHealthProbes = onSchedule({ schedule: 'every 5 minutes', ... })` — separate export |
+| 14 | `runRetentionCleanup` deletes from `service_health_checks`, `alert_events`, and `document_processing_events` older than 30 days on schedule | VERIFIED | `index.ts:366-390` — `schedule: 'every monday 02:00'`; `Promise.all([HealthCheckModel.deleteOlderThan(30), AlertEventModel.deleteOlderThan(30), deleteProcessingEventsOlderThan(30)])` |
+
+**Score:** 14/14 truths verified
+
+---
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `backend/src/models/migrations/013_create_processing_events_table.sql` | DDL with indexes and RLS | VERIFIED | 34 lines — `CREATE TABLE IF NOT EXISTS document_processing_events`, 2 indexes on `created_at`/`document_id`, `ENABLE ROW LEVEL SECURITY` |
+| `backend/src/services/analyticsService.ts` | Fire-and-forget analytics writer | VERIFIED | 88 lines — exports `recordProcessingEvent` (void), `deleteProcessingEventsOlderThan` (Promise<number>), `ProcessingEventData` |
+| `backend/src/__tests__/unit/analyticsService.test.ts` | Unit tests, min 50 lines | VERIFIED | 205 lines — 6 tests, all pass |
+| `backend/src/services/healthProbeService.ts` | 4 probers + orchestrator | VERIFIED | 248 lines — exports `healthProbeService.runAllProbes()` and `ProbeResult` |
+| `backend/src/__tests__/unit/healthProbeService.test.ts` | Unit tests, min 80 lines | VERIFIED | 317 lines — 9 tests, all pass |
+| `backend/src/services/alertService.ts` | Alert deduplication + email | VERIFIED | 146 lines — exports `alertService.evaluateAndAlert()` |
+| `backend/src/__tests__/unit/alertService.test.ts` | Unit tests, min 80 lines | VERIFIED | 235 lines — 8 tests, all pass |
+| `backend/src/index.ts` | Two new `onSchedule` Cloud Function exports | VERIFIED | `export const runHealthProbes` (line 340), `export const runRetentionCleanup` (line 366) |
+
+---
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `analyticsService.ts` | `config/supabase.ts` | `getSupabaseServiceClient()` call | WIRED | `analyticsService.ts:1,32,70` — imported and called inside both exported functions |
+| `analyticsService.ts` | `document_processing_events` table | `void supabase.from('document_processing_events').insert(...)` | WIRED | `analyticsService.ts:34-35` — pattern matches exactly |
+| `healthProbeService.ts` | `HealthCheckModel.ts` | `HealthCheckModel.create()` for persistence | WIRED | `healthProbeService.ts:5,219` — imported statically and called for each probe result |
+| `healthProbeService.ts` | `config/supabase.ts` | `getPostgresPool()` for Supabase probe | WIRED | `healthProbeService.ts:4,105` — imported and called inside `probeSupabase()` |
+| `alertService.ts` | `AlertEventModel.ts` | `findRecentByService()` and `create()` | WIRED | `alertService.ts:3,113,135` — imported and both methods called in `evaluateAndAlert()` |
+| `alertService.ts` | `nodemailer` | `createTransport` inside function scope | WIRED | `alertService.ts:1,22` — imported; `createTransporter()` is called lazily inside `sendAlertEmail()` |
+| `alertService.ts` | `process.env.EMAIL_WEEKLY_RECIPIENT` | Config-based recipient | WIRED | `alertService.ts:43` — `process.env['EMAIL_WEEKLY_RECIPIENT']` with no hardcoded fallback |
+| `index.ts (runHealthProbes)` | `healthProbeService.ts` | `dynamic import('./services/healthProbeService')` | WIRED | `index.ts:353` — `const { healthProbeService } = await import('./services/healthProbeService')` |
+| `index.ts (runHealthProbes)` | `alertService.ts` | `dynamic import('./services/alertService')` | WIRED | `index.ts:354` — `const { alertService } = await import('./services/alertService')` |
+| `index.ts (runRetentionCleanup)` | `HealthCheckModel.ts` | `HealthCheckModel.deleteOlderThan(30)` | WIRED | `index.ts:372,379` — dynamically imported and called in `Promise.all` |
+| `index.ts (runRetentionCleanup)` | `analyticsService.ts` | `deleteProcessingEventsOlderThan(30)` | WIRED | `index.ts:374,381` — dynamically imported and called in `Promise.all` |
+
+---
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|-------------|-------------|--------|----------|
+| ANLY-01 | 02-01 | Document processing events persist to Supabase at write time | SATISFIED | `analyticsService.ts` writes to `document_processing_events` via Supabase on each `recordProcessingEvent()` call |
+| ANLY-03 | 02-01 | Analytics instrumentation is non-blocking (fire-and-forget) | SATISFIED | `recordProcessingEvent()` return type is `void`; uses `void supabase...insert(...).then(...)` — no `await`; test 2 verifies return is `undefined` |
+| HLTH-02 | 02-02 | Each health probe makes a real authenticated API call | SATISFIED | `healthProbeService.ts` — Document AI calls `client.listProcessors()`, LLM calls `client.messages.create()`, Supabase calls `pool.query('SELECT 1')`, Firebase calls `admin.auth().verifyIdToken()` |
+| HLTH-04 | 02-02 | Health probe results persist to Supabase | SATISFIED | `healthProbeService.ts:219-225` — `HealthCheckModel.create()` called for every probe result |
+| ALRT-01 | 02-03 | Admin receives email alert when a service goes down or degrades | SATISFIED | `alertService.ts` — `sendAlertEmail()` called after `AlertEventModel.create()` for any non-healthy probe status |
+| ALRT-02 | 02-03 | Alert deduplication prevents repeat emails within cooldown period | SATISFIED | `alertService.ts:113-128` — `AlertEventModel.findRecentByService()` gates both row creation and email; test 4 verifies suppression |
+| ALRT-04 | 02-03 | Alert recipient stored as configuration, not hardcoded | SATISFIED | `alertService.ts:43` — `process.env['EMAIL_WEEKLY_RECIPIENT']` with no hardcoded default; service skips email if env var missing |
+| HLTH-03 | 02-04 | Health probes run on a scheduled interval, separate from document processing | SATISFIED | `index.ts:340-363` — `export const runHealthProbes = onSchedule({ schedule: 'every 5 minutes' })` — distinct export from `processDocumentJobs` |
+| INFR-03 | 02-04 | 30-day rolling data retention cleanup runs on schedule | SATISFIED | `index.ts:366-390` — `export const runRetentionCleanup = onSchedule({ schedule: 'every monday 02:00' })` — deletes from all 3 monitoring tables |
+
+**Orphaned requirements check:** Requirements INFR-01 and INFR-04 are mapped to Phase 1 in REQUIREMENTS.md and are not claimed by any Phase 2 plan — correctly out of scope. HLTH-01, ANLY-02, INFR-02, ALRT-03 are mapped to Phase 3/4 — correctly out of scope.
+
+All 9 Phase 2 requirement IDs (HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03) are accounted for with implementation evidence.
+
+---
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| `backend/src/index.ts` | 225 | `defineString('EMAIL_WEEKLY_RECIPIENT', { default: 'jpressnell@bluepointcapital.com' })` | Info | Personal email address as Firebase `defineString` deployment default. `emailWeeklyRecipient` variable is defined but never passed to any function or included in any secrets array — it is effectively unused. The runtime `alertService.ts` reads `process.env['EMAIL_WEEKLY_RECIPIENT']` correctly with no hardcoded default. **Not an ALRT-04 violation** (the `defineString` default is deployment infrastructure config, not source-code-hardcoded logic). Recommend removing the personal email from this default or replacing with a placeholder in a follow-up. |
+
+No blockers. No stubs. No placeholder implementations found.
+
+---
+
+### TypeScript Compilation
+
+```
+npx tsc --noEmit — exit 0 (no output, no errors)
+```
+
+All new files compile cleanly with no TypeScript errors.
+
+---
+
+### Test Results
+
+```
+Test Files  3 passed (3)
+     Tests  23 passed (23)
+  Duration  924ms
+```
+
+All 23 unit tests across `analyticsService.test.ts`, `healthProbeService.test.ts`, and `alertService.test.ts` pass.
+
+---
+
+### Human Verification Required
+
+#### 1. Live Firebase Deployment — Health Probe Execution
+
+**Test:** Deploy to Firebase and wait for a `runHealthProbes` trigger (5-minute schedule). Check Firebase Cloud Logging for `healthProbeService: all probes complete` log entry and verify 4 new rows in `service_health_checks` table.
+**Expected:** 4 rows inserted, all with real latency values. `document_ai` and `firebase_auth` probes return either healthy or degraded (not a connection failure).
+**Why human:** Cannot run Firebase scheduled functions locally; requires live GCP credentials and deployed infrastructure.
+
+#### 2. Alert Email Delivery — End-to-End
+
+**Test:** Temporarily set `ANTHROPIC_API_KEY` to an invalid value and trigger `runHealthProbes`. Verify an email arrives at the `EMAIL_WEEKLY_RECIPIENT` address with subject `[CIM Summary] Alert: llm_api — service_down`.
+**Expected:** Email received within 5 minutes of probe run. Second probe cycle within 60 minutes should NOT send a duplicate email.
+**Why human:** SMTP delivery requires live credentials and network routing; deduplication cooldown requires real-time waiting.
+
+#### 3. Retention Cleanup — Data Deletion Verification
+
+**Test:** Insert rows into `document_processing_events`, `service_health_checks`, and `alert_events` with `created_at` older than 30 days, then trigger `runRetentionCleanup` manually. Verify old rows are deleted and recent rows remain.
+**Expected:** Only rows older than 30 days deleted; row counts logged accurately.
+**Why human:** Requires live Supabase access and insertion of backdated test data.
+
+---
+
+### Gaps Summary
+
+None. All must-haves verified. Phase goal achieved.
+
+---
+
+_Verified: 2026-02-24T14:38:30Z_
+_Verifier: Claude (gsd-verifier)_
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-01-PLAN.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-01-PLAN.md
@@ -0,0 +1,283 @@
+---
+phase: 03-api-layer
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - backend/src/middleware/requireAdmin.ts
+  - backend/src/services/analyticsService.ts
+  - backend/src/routes/admin.ts
+  - backend/src/index.ts
+autonomous: true
+requirements:
+  - INFR-02
+  - HLTH-01
+  - ANLY-02
+
+must_haves:
+  truths:
+    - "GET /admin/health returns current health status for all four services when called by admin"
+    - "GET /admin/analytics returns processing summary (uploads, success/failure, avg time) for a configurable time range"
+    - "GET /admin/alerts returns active alert events"
+    - "POST /admin/alerts/:id/acknowledge marks an alert as acknowledged"
+    - "Non-admin authenticated users receive 404 on all admin endpoints"
+    - "Unauthenticated requests receive 401 on admin endpoints"
+  artifacts:
+    - path: "backend/src/middleware/requireAdmin.ts"
+      provides: "Admin email check middleware returning 404 for non-admin"
+      exports: ["requireAdminEmail"]
+    - path: "backend/src/routes/admin.ts"
+      provides: "Admin router with health, analytics, alerts endpoints"
+      exports: ["default"]
+    - path: "backend/src/services/analyticsService.ts"
+      provides: "getAnalyticsSummary function using Postgres pool for aggregate queries"
+      exports: ["getAnalyticsSummary", "AnalyticsSummary"]
+  key_links:
+    - from: "backend/src/routes/admin.ts"
+      to: "backend/src/middleware/requireAdmin.ts"
+      via: "router.use(requireAdminEmail)"
+      pattern: "requireAdminEmail"
+    - from: "backend/src/routes/admin.ts"
+      to: "backend/src/models/HealthCheckModel.ts"
+      via: "HealthCheckModel.findLatestByService()"
+      pattern: "findLatestByService"
+    - from: "backend/src/routes/admin.ts"
+      to: "backend/src/services/analyticsService.ts"
+      via: "getAnalyticsSummary(range)"
+      pattern: "getAnalyticsSummary"
+    - from: "backend/src/index.ts"
+      to: "backend/src/routes/admin.ts"
+      via: "app.use('/admin', adminRoutes)"
+      pattern: "app\\.use.*admin"
+---
+
+<objective>
+Create admin-authenticated HTTP endpoints exposing health status, alerts, and processing analytics.
+
+Purpose: Enables the admin to query service health, view active alerts, acknowledge alerts, and see processing analytics through protected API routes. This is the data access layer that Phase 4 (frontend) will consume.
+
+Output: Four working admin endpoints behind Firebase Auth + admin email verification, plus the `getAnalyticsSummary()` query function.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/03-api-layer/03-RESEARCH.md
+
+@backend/src/middleware/firebaseAuth.ts
+@backend/src/models/HealthCheckModel.ts
+@backend/src/models/AlertEventModel.ts
+@backend/src/services/analyticsService.ts
+@backend/src/services/healthProbeService.ts
+@backend/src/routes/monitoring.ts
+@backend/src/index.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create requireAdmin middleware and getAnalyticsSummary function</name>
+  <files>
+    backend/src/middleware/requireAdmin.ts
+    backend/src/services/analyticsService.ts
+  </files>
+  <action>
+**1. Create `backend/src/middleware/requireAdmin.ts`:**
+
+Create the admin email check middleware. This runs AFTER `verifyFirebaseToken` in the middleware chain, so `req.user` is already populated.
+
+```typescript
+import { Response, NextFunction } from 'express';
+import { FirebaseAuthenticatedRequest } from './firebaseAuth';
+import { logger } from '../utils/logger';
+
+export function requireAdminEmail(
+  req: FirebaseAuthenticatedRequest,
+  res: Response,
+  next: NextFunction
+): void {
+  // Read inside function, not at module level — Firebase Secrets not available at module load time
+  const adminEmail = process.env['ADMIN_EMAIL'] ?? process.env['EMAIL_WEEKLY_RECIPIENT'];
+
+  if (!adminEmail) {
+    logger.warn('requireAdminEmail: neither ADMIN_EMAIL nor EMAIL_WEEKLY_RECIPIENT is configured — denying all admin access');
+    res.status(404).json({ error: 'Not found' });
+    return;
+  }
+
+  const userEmail = req.user?.email;
+
+  if (!userEmail || userEmail !== adminEmail) {
+    // 404 — do not reveal admin routes exist (per locked decision)
+    logger.warn('requireAdminEmail: access denied', {
+      uid: req.user?.uid ?? 'unauthenticated',
+      email: userEmail ?? 'none',
+      path: req.path,
+    });
+    res.status(404).json({ error: 'Not found' });
+    return;
+  }
+
+  next();
+}
+```
+
+Key constraints:
+- Return 404 (not 403) for non-admin users — per locked decision, do not reveal admin routes exist
+- Read env vars inside function body, not module level (Firebase Secrets timing, matches alertService pattern)
+- Fail closed: if no admin email configured, deny all access with logged warning
+
+**2. Add `getAnalyticsSummary()` to `backend/src/services/analyticsService.ts`:**
+
+Add below the existing `deleteProcessingEventsOlderThan` function. Use `getPostgresPool()` (from `../config/supabase`) for aggregate SQL — Supabase JS client does not support COUNT/AVG.
+
+```typescript
+import { getPostgresPool } from '../config/supabase';
+
+export interface AnalyticsSummary {
+  range: string;
+  totalUploads: number;
+  succeeded: number;
+  failed: number;
+  successRate: number;
+  avgProcessingMs: number | null;
+  generatedAt: string;
+}
+
+function parseRange(range: string): string {
+  if (/^\d+h$/.test(range)) return range.replace('h', ' hours');
+  if (/^\d+d$/.test(range)) return range.replace('d', ' days');
+  return '24 hours'; // fallback default
+}
+
+export async function getAnalyticsSummary(range: string = '24h'): Promise<AnalyticsSummary> {
+  const interval = parseRange(range);
+  const pool = getPostgresPool();
+
+  const { rows } = await pool.query<{
+    total_uploads: string;
+    succeeded: string;
+    failed: string;
+    avg_processing_ms: string | null;
+  }>(`
+    SELECT
+      COUNT(*) FILTER (WHERE event_type = 'upload_started')    AS total_uploads,
+      COUNT(*) FILTER (WHERE event_type = 'completed')         AS succeeded,
+      COUNT(*) FILTER (WHERE event_type = 'failed')            AS failed,
+      AVG(duration_ms) FILTER (WHERE event_type = 'completed') AS avg_processing_ms
+    FROM document_processing_events
+    WHERE created_at >= NOW() - $1::interval
+  `, [interval]);
+
+  const row = rows[0]!;
+  const total = parseInt(row.total_uploads, 10);
+  const succeeded = parseInt(row.succeeded, 10);
+  const failed = parseInt(row.failed, 10);
+
+  return {
+    range,
+    totalUploads: total,
+    succeeded,
+    failed,
+    successRate: total > 0 ? succeeded / total : 0,
+    avgProcessingMs: row.avg_processing_ms ? parseFloat(row.avg_processing_ms) : null,
+    generatedAt: new Date().toISOString(),
+  };
+}
+```
+
+Note: Use `$1::interval` cast for parameterized interval — PostgreSQL requires explicit cast for interval parameters.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit 2>&1 | head -30</automated>
+    <manual>Check that requireAdmin.ts exports requireAdminEmail and analyticsService.ts exports getAnalyticsSummary</manual>
+  </verify>
+  <done>requireAdminEmail middleware returns 404 for non-admin users and calls next() for admin. getAnalyticsSummary queries document_processing_events with configurable time range and returns structured summary.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create admin routes and mount in Express app</name>
+  <files>
+    backend/src/routes/admin.ts
+    backend/src/index.ts
+  </files>
+  <action>
+**1. Create `backend/src/routes/admin.ts`:**
+
+Follow the exact pattern from `routes/monitoring.ts`. Apply `verifyFirebaseToken` + `requireAdminEmail` + `addCorrelationId` as router-level middleware. Use the `{ success, data, correlationId }` envelope pattern.
+
+Service names MUST match what healthProbeService writes (confirmed from codebase): `'document_ai'`, `'llm_api'`, `'supabase'`, `'firebase_auth'`.
+
+**Endpoints:**
+
+**GET /health** — Returns latest health check for all four services.
+- Use `Promise.all(SERVICE_NAMES.map(name => HealthCheckModel.findLatestByService(name)))`.
+- For each result, map to `{ service, status, checkedAt, latencyMs, errorMessage }`. If `findLatestByService` returns null, use `status: 'unknown'`.
+- Return `{ success: true, data: [...], correlationId }`.
+
+**GET /analytics** — Returns processing summary.
+- Accept `?range=24h` query param (default: `'24h'`).
+- Validate range matches `/^\d+[hd]$/` — return 400 if invalid.
+- Call `getAnalyticsSummary(range)`.
+- Return `{ success: true, data: summary, correlationId }`.
+
+**GET /alerts** — Returns active alerts.
+- Call `AlertEventModel.findActive()` (no arguments = all active alerts).
+- Return `{ success: true, data: alerts, correlationId }`.
+
+**POST /alerts/:id/acknowledge** — Acknowledge an alert.
+- Call `AlertEventModel.acknowledge(req.params.id)`.
+- If error message includes 'not found', return 404.
+- Return `{ success: true, data: updatedAlert, correlationId }`.
+
+All error handlers follow the same pattern: `logger.error(...)` then `res.status(500).json({ success: false, error: 'Human-readable message', correlationId })`.
+
+**2. Mount in `backend/src/index.ts`:**
+
+Add import: `import adminRoutes from './routes/admin';`
+
+Add route registration alongside existing routes (after the `app.use('/api/audit', auditRoutes);` line):
+```typescript
+app.use('/admin', adminRoutes);
+```
+
+The `/admin` prefix is unique — no conflicts with existing routes.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit 2>&1 | head -30</automated>
+    <manual>Verify admin.ts has 4 route handlers, index.ts mounts /admin</manual>
+  </verify>
+  <done>Four admin endpoints (GET /health, GET /analytics, GET /alerts, POST /alerts/:id/acknowledge) are mounted behind Firebase Auth + admin email check. Non-admin users get 404. Response envelope matches existing codebase pattern.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npx tsc --noEmit` passes with no errors
+2. `backend/src/middleware/requireAdmin.ts` exists and exports `requireAdminEmail`
+3. `backend/src/routes/admin.ts` exists and exports default Router with 4 endpoints
+4. `backend/src/services/analyticsService.ts` exports `getAnalyticsSummary` and `AnalyticsSummary`
+5. `backend/src/index.ts` imports and mounts admin routes at `/admin`
+6. Admin routes use `verifyFirebaseToken` + `requireAdminEmail` middleware chain
+7. Service names in health endpoint match healthProbeService: `document_ai`, `llm_api`, `supabase`, `firebase_auth`
+</verification>
+
+<success_criteria>
+- TypeScript compiles without errors
+- All four admin endpoints defined with correct HTTP methods and paths
+- Admin auth middleware returns 404 for non-admin, next() for admin
+- Analytics summary uses getPostgresPool() for aggregate SQL, not Supabase JS client
+- Response envelope matches `{ success, data, correlationId }` pattern
+- No `console.log` — all logging via Winston logger
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/03-api-layer/03-01-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-01-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-01-SUMMARY.md
@@ -0,0 +1,115 @@
+---
+phase: 03-api-layer
+plan: 01
+subsystem: api
+tags: [express, firebase-auth, typescript, postgres, admin-routes]
+
+# Dependency graph
+requires:
+  - phase: 02-backend-services
+    provides: HealthCheckModel.findLatestByService, AlertEventModel.findActive/acknowledge, document_processing_events table
+  - phase: 01-data-foundation
+    provides: service_health_checks, alert_events, document_processing_events tables and models
+provides:
+  - requireAdminEmail middleware (404 for non-admin, next() for admin)
+  - getAnalyticsSummary() aggregate query function with configurable time range
+  - GET /admin/health — latest health check for all four monitored services
+  - GET /admin/analytics — processing summary with uploads/success/failure/avg-time
+  - GET /admin/alerts — active alert events
+  - POST /admin/alerts/:id/acknowledge — mark alert acknowledged
+affects:
+  - 04-frontend: consumes all four admin endpoints for admin dashboard
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - Admin routes use router-level middleware chain (addCorrelationId + verifyFirebaseToken + requireAdminEmail)
+    - requireAdminEmail reads env vars inside function body (Firebase Secrets timing)
+    - Fail-closed pattern: if no admin email configured, deny all with logged warning
+    - getPostgresPool() for aggregate SQL (Supabase JS client does not support COUNT/AVG)
+    - PostgreSQL parameterized interval with $1::interval explicit cast
+    - Response envelope { success, data, correlationId } on all admin endpoints
+
+key-files:
+  created:
+    - backend/src/middleware/requireAdmin.ts
+    - backend/src/routes/admin.ts
+  modified:
+    - backend/src/services/analyticsService.ts
+    - backend/src/index.ts
+
+key-decisions:
+  - "requireAdminEmail returns 404 (not 403) for non-admin users — does not reveal admin routes exist"
+  - "Env vars read inside function body, not module level — Firebase Secrets not available at module load time"
+  - "getPostgresPool() used for aggregate SQL (COUNT/AVG) — Supabase JS client does not support these operations"
+  - "Service names in health endpoint hardcoded to match healthProbeService: document_ai, llm_api, supabase, firebase_auth"
+
+patterns-established:
+  - "Admin auth chain: addCorrelationId → verifyFirebaseToken → requireAdminEmail (router-level, applied once)"
+  - "Admin 404 pattern: non-admin users and unconfigured admin get 404 to obscure admin surface"
+
+requirements-completed: [INFR-02, HLTH-01, ANLY-02]
+
+# Metrics
+duration: 8min
+completed: 2026-02-24
+---
+
+# Phase 3 Plan 01: Admin API Endpoints Summary
+
+**Four Firebase-auth-protected admin endpoints exposing health status, alert management, and processing analytics via requireAdminEmail middleware returning 404 for non-admin**
+
+## Performance
+
+- **Duration:** 8 min
+- **Started:** 2026-02-24T05:03:00Z
+- **Completed:** 2026-02-24T05:11:00Z
+- **Tasks:** 2
+- **Files modified:** 4
+
+## Accomplishments
+- requireAdminEmail middleware correctly fails closed (404) for non-admin users and unconfigured environments
+- getAnalyticsSummary() uses getPostgresPool() for aggregate SQL with parameterized interval cast
+- All four admin endpoints mounted at /admin with router-level auth chain
+- TypeScript compiles without errors across the entire backend
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create requireAdmin middleware and getAnalyticsSummary function** - `301d0bf` (feat)
+2. **Task 2: Create admin routes and mount in Express app** - `4169a37` (feat)
+
+**Plan metadata:** (docs commit pending)
+
+## Files Created/Modified
+- `backend/src/middleware/requireAdmin.ts` - Admin email check middleware returning 404 for non-admin
+- `backend/src/routes/admin.ts` - Admin router with health, analytics, alerts endpoints (4 handlers)
+- `backend/src/services/analyticsService.ts` - Added AnalyticsSummary interface and getAnalyticsSummary()
+- `backend/src/index.ts` - Added adminRoutes import and app.use('/admin', adminRoutes) mount
+
+## Decisions Made
+- requireAdminEmail returns 404 (not 403) — per locked decision, do not reveal admin routes exist
+- Env vars read inside function body, not at module level — Firebase Secrets timing constraint (matches alertService.ts pattern)
+- getPostgresPool() for aggregate SQL because Supabase JS client does not support COUNT/AVG filters
+- Service names ['document_ai', 'llm_api', 'supabase', 'firebase_auth'] hardcoded to match healthProbeService output exactly
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None.
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- All four admin endpoints are ready for Phase 4 (frontend) consumption
+- Admin dashboard can call GET /admin/health, GET /admin/analytics, GET /admin/alerts, POST /admin/alerts/:id/acknowledge
+- No blockers — TypeScript clean, response envelopes consistent with codebase patterns
+
+---
+*Phase: 03-api-layer*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-02-PLAN.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-02-PLAN.md
@@ -0,0 +1,149 @@
+---
+phase: 03-api-layer
+plan: 02
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - backend/src/services/jobProcessorService.ts
+autonomous: true
+requirements:
+  - ANLY-02
+
+must_haves:
+  truths:
+    - "Document processing emits upload_started event after job is marked as processing"
+    - "Document processing emits completed event with duration_ms after job succeeds"
+    - "Document processing emits failed event with duration_ms and error_message when job fails"
+    - "Analytics instrumentation does not change existing processing behavior or error handling"
+  artifacts:
+    - path: "backend/src/services/jobProcessorService.ts"
+      provides: "Analytics instrumentation at 3 lifecycle points in processJob()"
+      contains: "recordProcessingEvent"
+  key_links:
+    - from: "backend/src/services/jobProcessorService.ts"
+      to: "backend/src/services/analyticsService.ts"
+      via: "import and call recordProcessingEvent()"
+      pattern: "recordProcessingEvent"
+---
+
+<objective>
+Instrument the document processing pipeline with fire-and-forget analytics events at key lifecycle points.
+
+Purpose: Enables the analytics endpoint (Plan 03-01) to report real processing data. Without instrumentation, `document_processing_events` table stays empty and `GET /admin/analytics` returns zeros.
+
+Output: Three `recordProcessingEvent()` calls in `jobProcessorService.processJob()` — one at job start, one at completion, one at failure.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/03-api-layer/03-RESEARCH.md
+
+@backend/src/services/jobProcessorService.ts
+@backend/src/services/analyticsService.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add analytics instrumentation to processJob lifecycle</name>
+  <files>
+    backend/src/services/jobProcessorService.ts
+  </files>
+  <action>
+**1. Add import at top of file:**
+
+```typescript
+import { recordProcessingEvent } from './analyticsService';
+```
+
+**2. Emit `upload_started` after `markAsProcessing` (line ~133):**
+
+After `await ProcessingJobModel.markAsProcessing(jobId);` and `jobStatusUpdated = true;`, add:
+
+```typescript
+// Analytics: job processing started (fire-and-forget, void return)
+recordProcessingEvent({
+  document_id: job.document_id,
+  user_id: job.user_id,
+  event_type: 'upload_started',
+});
+```
+
+Place this BEFORE the timeout setup block (before `const processingTimeout = ...`).
+
+**3. Emit `completed` after `markAsCompleted` (line ~329):**
+
+After `const processingTime = Date.now() - startTime;` and the `logger.info('Job completed successfully', ...)` call, add:
+
+```typescript
+// Analytics: job completed (fire-and-forget, void return)
+recordProcessingEvent({
+  document_id: job.document_id,
+  user_id: job.user_id,
+  event_type: 'completed',
+  duration_ms: processingTime,
+});
+```
+
+**4. Emit `failed` in catch block (line ~355-368):**
+
+After `const processingTime = Date.now() - startTime;` and `logger.error('Job processing failed', ...)`, but BEFORE the `try { await ProcessingJobModel.markAsFailed(...)` block, add:
+
+```typescript
+// Analytics: job failed (fire-and-forget, void return)
+// Guard with job check — job is null if findById failed before assignment
+if (job) {
+  recordProcessingEvent({
+    document_id: job.document_id,
+    user_id: job.user_id,
+    event_type: 'failed',
+    duration_ms: processingTime,
+    error_message: errorMessage,
+  });
+}
+```
+
+**Critical constraints:**
+- `recordProcessingEvent` returns `void` (not `Promise<void>`) — do NOT use `await`. This is the fire-and-forget guarantee (PITFALL-6, STATE.md decision).
+- Do NOT wrap in try/catch — the function internally catches all errors and logs them.
+- Do NOT modify any existing code around the instrumentation points — add lines, don't change lines.
+- Guard `job` in catch block — it can be null if `findById` threw before assignment.
+- Use `event_type: 'upload_started'` (not `'processing_started'`) — per locked decision, key milestones only: upload started, processing complete, processing failed.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit 2>&1 | head -30 && npx vitest run --reporter=verbose 2>&1 | tail -20</automated>
+    <manual>Verify 3 recordProcessingEvent calls exist in jobProcessorService.ts, none use await</manual>
+  </verify>
+  <done>processJob() emits upload_started after markAsProcessing, completed with duration after markAsCompleted, and failed with duration+error in catch block. All calls are fire-and-forget (no await). Existing processing logic unchanged — no behavior modification.</done>
+</task>
+
+</tasks>
+
+<verification>
+1. `npx tsc --noEmit` passes with no errors
+2. `npx vitest run` — all existing tests pass (no regressions)
+3. `grep -c 'recordProcessingEvent' backend/src/services/jobProcessorService.ts` returns 3
+4. `grep 'await recordProcessingEvent' backend/src/services/jobProcessorService.ts` returns nothing (no accidental await)
+5. `recordProcessingEvent` import exists at top of file
+</verification>
+
+<success_criteria>
+- TypeScript compiles without errors
+- All existing tests pass (zero regressions)
+- Three recordProcessingEvent calls at correct lifecycle points
+- No await on recordProcessingEvent (fire-and-forget preserved)
+- job null-guard in catch block prevents runtime errors
+- No changes to existing processing logic
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/03-api-layer/03-02-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-02-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-02-SUMMARY.md
@@ -0,0 +1,109 @@
+---
+phase: 03-api-layer
+plan: 02
+subsystem: api
+tags: [analytics, instrumentation, fire-and-forget, document-processing]
+
+# Dependency graph
+requires:
+  - phase: 02-backend-services
+    provides: analyticsService.recordProcessingEvent() fire-and-forget function
+  - phase: 03-api-layer/03-01
+    provides: analytics endpoint that reads document_processing_events table
+provides:
+  - Analytics instrumentation at 3 lifecycle points in processJob()
+  - document_processing_events table populated with real processing data
+affects: [03-api-layer, 03-01-analytics-endpoint]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - Fire-and-forget analytics calls (void return, no await) in processJob lifecycle
+
+key-files:
+  created: []
+  modified:
+    - backend/src/services/jobProcessorService.ts
+
+key-decisions:
+  - "All three recordProcessingEvent() calls are void/fire-and-forget (no await) — PITFALL-6 compliance confirmed"
+  - "upload_started event emitted after markAsProcessing (not processing_started) per locked decision"
+  - "Null-guard on job in catch block — job can be null if findById throws before assignment"
+
+patterns-established:
+  - "Analytics instrumentation pattern: call recordProcessingEvent() without await, no try/catch wrapper — function handles errors internally"
+
+requirements-completed: [ANLY-02]
+
+# Metrics
+duration: 2min
+completed: 2026-02-24
+---
+
+# Phase 3 Plan 02: Analytics Instrumentation Summary
+
+**Three fire-and-forget recordProcessingEvent() calls added to processJob() at upload_started, completed, and failed lifecycle points**
+
+## Performance
+
+- **Duration:** 2 min
+- **Started:** 2026-02-24T20:42:54Z
+- **Completed:** 2026-02-24T20:44:36Z
+- **Tasks:** 1
+- **Files modified:** 1
+
+## Accomplishments
+
+- Added import for `recordProcessingEvent` from analyticsService at top of jobProcessorService.ts
+- Emits `upload_started` event (fire-and-forget) after `markAsProcessing` at job start
+- Emits `completed` event with `duration_ms` (fire-and-forget) after `markAsCompleted` on success
+- Emits `failed` event with `duration_ms` and `error_message` (fire-and-forget) in catch block with null-guard
+- Zero regressions — all 64 existing tests pass, TypeScript compiles cleanly
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Add analytics instrumentation to processJob lifecycle** - `dabd4a5` (feat)
+
+**Plan metadata:** (docs commit follows)
+
+## Files Created/Modified
+
+- `backend/src/services/jobProcessorService.ts` - Added import and 3 recordProcessingEvent() instrumentation calls at job start, completion, and failure
+
+## Decisions Made
+
+- Confirmed `event_type: 'upload_started'` (not `'processing_started'`) matches the locked analytics schema decision
+- No await on any recordProcessingEvent() call — void return type enforces fire-and-forget at the type system level
+- Null-guard `if (job)` in catch block is necessary because `job` remains `null` if `ProcessingJobModel.findById()` throws before assignment
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+
+None.
+
+## User Setup Required
+
+None - no external service configuration required.
+
+## Next Phase Readiness
+
+- Analytics pipeline is now end-to-end: document_processing_events table receives real data when jobs run
+- GET /admin/analytics endpoint (03-01) will report actual processing metrics instead of zeros
+- No blockers for remaining Phase 03 plans
+
+## Self-Check: PASSED
+
+- FOUND: backend/src/services/jobProcessorService.ts
+- FOUND: .planning/phases/03-api-layer/03-02-SUMMARY.md
+- FOUND commit: dabd4a5 (feat: analytics instrumentation)
+- FOUND commit: 081c535 (docs: plan metadata)
+
+---
+*Phase: 03-api-layer*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-CONTEXT.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-CONTEXT.md
@@ -0,0 +1,61 @@
+# Phase 3: API Layer - Context
+
+**Gathered:** 2026-02-24
+**Status:** Ready for planning
+
+<domain>
+## Phase Boundary
+
+Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics. Existing service processors (jobProcessorService, llmService) emit analytics events at stage transitions without changing processing behavior. The admin dashboard UI that consumes these endpoints is a separate phase.
+
+</domain>
+
+<decisions>
+## Implementation Decisions
+
+### Response shape & contracts
+- Analytics endpoint accepts a configurable time range via query param (e.g., `?range=24h`, `?range=7d`) with a sensible default
+- Field naming convention: match whatever the existing codebase already uses (camelCase or snake_case) — stay consistent
+
+### Auth & error behavior
+- Non-admin users receive 404 on admin endpoints — do not reveal that admin routes exist
+- Unauthenticated requests: Claude decides whether to return 401 or same 404 based on existing auth middleware patterns
+
+### Analytics instrumentation
+- Best-effort with logging: emit events asynchronously, log failures, but never let instrumentation errors propagate to processing
+- Key milestones only — upload started, processing complete, processing failed (not every pipeline stage)
+- Include duration/timing data per event — enables avg processing time metric in the analytics endpoint
+
+### Endpoint conventions
+- Route prefix: match existing Express app patterns
+- Acknowledge semantics: Claude decides (one-way, toggle, or with note — whatever fits best)
+
+### Claude's Discretion
+- Envelope pattern vs direct data for API responses
+- Health endpoint detail level (flat status vs nested with last-check times)
+- Admin role mechanism (Firebase custom claims vs Supabase role check vs other)
+- Unauthenticated request handling (401 vs 404)
+- Alert pagination strategy
+- Alert filtering support
+- Rate limiting on admin endpoints
+
+</decisions>
+
+<specifics>
+## Specific Ideas
+
+No specific requirements — open to standard approaches. User trusts Claude to make sensible implementation choices across most areas, with the explicit constraint that admin endpoints must be invisible (404) to non-admin users.
+
+</specifics>
+
+<deferred>
+## Deferred Ideas
+
+None — discussion stayed within phase scope
+
+</deferred>
+
+---
+
+*Phase: 03-api-layer*
+*Context gathered: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-RESEARCH.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-RESEARCH.md
@@ -0,0 +1,550 @@
+# Phase 3: API Layer - Research
+
+**Researched:** 2026-02-24
+**Domain:** Express.js admin route construction, Firebase Auth middleware, Supabase analytics queries
+**Confidence:** HIGH
+
+---
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+
+**Response shape & contracts**
+- Analytics endpoint accepts a configurable time range via query param (e.g., `?range=24h`, `?range=7d`) with a sensible default
+- Field naming convention: match whatever the existing codebase already uses (camelCase or snake_case) — stay consistent
+
+**Auth & error behavior**
+- Non-admin users receive 404 on admin endpoints — do not reveal that admin routes exist
+- Unauthenticated requests: Claude decides whether to return 401 or same 404 based on existing auth middleware patterns
+
+**Analytics instrumentation**
+- Best-effort with logging: emit events asynchronously, log failures, but never let instrumentation errors propagate to processing
+- Key milestones only — upload started, processing complete, processing failed (not every pipeline stage)
+- Include duration/timing data per event — enables avg processing time metric in the analytics endpoint
+
+**Endpoint conventions**
+- Route prefix: match existing Express app patterns
+- Acknowledge semantics: Claude decides (one-way, toggle, or with note — whatever fits best)
+
+### Claude's Discretion
+- Envelope pattern vs direct data for API responses
+- Health endpoint detail level (flat status vs nested with last-check times)
+- Admin role mechanism (Firebase custom claims vs Supabase role check vs other)
+- Unauthenticated request handling (401 vs 404)
+- Alert pagination strategy
+- Alert filtering support
+- Rate limiting on admin endpoints
+
+### Deferred Ideas (OUT OF SCOPE)
+
+None — discussion stayed within phase scope
+</user_constraints>
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| INFR-02 | Admin API routes protected by Firebase Auth with admin email check | Firebase Auth `verifyFirebaseToken` middleware exists; need `requireAdmin` layer that checks `req.user.email` against `process.env.EMAIL_WEEKLY_RECIPIENT` (already configured for alerts) or a dedicated `ADMIN_EMAIL` env var |
+| HLTH-01 | Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth | `HealthCheckModel.findLatestByService()` already exists; need a query across all four service names or a loop; service names must match what `healthProbeService` writes |
+| ANLY-02 | Admin can view processing summary: upload counts, success/failure rates, avg processing time | `document_processing_events` table exists with `event_type`, `duration_ms`, `created_at`; need a Supabase aggregation query grouped by `event_type` over a time window; `recordProcessingEvent()` must be called from `jobProcessorService.processJob()` (not yet called there) |
+</phase_requirements>
+
+---
+
+## Summary
+
+Phase 3 is entirely additive — it exposes data from Phase 1 and Phase 2 via admin-protected HTTP endpoints, and instruments the existing `jobProcessorService.processJob()` method with fire-and-forget analytics calls. No database schema changes are needed; all tables and models exist.
+
+The three technical sub-problems are: (1) a two-layer auth middleware — Firebase token verification (existing `verifyFirebaseToken`) plus an admin email check (new, 5-10 lines); (2) three new route handlers reading from `HealthCheckModel`, `AlertEventModel`, and a new `getAnalyticsSummary()` function in `analyticsService`; and (3) inserting `recordProcessingEvent()` calls at three points inside `processJob()` without altering success/failure semantics.
+
+The codebase is well-factored and consistent: route files live in `backend/src/routes/`, middleware in `backend/src/middleware/`, service functions in `backend/src/services/`. The existing `verifyFirebaseToken` middleware plus a new `requireAdminEmail` middleware compose cleanly onto the new `/admin` router. The existing `{ success: true, data: ..., correlationId: ... }` envelope is the established pattern and should be followed.
+
+**Primary recommendation:** Add `adminRoutes.ts` to the existing routes directory, mount it at `/admin` in `index.ts`, compose `verifyFirebaseToken` + `requireAdminEmail` as router-level middleware, and wire three handlers to existing model/service methods. Instrument `processJob()` at job-start, completion, and failure using the existing `recordProcessingEvent()` signature.
+
+---
+
+## Standard Stack
+
+### Core
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| express | already in use | Router, Request/Response types | Project standard |
+| firebase-admin | already in use | Token verification (`verifyIdToken`) | Existing auth layer |
+| @supabase/supabase-js | already in use | Database reads via `getSupabaseServiceClient()` | Project data layer |
+
+### Supporting
+
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| (none new) | — | All needed libraries already present | No new npm installs required |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| Email-based admin check | Firebase custom claims | Custom claims require Firebase Admin SDK `setCustomUserClaims()` call — more setup; email check works with zero additional config since `EMAIL_WEEKLY_RECIPIENT` is already defined |
+| Email-based admin check | Supabase role column | Cross-system lookup adds latency and a new dependency; email check is synchronous against the already-decoded token |
+
+**Installation:** No new packages needed.
+
+---
+
+## Architecture Patterns
+
+### Recommended Project Structure
+
+```
+backend/src/
+├── routes/
+│   ├── admin.ts           # NEW — /admin router with health, analytics, alerts endpoints
+│   ├── documents.ts       # existing
+│   ├── monitoring.ts      # existing
+│   └── ...
+├── middleware/
+│   ├── firebaseAuth.ts    # existing — verifyFirebaseToken
+│   ├── requireAdmin.ts    # NEW — requireAdminEmail middleware (10-15 lines)
+│   └── ...
+├── services/
+│   ├── analyticsService.ts  # extend — add getAnalyticsSummary() query function
+│   ├── jobProcessorService.ts  # modify — add recordProcessingEvent() calls
+│   └── ...
+└── index.ts               # modify — mount /admin routes
+```
+
+### Pattern 1: Two-Layer Admin Auth Middleware
+
+**What:** `verifyFirebaseToken` handles token signature + expiry; `requireAdminEmail` checks that `req.user.email` equals the configured admin email. Admin routes apply both in sequence.
+
+**When to use:** All `/admin/*` routes.
+
+**Example:**
+```typescript
+// backend/src/middleware/requireAdmin.ts
+import { Response, NextFunction } from 'express';
+import { FirebaseAuthenticatedRequest } from './firebaseAuth';
+import { logger } from '../utils/logger';
+
+const ADMIN_EMAIL = process.env['ADMIN_EMAIL'] ?? process.env['EMAIL_WEEKLY_RECIPIENT'];
+
+export function requireAdminEmail(
+  req: FirebaseAuthenticatedRequest,
+  res: Response,
+  next: NextFunction
+): void {
+  const userEmail = req.user?.email;
+
+  if (!userEmail || userEmail !== ADMIN_EMAIL) {
+    // 404 — do not reveal admin routes exist (per locked decision)
+    logger.warn('requireAdminEmail: access denied', {
+      uid: req.user?.uid ?? 'unauthenticated',
+      email: userEmail ?? 'none',
+      path: req.path,
+    });
+    res.status(404).json({ error: 'Not found' });
+    return;
+  }
+
+  next();
+}
+```
+
+**Unauthenticated handling:** `verifyFirebaseToken` already returns 401 for missing/invalid tokens. Since it runs first, unauthenticated requests never reach `requireAdminEmail`. The 404 behavior (hiding admin routes) only applies to authenticated non-admin users — this is consistent with the existing middleware chain. No change needed to `verifyFirebaseToken`.
+
+### Pattern 2: Admin Router Construction
+
+**What:** A dedicated Express Router with both middleware applied at router level, then individual route handlers.
+
+**When to use:** All admin endpoints.
+
+**Example:**
+```typescript
+// backend/src/routes/admin.ts
+import { Router, Request, Response } from 'express';
+import { verifyFirebaseToken } from '../middleware/firebaseAuth';
+import { requireAdminEmail } from '../middleware/requireAdmin';
+import { addCorrelationId } from '../middleware/validation';
+import { HealthCheckModel } from '../models/HealthCheckModel';
+import { AlertEventModel } from '../models/AlertEventModel';
+import { getAnalyticsSummary } from '../services/analyticsService';
+import { logger } from '../utils/logger';
+
+const router = Router();
+
+// Auth chain: verify Firebase token, then assert admin email
+router.use(verifyFirebaseToken);
+router.use(requireAdminEmail);
+router.use(addCorrelationId);
+
+const SERVICE_NAMES = ['document_ai', 'llm', 'supabase', 'firebase_auth'] as const;
+
+router.get('/health', async (req: Request, res: Response): Promise<void> => {
+  try {
+    const results = await Promise.all(
+      SERVICE_NAMES.map(name => HealthCheckModel.findLatestByService(name))
+    );
+    const health = SERVICE_NAMES.map((name, i) => ({
+      service: name,
+      status: results[i]?.status ?? 'unknown',
+      checkedAt: results[i]?.checked_at ?? null,
+      latencyMs: results[i]?.latency_ms ?? null,
+      errorMessage: results[i]?.error_message ?? null,
+    }));
+    res.json({ success: true, data: health, correlationId: req.correlationId });
+  } catch (error) {
+    logger.error('GET /admin/health failed', { error, correlationId: req.correlationId });
+    res.status(500).json({ success: false, error: 'Health query failed', correlationId: req.correlationId });
+  }
+});
+```
+
+### Pattern 3: Analytics Summary Query
+
+**What:** A new `getAnalyticsSummary(range: string)` function in `analyticsService.ts` that queries `document_processing_events` aggregated over a time window. Supabase JS client does not support `COUNT`/`AVG` aggregations directly — use the Postgres pool (`getPostgresPool().query()`) for aggregate SQL, consistent with how `runRetentionCleanup` and the scheduled function's health check already use the pool.
+
+**When to use:** `GET /admin/analytics?range=24h`
+
+**Range parsing:** `24h` → `24 hours`, `7d` → `7 days`. Default: `24h`.
+
+**Example:**
+```typescript
+// backend/src/services/analyticsService.ts (addition)
+import { getPostgresPool } from '../config/supabase';
+
+export interface AnalyticsSummary {
+  range: string;
+  totalUploads: number;
+  succeeded: number;
+  failed: number;
+  successRate: number;
+  avgProcessingMs: number | null;
+  generatedAt: string;
+}
+
+export async function getAnalyticsSummary(range: string): Promise<AnalyticsSummary> {
+  const interval = parseRange(range); // '24h' -> '24 hours', '7d' -> '7 days'
+  const pool = getPostgresPool();
+
+  const { rows } = await pool.query<{
+    total_uploads: string;
+    succeeded: string;
+    failed: string;
+    avg_processing_ms: string | null;
+  }>(`
+    SELECT
+      COUNT(*) FILTER (WHERE event_type = 'upload_started')    AS total_uploads,
+      COUNT(*) FILTER (WHERE event_type = 'completed')         AS succeeded,
+      COUNT(*) FILTER (WHERE event_type = 'failed')            AS failed,
+      AVG(duration_ms) FILTER (WHERE event_type = 'completed') AS avg_processing_ms
+    FROM document_processing_events
+    WHERE created_at >= NOW() - INTERVAL $1
+  `, [interval]);
+
+  const row = rows[0]!;
+  const total = parseInt(row.total_uploads, 10);
+  const succeeded = parseInt(row.succeeded, 10);
+  const failed = parseInt(row.failed, 10);
+
+  return {
+    range,
+    totalUploads: total,
+    succeeded,
+    failed,
+    successRate: total > 0 ? succeeded / total : 0,
+    avgProcessingMs: row.avg_processing_ms ? parseFloat(row.avg_processing_ms) : null,
+    generatedAt: new Date().toISOString(),
+  };
+}
+
+function parseRange(range: string): string {
+  if (/^\d+h$/.test(range)) return range.replace('h', ' hours');
+  if (/^\d+d$/.test(range)) return range.replace('d', ' days');
+  return '24 hours'; // fallback
+}
+```
+
+### Pattern 4: Analytics Instrumentation in jobProcessorService
+
+**What:** Three `recordProcessingEvent()` calls in `processJob()` at existing lifecycle points. The function signature already matches — `document_id`, `user_id`, `event_type`, optional `duration_ms` and `error_message`. The return type is `void` (not `Promise<void>`) so no `await` is possible.
+
+**Key instrumentation points:**
+1. After `ProcessingJobModel.markAsProcessing(jobId)` — emit `upload_started` (no duration)
+2. After `ProcessingJobModel.markAsCompleted(...)` — emit `completed` with `duration_ms = Date.now() - startTime`
+3. In the catch block before `ProcessingJobModel.markAsFailed(...)` — emit `failed` with `duration_ms` and `error_message`
+
+**Example:**
+```typescript
+// In processJob(), after markAsProcessing:
+recordProcessingEvent({
+  document_id: job.document_id,
+  user_id: job.user_id,
+  event_type: 'upload_started',
+});
+
+// After markAsCompleted:
+recordProcessingEvent({
+  document_id: job.document_id,
+  user_id: job.user_id,
+  event_type: 'completed',
+  duration_ms: Date.now() - startTime,
+});
+
+// In catch, before markAsFailed:
+recordProcessingEvent({
+  document_id: job.document_id,
+  user_id: job.user_id ?? '',
+  event_type: 'failed',
+  duration_ms: Date.now() - startTime,
+  error_message: errorMessage,
+});
+```
+
+**Constraint:** `job` may be null in the catch block if `findById` failed. Guard with `job?.document_id` or skip instrumentation when `job` is null (it's already handled by the early return in that case).
+
+### Pattern 5: Alert Acknowledge Semantics
+
+**Decision:** One-way acknowledge (active → acknowledged). `AlertEventModel.acknowledge(id)` already implements exactly this. No toggle, no note field. The endpoint returns the updated alert object.
+
+```typescript
+router.post('/alerts/:id/acknowledge', async (req: Request, res: Response): Promise<void> => {
+  const { id } = req.params;
+  try {
+    const updated = await AlertEventModel.acknowledge(id);
+    res.json({ success: true, data: updated, correlationId: req.correlationId });
+  } catch (error) {
+    // AlertEventModel.acknowledge throws a specific error when id not found
+    const msg = error instanceof Error ? error.message : String(error);
+    if (msg.includes('not found')) {
+      res.status(404).json({ success: false, error: 'Alert not found', correlationId: req.correlationId });
+      return;
+    }
+    logger.error('POST /admin/alerts/:id/acknowledge failed', { id, error: msg });
+    res.status(500).json({ success: false, error: 'Acknowledge failed', correlationId: req.correlationId });
+  }
+});
+```
+
+### Anti-Patterns to Avoid
+
+- **Awaiting `recordProcessingEvent()`:** Its return type is `void`, not `Promise<void>`. Calling `await recordProcessingEvent(...)` is a TypeScript error and would break the fire-and-forget guarantee.
+- **Supabase JS `.select()` for aggregates:** Supabase JS client does not support SQL aggregate functions (`COUNT`, `AVG`). Use `getPostgresPool().query()` for analytics queries.
+- **Caching admin email at module level:** Firebase Secrets are not available at module load time. Read `process.env['ADMIN_EMAIL']` inside the middleware function, not at the top of the file — or use lazy evaluation. The alertService precedent (creating transporter inside function scope) demonstrates this pattern.
+- **Revealing admin routes to non-admin users:** Never return 403 on admin routes — always return 404 to unauthenticated/non-admin callers (per locked decision). Since `verifyFirebaseToken` runs first and returns 401 for unauthenticated requests, unauthenticated callers get 401 (expected, token verification precedes admin check). Authenticated non-admin callers get 404.
+- **Mutating existing `processJob()` logic:** Analytics calls go around existing `markAsProcessing`, `markAsCompleted`, `markAsFailed` calls — never replacing or wrapping them.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Token verification | Custom JWT validation | `verifyFirebaseToken` (already exists) | Handles expiry, revocation, recovery from session |
+| Health data retrieval | Raw SQL or in-memory aggregation | `HealthCheckModel.findLatestByService()` (already exists) | Validated input, proper error handling, same pattern as Phase 2 |
+| Alert CRUD | New Supabase queries | `AlertEventModel.findActive()`, `AlertEventModel.acknowledge()` (already exist) | Consistent error handling, deduplication-aware |
+| Correlation IDs | Custom header logic | `addCorrelationId` middleware (already exists) | Applied at router level like other route files |
+
+**Key insight:** Phase 3 is primarily composition, not construction. Nearly all data access is through existing models. The only new code is the admin router, the admin email middleware, the `getAnalyticsSummary()` function, and three `recordProcessingEvent()` call sites.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Admin Email Source
+**What goes wrong:** `ADMIN_EMAIL` env var is not defined; admin check silently passes (if the check is `email === undefined`) or silently blocks all admin access.
+**Why it happens:** The codebase uses `EMAIL_WEEKLY_RECIPIENT` for the alert recipient — there is no `ADMIN_EMAIL` variable yet. If `ADMIN_EMAIL` is not set and the check falls back to `undefined`, `email !== undefined` would always be true (blocking all) or the inverse.
+**How to avoid:** Read `ADMIN_EMAIL ?? EMAIL_WEEKLY_RECIPIENT` as fallback. Log a `logger.warn` at startup/first call if neither is defined. If neither is set, fail closed (deny all admin access) with a logged warning.
+**Warning signs:** Admin endpoints return 404 even when authenticated with the correct email.
+
+### Pitfall 2: Service Name Mismatch on Health Endpoint
+**What goes wrong:** `GET /admin/health` returns `status: null` / `checkedAt: null` for all services because the service names in the query don't match what `healthProbeService` writes.
+**Why it happens:** `HealthCheckModel.findLatestByService(serviceName)` does an exact string match. If the route handler uses `'document-ai'` but the probe writes `'document_ai'`, the join finds nothing.
+**How to avoid:** Read `healthProbeService.ts` to confirm the exact service name strings used in `HealthCheckResult` / passed to `HealthCheckModel.create()`. Use those exact strings in the admin route.
+**Warning signs:** Response data has `status: 'unknown'` for all services.
+
+### Pitfall 3: `job.user_id` Type in Analytics Instrumentation
+**What goes wrong:** TypeScript error or runtime `undefined` when emitting `recordProcessingEvent` in the catch block.
+**Why it happens:** `job` can be `null` if `ProcessingJobModel.findById()` threw before `job` was assigned. The catch block handles all errors, including the pre-assignment path.
+**How to avoid:** Guard instrumentation with `if (job)` in the catch block. `ProcessingEventData.user_id` is typed as `string`, so pass `job.user_id` only when `job` is non-null.
+**Warning signs:** TypeScript compile error on `job.user_id` in catch block.
+
+### Pitfall 4: `getPostgresPool()` vs `getSupabaseServiceClient()` for Aggregates
+**What goes wrong:** Using `getSupabaseServiceClient().from('document_processing_events').select(...)` for the analytics summary and getting back raw rows instead of aggregated counts.
+**Why it happens:** Supabase JS PostgREST client does not support SQL aggregate functions in the query builder.
+**How to avoid:** Use `getPostgresPool().query(sql, params)` for the analytics aggregate query, consistent with how `processDocumentJobs` scheduled function performs its DB health check and how `cleanupOldData` runs bulk deletes.
+**Warning signs:** `getAnalyticsSummary` returns row-level data instead of aggregated counts.
+
+### Pitfall 5: Route Registration Order in index.ts
+**What goes wrong:** Admin routes conflict with or shadow existing routes.
+**Why it happens:** Express matches routes in registration order. Registering `/admin` before `/documents` is fine as long as there are no overlapping paths.
+**How to avoid:** Add `app.use('/admin', adminRoutes)` alongside the existing route registrations. The `/admin` prefix is unique — no conflicts expected.
+**Warning signs:** Existing document/monitoring routes stop working after adding admin routes.
+
+---
+
+## Code Examples
+
+Verified patterns from the existing codebase:
+
+### Existing Route File Pattern (from routes/monitoring.ts)
+```typescript
+// Source: backend/src/routes/monitoring.ts
+import { Router, Request, Response } from 'express';
+import { addCorrelationId } from '../middleware/validation';
+import { logger } from '../utils/logger';
+
+const router = Router();
+router.use(addCorrelationId);
+
+router.get('/some-endpoint', async (req: Request, res: Response): Promise<void> => {
+  try {
+    // ... data access
+    res.json({
+      success: true,
+      data: someData,
+      correlationId: req.correlationId || undefined,
+    });
+  } catch (error) {
+    logger.error('Failed', {
+      category: 'monitoring',
+      operation: 'some_op',
+      error: error instanceof Error ? error.message : 'Unknown error',
+      correlationId: req.correlationId || undefined,
+    });
+    res.status(500).json({
+      success: false,
+      error: 'Failed to retrieve data',
+      correlationId: req.correlationId || undefined,
+    });
+  }
+});
+
+export default router;
+```
+
+### Existing Middleware Pattern (from middleware/firebaseAuth.ts)
+```typescript
+// Source: backend/src/middleware/firebaseAuth.ts
+export interface FirebaseAuthenticatedRequest extends Request {
+  user?: admin.auth.DecodedIdToken;
+}
+
+export const verifyFirebaseToken = async (
+  req: FirebaseAuthenticatedRequest,
+  res: Response,
+  next: NextFunction
+): Promise<void> => {
+  // ... verifies token, sets req.user, calls next() or returns 401
+};
+```
+
+### Existing Model Pattern (from models/HealthCheckModel.ts)
+```typescript
+// Source: backend/src/models/HealthCheckModel.ts
+static async findLatestByService(serviceName: string): Promise<ServiceHealthCheck | null> {
+  const supabase = getSupabaseServiceClient();
+  const { data, error } = await supabase
+    .from('service_health_checks')
+    .select('*')
+    .eq('service_name', serviceName)
+    .order('checked_at', { ascending: false })
+    .limit(1)
+    .single();
+  if (error?.code === 'PGRST116') return null;
+  // ...
+}
+```
+
+### Existing Analytics Record Pattern (from services/analyticsService.ts)
+```typescript
+// Source: backend/src/services/analyticsService.ts
+// Return type is void (NOT Promise<void>) — prevents accidental await on critical path
+export function recordProcessingEvent(data: ProcessingEventData): void {
+  const supabase = getSupabaseServiceClient();
+  void supabase
+    .from('document_processing_events')
+    .insert({ ... })
+    .then(({ error }) => {
+      if (error) logger.error('analyticsService: failed to insert processing event', { ... });
+    });
+}
+```
+
+### Route Registration Pattern (from index.ts)
+```typescript
+// Source: backend/src/index.ts
+app.use('/documents', documentRoutes);
+app.use('/vector', vectorRoutes);
+app.use('/monitoring', monitoringRoutes);
+app.use('/api/audit', auditRoutes);
+// New:
+app.use('/admin', adminRoutes);
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Legacy auth middleware (auth.ts) | Firebase Auth (firebaseAuth.ts) | Pre-Phase 3 | `auth.ts` is fully deprecated and returns 501 — do not use it |
+| In-memory monitoring (uploadMonitoringService) | Supabase-persisted health checks and analytics | Phase 1-2 | Admin endpoints must read from Supabase, not in-memory state |
+| Direct `console.log` | Winston logger (`logger` from `utils/logger.ts`) | Pre-Phase 3 | Always use `logger.info/warn/error/debug` |
+
+**Deprecated/outdated:**
+- `backend/src/middleware/auth.ts`: All exports (`authenticateToken`, `requireAdmin`, `requireRole`) return 501. Do not import. Use `firebaseAuth.ts`.
+- `uploadMonitoringService`: In-memory service. Not suitable for admin health dashboard — data does not survive cold starts.
+
+---
+
+## Open Questions
+
+1. **Exact service name strings written by healthProbeService**
+   - What we know: The service names come from whatever `healthProbeService.ts` passes to `HealthCheckModel.create({ service_name: ... })`
+   - What's unclear: The exact strings — likely `'document_ai'`, `'llm'`, `'supabase'`, `'firebase_auth'` but must be verified before writing the health handler
+   - Recommendation: Read `healthProbeService.ts` during plan/implementation to confirm exact strings before writing `SERVICE_NAMES` constant in the admin route
+
+2. **`job.user_id` field type confirmation**
+   - What we know: `ProcessingEventData.user_id` is typed as `string`; `ProcessingJob` model has `user_id` field
+   - What's unclear: Whether `ProcessingJob.user_id` can ever be `undefined`/nullable in practice
+   - Recommendation: Check `ProcessingJobModel` type definition during implementation; add defensive `?? ''` if nullable
+
+3. **Alert pagination for GET /admin/alerts**
+   - What we know: `AlertEventModel.findActive()` returns all active alerts without limit; for a single-admin system this is unlikely to be an issue
+   - What's unclear: Whether a limit/offset param is needed
+   - Recommendation: Claude's discretion — default to returning all active alerts (no pagination) given single-admin use case; add `?limit=N` support as optional param using `.limit()` on the Supabase query
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- Codebase: `backend/src/middleware/firebaseAuth.ts` — verifyFirebaseToken implementation, FirebaseAuthenticatedRequest interface, 401 error responses
+- Codebase: `backend/src/models/HealthCheckModel.ts` — findLatestByService, findAll, deleteOlderThan patterns
+- Codebase: `backend/src/models/AlertEventModel.ts` — findActive, acknowledge, resolve, findRecentByService patterns
+- Codebase: `backend/src/services/analyticsService.ts` — recordProcessingEvent (void return), deleteProcessingEventsOlderThan (pool.query pattern)
+- Codebase: `backend/src/services/jobProcessorService.ts` — processJob lifecycle: startTime capture, markAsProcessing, markAsCompleted, markAsFailed, catch block structure
+- Codebase: `backend/src/routes/monitoring.ts` — route file pattern, envelope shape `{ success, data, correlationId }`
+- Codebase: `backend/src/index.ts` — route registration, Express app structure, existing `/health` endpoint shape
+- Codebase: `backend/src/models/migrations/012_create_monitoring_tables.sql` — exact column names for service_health_checks, alert_events
+- Codebase: `backend/src/models/migrations/013_create_processing_events_table.sql` — exact column names for document_processing_events
+
+### Secondary (MEDIUM confidence)
+- Codebase: `backend/src/services/alertService.ts` — pattern for reading `process.env['EMAIL_WEEKLY_RECIPIENT']` inside function (not at module level) to avoid Firebase Secrets timing issue
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — all libraries already in use; no new dependencies
+- Architecture: HIGH — patterns derived from existing codebase, not assumptions
+- Pitfalls: HIGH — three of five pitfalls are directly observable from reading the existing code
+- Open questions: LOW confidence only on exact service name strings (requires reading one more file)
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-03-24 (stable codebase; valid until significant refactoring)
--- a/.planning/milestones/v1.0-phases/03-api-layer/03-VERIFICATION.md
+++ b/.planning/milestones/v1.0-phases/03-api-layer/03-VERIFICATION.md
@@ -0,0 +1,113 @@
+---
+phase: 03-api-layer
+verified: 2026-02-24T21:15:00Z
+status: passed
+score: 10/10 must-haves verified
+re_verification: false
+---
+
+# Phase 3: API Layer Verification Report
+
+**Phase Goal:** Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation
+**Verified:** 2026-02-24T21:15:00Z
+**Status:** passed
+**Re-verification:** No — initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | GET /admin/health returns current health status for all four services when called by admin | VERIFIED | `admin.ts:24-62` — Promise.all over SERVICE_NAMES=['document_ai','llm_api','supabase','firebase_auth'], maps null results to `status:'unknown'` |
+| 2 | GET /admin/analytics returns processing summary (uploads, success/failure, avg time) for a configurable time range | VERIFIED | `admin.ts:69-96` — validates `?range=` against `/^\d+[hd]$/`, calls `getAnalyticsSummary(range)`, returns `{ totalUploads, succeeded, failed, successRate, avgProcessingMs }` |
+| 3 | GET /admin/alerts returns active alert events | VERIFIED | `admin.ts:102-117` — calls `AlertEventModel.findActive()`, returns envelope |
+| 4 | POST /admin/alerts/:id/acknowledge marks an alert as acknowledged | VERIFIED | `admin.ts:123-150` — calls `AlertEventModel.acknowledge(id)`, returns 404 on not-found, 500 on other errors |
+| 5 | Non-admin authenticated users receive 404 on all admin endpoints | VERIFIED | `requireAdmin.ts:21-30` — returns `res.status(404).json({ error: 'Not found' })` if email does not match; router-level middleware applies to all routes |
+| 6 | Unauthenticated requests receive 401 on admin endpoints | VERIFIED | `firebaseAuth.ts:102-104` — returns 401 before `requireAdminEmail` runs; `verifyFirebaseToken` is second in the router middleware chain |
+| 7 | Document processing emits upload_started event after job is marked as processing | VERIFIED | `jobProcessorService.ts:137-142` — `recordProcessingEvent({ event_type: 'upload_started' })` called after `markAsProcessing`, before timeout setup |
+| 8 | Document processing emits completed event with duration_ms after job succeeds | VERIFIED | `jobProcessorService.ts:345-351` — `recordProcessingEvent({ event_type: 'completed', duration_ms: processingTime })` called after `markAsCompleted` |
+| 9 | Document processing emits failed event with duration_ms and error_message when job fails | VERIFIED | `jobProcessorService.ts:382-392` — `recordProcessingEvent({ event_type: 'failed', duration_ms, error_message })` in catch block with `if (job)` null-guard |
+| 10 | Analytics instrumentation does not change existing processing behavior or error handling | VERIFIED | All three calls are void fire-and-forget (no await, no try/catch wrapper); confirmed 0 occurrences of `await recordProcessingEvent`; existing code paths unchanged |
+
+**Score:** 10/10 truths verified
+
+---
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `backend/src/middleware/requireAdmin.ts` | Admin email check middleware returning 404 for non-admin; exports `requireAdminEmail` | VERIFIED | 33 lines; exports `requireAdminEmail`; reads env vars inside function body; fail-closed pattern; no stubs |
+| `backend/src/routes/admin.ts` | Admin router with 4 endpoints; exports default Router | VERIFIED | 153 lines; 4 route handlers (`GET /health`, `GET /analytics`, `GET /alerts`, `POST /alerts/:id/acknowledge`); default export; fully implemented |
+| `backend/src/services/analyticsService.ts` | `getAnalyticsSummary` and `AnalyticsSummary` export; uses `getPostgresPool()` | VERIFIED | Exports `AnalyticsSummary` interface (line 95) and `getAnalyticsSummary` function (line 115); uses `getPostgresPool()` (line 117); parameterized SQL with `$1::interval` cast |
+| `backend/src/services/jobProcessorService.ts` | `recordProcessingEvent` import + 3 instrumentation call sites | VERIFIED | Import at line 6; 4 occurrences total (1 import + 3 call sites at lines 138, 346, 385); 0 `await` uses |
+| `backend/src/index.ts` | `app.use('/admin', adminRoutes)` mount | VERIFIED | Import at line 15; mount at line 184 |
+
+---
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `backend/src/routes/admin.ts` | `backend/src/middleware/requireAdmin.ts` | `router.use(requireAdminEmail)` | WIRED | `admin.ts:3` imports; `admin.ts:18` applies as router middleware |
+| `backend/src/routes/admin.ts` | `backend/src/models/HealthCheckModel.ts` | `HealthCheckModel.findLatestByService()` | WIRED | `admin.ts:5` imports; `admin.ts:27` calls `findLatestByService(name)` in Promise.all |
+| `backend/src/routes/admin.ts` | `backend/src/services/analyticsService.ts` | `getAnalyticsSummary(range)` | WIRED | `admin.ts:7` imports; `admin.ts:82` calls with validated range |
+| `backend/src/index.ts` | `backend/src/routes/admin.ts` | `app.use('/admin', adminRoutes)` | WIRED | Import at line 15; mount at line 184 |
+| `backend/src/services/jobProcessorService.ts` | `backend/src/services/analyticsService.ts` | `recordProcessingEvent()` | WIRED | Import at line 6; 3 call sites at lines 138, 346, 385 — no await |
+
+---
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|----------|
+| INFR-02 | 03-01 | Admin API routes protected by Firebase Auth with admin email check | SATISFIED | `verifyFirebaseToken` + `requireAdminEmail` applied as router-level middleware in `admin.ts:16-18`; unauthenticated gets 401, non-admin gets 404 |
+| HLTH-01 | 03-01 | Admin can view live health status (healthy/degraded/down) for all four services | SATISFIED | `GET /admin/health` queries `HealthCheckModel.findLatestByService` for `['document_ai','llm_api','supabase','firebase_auth']`; null results map to `status:'unknown'` |
+| ANLY-02 | 03-01, 03-02 | Admin can view processing summary: upload counts, success/failure rates, avg processing time | SATISFIED | `GET /admin/analytics` returns `{ totalUploads, succeeded, failed, successRate, avgProcessingMs }` via aggregate SQL; `jobProcessorService.ts` emits real events to populate the table |
+
+All three requirement IDs declared across plans are accounted for.
+
+**Cross-reference against REQUIREMENTS.md traceability table:** INFR-02, HLTH-01, and ANLY-02 are all mapped to Phase 3 in `REQUIREMENTS.md:89-91`. No orphaned requirements — all Phase 3 requirements are claimed and verified.
+
+---
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| `backend/src/services/jobProcessorService.ts` | 448 | `TODO: Implement statistics method in ProcessingJobModel` inside `getStatistics()` | Info | Pre-existing stub in `getStatistics()` — method is not called anywhere in the codebase and is not part of Phase 03 plan artifacts. No impact on phase goal. |
+
+No blocker or warning-level anti-patterns found in Phase 03 modified files. The one TODO is in a pre-existing orphaned method unrelated to this phase.
+
+---
+
+### Human Verification Required
+
+None. All must-haves are verifiable programmatically for this phase.
+
+The following items would require human verification only when consuming the API from Phase 4 (frontend):
+- Visual rendering of health status badges
+- Alert acknowledgement flow in the admin dashboard UI
+- Analytics chart display
+
+These are Phase 4 concerns, not Phase 3.
+
+---
+
+### Summary
+
+Phase 3 goal is fully achieved. All ten observable truths are verified at all three levels (exists, substantive, wired).
+
+**Plan 03-01 (Admin API endpoints):** All four endpoints are implemented with real logic, properly authenticated behind `verifyFirebaseToken + requireAdminEmail`, mounted at `/admin` in `index.ts`, and using the `{ success, data, correlationId }` response envelope consistently. The `requireAdminEmail` middleware correctly returns 404 (not 403) per the locked design decision.
+
+**Plan 03-02 (Analytics instrumentation):** Three `recordProcessingEvent()` call sites are present at the correct lifecycle points in `processJob()`. All calls are void fire-and-forget with no `await`, preserving the non-blocking contract. The null-guard on `job` in the catch block prevents runtime errors when `findById` throws before assignment.
+
+The two plans together deliver the complete analytics pipeline: events are now written to `document_processing_events` by the processor, and `GET /admin/analytics` reads them via aggregate SQL.
+
+Commits 301d0bf, 4169a37, and dabd4a5 verified present in git history with correct content.
+
+---
+
+_Verified: 2026-02-24T21:15:00Z_
+_Verifier: Claude (gsd-verifier)_
--- a/.planning/milestones/v1.0-phases/04-frontend/04-01-PLAN.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-01-PLAN.md
@@ -0,0 +1,239 @@
+---
+phase: 04-frontend
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - frontend/src/services/adminService.ts
+  - frontend/src/components/AlertBanner.tsx
+  - frontend/src/components/AdminMonitoringDashboard.tsx
+autonomous: true
+requirements:
+  - ALRT-03
+  - ANLY-02
+  - HLTH-01
+
+must_haves:
+  truths:
+    - "adminService exposes typed methods for getHealth(), getAnalytics(range), getAlerts(), and acknowledgeAlert(id)"
+    - "AlertBanner component renders critical active alerts with acknowledge button"
+    - "AdminMonitoringDashboard component shows health status grid and analytics summary with range selector"
+  artifacts:
+    - path: "frontend/src/services/adminService.ts"
+      provides: "Monitoring API client methods with typed interfaces"
+      contains: "getHealth"
+    - path: "frontend/src/components/AlertBanner.tsx"
+      provides: "Global alert banner with acknowledge callback"
+      exports: ["AlertBanner"]
+    - path: "frontend/src/components/AdminMonitoringDashboard.tsx"
+      provides: "Health panel + analytics summary panel"
+      exports: ["AdminMonitoringDashboard"]
+  key_links:
+    - from: "frontend/src/components/AlertBanner.tsx"
+      to: "adminService.ts"
+      via: "AlertEvent type import"
+      pattern: "import.*AlertEvent.*adminService"
+    - from: "frontend/src/components/AdminMonitoringDashboard.tsx"
+      to: "adminService.ts"
+      via: "getHealth and getAnalytics calls"
+      pattern: "adminService\\.(getHealth|getAnalytics)"
+---
+
+<objective>
+Create the three building blocks for Phase 4 frontend: extend adminService with typed monitoring API methods, build the AlertBanner component, and build the AdminMonitoringDashboard component.
+
+Purpose: These components and service methods are the foundation that Plan 02 wires into the Dashboard. Separating creation from wiring keeps each plan focused.
+Output: Three files ready to be imported and mounted in App.tsx.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/04-frontend/04-RESEARCH.md
+@frontend/src/services/adminService.ts
+@frontend/src/components/Analytics.tsx
+@frontend/src/components/UploadMonitoringDashboard.tsx
+@frontend/src/utils/cn.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Extend adminService with monitoring API methods and types</name>
+  <files>frontend/src/services/adminService.ts</files>
+  <action>
+Add three exported interfaces and four new methods to the existing AdminService class in `frontend/src/services/adminService.ts`.
+
+**Interfaces to add** (above the AdminService class):
+
+```typescript
+export interface AlertEvent {
+  id: string;
+  service_name: string;
+  alert_type: 'service_down' | 'service_degraded' | 'recovery';
+  status: 'active' | 'acknowledged' | 'resolved';
+  message: string | null;
+  details: Record<string, unknown> | null;
+  created_at: string;
+  acknowledged_at: string | null;
+  resolved_at: string | null;
+}
+
+export interface ServiceHealthEntry {
+  service: string;
+  status: 'healthy' | 'degraded' | 'down' | 'unknown';
+  checkedAt: string | null;
+  latencyMs: number | null;
+  errorMessage: string | null;
+}
+
+export interface AnalyticsSummary {
+  range: string;
+  totalUploads: number;
+  succeeded: number;
+  failed: number;
+  successRate: number;
+  avgProcessingMs: number | null;
+  generatedAt: string;
+}
+```
+
+**IMPORTANT type casing note** (from RESEARCH Pitfall 4):
+- `ServiceHealthEntry` uses camelCase (backend admin.ts remaps to camelCase)
+- `AlertEvent` uses snake_case (backend returns raw model data)
+- `AnalyticsSummary` uses camelCase (from backend analyticsService.ts)
+
+**Methods to add** inside the AdminService class:
+
+```typescript
+async getHealth(): Promise<ServiceHealthEntry[]> {
+  const response = await apiClient.get('/admin/health');
+  return response.data.data;
+}
+
+async getAnalytics(range: string = '24h'): Promise<AnalyticsSummary> {
+  const response = await apiClient.get(`/admin/analytics?range=${range}`);
+  return response.data.data;
+}
+
+async getAlerts(): Promise<AlertEvent[]> {
+  const response = await apiClient.get('/admin/alerts');
+  return response.data.data;
+}
+
+async acknowledgeAlert(id: string): Promise<AlertEvent> {
+  const response = await apiClient.post(`/admin/alerts/${id}/acknowledge`);
+  return response.data.data;
+}
+```
+
+Keep all existing methods and interfaces. Do not modify the `apiClient` interceptor or the `ADMIN_EMAIL` check.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/frontend && npx tsc --noEmit --strict src/services/adminService.ts 2>&1 | head -20</automated>
+    <manual>Verify the file exports AlertEvent, ServiceHealthEntry, AnalyticsSummary interfaces and the four new methods</manual>
+  </verify>
+  <done>adminService.ts exports 3 new typed interfaces and 4 new methods (getHealth, getAnalytics, getAlerts, acknowledgeAlert) alongside all existing functionality</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Create AlertBanner and AdminMonitoringDashboard components</name>
+  <files>frontend/src/components/AlertBanner.tsx, frontend/src/components/AdminMonitoringDashboard.tsx</files>
+  <action>
+**AlertBanner.tsx** — Create a new component at `frontend/src/components/AlertBanner.tsx`:
+
+Props interface:
+```typescript
+interface AlertBannerProps {
+  alerts: AlertEvent[];
+  onAcknowledge: (id: string) => Promise<void>;
+}
+```
+
+Behavior:
+- Filter alerts to show only `status === 'active'` AND `alert_type` is `service_down` or `service_degraded` (per RESEARCH Pitfall — `recovery` is informational, not critical)
+- If no critical alerts after filtering, return `null`
+- Render a red banner (`bg-red-600 px-4 py-3`) with each alert showing:
+  - `AlertTriangle` icon from lucide-react (h-5 w-5, flex-shrink-0)
+  - Text: `{alert.service_name}: {alert.message ?? alert.alert_type}` (text-sm font-medium text-white)
+  - "Acknowledge" button with `X` icon from lucide-react (text-sm underline hover:no-underline)
+- `onAcknowledge` called with `alert.id` on button click
+- Import `AlertEvent` from `../services/adminService`
+- Import `cn` from `../utils/cn`
+- Use `AlertTriangle` and `X` from `lucide-react`
+
+**AdminMonitoringDashboard.tsx** — Create at `frontend/src/components/AdminMonitoringDashboard.tsx`:
+
+This component contains two sections: Service Health Panel and Processing Analytics Panel.
+
+State:
+```typescript
+const [health, setHealth] = useState<ServiceHealthEntry[]>([]);
+const [analytics, setAnalytics] = useState<AnalyticsSummary | null>(null);
+const [range, setRange] = useState('24h');
+const [loading, setLoading] = useState(true);
+const [error, setError] = useState<string | null>(null);
+```
+
+Data fetching: Use `useCallback` + `useEffect` pattern matching existing `Analytics.tsx`:
+- `loadData()` calls `Promise.all([adminService.getHealth(), adminService.getAnalytics(range)])`
+- Sets loading/error state appropriately
+- Re-fetches when `range` changes
+
+**Service Health Panel:**
+- 2x2 grid on desktop (`grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4`)
+- Each card: white bg, rounded-lg, shadow-soft, border border-gray-100, p-4
+- Status dot: `w-3 h-3 rounded-full` with color mapping:
+  - `healthy` → `bg-green-500`
+  - `degraded` → `bg-yellow-500`
+  - `down` → `bg-red-500`
+  - `unknown` → `bg-gray-400`
+- Service display name mapping: `document_ai` → "Document AI", `llm_api` → "LLM API", `supabase` → "Supabase", `firebase_auth` → "Firebase Auth"
+- Show `checkedAt` as `new Date(checkedAt).toLocaleString()` if available, otherwise "Never checked"
+- Show `latencyMs` with "ms" suffix if available
+- Use `Activity`, `Clock` icons from lucide-react
+
+**Processing Analytics Panel:**
+- Range selector: `<select>` with options `24h`, `7d`, `30d` — onChange updates `range` state
+- Stat cards in 1x5 grid: Total Uploads, Succeeded, Failed, Success Rate (formatted as `(successRate * 100).toFixed(1)%`), Avg Processing Time (format `avgProcessingMs` as seconds: `(avgProcessingMs / 1000).toFixed(1)s`, or "N/A" if null)
+- Include a "Refresh" button that calls `loadData()` — matches existing Analytics.tsx refresh pattern
+- Use `RefreshCw` icon from lucide-react for refresh button
+
+Loading state: Show `animate-spin rounded-full h-8 w-8 border-b-2 border-accent-500` centered (matching existing App.tsx pattern).
+Error state: Show error message with retry button.
+
+Import `ServiceHealthEntry`, `AnalyticsSummary` from `../services/adminService`.
+Import `adminService` from `../services/adminService`.
+Import `cn` from `../utils/cn`.
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/frontend && npx tsc --noEmit 2>&1 | head -30</automated>
+    <manual>Check that AlertBanner.tsx and AdminMonitoringDashboard.tsx exist and export their components</manual>
+  </verify>
+  <done>AlertBanner renders critical alerts with acknowledge buttons; AdminMonitoringDashboard renders health status grid with colored dots and analytics summary with range selector and refresh</done>
+</task>
+
+</tasks>
+
+<verification>
+- `npx tsc --noEmit` passes with no type errors in the three modified/created files
+- AlertBanner exports a React component accepting `alerts` and `onAcknowledge` props
+- AdminMonitoringDashboard exports a React component with no required props
+- adminService exports AlertEvent, ServiceHealthEntry, AnalyticsSummary interfaces
+- adminService.getHealth(), getAnalytics(), getAlerts(), acknowledgeAlert() methods exist with correct return types
+</verification>
+
+<success_criteria>
+All three files compile without TypeScript errors. Components follow existing project patterns (Tailwind, lucide-react, cn utility). Types match backend API response shapes exactly (camelCase for health/analytics, snake_case for alerts).
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/04-frontend/04-01-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/04-frontend/04-01-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-01-SUMMARY.md
@@ -0,0 +1,111 @@
+---
+phase: 04-frontend
+plan: 01
+subsystem: ui
+tags: [react, typescript, tailwind, lucide-react, axios]
+
+# Dependency graph
+requires:
+  - phase: 03-api-layer
+    provides: "GET /admin/health, GET /admin/analytics, GET /admin/alerts, POST /admin/alerts/:id/acknowledge endpoints"
+provides:
+  - "AdminService typed methods: getHealth(), getAnalytics(range), getAlerts(), acknowledgeAlert(id)"
+  - "AlertEvent, ServiceHealthEntry, AnalyticsSummary TypeScript interfaces"
+  - "AlertBanner component: critical active alert display with per-alert acknowledge button"
+  - "AdminMonitoringDashboard component: service health grid + analytics summary with range selector"
+affects:
+  - 04-02 (wires AlertBanner and AdminMonitoringDashboard into App.tsx Dashboard)
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "useCallback + useEffect for data fetching with re-fetch on state dependency changes"
+    - "Promise.all for concurrent independent API calls"
+    - "Optimistic UI: AlertBanner onAcknowledge pattern is defined at the parent level to filter local state immediately"
+    - "Status dot pattern: w-3 h-3 rounded-full with Tailwind bg-color for health indicators"
+
+key-files:
+  created:
+    - frontend/src/components/AlertBanner.tsx
+    - frontend/src/components/AdminMonitoringDashboard.tsx
+  modified:
+    - frontend/src/services/adminService.ts
+
+key-decisions:
+  - "AlertBanner filters to active service_down/service_degraded only — recovery type is informational, not critical (per RESEARCH Pitfall)"
+  - "AlertEvent uses snake_case fields (backend returns raw model data), ServiceHealthEntry/AnalyticsSummary use camelCase (backend admin.ts remaps)"
+  - "AdminMonitoringDashboard has no required props — self-contained component that fetches its own data"
+
+patterns-established:
+  - "Monitoring dashboard pattern: health grid + analytics stat cards in same component"
+  - "Alert banner pattern: top-level conditional render, filters by status=active AND critical alert_type"
+
+requirements-completed:
+  - ALRT-03
+  - ANLY-02
+  - HLTH-01
+
+# Metrics
+duration: 2min
+completed: 2026-02-24
+---
+
+# Phase 04 Plan 01: Monitoring Service Layer and Components Summary
+
+**AdminService extended with typed monitoring API methods plus AlertBanner and AdminMonitoringDashboard React components ready for mounting in App.tsx**
+
+## Performance
+
+- **Duration:** 2 min
+- **Started:** 2026-02-24T21:33:40Z
+- **Completed:** 2026-02-24T21:35:33Z
+- **Tasks:** 2
+- **Files modified:** 3
+
+## Accomplishments
+
+- Extended `adminService.ts` with three new exported TypeScript interfaces (AlertEvent, ServiceHealthEntry, AnalyticsSummary) and four new typed API methods (getHealth, getAnalytics, getAlerts, acknowledgeAlert)
+- Created `AlertBanner` component that filters alerts to active critical types only and renders a red banner with per-alert acknowledge buttons
+- Created `AdminMonitoringDashboard` component with a 1x4 service health card grid (colored status dots) and a 1x5 analytics stat card panel with range selector and refresh button
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Extend adminService with monitoring API methods and types** - `f84a822` (feat)
+2. **Task 2: Create AlertBanner and AdminMonitoringDashboard components** - `b457b9e` (feat)
+
+## Files Created/Modified
+
+- `frontend/src/services/adminService.ts` - Added AlertEvent, ServiceHealthEntry, AnalyticsSummary interfaces and getHealth(), getAnalytics(), getAlerts(), acknowledgeAlert() methods
+- `frontend/src/components/AlertBanner.tsx` - New component rendering red banner for active service_down/service_degraded alerts with X Acknowledge buttons
+- `frontend/src/components/AdminMonitoringDashboard.tsx` - New component with service health grid and processing analytics panel with range selector
+
+## Decisions Made
+
+- AlertBanner renders only `status === 'active'` alerts of type `service_down` or `service_degraded`; `recovery` alerts are filtered out as informational
+- Type casing follows backend API response shapes exactly: AlertEvent is snake_case (raw model), ServiceHealthEntry/AnalyticsSummary are camelCase (backend admin.ts remaps them)
+- AdminMonitoringDashboard is self-contained with no required props, following existing Analytics.tsx pattern
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+
+None.
+
+## User Setup Required
+
+None - no external service configuration required.
+
+## Next Phase Readiness
+
+- All three files are ready to import in Plan 02 (App.tsx wiring)
+- AlertBanner expects `alerts: AlertEvent[]` and `onAcknowledge: (id: string) => Promise<void>` — parent must manage alert state and provide optimistic acknowledge handler
+- AdminMonitoringDashboard has no required props — drop-in for the monitoring tab
+
+---
+*Phase: 04-frontend*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/04-frontend/04-02-PLAN.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-02-PLAN.md
@@ -0,0 +1,187 @@
+---
+phase: 04-frontend
+plan: 02
+type: execute
+wave: 2
+depends_on:
+  - 04-01
+files_modified:
+  - frontend/src/App.tsx
+autonomous: false
+requirements:
+  - ALRT-03
+  - ANLY-02
+  - HLTH-01
+
+must_haves:
+  truths:
+    - "Alert banner appears above tab navigation when there are active critical alerts"
+    - "Alert banner disappears immediately after admin clicks Acknowledge (optimistic update)"
+    - "Monitoring tab shows health status indicators and processing analytics from Supabase backend"
+    - "Non-admin user on monitoring tab sees Access Denied, not the dashboard"
+    - "Alert fetching only happens for admin users (gated by isAdmin check)"
+  artifacts:
+    - path: "frontend/src/App.tsx"
+      provides: "Dashboard with AlertBanner wired above nav and AdminMonitoringDashboard in monitoring tab"
+      contains: "AlertBanner"
+  key_links:
+    - from: "frontend/src/App.tsx"
+      to: "frontend/src/components/AlertBanner.tsx"
+      via: "import and render above nav"
+      pattern: "import.*AlertBanner"
+    - from: "frontend/src/App.tsx"
+      to: "frontend/src/components/AdminMonitoringDashboard.tsx"
+      via: "import and render in monitoring tab"
+      pattern: "import.*AdminMonitoringDashboard"
+    - from: "frontend/src/App.tsx"
+      to: "frontend/src/services/adminService.ts"
+      via: "getAlerts call in Dashboard useEffect"
+      pattern: "adminService\\.getAlerts"
+---
+
+<objective>
+Wire AlertBanner and AdminMonitoringDashboard into the Dashboard component in App.tsx. Add alert state management with optimistic acknowledge. Replace the monitoring tab content from UploadMonitoringDashboard to AdminMonitoringDashboard.
+
+Purpose: This completes the frontend delivery of ALRT-03 (in-app alert banner), ANLY-02 (processing metrics UI), and HLTH-01 (health status UI).
+Output: Fully wired admin monitoring UI visible in the application.
+</objective>
+
+<execution_context>
+@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
+@/home/jonathan/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/04-frontend/04-RESEARCH.md
+@.planning/phases/04-frontend/04-01-SUMMARY.md
+@frontend/src/App.tsx
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Wire AlertBanner and AdminMonitoringDashboard into Dashboard</name>
+  <files>frontend/src/App.tsx</files>
+  <action>
+Modify the Dashboard component in `frontend/src/App.tsx` with these changes:
+
+**1. Add imports** (at the top of the file):
+```typescript
+import AlertBanner from './components/AlertBanner';
+import AdminMonitoringDashboard from './components/AdminMonitoringDashboard';
+```
+Import `AlertEvent` type from `./services/adminService`.
+
+The `UploadMonitoringDashboard` import can be removed since the monitoring tab will now use `AdminMonitoringDashboard`.
+
+**2. Add alert state** inside the Dashboard component, after the existing state declarations:
+```typescript
+const [activeAlerts, setActiveAlerts] = useState<AlertEvent[]>([]);
+```
+
+**3. Add alert fetching useEffect** — MUST be gated by `isAdmin` (RESEARCH Pitfall 5):
+```typescript
+useEffect(() => {
+  if (isAdmin) {
+    adminService.getAlerts().then(setActiveAlerts).catch(() => {});
+  }
+}, [isAdmin]);
+```
+
+**4. Add handleAcknowledge callback** — uses optimistic update (RESEARCH Pitfall 2):
+```typescript
+const handleAcknowledge = async (id: string) => {
+  setActiveAlerts(prev => prev.filter(a => a.id !== id));
+  try {
+    await adminService.acknowledgeAlert(id);
+  } catch {
+    // On failure, re-fetch to restore correct state
+    adminService.getAlerts().then(setActiveAlerts).catch(() => {});
+  }
+};
+```
+
+**5. Render AlertBanner ABOVE the `<nav>` element** (RESEARCH Pitfall 1 — must be above nav, not inside a tab):
+Inside the Dashboard return JSX, immediately after `<div className="min-h-screen bg-gray-50">` and before the `{/* Navigation */}` comment and `<nav>` element, add:
+```jsx
+{isAdmin && activeAlerts.length > 0 && (
+  <AlertBanner alerts={activeAlerts} onAcknowledge={handleAcknowledge} />
+)}
+```
+
+**6. Replace monitoring tab content:**
+Change:
+```jsx
+{activeTab === 'monitoring' && isAdmin && (
+  <UploadMonitoringDashboard />
+)}
+```
+To:
+```jsx
+{activeTab === 'monitoring' && isAdmin && (
+  <AdminMonitoringDashboard />
+)}
+```
+
+**7. Keep the existing non-admin access-denied fallback** for `{activeTab === 'monitoring' && !isAdmin && (...)}` — do not change it.
+
+**Do NOT change:**
+- The `analytics` tab content (still renders existing `<Analytics />`)
+- Any other tab content
+- The tab navigation buttons
+- Any other Dashboard state or logic
+  </action>
+  <verify>
+    <automated>cd /home/jonathan/Coding/cim_summary/frontend && npx tsc --noEmit 2>&1 | head -30</automated>
+    <manual>Run `npm run dev` and verify: (1) AlertBanner shows above nav if alerts exist, (2) Monitoring tab shows health grid and analytics panel, (3) Non-admin sees Access Denied on monitoring tab</manual>
+  </verify>
+  <done>AlertBanner renders above nav for admin users with active alerts; monitoring tab shows AdminMonitoringDashboard with health status and analytics; non-admin access denied preserved</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <name>Task 2: Visual verification of monitoring UI</name>
+  <what-built>
+    Complete admin monitoring frontend:
+    1. AlertBanner above navigation (shows when critical alerts exist)
+    2. AdminMonitoringDashboard in Monitoring tab (health status grid + analytics summary)
+    3. Alert acknowledge with optimistic update
+  </what-built>
+  <how-to-verify>
+    1. Start frontend: `cd frontend && npm run dev`
+    2. Start backend: `cd backend && npm run dev`
+    3. Log in as admin (jpressnell@bluepointcapital.com)
+    4. Click the "Monitoring" tab — verify:
+       - Health status cards show for all 4 services (Document AI, LLM API, Supabase, Firebase Auth)
+       - Each card has colored dot (green/yellow/red/gray) and last-checked timestamp
+       - Analytics section shows upload counts, success/failure rates, avg processing time
+       - Range selector (24h/7d/30d) changes the analytics data
+       - Refresh button reloads data
+    5. If there are active alerts: verify red banner appears ABOVE the tab navigation on ALL tabs (not just Monitoring)
+    6. If no alerts exist: verify no banner is shown (this is correct behavior)
+    7. (Optional) If you can trigger an alert (e.g., by having a service probe fail), verify the banner appears and "Acknowledge" button removes it immediately
+  </how-to-verify>
+  <resume-signal>Type "approved" to complete Phase 4, or describe any issues to fix</resume-signal>
+</task>
+
+</tasks>
+
+<verification>
+- `npx tsc --noEmit` passes with no errors
+- AlertBanner renders above `<nav>` in Dashboard (not inside a tab)
+- Alert state only fetched when `isAdmin` is true
+- Optimistic update: banner disappears immediately on acknowledge click
+- Monitoring tab renders AdminMonitoringDashboard (not UploadMonitoringDashboard)
+- Non-admin monitoring access denied fallback still works
+- `npm run build` completes successfully
+</verification>
+
+<success_criteria>
+Admin user sees health indicators and processing metrics on the Monitoring tab. Alert banner appears above navigation when active critical alerts exist. Acknowledge removes the alert banner immediately. Non-admin users see Access Denied on admin tabs.
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/04-frontend/04-02-SUMMARY.md`
+</output>
--- a/.planning/milestones/v1.0-phases/04-frontend/04-02-SUMMARY.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-02-SUMMARY.md
@@ -0,0 +1,114 @@
+---
+phase: 04-frontend
+plan: 02
+subsystem: ui
+tags: [react, typescript, app-tsx, alert-banner, admin-monitoring]
+
+# Dependency graph
+requires:
+  - phase: 04-frontend
+    plan: 01
+    provides: "AlertBanner component, AdminMonitoringDashboard component, AlertEvent type, adminService.getAlerts/acknowledgeAlert"
+provides:
+  - "Dashboard with AlertBanner above nav wired to adminService.getAlerts"
+  - "Monitoring tab replaced with AdminMonitoringDashboard"
+  - "Optimistic alert acknowledge with re-fetch fallback"
+affects: []
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "Optimistic UI: filter local state immediately on acknowledge, re-fetch on API failure"
+    - "Admin-gated data fetching: isAdmin dependency in useEffect prevents unnecessary API calls"
+    - "AlertBanner above nav: conditional render before <nav> so banner shows on all tabs"
+
+key-files:
+  created: []
+  modified:
+    - frontend/src/App.tsx
+
+key-decisions:
+  - "AlertBanner placed before <nav> element so it shows across all tabs, not scoped to monitoring tab"
+  - "handleAcknowledge uses optimistic update (filter state immediately) with re-fetch on failure"
+  - "Alert fetch gated by isAdmin — non-admin users never trigger getAlerts API call"
+  - "UploadMonitoringDashboard import removed entirely — replaced by AdminMonitoringDashboard"
+
+requirements-completed:
+  - ALRT-03
+  - ANLY-02
+  - HLTH-01
+
+# Metrics
+duration: 2min
+completed: 2026-02-24
+---
+
+# Phase 04 Plan 02: Wire AlertBanner and AdminMonitoringDashboard into App.tsx Summary
+
+**AlertBanner wired above navigation in Dashboard with optimistic acknowledge, AdminMonitoringDashboard replacing UploadMonitoringDashboard in monitoring tab**
+
+## Performance
+
+- **Duration:** ~2 min
+- **Started:** 2026-02-24T21:37:58Z
+- **Completed:** 2026-02-24T21:39:36Z
+- **Tasks:** 1 auto + 1 checkpoint (pending visual verification)
+- **Files modified:** 1
+
+## Accomplishments
+
+- Added `AlertBanner` and `AdminMonitoringDashboard` imports to `frontend/src/App.tsx`
+- Added `AlertEvent` type import from `adminService`
+- Added `activeAlerts` state (AlertEvent[]) inside Dashboard component
+- Added `useEffect` gated by `isAdmin` to fetch alerts on mount
+- Added `handleAcknowledge` callback with optimistic update (immediate filter) and re-fetch on failure
+- Rendered `AlertBanner` above `<nav>` so it appears on all tabs when admin has active alerts
+- Replaced `UploadMonitoringDashboard` with `AdminMonitoringDashboard` in monitoring tab
+- Removed unused `UploadMonitoringDashboard` import
+- `npx tsc --noEmit` passes with zero errors
+- `npm run build` succeeds
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Wire AlertBanner and AdminMonitoringDashboard into Dashboard** - `6c345a6` (feat)
+
+## Files Created/Modified
+
+- `frontend/src/App.tsx` - Added AlertBanner above nav, added alert state and optimistic acknowledge, replaced UploadMonitoringDashboard with AdminMonitoringDashboard in monitoring tab
+
+## Decisions Made
+
+- AlertBanner is placed before the `<nav>` element (not inside a tab) so it appears globally on every tab when the admin has active critical alerts
+- Optimistic update pattern: `setActiveAlerts(prev => prev.filter(a => a.id !== id))` fires before the API call, restoring state on failure via re-fetch
+- Alert fetch is fully gated on `isAdmin` in the `useEffect` dependency array — non-admin users never call `adminService.getAlerts()`
+- `UploadMonitoringDashboard` import was removed entirely since AdminMonitoringDashboard replaces it
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+
+None.
+
+## User Setup Required
+
+Task 2 is a `checkpoint:human-verify` — admin must visually verify the monitoring tab and alert banner in the running application.
+
+## Next Phase Readiness
+
+- Phase 4 is complete pending human visual verification (Task 2 checkpoint)
+- All requirements ALRT-03, ANLY-02, HLTH-01 are now fully implemented frontend-to-backend
+
+## Self-Check: PASSED
+
+- `04-02-SUMMARY.md` exists at `.planning/phases/04-frontend/04-02-SUMMARY.md`
+- `frontend/src/App.tsx` exists and modified
+- Commit `6c345a6` exists in git log
+
+---
+*Phase: 04-frontend*
+*Completed: 2026-02-24*
--- a/.planning/milestones/v1.0-phases/04-frontend/04-RESEARCH.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-RESEARCH.md
@@ -0,0 +1,511 @@
+# Phase 4: Frontend - Research
+
+**Researched:** 2026-02-24
+**Domain:** React + TypeScript frontend integration with admin monitoring APIs
+**Confidence:** HIGH
+
+## Summary
+
+Phase 4 wires the existing React/TypeScript/Tailwind frontend to the three admin API endpoints delivered in Phase 3: `GET /admin/health`, `GET /admin/analytics`, and `GET /admin/alerts` + `POST /admin/alerts/:id/acknowledge`. The frontend already has a complete admin-detection pattern, tab-based navigation, an axios-backed `adminService.ts`, and a `ProtectedRoute` component. The work is pure frontend integration — no new infrastructure, no new libraries, no new backend routes.
+
+The stack is locked: React 18 + TypeScript + Tailwind CSS + lucide-react icons + react-router-dom v6 + axios (via `adminService.ts`). The project uses `clsx` and `tailwind-merge` (via `cn()`) for conditional class composition. No charting library is installed. The existing `Analytics.tsx` component shows the styling and layout patterns to follow. The existing `UploadMonitoringDashboard.tsx` shows the health indicator pattern with colored circles.
+
+The primary implementation risk is the alert acknowledgement UX: after calling `POST /admin/alerts/:id/acknowledge`, the local state must update immediately (optimistic update or re-fetch) so the banner disappears without waiting for a full page refresh. The alert banner must render above the tab navigation because it is a global signal, not scoped to a specific tab.
+
+**Primary recommendation:** Add new components to the existing `monitoring` tab in App.tsx, extend `adminService.ts` with the three monitoring API methods, add an `AdminMonitoringDashboard` component, add an `AlertBanner` component that renders above the nav inside `Dashboard`, and add an `AdminRoute` wrapper that shows access-denied for non-admins who somehow hit the monitoring tab directly.
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| ALRT-03 | Admin sees in-app alert banner for active critical issues; banner disappears after acknowledgement | alertBanner component using GET /admin/alerts + POST /admin/alerts/:id/acknowledge; optimistic state update on acknowledge |
+| ANLY-02 (UI) | Admin dashboard shows processing summary: upload counts, success/failure rates, avg processing time | AdminMonitoringDashboard section consuming GET /admin/analytics response shape (AnalyticsSummary interface) |
+| HLTH-01 (UI) | Admin dashboard shows health status indicators (green/yellow/red) for all four services with last-checked timestamp | ServiceHealthPanel component consuming GET /admin/health; status → color mapping green=healthy, yellow=degraded, red=down/unknown |
+
+*Note: ANLY-02 and HLTH-01 were marked Complete in Phase 3 (backend side). Phase 4 completes their UI delivery.*
+</phase_requirements>
+
+---
+
+## Standard Stack
+
+### Core (already installed — no new packages needed)
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| react | ^18.2.0 | Component rendering | Project standard |
+| typescript | ^5.2.2 | Type safety | Project standard |
+| tailwindcss | ^3.3.5 | Utility CSS | Project standard |
+| lucide-react | ^0.294.0 | Icons (AlertTriangle, CheckCircle, Activity, Clock, etc.) | Already used throughout |
+| axios | ^1.6.2 | HTTP client (via adminService) | Already used in adminService.ts |
+| clsx + tailwind-merge | ^2.0.0 / ^2.0.0 | Conditional class composition via `cn()` | Already used in Analytics.tsx |
+| react-router-dom | ^6.20.1 | Routing, Navigate | Already used |
+
+### No New Libraries Required
+
+All required UI capabilities exist in the current stack:
+- Status indicators: plain `div` with Tailwind bg-green-500 / bg-yellow-500 / bg-red-500
+- Alert banner: fixed/sticky div above nav, standard Tailwind layout
+- Timestamps: `new Date(ts).toLocaleString()` or `toRelative()` — no date library needed
+- Loading state: existing spinner pattern (`animate-spin rounded-full border-b-2`)
+
+**Installation:** None required.
+
+---
+
+## Architecture Patterns
+
+### Recommended File Changes
+
+```
+frontend/src/
+├── components/
+│   ├── AlertBanner.tsx          # NEW — global alert banner above nav
+│   ├── AdminMonitoringDashboard.tsx  # NEW — health + analytics panel
+│   └── (existing files unchanged)
+├── services/
+│   └── adminService.ts          # EXTEND — add getHealth(), getAnalytics(), getAlerts(), acknowledgeAlert()
+└── App.tsx                      # MODIFY — render AlertBanner, wire monitoring tab to new component
+```
+
+### Pattern 1: Alert Banner — global, above nav, conditional render
+
+**What:** A dismissible/acknowledgeable banner rendered inside `Dashboard` ABOVE the `<nav>` element, only when `alerts.length > 0` and there is at least one `status === 'active'` alert with a critical `alert_type` (`service_down` or `service_degraded`).
+
+**When to use:** This pattern matches the existing App.tsx structure — `Dashboard` is the single top-level component after login, so mounting `AlertBanner` inside it ensures it is always visible when the admin is on any tab.
+
+**Data flow:**
+1. `Dashboard` fetches active alerts on mount via `adminService.getAlerts()`
+2. Stores in `activeAlerts` state
+3. Passes `alerts` and `onAcknowledge` callback to `AlertBanner`
+4. `onAcknowledge(id)` calls `adminService.acknowledgeAlert(id)` then updates local state by filtering out the acknowledged alert (optimistic update)
+
+**Example:**
+```typescript
+// In Dashboard component
+const [activeAlerts, setActiveAlerts] = useState<AlertEvent[]>([]);
+
+useEffect(() => {
+  adminService.getAlerts().then(setActiveAlerts).catch(() => {});
+}, []);
+
+const handleAcknowledge = async (id: string) => {
+  await adminService.acknowledgeAlert(id);
+  setActiveAlerts(prev => prev.filter(a => a.id !== id));
+};
+
+// Rendered before <nav>:
+{isAdmin && activeAlerts.length > 0 && (
+  <AlertBanner alerts={activeAlerts} onAcknowledge={handleAcknowledge} />
+)}
+```
+
+**AlertBanner props:**
+```typescript
+interface AlertBannerProps {
+  alerts: AlertEvent[];
+  onAcknowledge: (id: string) => Promise<void>;
+}
+```
+
+### Pattern 2: Service Health Panel — status dot + service name + timestamp
+
+**What:** A 2x2 or 1x4 grid of service health cards. Each card shows: colored dot (green=healthy, yellow=degraded, red=down, gray=unknown), service display name, last-checked timestamp, and latency_ms if available.
+
+**Status → color mapping** (matches REQUIREMENTS.md "green/yellow/red"):
+```typescript
+const statusColor = {
+  healthy: 'bg-green-500',
+  degraded: 'bg-yellow-500',
+  down: 'bg-red-500',
+  unknown: 'bg-gray-400',
+} as const;
+```
+
+**Service name display mapping** (backend sends snake_case):
+```typescript
+const serviceDisplayName: Record<string, string> = {
+  document_ai: 'Document AI',
+  llm_api: 'LLM API',
+  supabase: 'Supabase',
+  firebase_auth: 'Firebase Auth',
+};
+```
+
+**Example card:**
+```typescript
+// Source: admin.ts route response shape
+interface ServiceHealthEntry {
+  service: 'document_ai' | 'llm_api' | 'supabase' | 'firebase_auth';
+  status: 'healthy' | 'degraded' | 'down' | 'unknown';
+  checkedAt: string | null;
+  latencyMs: number | null;
+  errorMessage: string | null;
+}
+```
+
+### Pattern 3: Analytics Summary Panel — upload counts + rates + avg time
+
+**What:** A stat card grid showing `totalUploads`, `succeeded`, `failed`, `successRate` (as %, formatted), and `avgProcessingMs` (formatted as seconds or minutes). Matches `AnalyticsSummary` response from `analyticsService.ts`.
+
+**AnalyticsSummary interface** (from backend analyticsService.ts):
+```typescript
+interface AnalyticsSummary {
+  range: string;           // e.g. "24h"
+  totalUploads: number;
+  succeeded: number;
+  failed: number;
+  successRate: number;     // 0.0 to 1.0
+  avgProcessingMs: number | null;
+  generatedAt: string;
+}
+```
+
+**Range selector:** Include a `<select>` for `24h`, `7d`, `30d` as query params passed to `getAnalytics(range)`. Matches the pattern in existing `Analytics.tsx`.
+
+### Pattern 4: Admin-only route protection in existing tab system
+
+**What:** The app uses a tab system inside `Dashboard`, not separate React Router routes for admin tabs. Admin-only tabs (`analytics`, `monitoring`) are already conditionally rendered with `isAdmin && (...)`. The "access-denied" state for non-admin users already exists as inline fallback JSX in App.tsx.
+
+**For Phase 4 success criterion 4 (non-admin sees access-denied):** The `monitoring` tab already shows an inline "Access Denied" card for `!isAdmin` users. The new `AdminMonitoringDashboard` will render inside the `monitoring` tab, guarded the same way. No new route is needed.
+
+### Pattern 5: `adminService.ts` extensions
+
+The existing `adminService.ts` already has an axios client with auto-attached Firebase token. Extend it with typed methods:
+
+```typescript
+// Add to adminService.ts
+
+// Types — import or co-locate
+export interface AlertEvent {
+  id: string;
+  service_name: string;
+  alert_type: 'service_down' | 'service_degraded' | 'recovery';
+  status: 'active' | 'acknowledged' | 'resolved';
+  message: string | null;
+  details: Record<string, unknown> | null;
+  created_at: string;
+  acknowledged_at: string | null;
+  resolved_at: string | null;
+}
+
+export interface ServiceHealthEntry {
+  service: string;
+  status: 'healthy' | 'degraded' | 'down' | 'unknown';
+  checkedAt: string | null;
+  latencyMs: number | null;
+  errorMessage: string | null;
+}
+
+export interface AnalyticsSummary {
+  range: string;
+  totalUploads: number;
+  succeeded: number;
+  failed: number;
+  successRate: number;
+  avgProcessingMs: number | null;
+  generatedAt: string;
+}
+
+// Methods to add to AdminService class:
+async getHealth(): Promise<ServiceHealthEntry[]> {
+  const response = await apiClient.get('/admin/health');
+  return response.data.data;
+}
+
+async getAnalytics(range: string = '24h'): Promise<AnalyticsSummary> {
+  const response = await apiClient.get(`/admin/analytics?range=${range}`);
+  return response.data.data;
+}
+
+async getAlerts(): Promise<AlertEvent[]> {
+  const response = await apiClient.get('/admin/alerts');
+  return response.data.data;
+}
+
+async acknowledgeAlert(id: string): Promise<AlertEvent> {
+  const response = await apiClient.post(`/admin/alerts/${id}/acknowledge`);
+  return response.data.data;
+}
+```
+
+### Anti-Patterns to Avoid
+
+- **Awaiting acknowledgeAlert before updating UI:** The banner should disappear immediately on click, not wait for the API round-trip. Use optimistic state update: `setActiveAlerts(prev => prev.filter(a => a.id !== id))` before the `await`.
+- **Polling alerts on a short interval:** Out of scope (WebSockets/SSE out of scope per REQUIREMENTS.md). Fetch alerts once on Dashboard mount. A "Refresh" button on the monitoring panel is acceptable.
+- **Using console.log:** The frontend already uses console.log extensively. New components should match the existing pattern (the backend Winston logger rule does not apply to frontend code — frontend has no logger.ts).
+- **Building a new ProtectedRoute for admin tabs:** The existing tab-visibility pattern (`isAdmin &&`) is sufficient. No new routes needed.
+- **Using `any` type:** Type the API responses explicitly with interfaces matching backend response shapes.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Colored status indicators | Custom SVG status icon component | Tailwind `bg-green-500/bg-yellow-500/bg-red-500` on a `w-3 h-3 rounded-full` div | Already established in Analytics.tsx and UploadMonitoringDashboard.tsx |
+| Token-attached API calls | Custom fetch with manual header attachment | Extend existing `apiClient` axios instance in adminService.ts | Interceptor already handles token attachment automatically |
+| Date formatting | Custom date utility | `new Date(ts).toLocaleString()` inline | Sufficient; no date library installed |
+| Conditional class composition | String concatenation | `cn()` from `../utils/cn` | Already imported in every component |
+
+**Key insight:** Every UI pattern needed already exists in the codebase. The implementation is wiring existing patterns to new API endpoints, not building new UI infrastructure.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Alert banner position — wrong mount point
+
+**What goes wrong:** `AlertBanner` mounted inside a specific tab panel instead of at the top of `Dashboard`. The banner would only show on one tab.
+
+**Why it happens:** Placing alert logic where other data-fetching lives (in the monitoring tab content).
+
+**How to avoid:** Mount `AlertBanner` in `Dashboard` return JSX, before the `<nav>` element. Alert fetching state lives in `Dashboard`, not in a child component.
+
+**Warning signs:** Alert banner only visible when "Monitoring" tab is active.
+
+### Pitfall 2: Banner not disappearing after acknowledge
+
+**What goes wrong:** Admin clicks "Acknowledge" on the banner. API call succeeds but banner stays. Admin must refresh the page.
+
+**Why it happens:** State update waits for API response, or state is not updated at all (only the API called).
+
+**How to avoid:** Use optimistic state update. Remove the alert from `activeAlerts` immediately before or during the API call:
+```typescript
+const handleAcknowledge = async (id: string) => {
+  setActiveAlerts(prev => prev.filter(a => a.id !== id)); // optimistic
+  try {
+    await adminService.acknowledgeAlert(id);
+  } catch {
+    // on failure: re-fetch to restore correct state
+    adminService.getAlerts().then(setActiveAlerts).catch(() => {});
+  }
+};
+```
+
+### Pitfall 3: AdminService.isAdmin hardcoded email
+
+**What goes wrong:** The existing `adminService.ts` has `private readonly ADMIN_EMAIL = 'jpressnell@bluepointcapital.com'` hardcoded. `isAdmin(user?.email)` works correctly for the current single admin. The backend also enforces admin-email check independently, so this is not a security issue — but it is a code smell.
+
+**How to avoid:** Do not change this in Phase 4. It is the existing pattern. The admin check is backend-enforced; frontend admin detection is UI-only (for showing/hiding tabs and fetching admin data).
+
+### Pitfall 4: Type mismatch between backend AlertEvent and frontend usage
+
+**What goes wrong:** Frontend defines `AlertEvent` with fields that differ from backend's `AlertEvent` interface — particularly `status` values or field names (`checkedAt` vs `checked_at`).
+
+**Why it happens:** Backend uses snake_case internally; admin route returns camelCase (see admin.ts: `checkedAt`, `latencyMs`, `errorMessage`). Alert model uses snake_case throughout (returned directly from model without remapping).
+
+**How to avoid:** Check admin.ts response shapes carefully:
+- `GET /admin/health` remaps to camelCase: `{ service, status, checkedAt, latencyMs, errorMessage }`
+- `GET /admin/alerts` returns raw `AlertEvent` model data (snake_case): `{ id, service_name, alert_type, status, message, created_at, acknowledged_at }`
+- `GET /admin/analytics` returns `AnalyticsSummary` (camelCase from analyticsService.ts)
+
+Frontend types must match these shapes exactly.
+
+### Pitfall 5: Fetching alerts/health when user is not admin
+
+**What goes wrong:** `Dashboard` calls `adminService.getAlerts()` on mount regardless of whether the user is admin. Non-admin users trigger a 404 response (backend returns 404 for non-admin, not 403, per STATE.md decision: "requireAdminEmail returns 404 not 403").
+
+**How to avoid:** Gate all admin API calls behind `isAdmin` check:
+```typescript
+useEffect(() => {
+  if (isAdmin) {
+    adminService.getAlerts().then(setActiveAlerts).catch(() => {});
+  }
+}, [isAdmin]);
+```
+
+---
+
+## Code Examples
+
+### AlertBanner Component Structure
+
+```typescript
+// Source: admin.ts route — alerts returned as raw AlertEvent (snake_case)
+import { AlertTriangle, X } from 'lucide-react';
+import { cn } from '../utils/cn';
+
+interface AlertEvent {
+  id: string;
+  service_name: string;
+  alert_type: 'service_down' | 'service_degraded' | 'recovery';
+  status: 'active' | 'acknowledged' | 'resolved';
+  message: string | null;
+  created_at: string;
+  acknowledged_at: string | null;
+  resolved_at: string | null;
+  details: Record<string, unknown> | null;
+}
+
+interface AlertBannerProps {
+  alerts: AlertEvent[];
+  onAcknowledge: (id: string) => Promise<void>;
+}
+
+const AlertBanner: React.FC<AlertBannerProps> = ({ alerts, onAcknowledge }) => {
+  const criticalAlerts = alerts.filter(a =>
+    a.status === 'active' && (a.alert_type === 'service_down' || a.alert_type === 'service_degraded')
+  );
+
+  if (criticalAlerts.length === 0) return null;
+
+  return (
+    <div className="bg-red-600 px-4 py-3">
+      {criticalAlerts.map(alert => (
+        <div key={alert.id} className="flex items-center justify-between text-white">
+          <div className="flex items-center space-x-2">
+            <AlertTriangle className="h-5 w-5 flex-shrink-0" />
+            <span className="text-sm font-medium">
+              {alert.service_name}: {alert.message ?? alert.alert_type}
+            </span>
+          </div>
+          <button
+            onClick={() => onAcknowledge(alert.id)}
+            className="flex items-center space-x-1 text-sm underline hover:no-underline ml-4"
+          >
+            <X className="h-4 w-4" />
+            <span>Acknowledge</span>
+          </button>
+        </div>
+      ))}
+    </div>
+  );
+};
+```
+
+### ServiceHealthPanel Card
+
+```typescript
+// Source: admin.ts GET /admin/health response (camelCase remapped)
+interface ServiceHealthEntry {
+  service: string;
+  status: 'healthy' | 'degraded' | 'down' | 'unknown';
+  checkedAt: string | null;
+  latencyMs: number | null;
+  errorMessage: string | null;
+}
+
+const statusStyles = {
+  healthy: { dot: 'bg-green-500', label: 'Healthy', text: 'text-green-700' },
+  degraded: { dot: 'bg-yellow-500', label: 'Degraded', text: 'text-yellow-700' },
+  down: { dot: 'bg-red-500', label: 'Down', text: 'text-red-700' },
+  unknown: { dot: 'bg-gray-400', label: 'Unknown', text: 'text-gray-600' },
+} as const;
+
+const serviceDisplayName: Record<string, string> = {
+  document_ai: 'Document AI',
+  llm_api: 'LLM API',
+  supabase: 'Supabase',
+  firebase_auth: 'Firebase Auth',
+};
+```
+
+### AdminMonitoringDashboard fetch pattern
+
+```typescript
+// Matches existing Analytics.tsx pattern — useEffect + setLoading + error state
+const AdminMonitoringDashboard: React.FC = () => {
+  const [health, setHealth] = useState<ServiceHealthEntry[]>([]);
+  const [analytics, setAnalytics] = useState<AnalyticsSummary | null>(null);
+  const [range, setRange] = useState('24h');
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  const loadData = useCallback(async () => {
+    try {
+      setLoading(true);
+      setError(null);
+      const [healthData, analyticsData] = await Promise.all([
+        adminService.getHealth(),
+        adminService.getAnalytics(range),
+      ]);
+      setHealth(healthData);
+      setAnalytics(analyticsData);
+    } catch {
+      setError('Failed to load monitoring data');
+    } finally {
+      setLoading(false);
+    }
+  }, [range]);
+
+  useEffect(() => { loadData(); }, [loadData]);
+  // ...
+};
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | Impact |
+|--------------|------------------|--------|
+| `UploadMonitoringDashboard.tsx` reads from in-memory upload tracking | New `AdminMonitoringDashboard` reads from Supabase via admin API | Data survives cold starts (HLTH-04 fulfilled) |
+| `Analytics.tsx` reads from documentService (old agentic session tables) | New analytics panel reads from `document_processing_events` via `GET /admin/analytics` | Sourced from the persistent monitoring schema built in Phase 1-3 |
+| No alert visibility in UI | `AlertBanner` above nav, auto-populated from `GET /admin/alerts` | ALRT-03 fulfilled |
+
+**Deprecated/outdated:**
+- The existing `Analytics.tsx` component (uses `documentService.getAnalytics()` which hits old session/agent tables) — Phase 4 does NOT replace it; it adds a new monitoring section alongside it. The monitoring tab is separate from the analytics tab.
+- `UploadMonitoringDashboard.tsx` — may be kept or replaced; Phase 4 should use the new `AdminMonitoringDashboard` in the `monitoring` tab.
+
+---
+
+## Open Questions
+
+1. **Does `UploadMonitoringDashboard` get replaced or supplemented?**
+   - What we know: The `monitoring` tab currently renders `UploadMonitoringDashboard`
+   - What's unclear: The Phase 4 requirement says "admin dashboard" — it's ambiguous whether to replace `UploadMonitoringDashboard` entirely or add `AdminMonitoringDashboard` below/beside it
+   - Recommendation: Replace the `monitoring` tab content with `AdminMonitoringDashboard`. The old `UploadMonitoringDashboard` tracked in-memory state that is now superseded by Supabase-backed data.
+
+2. **Alert polling: once on mount or refresh button?**
+   - What we know: WebSockets/SSE are explicitly out of scope (REQUIREMENTS.md). No polling interval is specified.
+   - What's unclear: Whether a manual "Refresh" button on the banner is expected
+   - Recommendation: Fetch alerts once on Dashboard mount (gated by `isAdmin`). Include a Refresh button on `AdminMonitoringDashboard` that also re-fetches alerts. This matches the existing `Analytics.tsx` refresh pattern.
+
+3. **Which alert types trigger the banner?**
+   - What we know: ALRT-03 says "active critical issues". AlertEvent.alert_type is `service_down | service_degraded | recovery`.
+   - What's unclear: Does `service_degraded` count as "critical"?
+   - Recommendation: Show banner for both `service_down` and `service_degraded` (both are actionable alerts that indicate a real problem). Filter out `recovery` type alerts as they are informational.
+
+---
+
+## Validation Architecture
+
+> `workflow.nyquist_validation` is not present in `.planning/config.json` — the config only has `workflow.research`, `workflow.plan_check`, and `workflow.verifier`. No `nyquist_validation` key. Skipping this section.
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- Direct codebase inspection — `frontend/src/App.tsx` (tab structure, admin detection pattern)
+- Direct codebase inspection — `frontend/src/services/adminService.ts` (axios client, existing methods)
+- Direct codebase inspection — `frontend/src/components/Analytics.tsx` (data-fetch pattern, Tailwind layout)
+- Direct codebase inspection — `frontend/src/contexts/AuthContext.tsx` (useAuth hook, token/user state)
+- Direct codebase inspection — `backend/src/routes/admin.ts` (exact API response shapes)
+- Direct codebase inspection — `backend/src/models/AlertEventModel.ts` (AlertEvent type, field names and casing)
+- Direct codebase inspection — `backend/src/services/analyticsService.ts` (AnalyticsSummary interface)
+- Direct codebase inspection — `frontend/package.json` (installed libraries, confirmed no test framework)
+
+### Secondary (MEDIUM confidence)
+- Pattern inference from existing components (`UploadMonitoringDashboard.tsx` status indicator pattern)
+- React 18 state/effect patterns — verified against direct code inspection in existing components
+
+### Tertiary (LOW confidence)
+- None
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — verified directly from package.json and existing component files
+- Architecture: HIGH — patterns derived directly from existing App.tsx tab system and adminService.ts structure
+- API response shapes: HIGH — read directly from backend admin.ts route and model files
+- Pitfalls: HIGH — derived from direct code inspection and STATE.md decisions
+
+**Research date:** 2026-02-24
+**Valid until:** 2026-03-24 (stable — no moving dependencies; all external packages are pinned)
--- a/.planning/milestones/v1.0-phases/04-frontend/04-VERIFICATION.md
+++ b/.planning/milestones/v1.0-phases/04-frontend/04-VERIFICATION.md
@@ -0,0 +1,154 @@
+---
+phase: 04-frontend
+verified: 2026-02-25T00:10:00Z
+status: human_needed
+score: 4/4 must-haves verified
+human_verification:
+  - test: "Alert banner appears on all tabs when active critical alerts exist"
+    expected: "A red banner shows above the nav bar on overview, documents, and upload tabs — not just on the monitoring tab — whenever there are active service_down or service_degraded alerts"
+    why_human: "Cannot trigger live alerts programmatically; requires the backend health probe to actually record a service failure in Supabase"
+  - test: "Alert acknowledge removes the banner immediately (optimistic update)"
+    expected: "Clicking Acknowledge on a banner alert removes it instantly without a page reload, even before the API call completes"
+    why_human: "Requires live alert data in the database and a running application to observe the optimistic update behavior"
+  - test: "Monitoring tab shows health status cards for all four services with colored dots and last-checked timestamp"
+    expected: "Document AI, LLM API, Supabase, Firebase Auth each appear as a card; each card has a colored dot (green/yellow/red/gray) and shows a human-readable last-checked timestamp or 'Never checked'"
+    why_human: "Requires the backend GET /admin/health endpoint to return live data from the service_health_checks table"
+  - test: "Processing analytics shows real Supabase-sourced data with range selector"
+    expected: "Total Uploads, Succeeded, Failed, Success Rate, Avg Processing Time stat cards show values from Supabase; changing the range selector to 7d or 30d fetches updated figures"
+    why_human: "Requires processed documents and analytics events in Supabase to validate that data is real and not empty/stubbed"
+  - test: "Non-admin user navigating to monitoring tab sees Access Denied"
+    expected: "A logged-in non-admin user who somehow reaches activeTab=monitoring (e.g., via browser state manipulation) sees the Access Denied message, not the AdminMonitoringDashboard"
+    why_human: "The tab button is hidden for non-admins but a runtime state change cannot be tested without a non-admin account in the running app"
+---
+
+# Phase 4: Frontend Verification Report
+
+**Phase Goal:** The admin can see live service health, processing metrics, and active alerts directly in the application UI
+**Verified:** 2026-02-25T00:10:00Z
+**Status:** human_needed
+**Re-verification:** No — initial verification
+
+## Goal Achievement
+
+### Observable Truths (from Phase 4 Success Criteria)
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|---------|
+| 1 | An alert banner appears at the top of the admin UI when there is at least one unacknowledged critical alert, and disappears after the admin acknowledges it | ? NEEDS HUMAN | Code structure is correct: AlertBanner is rendered above `<nav>`, filters to `status=active` AND `alert_type` in `[service_down, service_degraded]`, calls `onAcknowledge` which immediately filters local state. Cannot confirm without live alert data. |
+| 2 | The admin dashboard shows health status indicators (green/yellow/red) for all four services, with the last-checked timestamp visible | ? NEEDS HUMAN | AdminMonitoringDashboard renders a 1x4 service health grid with `bg-green-500`/`bg-yellow-500`/`bg-red-500`/`bg-gray-400` dots and `toLocaleString()` timestamps. Requires live backend data to confirm rendering. |
+| 3 | The admin dashboard shows processing metrics (upload counts, success/failure rates, average processing time) sourced from the persistent Supabase backend | ? NEEDS HUMAN | Component calls `adminService.getAnalytics(range)` which hits `GET /admin/analytics` (a Supabase-backed endpoint verified in Phase 3). Stat cards render all five metrics. Cannot confirm real data without running the app. |
+| 4 | A non-admin user visiting the admin route is redirected or shown an access-denied state | ✓ VERIFIED | App.tsx lines 726-733: `{activeTab === 'monitoring' && !isAdmin && (<div>...<h3>Access Denied</h3>...)}`. Tab button is also hidden (`{isAdmin && (...<button onClick monitoring>)}`). Both layers present. |
+
+**Score:** 1 automated + 3 human-needed out of 4 truths
+
+---
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `frontend/src/services/adminService.ts` | Monitoring API client methods with typed interfaces | VERIFIED | Exports `AlertEvent`, `ServiceHealthEntry`, `AnalyticsSummary` interfaces; contains `getHealth()`, `getAnalytics(range)`, `getAlerts()`, `acknowledgeAlert(id)` methods. All pre-existing methods preserved. |
+| `frontend/src/components/AlertBanner.tsx` | Global alert banner with acknowledge callback | VERIFIED | 44 lines, substantive. Filters to critical active alerts, renders red banner with `AlertTriangle` icon + per-alert X button. Exports `AlertBanner` as default and named export. |
+| `frontend/src/components/AdminMonitoringDashboard.tsx` | Health panel + analytics summary panel | VERIFIED | 178 lines, substantive. Fetches via `Promise.all`, renders health grid + analytics stat cards with range selector and Refresh button. Exports `AdminMonitoringDashboard`. |
+| `frontend/src/App.tsx` | Dashboard with AlertBanner wired above nav and AdminMonitoringDashboard in monitoring tab | VERIFIED | AlertBanner at line 422 (above `<nav>` at line 426). AdminMonitoringDashboard at line 713 in monitoring tab. `activeAlerts` state + `handleAcknowledge` + `isAdmin`-gated `useEffect` all present. |
+
+---
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `AlertBanner.tsx` | `adminService.ts` | `AlertEvent` type import | WIRED | Line 3: `import { AlertEvent } from '../services/adminService'` |
+| `AdminMonitoringDashboard.tsx` | `adminService.ts` | `adminService.getHealth()` and `getAnalytics()` calls | WIRED | Lines 4-7: imports `adminService`, `ServiceHealthEntry`, `AnalyticsSummary`; lines 38-41: `Promise.all([adminService.getHealth(), adminService.getAnalytics(range)])` |
+| `App.tsx` | `AlertBanner.tsx` | import and render above nav | WIRED | Line 10: `import AlertBanner from './components/AlertBanner'`; line 422: `<AlertBanner alerts={activeAlerts} onAcknowledge={handleAcknowledge} />` above `<nav>` at line 426 |
+| `App.tsx` | `AdminMonitoringDashboard.tsx` | import and render in monitoring tab | WIRED | Line 11: `import AdminMonitoringDashboard from './components/AdminMonitoringDashboard'`; line 713: `<AdminMonitoringDashboard />` in monitoring tab conditional |
+| `App.tsx` | `adminService.ts` | `adminService.getAlerts()` in Dashboard `useEffect` | WIRED | Line 14: `import { adminService, AlertEvent } from './services/adminService'`; lines 44-48: `useEffect(() => { if (isAdmin) { adminService.getAlerts().then(setActiveAlerts).catch(() => {}); } }, [isAdmin])` |
+
+All 5 key links verified as fully wired.
+
+---
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|---------|
+| ALRT-03 | 04-01, 04-02 | Admin sees in-app alert banner for active critical issues | VERIFIED (code) | AlertBanner component exists and is mounted above nav in App.tsx; filters to `status=active` and `alert_type in [service_down, service_degraded]` |
+| ANLY-02 | 04-01, 04-02 | Admin can view processing summary: upload counts, success/failure rates, avg processing time | VERIFIED (code) | AdminMonitoringDashboard renders 5 stat cards; calls `GET /admin/analytics` which returns Supabase-sourced data (Phase 3 responsibility, confirmed complete) |
+| HLTH-01 | 04-01, 04-02 | Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth | VERIFIED (code) | AdminMonitoringDashboard health grid uses service display name mapping for all 4 services; status dot color mapping covers healthy/degraded/down/unknown |
+
+**Orphaned requirements check:** REQUIREMENTS.md traceability table maps ALRT-03 to Phase 4 only. ANLY-02 and HLTH-01 are listed under Phase 3 for the API layer and Phase 4 for the UI delivery. All three requirement IDs from the plan are accounted for with no orphans.
+
+---
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| `AlertBanner.tsx` | 18 | `return null` | INFO | Correct behavior — component intentionally renders nothing when no critical alerts exist |
+| `App.tsx` | 82-84, 98, 253, 281, 294, 331 | `console.log` statements | WARNING | Pre-existing code in document fetch/upload/download handlers. Not introduced by Phase 4 changes. Does not affect monitoring functionality. |
+
+No blocker anti-patterns found. The `return null` in AlertBanner is intentional and correct. The `console.log` statements are pre-existing and outside the scope of Phase 4 changes.
+
+---
+
+### TypeScript Compilation
+
+`npx tsc --noEmit` passed with zero errors (confirmed via command output — no errors produced).
+
+### Git Commits Verified
+
+All three Phase 4 implementation commits confirmed to exist:
+- `f84a822` — feat(04-01): extend adminService with monitoring API methods and types
+- `b457b9e` — feat(04-01): create AlertBanner and AdminMonitoringDashboard components
+- `6c345a6` — feat(04-02): wire AlertBanner and AdminMonitoringDashboard into Dashboard
+
+---
+
+### Human Verification Required
+
+#### 1. Alert Banner — Live Critical Alerts
+
+**Test:** Ensure at least one `alert_events` row exists in Supabase with `status='active'` and `alert_type` in `('service_down','service_degraded')`. Log in as the admin user (jpressnell@bluepointcapital.com), navigate to the dashboard, and check all tabs (Overview, Documents, Upload, Monitoring).
+
+**Expected:** A red banner appears above the top navigation bar on every tab showing the service name and message, with an "Acknowledge" button. Clicking Acknowledge removes only that alert's row from the banner immediately (before the API call completes), and the banner disappears entirely if it was the last alert.
+
+**Why human:** Requires a live alert record in the database and a running frontend+backend. The optimistic update behavior (instant disappear) cannot be verified through static code analysis alone.
+
+#### 2. Health Status Grid — Real Data
+
+**Test:** Log in as admin and click the Monitoring tab.
+
+**Expected:** Four service cards appear (Document AI, LLM API, Supabase, Firebase Auth). Each card shows a colored status dot and a human-readable timestamp ("Never checked" is acceptable if health probes have not run yet, but the card must still render with `bg-gray-400` unknown dot).
+
+**Why human:** Requires the backend `GET /admin/health` endpoint to return data. Empty arrays are valid if no probes have run, but the grid must render the cards (currently the `{health.map(...)}` renders zero cards if the array is empty — no placeholder cards shown for the four expected services if the backend returns an empty array).
+
+**Note on potential gap:** The AdminMonitoringDashboard renders health cards only from the data returned by the API (`health.map((entry) => ...)`). If `GET /admin/health` returns an empty array (no probes run yet), zero cards appear instead of four placeholder cards. This is a UX concern but not a blocker for the requirement as stated (HLTH-01 requires viewing status, which implies data must exist).
+
+#### 3. Analytics Panel — Range Selector
+
+**Test:** On the Monitoring tab analytics panel, change the range selector from "Last 24h" to "Last 7d" and then "Last 30d".
+
+**Expected:** Each selection triggers a new API call to `GET /admin/analytics?range=7d` (or `30d`) and updates the five stat cards with fresh values.
+
+**Why human:** Requires real analytics events in Supabase to observe value changes. The range selector triggers a `setRange` state change which re-runs `loadData()` via `useCallback` dependency — the mechanism is correct but output requires live data to confirm.
+
+#### 4. Non-Admin Access Denied
+
+**Test:** Log in with a non-admin account. Attempt to reach the monitoring tab (the tab button will not be visible, but try navigating directly if possible).
+
+**Expected:** If `activeTab` is somehow set to `'monitoring'`, the non-admin user sees the Access Denied panel, not the AdminMonitoringDashboard.
+
+**Why human:** The tab button is hidden from non-admins, making this path hard to reach normally. A non-admin account is required to fully verify the fallback.
+
+---
+
+### Gaps Summary
+
+No structural gaps found. All code artifacts exist, are substantive, and are fully wired. TypeScript compiles with zero errors. The three items that cannot be verified programmatically are runtime behaviors requiring live backend data: alert banner with real alert records, health grid with real probe data, and analytics panel with real event data. These are expected human verification tasks for any frontend monitoring UI.
+
+The one code-level observation worth noting: AdminMonitoringDashboard renders zero health cards if the backend returns an empty array (no probes run). A future improvement could show four placeholder cards for the four known services even when data is absent. This is not a requirement gap (HLTH-01 says "can view" which requires data to exist) but may cause confusion during initial setup.
+
+---
+
+_Verified: 2026-02-25T00:10:00Z_
+_Verifier: Claude (gsd-verifier)_
--- a/.planning/milestones/v1.0-phases/05-tech-debt-cleanup/.gitkeep
+++ b/.planning/milestones/v1.0-phases/05-tech-debt-cleanup/.gitkeep
--- a/AGENTIC_RAG_IMPLEMENTATION_PLAN.md
+++ b/AGENTIC_RAG_IMPLEMENTATION_PLAN.md
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,23 @@
+# Repository Guidelines
+
+## Project Structure & Module Organization
+Two runnable apps sit alongside documentation in the repo root. `backend/src` holds the Express API with `config`, `routes`, `services`, `models`, `middleware`, and `__tests__`, while automation lives under `backend/src/scripts` and `backend/scripts`. `frontend/src` is the Vite + React client organized by `components`, `contexts`, `services`, `types`, and Tailwind assets. Update the matching guide (`DEPLOYMENT_GUIDE.md`, `TESTING_STRATEGY_DOCUMENTATION.md`, etc.) when you alter that area.
+
+## Build, Test, and Development Commands
+- `cd backend && npm run dev` – ts-node-dev server (port 5001) with live reload.
+- `cd backend && npm run build` – TypeScript compile plus Puppeteer config copy for deployments.
+- `cd backend && npm run test|test:watch|test:coverage` – Vitest suites in `src/__tests__`.
+- `cd backend && npm run test:postgres` then `npm run test:job <docId>` – verify Supabase/PostgreSQL plumbing per `QUICK_START.md`.
+- `cd frontend && npm run dev` (port 5173) or `npm run build && npm run preview` for release smoke tests.
+
+## Coding Style & Naming Conventions
+Use TypeScript everywhere, ES modules, and 2-space indentation. ESLint (`backend/.eslintrc.js` plus Vite defaults) enforces `@typescript-eslint/no-unused-vars`, warns on `any`, and blocks undefined globals; run `npm run lint` before pushing. React components stay `PascalCase`, functions/utilities `camelCase`, env vars `SCREAMING_SNAKE_CASE`, and DTOs belong in `backend/src/types`.
+
+## Testing Guidelines
+Backend unit and service tests reside in `backend/src/__tests__` (Vitest). Cover any change to ingestion, job orchestration, or financial parsing—assert both success/failure paths and lean on fixtures for large CIM payloads. Integration confidence comes from the scripted probes (`npm run test:postgres`, `npm run test:pipeline`, `npm run check:pipeline`). Frontend work currently depends on manual verification; for UX-critical updates, either add Vitest + Testing Library suites under `frontend/src/__tests__` or attach before/after screenshots.
+
+## Commit & Pull Request Guidelines
+Branch off `main`, keep commits focused, and use imperative subjects similar to `Fix EBITDA margin auto-correction`. Each PR must state motivation, summarize code changes, link tickets, and attach test or script output plus UI screenshots for visual tweaks. Highlight migrations or env updates, flag auth/storage changes for security review, and wait for at least one approval before merging.
+
+## Security & Configuration Notes
+Mirror `.env.example` locally but store production secrets via Firebase Functions secrets or Supabase settings—never commit credentials. Keep `DATABASE_URL`, `SUPABASE_*`, Google Cloud, and AI provider keys current before running pipeline scripts, and rotate service accounts if logs leave the network. Use correlation IDs from `backend/src/middleware/errorHandler.ts` for troubleshooting instead of logging raw payloads.
--- a/CIM_REVIEW_PDF_TEMPLATE.md
+++ b/CIM_REVIEW_PDF_TEMPLATE.md
@@ -1,539 +0,0 @@
-# CIM Review PDF Template
-## HTML Template for Professional CIM Review Reports
-
-### 🎯 Overview
-
-This document contains the HTML template used by the PDF Generation Service to create professional CIM Review reports. The template includes comprehensive styling and structure for generating high-quality PDF documents.
-
---
-
-## 📄 HTML Template
-
-```html
-<!DOCTYPE html>
-<html>
-<head>
-  <meta charset="UTF-8">
-  <title>CIM Review Report</title>
-  <style>
-    :root {
-      --page-margin: 0.75in;
-      --radius: 10px;
-      --shadow: 0 12px 30px -10px rgba(0,0,0,0.08);
-      --color-bg: #ffffff;
-      --color-muted: #f5f7fa;
-      --color-text: #1f2937;
-      --color-heading: #111827;
-      --color-border: #dfe3ea;
-      --color-primary: #5f6cff;
-      --color-primary-dark: #4a52d1;
-      --color-success-bg: #e6f4ea;
-      --color-success-border: #38a169;
-      --color-highlight-bg: #fff8ed;
-      --color-highlight-border: #f29f3f;
-      --color-summary-bg: #eef7fe;
-      --color-summary-border: #3182ce;
-      --font-stack: -apple-system, system-ui, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
-    }
-
-    @page {
-      margin: var(--page-margin);
-      size: A4;
-    }
-
-    * { box-sizing: border-box; }
-
-    body {
-      margin: 0;
-      padding: 0;
-      font-family: var(--font-stack);
-      background: var(--color-bg);
-      color: var(--color-text);
-      line-height: 1.45;
-      font-size: 11pt;
-    }
-
-    .container {
-      max-width: 940px;
-      margin: 0 auto;
-    }
-
-    .header {
-      display: flex;
-      flex-wrap: wrap;
-      justify-content: space-between;
-      align-items: flex-start;
-      padding: 24px 20px;
-      background: #f9fbfc;
-      border-radius: var(--radius);
-      border: 1px solid var(--color-border);
-      margin-bottom: 28px;
-      gap: 12px;
-    }
-
-    .header-left {
-      flex: 1 1 300px;
-    }
-
-    .title {
-      margin: 0;
-      font-size: 24pt;
-      font-weight: 700;
-      color: var(--color-heading);
-      position: relative;
-      display: inline-block;
-      padding-bottom: 4px;
-    }
-
-    .title:after {
-      content: '';
-      position: absolute;
-      left: 0;
-      bottom: 0;
-      height: 4px;
-      width: 60px;
-      background: linear-gradient(90deg, var(--color-primary), var(--color-primary-dark));
-      border-radius: 2px;
-    }
-
-    .subtitle {
-      margin: 4px 0 0 0;
-      font-size: 10pt;
-      color: #6b7280;
-    }
-
-    .meta {
-      text-align: right;
-      font-size: 9pt;
-      color: #6b7280;
-      min-width: 180px;
-      line-height: 1.3;
-    }
-
-    .section {
-      margin-bottom: 28px;
-      padding: 22px 24px;
-      background: #ffffff;
-      border-radius: var(--radius);
-      border: 1px solid var(--color-border);
-      box-shadow: var(--shadow);
-      page-break-inside: avoid;
-    }
-
-    .section + .section {
-      margin-top: 4px;
-    }
-
-    h2 {
-      margin: 0 0 14px 0;
-      font-size: 18pt;
-      font-weight: 600;
-      color: var(--color-heading);
-      display: flex;
-      align-items: center;
-      gap: 8px;
-    }
-
-    h3 {
-      margin: 16px 0 8px 0;
-      font-size: 13pt;
-      font-weight: 600;
-      color: #374151;
-    }
-
-    .field {
-      display: flex;
-      flex-wrap: wrap;
-      gap: 12px;
-      margin-bottom: 14px;
-    }
-
-    .field-label {
-      flex: 0 0 180px;
-      font-size: 9pt;
-      font-weight: 600;
-      text-transform: uppercase;
-      letter-spacing: 0.8px;
-      color: #4b5563;
-      margin: 0;
-    }
-
-    .field-value {
-      flex: 1 1 220px;
-      font-size: 11pt;
-      color: var(--color-text);
-      margin: 0;
-    }
-
-    .financial-table {
-      width: 100%;
-      border-collapse: collapse;
-      margin: 16px 0;
-      font-size: 10pt;
-    }
-
-    .financial-table th,
-    .financial-table td {
-      padding: 10px 12px;
-      text-align: left;
-      vertical-align: top;
-    }
-
-    .financial-table thead th {
-      background: var(--color-primary);
-      color: #fff;
-      font-weight: 600;
-      text-transform: uppercase;
-      letter-spacing: 0.5px;
-      font-size: 9pt;
-      border-bottom: 2px solid rgba(255,255,255,0.2);
-    }
-
-    .financial-table tbody tr {
-      border-bottom: 1px solid #eceef1;
-    }
-
-    .financial-table tbody tr:nth-child(odd) td {
-      background: #fbfcfe;
-    }
-
-    .financial-table td {
-      background: #fff;
-      color: var(--color-text);
-      font-size: 10pt;
-    }
-
-    .financial-table tbody tr:hover td {
-      background: #f1f5fa;
-    }
-
-    .summary-box,
-    .highlight-box,
-    .success-box {
-      border-radius: 8px;
-      padding: 16px 18px;
-      margin: 18px 0;
-      position: relative;
-      font-size: 11pt;
-    }
-
-    .summary-box {
-      background: var(--color-summary-bg);
-      border: 1px solid var(--color-summary-border);
-    }
-
-    .highlight-box {
-      background: var(--color-highlight-bg);
-      border: 1px solid var(--color-highlight-border);
-    }
-
-    .success-box {
-      background: var(--color-success-bg);
-      border: 1px solid var(--color-success-border);
-    }
-
-    .footer {
-      display: flex;
-      flex-wrap: wrap;
-      justify-content: space-between;
-      align-items: center;
-      padding: 18px 20px;
-      font-size: 9pt;
-      color: #6b7280;
-      border-top: 1px solid var(--color-border);
-      margin-top: 30px;
-      background: #f9fbfc;
-      border-radius: var(--radius);
-      gap: 8px;
-    }
-
-    .footer .left,
-    .footer .right {
-      flex: 1 1 200px;
-    }
-
-    .footer .center {
-      flex: 0 0 auto;
-      text-align: center;
-    }
-
-    .small {
-      font-size: 8.5pt;
-    }
-
-    .divider {
-      height: 1px;
-      background: var(--color-border);
-      margin: 16px 0;
-      border: none;
-    }
-
-    /* Utility */
-    .inline-block { display: inline-block; }
-    .muted { color: #6b7280; }
-
-    /* Page numbering for PDF (supported in many engines including Puppeteer) */
-    .page-footer {
-      position: absolute;
-      bottom: 0;
-      width: 100%;
-      font-size: 8pt;
-      text-align: center;
-      padding: 8px 0;
-      color: #9ca3af;
-    }
-  </style>
-</head>
-<body>
-  <div class="container">
-    <div class="header">
-      <div class="header-left">
-        <h1 class="title">CIM Review Report</h1>
-        <p class="subtitle">Professional Investment Analysis</p>
-      </div>
-      <div class="meta">
-        <div>Generated on ${new Date().toLocaleDateString()}</div>
-        <div style="margin-top:4px;">at ${new Date().toLocaleTimeString()}</div>
-      </div>
-    </div>
-
-    <!-- Dynamic Content Sections -->
-    <!-- Example of how your loop would insert sections: -->
-    <!--
-    <div class="section">
-      <h2><span class="section-icon">📊</span>Deal Overview</h2>
-      ...fields / tables...
-    </div>
-    -->
-
-    <!-- Footer -->
-    <div class="footer">
-      <div class="left">
-        <strong>BPCP CIM Document Processor</strong> | Professional Investment Analysis | Confidential
-      </div>
-      <div class="center small">
-        Generated on ${new Date().toLocaleDateString()} at ${new Date().toLocaleTimeString()}
-      </div>
-      <div class="right" style="text-align:right;">
-        Page <span class="page-number"></span>
-      </div>
-    </div>
-  </div>
-
-  <!-- Optional script to inject page numbers if using Puppeteer -->
-  <script>
-    // Puppeteer can replace this with its own page numbering; if not, simple fallback:
-    document.querySelectorAll('.page-number').forEach(el => {
-      // placeholder; leave blank or inject via PDF generation tooling
-      el.textContent = '';
-    });
-  </script>
-</body>
-</html>
-```
-
---
-
-## 🎨 CSS Styling Features
-
-### **Design System**
- **CSS Variables**: Centralized design tokens for consistency
- **Modern Color Palette**: Professional grays, blues, and accent colors
- **Typography**: System font stack for optimal rendering
- **Spacing**: Consistent spacing using design tokens
-
-### **Typography**
- **Font Stack**: -apple-system, system-ui, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif
- **Line Height**: 1.45 for optimal readability
- **Font Sizes**: 8.5pt to 24pt range for hierarchy
- **Color Scheme**: Professional grays and modern blue accent
-
-### **Layout**
- **Page Size**: A4 with 0.75in margins
- **Container**: Max-width 940px for optimal reading
- **Flexbox Layout**: Modern responsive design
- **Section Spacing**: 28px between sections with 4px gaps
-
-### **Visual Elements**
-
-#### **Headers**
- **Main Title**: 24pt with underline accent in primary color
- **Section Headers**: 18pt with icons and flexbox layout
- **Subsection Headers**: 13pt for organization
-
-#### **Content Sections**
- **Background**: White with subtle borders and shadows
- **Border Radius**: 10px for modern appearance
- **Box Shadows**: Sophisticated shadow with 12px blur
- **Padding**: 22px horizontal, 24px vertical for comfortable reading
- **Page Break**: Avoid page breaks within sections
-
-#### **Fields**
- **Layout**: Flexbox with label-value pairs
- **Labels**: 9pt uppercase with letter spacing (180px width)
- **Values**: 11pt standard text (flexible width)
- **Spacing**: 12px gap between label and value
-
-#### **Financial Tables**
- **Header**: Primary color background with white text
- **Rows**: Alternating colors for easy scanning
- **Hover Effects**: Subtle highlighting on hover
- **Typography**: 10pt for table content, 9pt for headers
-
-#### **Special Boxes**
- **Summary Box**: Light blue background for key information
- **Highlight Box**: Light orange background for important notes
- **Success Box**: Light green background for positive indicators
- **Consistent**: 8px border radius and 16px padding
-
---
-
-## 📋 Section Structure
-
-### **Report Sections**
-1. **Deal Overview** 📊
-2. **Business Description** 🏢
-3. **Market & Industry Analysis** 📈
-4. **Financial Summary** 💰
-5. **Management Team Overview** 👥
-6. **Preliminary Investment Thesis** 🎯
-7. **Key Questions & Next Steps** ❓
-
-### **Data Handling**
- **Simple Fields**: Direct text display
- **Nested Objects**: Structured field display
- **Financial Data**: Tabular format with periods
- **Arrays**: List format when applicable
-
---
-
-## 🔧 Template Variables
-
-### **Dynamic Content**
- `${new Date().toLocaleDateString()}` - Current date
- `${new Date().toLocaleTimeString()}` - Current time
- `${section.icon}` - Section emoji icons
- `${section.title}` - Section titles
- `${this.formatFieldName(key)}` - Formatted field names
- `${value}` - Field values
-
-### **Financial Table Structure**
-```html
-<table class="financial-table">
-  <thead>
-    <tr>
-      <th>Period</th>
-      <th>Revenue</th>
-      <th>Growth</th>
-      <th>EBITDA</th>
-      <th>Margin</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td><strong>FY3</strong></td>
-      <td>${data?.revenue || '-'}</td>
-      <td>${data?.revenueGrowth || '-'}</td>
-      <td>${data?.ebitda || '-'}</td>
-      <td>${data?.ebitdaMargin || '-'}</td>
-    </tr>
-    <!-- Additional periods: FY2, FY1, LTM -->
-  </tbody>
-</table>
-```
-
---
-
-## 🎯 Usage in Code
-
-### **Template Integration**
-```typescript
-// In pdfGenerationService.ts
-private generateCIMReviewHTML(analysisData: any): string {
-  const sections = [
-    { title: 'Deal Overview', data: analysisData.dealOverview, icon: '📊' },
-    { title: 'Business Description', data: analysisData.businessDescription, icon: '🏢' },
-    // ... additional sections
-  ];
-
-  // Generate HTML with template
-  let html = `<!DOCTYPE html>...`;
-  
-  sections.forEach(section => {
-    if (section.data) {
-      html += `<div class="section"><h2><span class="section-icon">${section.icon}</span>${section.title}</h2>`;
-      // Process section data
-      html += `</div>`;
-    }
-  });
-
-  return html;
-}
-```
-
-### **PDF Generation**
-```typescript
-async generateCIMReviewPDF(analysisData: any): Promise<Buffer> {
-  const html = this.generateCIMReviewHTML(analysisData);
-  const page = await this.getPage();
-  
-  await page.setContent(html, { waitUntil: 'networkidle0' });
-  const pdfBuffer = await page.pdf({
-    format: 'A4',
-    printBackground: true,
-    margin: { top: '0.75in', right: '0.75in', bottom: '0.75in', left: '0.75in' }
-  });
-  
-  this.releasePage(page);
-  return pdfBuffer;
-}
-```
-
---
-
-## 🚀 Customization Options
-
-### **Design System Customization**
- **CSS Variables**: Update `:root` variables for consistent theming
- **Color Palette**: Modify primary, success, highlight, and summary colors
- **Typography**: Change font stack and sizing
- **Spacing**: Adjust margins, padding, and gaps using design tokens
-
-### **Styling Modifications**
- **Colors**: Update CSS variables for brand colors
- **Fonts**: Change font-family for different styles
- **Layout**: Adjust margins, padding, and spacing
- **Effects**: Modify shadows, borders, and visual effects
-
-### **Content Structure**
- **Sections**: Add or remove report sections
- **Fields**: Customize field display formats
- **Tables**: Modify financial table structure
- **Icons**: Change section icons and styling
-
-### **Branding**
- **Header**: Update company name and logo
- **Footer**: Modify footer content and styling
- **Colors**: Implement brand color scheme
- **Typography**: Use brand fonts
-
---
-
-## 📊 Performance Considerations
-
-### **Optimization Features**
- **CSS Variables**: Efficient design token system
- **Font Loading**: System fonts for fast rendering
- **Image Handling**: No external images for reliability
- **Print Optimization**: Print-specific CSS rules
- **Flexbox Layout**: Modern, efficient layout system
-
-### **Browser Compatibility**
- **Puppeteer**: Optimized for headless browser rendering
- **CSS Support**: Modern CSS features for visual appeal
- **Fallbacks**: Graceful degradation for older browsers
- **Print Support**: Print-friendly styling
-
---
-
-This HTML template provides a professional, visually appealing foundation for CIM Review PDF generation, with comprehensive styling and flexible content structure. 
--- a/CLEANUP_ANALYSIS_REPORT.md
+++ b/CLEANUP_ANALYSIS_REPORT.md
@@ -1,373 +0,0 @@
-# Cleanup Analysis Report
-## Comprehensive Analysis of Safe Cleanup Opportunities
-
-### 🎯 Overview
-
-This report analyzes the current codebase to identify files and folders that can be safely removed while preserving only what's needed for the working CIM Document Processor system.
-
---
-
-## 📋 Current System Architecture
-
-### Core Components (KEEP)
- **Backend**: Node.js + Express + TypeScript
- **Frontend**: React + TypeScript + Vite
- **Database**: Supabase (PostgreSQL)
- **Storage**: Firebase Storage
- **Authentication**: Firebase Auth
- **AI Services**: Google Document AI + Claude AI/OpenAI
-
-### Documentation (KEEP)
- All comprehensive documentation created during the 7-phase documentation plan
- Configuration guides and operational procedures
-
---
-
-## 🗑️ Safe Cleanup Categories
-
-### 1. Test and Development Files (REMOVE)
-
-#### **Backend Test Files**
-```bash
-# Individual test files (outdated architecture)
-backend/test-db-connection.js
-backend/test-llm-processing.js
-backend/test-vector-fallback.js
-backend/test-vector-search.js
-backend/test-chunk-insert.js
-backend/check-recent-document.js
-backend/check-table-schema-simple.js
-backend/check-table-schema.js
-backend/create-rpc-function.js
-backend/create-vector-table.js
-backend/try-create-function.js
-```
-
-#### **Backend Scripts Directory (Mostly REMOVE)**
-```bash
-# Test and development scripts
-backend/scripts/test-document-ai-integration.js
-backend/scripts/test-full-integration.js
-backend/scripts/test-integration-with-mock.js
-backend/scripts/test-production-db.js
-backend/scripts/test-real-processor.js
-backend/scripts/test-supabase-client.js
-backend/scripts/test_exec_sql.js
-backend/scripts/simple-document-ai-test.js
-backend/scripts/test-database-working.js
-
-# Setup scripts (keep essential ones)
-backend/scripts/setup-complete.js          # KEEP - essential setup
-backend/scripts/setup-document-ai.js       # KEEP - essential setup
-backend/scripts/setup_supabase.js          # KEEP - essential setup
-backend/scripts/create-supabase-tables.js  # KEEP - essential setup
-backend/scripts/run-migrations.js          # KEEP - essential setup
-backend/scripts/run-production-migrations.js # KEEP - essential setup
-```
-
-### 2. Build and Cache Directories (REMOVE)
-
-#### **Build Artifacts**
-```bash
-backend/dist/                    # Build output (regenerated)
-frontend/dist/                   # Build output (regenerated)
-backend/coverage/                # Test coverage (no longer needed)
-```
-
-#### **Cache Directories**
-```bash
-backend/.cache/                  # Build cache
-frontend/.firebase/              # Firebase cache
-frontend/node_modules/           # Dependencies (regenerated)
-backend/node_modules/            # Dependencies (regenerated)
-node_modules/                    # Root dependencies (regenerated)
-```
-
-### 3. Temporary and Log Files (REMOVE)
-
-#### **Log Files**
-```bash
-backend/logs/app.log             # Application logs (regenerated)
-backend/logs/error.log           # Error logs (regenerated)
-backend/logs/upload.log          # Upload logs (regenerated)
-```
-
-#### **Upload Directories**
-```bash
-backend/uploads/                 # Local uploads (using Firebase Storage)
-```
-
-### 4. Development and IDE Files (REMOVE)
-
-#### **IDE Configuration**
-```bash
-.vscode/                         # VS Code settings
-.claude/                         # Claude IDE settings
-.kiro/                          # Kiro IDE settings
-```
-
-#### **Development Scripts**
-```bash
-# Root level scripts (mostly cleanup/utility)
-cleanup_gcs.sh                   # GCS cleanup script
-check_gcf_bucket.sh              # GCF bucket check
-cleanup_gcf_bucket.sh            # GCF bucket cleanup
-```
-
-### 5. Redundant Configuration Files (REMOVE)
-
-#### **Duplicate Configuration**
-```bash
-# Root level configs (backend/frontend have their own)
-firebase.json                    # Root firebase config (duplicate)
-cors.json                        # Root CORS config (duplicate)
-storage.cors.json                # Storage CORS config
-storage.rules                    # Storage rules
-package.json                     # Root package.json (minimal)
-package-lock.json                # Root package-lock.json
-```
-
-### 6. SQL Setup Files (KEEP ESSENTIAL)
-
-#### **Database Setup**
-```bash
-# KEEP - Essential database setup
-backend/supabase_setup.sql       # Core database setup
-backend/supabase_vector_setup.sql # Vector database setup
-backend/vector_function.sql      # Vector functions
-
-# REMOVE - Redundant
-backend/DATABASE.md              # Superseded by comprehensive documentation
-```
-
---
-
-## 🎯 Recommended Cleanup Strategy
-
-### Phase 1: Remove Test and Development Files
-```bash
-# Remove individual test files
-rm backend/test-*.js
-rm backend/check-*.js
-rm backend/create-*.js
-rm backend/try-create-function.js
-
-# Remove test scripts
-rm backend/scripts/test-*.js
-rm backend/scripts/simple-document-ai-test.js
-rm backend/scripts/test_exec_sql.js
-```
-
-### Phase 2: Remove Build and Cache Directories
-```bash
-# Remove build artifacts
-rm -rf backend/dist/
-rm -rf frontend/dist/
-rm -rf backend/coverage/
-
-# Remove cache directories
-rm -rf backend/.cache/
-rm -rf frontend/.firebase/
-rm -rf backend/node_modules/
-rm -rf frontend/node_modules/
-rm -rf node_modules/
-```
-
-### Phase 3: Remove Temporary Files
-```bash
-# Remove logs (regenerated on startup)
-rm -rf backend/logs/
-
-# Remove local uploads (using Firebase Storage)
-rm -rf backend/uploads/
-```
-
-### Phase 4: Remove Development Files
-```bash
-# Remove IDE configurations
-rm -rf .vscode/
-rm -rf .claude/
-rm -rf .kiro/
-
-# Remove utility scripts
-rm cleanup_gcs.sh
-rm check_gcf_bucket.sh
-rm cleanup_gcf_bucket.sh
-```
-
-### Phase 5: Remove Redundant Configuration
-```bash
-# Remove root level configs
-rm firebase.json
-rm cors.json
-rm storage.cors.json
-rm storage.rules
-rm package.json
-rm package-lock.json
-
-# Remove redundant documentation
-rm backend/DATABASE.md
-```
-
---
-
-## 📁 Final Clean Directory Structure
-
-### Root Level
-```
-cim_summary/
-├── README.md                                    # Project overview
-├── APP_DESIGN_DOCUMENTATION.md                 # Architecture
-├── AGENTIC_RAG_IMPLEMENTATION_PLAN.md          # AI strategy
-├── PDF_GENERATION_ANALYSIS.md                  # PDF optimization
-├── DEPLOYMENT_GUIDE.md                         # Deployment guide
-├── ARCHITECTURE_DIAGRAMS.md                    # Visual architecture
-├── DOCUMENTATION_AUDIT_REPORT.md               # Documentation audit
-├── FULL_DOCUMENTATION_PLAN.md                  # Documentation plan
-├── LLM_DOCUMENTATION_SUMMARY.md                # LLM optimization
-├── CODE_SUMMARY_TEMPLATE.md                    # Documentation template
-├── LLM_AGENT_DOCUMENTATION_GUIDE.md            # Documentation guide
-├── API_DOCUMENTATION_GUIDE.md                  # API reference
-├── CONFIGURATION_GUIDE.md                      # Configuration guide
-├── DATABASE_SCHEMA_DOCUMENTATION.md            # Database schema
-├── FRONTEND_DOCUMENTATION_SUMMARY.md           # Frontend docs
-├── TESTING_STRATEGY_DOCUMENTATION.md           # Testing strategy
-├── MONITORING_AND_ALERTING_GUIDE.md            # Monitoring guide
-├── TROUBLESHOOTING_GUIDE.md                    # Troubleshooting
-├── OPERATIONAL_DOCUMENTATION_SUMMARY.md        # Operational guide
-├── DOCUMENTATION_COMPLETION_REPORT.md          # Completion report
-├── CLEANUP_ANALYSIS_REPORT.md                  # This report
-├── deploy.sh                                   # Deployment script
-├── .gitignore                                  # Git ignore
-├── .gcloudignore                               # GCloud ignore
-├── backend/                                    # Backend application
-└── frontend/                                   # Frontend application
-```
-
-### Backend Structure
-```
-backend/
-├── src/                                        # Source code
-├── scripts/                                    # Essential setup scripts
-│   ├── setup-complete.js
-│   ├── setup-document-ai.js
-│   ├── setup_supabase.js
-│   ├── create-supabase-tables.js
-│   ├── run-migrations.js
-│   └── run-production-migrations.js
-├── supabase_setup.sql                          # Database setup
-├── supabase_vector_setup.sql                   # Vector database setup
-├── vector_function.sql                         # Vector functions
-├── serviceAccountKey.json                      # Service account
-├── setup-env.sh                                # Environment setup
-├── setup-supabase-vector.js                    # Vector setup
-├── firebase.json                               # Firebase config
-├── .firebaserc                                 # Firebase project
-├── .gcloudignore                               # GCloud ignore
-├── .gitignore                                  # Git ignore
-├── .puppeteerrc.cjs                            # Puppeteer config
-├── .dockerignore                               # Docker ignore
-├── .eslintrc.js                                # ESLint config
-├── tsconfig.json                               # TypeScript config
-├── package.json                                # Dependencies
-├── package-lock.json                           # Lock file
-├── index.js                                    # Entry point
-└── fix-env-config.sh                           # Config fix
-```
-
-### Frontend Structure
-```
-frontend/
-├── src/                                        # Source code
-├── public/                                     # Public assets
-├── firebase.json                               # Firebase config
-├── .firebaserc                                 # Firebase project
-├── .gcloudignore                               # GCloud ignore
-├── .gitignore                                  # Git ignore
-├── postcss.config.js                           # PostCSS config
-├── tailwind.config.js                          # Tailwind config
-├── tsconfig.json                               # TypeScript config
-├── tsconfig.node.json                          # Node TypeScript config
-├── vite.config.ts                              # Vite config
-├── index.html                                  # Entry HTML
-├── package.json                                # Dependencies
-└── package-lock.json                           # Lock file
-```
-
---
-
-## 💾 Space Savings Estimate
-
-### Files to Remove
- **Test Files**: ~50 files, ~500KB
- **Build Artifacts**: ~100MB (dist, coverage, node_modules)
- **Log Files**: ~200KB (regenerated)
- **Upload Files**: Variable size (using Firebase Storage)
- **IDE Files**: ~10KB
- **Redundant Configs**: ~50KB
-
-### Total Estimated Savings
- **File Count**: ~100 files removed
- **Disk Space**: ~100MB+ saved
- **Repository Size**: Significantly reduced
- **Clarity**: Much cleaner structure
-
---
-
-## ⚠️ Safety Considerations
-
-### Before Cleanup
-1. **Backup**: Ensure all important data is backed up
-2. **Documentation**: All essential documentation is preserved
-3. **Configuration**: Essential configs are kept
-4. **Dependencies**: Package files are preserved for regeneration
-
-### After Cleanup
-1. **Test Build**: Run `npm install` and build process
-2. **Verify Functionality**: Ensure system still works
-3. **Update Documentation**: Remove references to deleted files
-4. **Commit Changes**: Commit the cleanup
-
---
-
-## 🎯 Benefits of Cleanup
-
-### Immediate Benefits
-1. **Cleaner Repository**: Easier to navigate and understand
-2. **Reduced Size**: Smaller repository and faster operations
-3. **Less Confusion**: No outdated or unused files
-4. **Better Focus**: Only essential files remain
-
-### Long-term Benefits
-1. **Easier Maintenance**: Less clutter to maintain
-2. **Faster Development**: Cleaner development environment
-3. **Better Onboarding**: New developers see only essential files
-4. **Reduced Errors**: No confusion from outdated files
-
---
-
-## 📋 Cleanup Checklist
-
-### Pre-Cleanup
- [ ] Verify all documentation is complete and accurate
- [ ] Ensure all essential configuration files are identified
- [ ] Backup any potentially important files
- [ ] Test current system functionality
-
-### During Cleanup
- [ ] Remove test and development files
- [ ] Remove build and cache directories
- [ ] Remove temporary and log files
- [ ] Remove development and IDE files
- [ ] Remove redundant configuration files
-
-### Post-Cleanup
- [ ] Run `npm install` in both backend and frontend
- [ ] Test build process (`npm run build`)
- [ ] Verify system functionality
- [ ] Update any documentation references
- [ ] Commit cleanup changes
-
---
-
-This cleanup analysis provides a comprehensive plan for safely removing unnecessary files while preserving all essential components for the working CIM Document Processor system. 
--- a/CLEANUP_COMPLETION_REPORT.md
+++ b/CLEANUP_COMPLETION_REPORT.md
@@ -1,302 +0,0 @@
-# Cleanup Completion Report
-## Successful Cleanup of CIM Document Processor Codebase
-
-### 🎯 Overview
-
-This report summarizes the successful cleanup operation performed on the CIM Document Processor codebase, removing unnecessary files while preserving all essential components for the working system.
-
---
-
-## ✅ Cleanup Summary
-
-### **Files and Directories Removed**
-
-#### **1. Test and Development Files**
- **Individual Test Files**: 11 files removed
-  - `backend/test-db-connection.js`
-  - `backend/test-llm-processing.js`
-  - `backend/test-vector-fallback.js`
-  - `backend/test-vector-search.js`
-  - `backend/test-chunk-insert.js`
-  - `backend/check-recent-document.js`
-  - `backend/check-table-schema-simple.js`
-  - `backend/check-table-schema.js`
-  - `backend/create-rpc-function.js`
-  - `backend/create-vector-table.js`
-  - `backend/try-create-function.js`
-
- **Test Scripts**: 9 files removed
-  - `backend/scripts/test-document-ai-integration.js`
-  - `backend/scripts/test-full-integration.js`
-  - `backend/scripts/test-integration-with-mock.js`
-  - `backend/scripts/test-production-db.js`
-  - `backend/scripts/test-real-processor.js`
-  - `backend/scripts/test-supabase-client.js`
-  - `backend/scripts/test_exec_sql.js`
-  - `backend/scripts/simple-document-ai-test.js`
-  - `backend/scripts/test-database-working.js`
-
-#### **2. Build and Cache Directories**
- **Build Artifacts**: 3 directories removed
-  - `backend/dist/` (regenerated on build)
-  - `frontend/dist/` (regenerated on build)
-  - `backend/coverage/` (no longer needed)
-
- **Cache Directories**: 5 directories removed
-  - `backend/.cache/`
-  - `frontend/.firebase/`
-  - `backend/node_modules/` (regenerated)
-  - `frontend/node_modules/` (regenerated)
-  - `node_modules/` (regenerated)
-
-#### **3. Temporary and Log Files**
- **Log Files**: 3 files removed
-  - `backend/logs/app.log` (regenerated on startup)
-  - `backend/logs/error.log` (regenerated on startup)
-  - `backend/logs/upload.log` (regenerated on startup)
-
- **Upload Directories**: 1 directory removed
-  - `backend/uploads/` (using Firebase Storage)
-
-#### **4. Development and IDE Files**
- **IDE Configurations**: 3 directories removed
-  - `.vscode/`
-  - `.claude/`
-  - `.kiro/`
-
- **Utility Scripts**: 3 files removed
-  - `cleanup_gcs.sh`
-  - `check_gcf_bucket.sh`
-  - `cleanup_gcf_bucket.sh`
-
-#### **5. Redundant Configuration Files**
- **Root Level Configs**: 6 files removed
-  - `firebase.json` (duplicate)
-  - `cors.json` (duplicate)
-  - `storage.cors.json`
-  - `storage.rules`
-  - `package.json` (minimal root)
-  - `package-lock.json` (root)
-
- **Redundant Documentation**: 1 file removed
-  - `backend/DATABASE.md` (superseded by comprehensive documentation)
-
---
-
-## 📊 Cleanup Statistics
-
-### **Files Removed**
- **Total Files**: ~50 files
- **Total Directories**: ~12 directories
- **Estimated Space Saved**: ~100MB+
-
-### **Files Preserved**
- **Essential Source Code**: All backend and frontend source files
- **Configuration Files**: All essential configuration files
- **Documentation**: All comprehensive documentation (20+ files)
- **Database Setup**: All SQL setup files
- **Essential Scripts**: All setup and migration scripts
-
---
-
-## 🏗️ Current Clean Directory Structure
-
-### **Root Level**
-```
-cim_summary/
-├── README.md                                    # Project overview
-├── APP_DESIGN_DOCUMENTATION.md                 # Architecture
-├── AGENTIC_RAG_IMPLEMENTATION_PLAN.md          # AI strategy
-├── PDF_GENERATION_ANALYSIS.md                  # PDF optimization
-├── DEPLOYMENT_GUIDE.md                         # Deployment guide
-├── ARCHITECTURE_DIAGRAMS.md                    # Visual architecture
-├── DOCUMENTATION_AUDIT_REPORT.md               # Documentation audit
-├── FULL_DOCUMENTATION_PLAN.md                  # Documentation plan
-├── LLM_DOCUMENTATION_SUMMARY.md                # LLM optimization
-├── CODE_SUMMARY_TEMPLATE.md                    # Documentation template
-├── LLM_AGENT_DOCUMENTATION_GUIDE.md            # Documentation guide
-├── API_DOCUMENTATION_GUIDE.md                  # API reference
-├── CONFIGURATION_GUIDE.md                      # Configuration guide
-├── DATABASE_SCHEMA_DOCUMENTATION.md            # Database schema
-├── FRONTEND_DOCUMENTATION_SUMMARY.md           # Frontend docs
-├── TESTING_STRATEGY_DOCUMENTATION.md           # Testing strategy
-├── MONITORING_AND_ALERTING_GUIDE.md            # Monitoring guide
-├── TROUBLESHOOTING_GUIDE.md                    # Troubleshooting
-├── OPERATIONAL_DOCUMENTATION_SUMMARY.md        # Operational guide
-├── DOCUMENTATION_COMPLETION_REPORT.md          # Completion report
-├── CLEANUP_ANALYSIS_REPORT.md                  # Cleanup analysis
-├── CLEANUP_COMPLETION_REPORT.md                # This report
-├── deploy.sh                                   # Deployment script
-├── .gitignore                                  # Git ignore
-├── .gcloudignore                               # GCloud ignore
-├── backend/                                    # Backend application
-└── frontend/                                   # Frontend application
-```
-
-### **Backend Structure**
-```
-backend/
-├── src/                                        # Source code
-├── scripts/                                    # Essential setup scripts (12 files)
-├── supabase_setup.sql                          # Database setup
-├── supabase_vector_setup.sql                   # Vector database setup
-├── vector_function.sql                         # Vector functions
-├── serviceAccountKey.json                      # Service account
-├── setup-env.sh                                # Environment setup
-├── setup-supabase-vector.js                    # Vector setup
-├── firebase.json                               # Firebase config
-├── .firebaserc                                 # Firebase project
-├── .gcloudignore                               # GCloud ignore
-├── .gitignore                                  # Git ignore
-├── .puppeteerrc.cjs                            # Puppeteer config
-├── .dockerignore                               # Docker ignore
-├── .eslintrc.js                                # ESLint config
-├── tsconfig.json                               # TypeScript config
-├── package.json                                # Dependencies
-├── package-lock.json                           # Lock file
-├── index.js                                    # Entry point
-└── fix-env-config.sh                           # Config fix
-```
-
-### **Frontend Structure**
-```
-frontend/
-├── src/                                        # Source code
-├── firebase.json                               # Firebase config
-├── .firebaserc                                 # Firebase project
-├── .gcloudignore                               # GCloud ignore
-├── .gitignore                                  # Git ignore
-├── postcss.config.js                           # PostCSS config
-├── tailwind.config.js                          # Tailwind config
-├── tsconfig.json                               # TypeScript config
-├── tsconfig.node.json                          # Node TypeScript config
-├── vite.config.ts                              # Vite config
-├── index.html                                  # Entry HTML
-├── package.json                                # Dependencies
-└── package-lock.json                           # Lock file
-```
-
---
-
-## ✅ Verification Results
-
-### **Build Tests**
- ✅ **Backend Build**: `npm run build` - **SUCCESS**
- ✅ **Frontend Build**: `npm run build` - **SUCCESS**
- ✅ **Dependencies**: `npm install` - **SUCCESS** (both backend and frontend)
-
-### **Configuration Fixes**
- ✅ **Frontend package.json**: Fixed JSON syntax errors
- ✅ **Frontend tsconfig.json**: Removed vitest references, added Node.js types
- ✅ **TypeScript Configuration**: All type errors resolved
-
-### **System Integrity**
- ✅ **Source Code**: All essential source files preserved
- ✅ **Configuration**: All essential configuration files preserved
- ✅ **Documentation**: All comprehensive documentation preserved
- ✅ **Database Setup**: All SQL setup files preserved
- ✅ **Essential Scripts**: All setup and migration scripts preserved
-
---
-
-## 🎯 Benefits Achieved
-
-### **Immediate Benefits**
-1. **Cleaner Repository**: Much easier to navigate and understand
-2. **Reduced Size**: ~100MB+ saved, significantly smaller repository
-3. **Less Confusion**: No outdated or unused files
-4. **Better Focus**: Only essential files remain
-
-### **Long-term Benefits**
-1. **Easier Maintenance**: Less clutter to maintain
-2. **Faster Development**: Cleaner development environment
-3. **Better Onboarding**: New developers see only essential files
-4. **Reduced Errors**: No confusion from outdated files
-
-### **Operational Benefits**
-1. **Faster Builds**: Cleaner build process
-2. **Easier Deployment**: Less files to manage
-3. **Better Version Control**: Smaller commits and cleaner history
-4. **Improved CI/CD**: Faster pipeline execution
-
---
-
-## 📋 Essential Files Preserved
-
-### **Core Application**
- **Backend Source**: Complete Node.js/Express/TypeScript application
- **Frontend Source**: Complete React/TypeScript/Vite application
- **Configuration**: All essential environment and build configurations
-
-### **Documentation**
- **Project Overview**: README.md and architecture documentation
- **API Reference**: Complete API documentation
- **Configuration Guide**: Environment setup and configuration
- **Database Schema**: Complete database documentation
- **Operational Guides**: Monitoring, troubleshooting, and maintenance
-
-### **Database and Setup**
- **SQL Setup**: All database initialization scripts
- **Migration Scripts**: Database migration and setup scripts
- **Vector Database**: Vector database setup and functions
-
-### **Deployment**
- **Firebase Configuration**: Complete Firebase setup
- **Deployment Scripts**: Production deployment configuration
- **Service Accounts**: Essential service credentials
-
---
-
-## 🔄 Post-Cleanup Actions
-
-### **Completed Actions**
- ✅ **Dependency Installation**: Both backend and frontend dependencies installed
- ✅ **Build Verification**: Both applications build successfully
- ✅ **Configuration Fixes**: All configuration issues resolved
- ✅ **TypeScript Configuration**: All type errors resolved
-
-### **Recommended Actions**
-1. **Test Deployment**: Verify deployment process still works
-2. **Update Documentation**: Remove any references to deleted files
-3. **Team Communication**: Inform team of cleanup changes
-4. **Backup Verification**: Ensure all important data is backed up
-
---
-
-## 🎯 Final Status
-
-### **Cleanup Status**: ✅ **COMPLETED**
- **Files Removed**: ~50 files and ~12 directories
- **Space Saved**: ~100MB+
- **System Integrity**: ✅ **MAINTAINED**
- **Build Status**: ✅ **FUNCTIONAL**
-
-### **Repository Quality**
- **Cleanliness**: 🏆 **EXCELLENT**
- **Organization**: 🎯 **OPTIMIZED**
- **Maintainability**: 🚀 **ENHANCED**
- **Developer Experience**: 📈 **IMPROVED**
-
---
-
-## 📚 Documentation Status
-
-### **Complete Documentation Suite**
- ✅ **Project Overview**: README.md and architecture docs
- ✅ **API Documentation**: Complete API reference
- ✅ **Configuration Guide**: Environment and setup
- ✅ **Database Documentation**: Schema and setup
- ✅ **Frontend Documentation**: Component and service docs
- ✅ **Testing Strategy**: Testing approach and guidelines
- ✅ **Operational Documentation**: Monitoring and troubleshooting
- ✅ **Cleanup Documentation**: Analysis and completion reports
-
-### **Documentation Quality**
- **Completeness**: 100% of critical components documented
- **Accuracy**: All references verified against actual codebase
- **LLM Optimization**: Optimized for AI agent understanding
- **Maintenance**: Comprehensive maintenance procedures
-
---
-
-The CIM Document Processor codebase has been successfully cleaned up, removing unnecessary files while preserving all essential components. The system is now cleaner, more maintainable, and ready for efficient development and deployment. 
--- a/CODEBASE_ARCHITECTURE_SUMMARY.md
+++ b/CODEBASE_ARCHITECTURE_SUMMARY.md
--- a/CODE_SUMMARY_TEMPLATE.md
+++ b/CODE_SUMMARY_TEMPLATE.md
@@ -1,345 +0,0 @@
-# Code Summary Template
-## Standardized Documentation Format for LLM Agent Understanding
-
-### 📋 Template Usage
-Use this template to document individual files, services, or components. This format is optimized for LLM coding agents to quickly understand code structure, purpose, and implementation details.
-
---
-
-## 📄 File Information
-
-**File Path**: `[relative/path/to/file]`  
-**File Type**: `[TypeScript/JavaScript/JSON/etc.]`  
-**Last Updated**: `[YYYY-MM-DD]`  
-**Version**: `[semantic version]`  
-**Status**: `[Active/Deprecated/In Development]`
-
---
-
-## 🎯 Purpose & Overview
-
-**Primary Purpose**: `[What this file/service does in one sentence]`
-
-**Business Context**: `[Why this exists, what problem it solves]`
-
-**Key Responsibilities**:
- `[Responsibility 1]`
- `[Responsibility 2]`
- `[Responsibility 3]`
-
---
-
-## 🏗️ Architecture & Dependencies
-
-### Dependencies
-**Internal Dependencies**:
- `[service1.ts]` - `[purpose of dependency]`
- `[service2.ts]` - `[purpose of dependency]`
-
-**External Dependencies**:
- `[package-name]` - `[version]` - `[purpose]`
- `[API service]` - `[purpose]`
-
-### Integration Points
- **Input Sources**: `[Where data comes from]`
- **Output Destinations**: `[Where data goes]`
- **Event Triggers**: `[What triggers this service]`
- **Event Listeners**: `[What this service triggers]`
-
---
-
-## 🔧 Implementation Details
-
-### Core Functions/Methods
-
-#### `[functionName]`
-```typescript
-/**
- * @purpose [What this function does]
- * @context [When/why it's called]
- * @inputs [Parameter types and descriptions]
- * @outputs [Return type and format]
- * @dependencies [What it depends on]
- * @errors [Possible errors and conditions]
- * @complexity [Time/space complexity if relevant]
- */
-```
-
-**Example Usage**:
-```typescript
-// Example of how to use this function
-const result = await functionName(input);
-```
-
-### Data Structures
-
-#### `[TypeName]`
-```typescript
-interface TypeName {
-  property1: string;    // Description of property1
-  property2: number;    // Description of property2
-  property3?: boolean;  // Optional description of property3
-}
-```
-
-### Configuration
-```typescript
-// Key configuration options
-const CONFIG = {
-  timeout: 30000,        // Request timeout in ms
-  retryAttempts: 3,      // Number of retry attempts
-  batchSize: 10,         // Batch processing size
-};
-```
-
---
-
-## 📊 Data Flow
-
-### Input Processing
-1. `[Step 1 description]`
-2. `[Step 2 description]`
-3. `[Step 3 description]`
-
-### Output Generation
-1. `[Step 1 description]`
-2. `[Step 2 description]`
-3. `[Step 3 description]`
-
-### Data Transformations
- `[Input Type]` → `[Transformation]` → `[Output Type]`
- `[Input Type]` → `[Transformation]` → `[Output Type]`
-
---
-
-## 🚨 Error Handling
-
-### Error Types
-```typescript
-/**
- * @errorType VALIDATION_ERROR
- * @description [What causes this error]
- * @recoverable [true/false]
- * @retryStrategy [retry approach]
- * @userMessage [Message shown to user]
- */
-
-/**
- * @errorType PROCESSING_ERROR
- * @description [What causes this error]
- * @recoverable [true/false]
- * @retryStrategy [retry approach]
- * @userMessage [Message shown to user]
- */
-```
-
-### Error Recovery
- **Validation Errors**: `[How validation errors are handled]`
- **Processing Errors**: `[How processing errors are handled]`
- **System Errors**: `[How system errors are handled]`
-
-### Fallback Strategies
- **Primary Strategy**: `[Main approach]`
- **Fallback Strategy**: `[Backup approach]`
- **Degradation Strategy**: `[Graceful degradation]`
-
---
-
-## 🧪 Testing
-
-### Test Coverage
- **Unit Tests**: `[Coverage percentage]` - `[What's tested]`
- **Integration Tests**: `[Coverage percentage]` - `[What's tested]`
- **Performance Tests**: `[What performance aspects are tested]`
-
-### Test Data
-```typescript
-/**
- * @testData [test data name]
- * @description [Description of test data]
- * @size [Size if relevant]
- * @expectedOutput [What should be produced]
- */
-```
-
-### Mock Strategy
- **External APIs**: `[How external APIs are mocked]`
- **Database**: `[How database is mocked]`
- **File System**: `[How file system is mocked]`
-
---
-
-## 📈 Performance Characteristics
-
-### Performance Metrics
- **Average Response Time**: `[time]`
- **Memory Usage**: `[memory]`
- **CPU Usage**: `[CPU]`
- **Throughput**: `[requests per second]`
-
-### Optimization Strategies
- **Caching**: `[Caching approach]`
- **Batching**: `[Batching strategy]`
- **Parallelization**: `[Parallel processing]`
- **Resource Management**: `[Resource optimization]`
-
-### Scalability Limits
- **Concurrent Requests**: `[limit]`
- **Data Size**: `[limit]`
- **Rate Limits**: `[limits]`
-
---
-
-## 🔍 Debugging & Monitoring
-
-### Logging
-```typescript
-/**
- * @logging [Logging configuration]
- * @levels [Log levels used]
- * @correlation [Correlation ID strategy]
- * @context [Context information logged]
- */
-```
-
-### Debug Tools
- **Health Checks**: `[Health check endpoints]`
- **Metrics**: `[Performance metrics]`
- **Tracing**: `[Request tracing]`
-
-### Common Issues
-1. **Issue 1**: `[Description]` - `[Solution]`
-2. **Issue 2**: `[Description]` - `[Solution]`
-3. **Issue 3**: `[Description]` - `[Solution]`
-
---
-
-## 🔐 Security Considerations
-
-### Input Validation
- **File Types**: `[Allowed file types]`
- **File Size**: `[Size limits]`
- **Content Validation**: `[Content checks]`
-
-### Authentication & Authorization
- **Authentication**: `[How authentication is handled]`
- **Authorization**: `[How authorization is handled]`
- **Data Isolation**: `[How data is isolated]`
-
-### Data Protection
- **Encryption**: `[Encryption approach]`
- **Sanitization**: `[Data sanitization]`
- **Audit Logging**: `[Audit trail]`
-
---
-
-## 📚 Related Documentation
-
-### Internal References
- `[related-file1.ts]` - `[relationship]`
- `[related-file2.ts]` - `[relationship]`
- `[related-file3.ts]` - `[relationship]`
-
-### External References
- `[API Documentation]` - `[URL]`
- `[Library Documentation]` - `[URL]`
- `[Architecture Documentation]` - `[URL]`
-
---
-
-## 🔄 Change History
-
-### Recent Changes
- `[YYYY-MM-DD]` - `[Change description]` - `[Author]`
- `[YYYY-MM-DD]` - `[Change description]` - `[Author]`
- `[YYYY-MM-DD]` - `[Change description]` - `[Author]`
-
-### Planned Changes
- `[Future change 1]` - `[Target date]`
- `[Future change 2]` - `[Target date]`
-
---
-
-## 📋 Usage Examples
-
-### Basic Usage
-```typescript
-// Basic example of how to use this service
-import { ServiceName } from './serviceName';
-
-const service = new ServiceName();
-const result = await service.processData(input);
-```
-
-### Advanced Usage
-```typescript
-// Advanced example with configuration
-import { ServiceName } from './serviceName';
-
-const service = new ServiceName({
-  timeout: 60000,
-  retryAttempts: 5,
-  batchSize: 20
-});
-
-const results = await service.processBatch(dataArray);
-```
-
-### Error Handling
-```typescript
-// Example of error handling
-try {
-  const result = await service.processData(input);
-} catch (error) {
-  if (error.type === 'VALIDATION_ERROR') {
-    // Handle validation error
-  } else if (error.type === 'PROCESSING_ERROR') {
-    // Handle processing error
-  }
-}
-```
-
---
-
-## 🎯 LLM Agent Notes
-
-### Key Understanding Points
- `[Important concept 1]`
- `[Important concept 2]`
- `[Important concept 3]`
-
-### Common Modifications
- `[Common change 1]` - `[How to implement]`
- `[Common change 2]` - `[How to implement]`
-
-### Integration Patterns
- `[Integration pattern 1]` - `[When to use]`
- `[Integration pattern 2]` - `[When to use]`
-
---
-
-## 📝 Template Usage Instructions
-
-### For New Files
-1. Copy this template
-2. Fill in all sections with relevant information
-3. Remove sections that don't apply
-4. Add sections specific to your file type
-5. Update the file information header
-
-### For Existing Files
-1. Use this template to document existing code
-2. Focus on the most important sections first
-3. Add examples and usage patterns
-4. Include error scenarios and solutions
-5. Document performance characteristics
-
-### Maintenance
- Update this documentation when code changes
- Keep examples current and working
- Review and update performance metrics regularly
- Maintain change history for significant updates
-
---
-
-This template ensures consistent, comprehensive documentation that LLM agents can quickly parse and understand, leading to more accurate code evaluation and modification suggestions. 
--- a/CONFIGURATION_GUIDE.md
+++ b/CONFIGURATION_GUIDE.md
@@ -24,8 +24,8 @@ DOCUMENT_AI_OUTPUT_BUCKET_NAME=your-document-ai-bucket
 DOCUMENT_AI_LOCATION=us
 DOCUMENT_AI_PROCESSOR_ID=your-processor-id

-# Service Account
-GOOGLE_APPLICATION_CREDENTIALS=./serviceAccountKey.json
+# Service Account (leave blank if using Firebase Functions secrets / ADC)
+GOOGLE_APPLICATION_CREDENTIALS=
 ```

 #### Supabase Configuration
@@ -206,6 +206,14 @@ firebase init
 firebase use YOUR_PROJECT_ID
 ```

+##### Configure Google credentials via Firebase Functions secrets
+```bash
+# Store the full service account JSON as a secret (never commit it to the repo)
+firebase functions:secrets:set FIREBASE_SERVICE_ACCOUNT --data-file=/path/to/serviceAccountKey.json
+```
+
+> When deploying Functions v2, add `FIREBASE_SERVICE_ACCOUNT` to your function's `secrets` array. The backend automatically reads this JSON from `process.env.FIREBASE_SERVICE_ACCOUNT`, so `GOOGLE_APPLICATION_CREDENTIALS` can remain blank and no local file is required. For local development, you can still set `GOOGLE_APPLICATION_CREDENTIALS=/abs/path/to/key.json` if needed.
+
 ### Production Environment

 #### 1. Environment Variables
@@ -528,4 +536,4 @@ export const debugConfiguration = () => {

 ---

-This comprehensive configuration guide ensures proper setup and configuration of the CIM Document Processor across all environments. 
+This comprehensive configuration guide ensures proper setup and configuration of the CIM Document Processor across all environments. 
--- a/DOCUMENTATION_AUDIT_REPORT.md
+++ b/DOCUMENTATION_AUDIT_REPORT.md
@@ -1,457 +0,0 @@
-# Documentation Audit Report
-## Comprehensive Review and Correction of Inaccurate References
-
-### 🎯 Executive Summary
-
-This audit report identifies and corrects inaccurate references found in the documentation, ensuring all information accurately reflects the current state of the CIM Document Processor codebase.
-
---
-
-## 📋 Audit Scope
-
-### Files Reviewed
- `README.md` - Project overview and API endpoints
- `backend/src/services/unifiedDocumentProcessor.md` - Service documentation
- `LLM_DOCUMENTATION_SUMMARY.md` - Documentation strategy guide
- `APP_DESIGN_DOCUMENTATION.md` - Architecture documentation
- `AGENTIC_RAG_IMPLEMENTATION_PLAN.md` - Implementation plan
-
-### Areas Audited
- API endpoint references
- Service names and file paths
- Environment variable names
- Configuration options
- Database table names
- Method signatures
- Dependencies and imports
-
---
-
-## 🚨 Critical Issues Found
-
-### 1. **API Endpoint Inaccuracies**
-
-#### ❌ Incorrect References
- `GET /monitoring/dashboard` - This endpoint doesn't exist
- Missing `GET /documents/processing-stats` endpoint
- Missing monitoring endpoints: `/upload-metrics`, `/upload-health`, `/real-time-stats`
-
-#### ✅ Corrected References
-```markdown
-### Analytics & Monitoring
- `GET /documents/analytics` - Get processing analytics
- `GET /documents/processing-stats` - Get processing statistics
- `GET /documents/:id/agentic-rag-sessions` - Get processing sessions
- `GET /monitoring/upload-metrics` - Get upload metrics
- `GET /monitoring/upload-health` - Get upload health status
- `GET /monitoring/real-time-stats` - Get real-time statistics
- `GET /vector/stats` - Get vector database statistics
-```
-
-### 2. **Environment Variable Inaccuracies**
-
-#### ❌ Incorrect References
- `GOOGLE_CLOUD_PROJECT_ID` - Should be `GCLOUD_PROJECT_ID`
- `GOOGLE_CLOUD_STORAGE_BUCKET` - Should be `GCS_BUCKET_NAME`
- `AGENTIC_RAG_ENABLED` - Should be `config.agenticRag.enabled`
-
-#### ✅ Corrected References
-```typescript
-// Required Environment Variables
-GCLOUD_PROJECT_ID: string;                    // Google Cloud project ID
-GCS_BUCKET_NAME: string;                      // Google Cloud Storage bucket
-DOCUMENT_AI_LOCATION: string;                 // Document AI location (default: 'us')
-DOCUMENT_AI_PROCESSOR_ID: string;             // Document AI processor ID
-SUPABASE_URL: string;                         // Supabase project URL
-SUPABASE_ANON_KEY: string;                    // Supabase anonymous key
-ANTHROPIC_API_KEY: string;                    // Claude AI API key
-OPENAI_API_KEY: string;                       // OpenAI API key (optional)
-
-// Configuration Access
-config.agenticRag.enabled: boolean;           // Agentic RAG feature flag
-```
-
-### 3. **Service Name Inaccuracies**
-
-#### ❌ Incorrect References
- `documentProcessingService` - Should be `unifiedDocumentProcessor`
- `agenticRAGProcessor` - Should be `optimizedAgenticRAGProcessor`
- Missing `agenticRAGDatabaseService` reference
-
-#### ✅ Corrected References
-```typescript
-// Core Services
-import { unifiedDocumentProcessor } from './unifiedDocumentProcessor';
-import { optimizedAgenticRAGProcessor } from './optimizedAgenticRAGProcessor';
-import { agenticRAGDatabaseService } from './agenticRAGDatabaseService';
-import { documentAiProcessor } from './documentAiProcessor';
-```
-
-### 4. **Method Signature Inaccuracies**
-
-#### ❌ Incorrect References
- `processDocument(doc)` - Missing required parameters
- `getProcessingStats()` - Missing return type information
-
-#### ✅ Corrected References
-```typescript
-// Method Signatures
-async processDocument(
-  documentId: string, 
-  userId: string, 
-  text: string,
-  options: any = {}
-): Promise<ProcessingResult>
-
-async getProcessingStats(): Promise<{
-  totalDocuments: number;
-  documentAiAgenticRagSuccess: number;
-  averageProcessingTime: {
-    documentAiAgenticRag: number;
-  };
-  averageApiCalls: {
-    documentAiAgenticRag: number;
-  };
-}>
-```
-
---
-
-## 🔧 Configuration Corrections
-
-### 1. **Agentic RAG Configuration**
-
-#### ❌ Incorrect References
-```typescript
-// Old incorrect configuration
-AGENTIC_RAG_ENABLED=true
-AGENTIC_RAG_MAX_AGENTS=6
-```
-
-#### ✅ Corrected Configuration
-```typescript
-// Current configuration structure
-const config = {
-  agenticRag: {
-    enabled: process.env.AGENTIC_RAG_ENABLED === 'true',
-    maxAgents: parseInt(process.env.AGENTIC_RAG_MAX_AGENTS) || 6,
-    parallelProcessing: process.env.AGENTIC_RAG_PARALLEL_PROCESSING === 'true',
-    validationStrict: process.env.AGENTIC_RAG_VALIDATION_STRICT === 'true',
-    retryAttempts: parseInt(process.env.AGENTIC_RAG_RETRY_ATTEMPTS) || 3,
-    timeoutPerAgent: parseInt(process.env.AGENTIC_RAG_TIMEOUT_PER_AGENT) || 60000
-  }
-};
-```
-
-### 2. **LLM Configuration**
-
-#### ❌ Incorrect References
-```typescript
-// Old incorrect configuration
-LLM_MODEL=claude-3-opus-20240229
-```
-
-#### ✅ Corrected Configuration
-```typescript
-// Current configuration structure
-const config = {
-  llm: {
-    provider: process.env.LLM_PROVIDER || 'openai',
-    model: process.env.LLM_MODEL || 'gpt-4',
-    maxTokens: parseInt(process.env.LLM_MAX_TOKENS) || 3500,
-    temperature: parseFloat(process.env.LLM_TEMPERATURE) || 0.1,
-    promptBuffer: parseInt(process.env.LLM_PROMPT_BUFFER) || 500
-  }
-};
-```
-
---
-
-## 📊 Database Schema Corrections
-
-### 1. **Table Name Inaccuracies**
-
-#### ❌ Incorrect References
- `agentic_rag_sessions` - Table exists but implementation is stubbed
- `document_chunks` - Table exists but implementation varies
-
-#### ✅ Corrected References
-```sql
-- Current Database Tables
-CREATE TABLE documents (
-  id UUID PRIMARY KEY,
-  user_id TEXT NOT NULL,
-  original_file_name TEXT NOT NULL,
-  file_path TEXT NOT NULL,
-  file_size INTEGER NOT NULL,
-  status TEXT NOT NULL,
-  extracted_text TEXT,
-  generated_summary TEXT,
-  summary_pdf_path TEXT,
-  analysis_data JSONB,
-  created_at TIMESTAMP DEFAULT NOW(),
-  updated_at TIMESTAMP DEFAULT NOW()
-);
-
-- Note: agentic_rag_sessions table exists but implementation is stubbed
-- Note: document_chunks table exists but implementation varies by vector provider
-```
-
-### 2. **Model Implementation Status**
-
-#### ❌ Incorrect References
- `AgenticRAGSessionModel` - Fully implemented
- `VectorDatabaseModel` - Standard implementation
-
-#### ✅ Corrected References
-```typescript
-// Current Implementation Status
-AgenticRAGSessionModel: {
-  status: 'STUBBED',           // Returns mock data, not fully implemented
-  methods: ['create', 'update', 'getById', 'getByDocumentId', 'delete', 'getAnalytics']
-}
-
-VectorDatabaseModel: {
-  status: 'PARTIAL',           // Partially implemented, varies by provider
-  providers: ['supabase', 'pinecone'],
-  methods: ['getDocumentChunks', 'getSearchAnalytics', 'getTotalChunkCount']
-}
-```
-
---
-
-## 🔌 API Endpoint Corrections
-
-### 1. **Document Routes**
-
-#### ✅ Current Active Endpoints
-```typescript
-// Document Management
-POST /documents/upload-url                    // Get signed upload URL
-POST /documents/:id/confirm-upload            // Confirm upload and start processing
-POST /documents/:id/process-optimized-agentic-rag  // Trigger AI processing
-GET  /documents/:id/download                  // Download processed PDF
-DELETE /documents/:id                         // Delete document
-
-// Analytics & Monitoring
-GET  /documents/analytics                     // Get processing analytics
-GET  /documents/processing-stats              // Get processing statistics
-GET  /documents/:id/agentic-rag-sessions      // Get processing sessions
-```
-
-### 2. **Monitoring Routes**
-
-#### ✅ Current Active Endpoints
-```typescript
-// Monitoring
-GET  /monitoring/upload-metrics               // Get upload metrics
-GET  /monitoring/upload-health                // Get upload health status
-GET  /monitoring/real-time-stats              // Get real-time statistics
-```
-
-### 3. **Vector Routes**
-
-#### ✅ Current Active Endpoints
-```typescript
-// Vector Database
-GET  /vector/document-chunks/:documentId      // Get document chunks
-GET  /vector/analytics                        // Get search analytics
-GET  /vector/stats                            // Get vector database statistics
-```
-
---
-
-## 🚨 Error Handling Corrections
-
-### 1. **Error Types**
-
-#### ❌ Incorrect References
- Generic error types without specific context
- Missing correlation ID references
-
-#### ✅ Corrected References
-```typescript
-// Current Error Handling
-interface ErrorResponse {
-  error: string;
-  correlationId?: string;
-  details?: any;
-}
-
-// Error Types in Routes
-400: 'Bad Request' - Invalid input parameters
-401: 'Unauthorized' - Missing or invalid authentication
-500: 'Internal Server Error' - Processing failures
-```
-
-### 2. **Logging Corrections**
-
-#### ❌ Incorrect References
- Missing correlation ID logging
- Incomplete error context
-
-#### ✅ Corrected References
-```typescript
-// Current Logging Pattern
-logger.error('Processing failed', { 
-  error, 
-  correlationId: req.correlationId,
-  documentId,
-  userId 
-});
-
-// Response Pattern
-return res.status(500).json({ 
-  error: 'Processing failed',
-  correlationId: req.correlationId || undefined
-});
-```
-
---
-
-## 📈 Performance Documentation Corrections
-
-### 1. **Processing Times**
-
-#### ❌ Incorrect References
- Generic performance metrics
- Missing actual benchmarks
-
-#### ✅ Corrected References
-```typescript
-// Current Performance Characteristics
-const PERFORMANCE_METRICS = {
-  smallDocuments: '30-60 seconds',      // <5MB documents
-  mediumDocuments: '1-3 minutes',       // 5-15MB documents
-  largeDocuments: '3-5 minutes',        // 15-50MB documents
-  concurrentLimit: 5,                   // Maximum concurrent processing
-  memoryUsage: '50-150MB per session',  // Per processing session
-  apiCalls: '10-50 per document'        // LLM API calls per document
-};
-```
-
-### 2. **Resource Limits**
-
-#### ✅ Current Resource Limits
-```typescript
-// File Upload Limits
-MAX_FILE_SIZE: 104857600,               // 100MB maximum
-ALLOWED_FILE_TYPES: 'application/pdf',  // PDF files only
-
-// Processing Limits
-CONCURRENT_PROCESSING: 5,               // Maximum concurrent documents
-TIMEOUT_PER_DOCUMENT: 300000,           // 5 minutes per document
-RATE_LIMIT_WINDOW: 900000,              // 15 minutes
-RATE_LIMIT_MAX_REQUESTS: 100            // 100 requests per window
-```
-
---
-
-## 🔧 Implementation Status Corrections
-
-### 1. **Service Implementation Status**
-
-#### ✅ Current Implementation Status
-```typescript
-const SERVICE_STATUS = {
-  unifiedDocumentProcessor: 'ACTIVE',           // Main orchestrator
-  optimizedAgenticRAGProcessor: 'ACTIVE',       // AI processing engine
-  documentAiProcessor: 'ACTIVE',                // Text extraction
-  llmService: 'ACTIVE',                         // LLM interactions
-  pdfGenerationService: 'ACTIVE',               // PDF generation
-  fileStorageService: 'ACTIVE',                 // File storage
-  uploadMonitoringService: 'ACTIVE',            // Upload tracking
-  agenticRAGDatabaseService: 'STUBBED',         // Returns mock data
-  sessionService: 'ACTIVE',                     // Session management
-  vectorDatabaseService: 'PARTIAL',             // Varies by provider
-  jobQueueService: 'ACTIVE',                    // Background processing
-  uploadProgressService: 'ACTIVE'               // Progress tracking
-};
-```
-
-### 2. **Feature Implementation Status**
-
-#### ✅ Current Feature Status
-```typescript
-const FEATURE_STATUS = {
-  agenticRAG: 'ENABLED',                        // Currently active
-  documentAI: 'ENABLED',                        // Google Document AI
-  pdfGeneration: 'ENABLED',                     // PDF report generation
-  vectorSearch: 'PARTIAL',                      // Varies by provider
-  realTimeMonitoring: 'ENABLED',                // Upload monitoring
-  analytics: 'ENABLED',                         // Processing analytics
-  sessionTracking: 'STUBBED'                    // Mock implementation
-};
-```
-
---
-
-## 📋 Action Items
-
-### Immediate Corrections Required
-1. **Update README.md** with correct API endpoints
-2. **Fix environment variable references** in all documentation
-3. **Update service names** to match current implementation
-4. **Correct method signatures** with proper types
-5. **Update configuration examples** to match current structure
-
-### Documentation Updates Needed
-1. **Add implementation status notes** for stubbed services
-2. **Update performance metrics** with actual benchmarks
-3. **Correct error handling examples** with correlation IDs
-4. **Update database schema** with current table structure
-5. **Add feature flags documentation** for configurable features
-
-### Long-term Improvements
-1. **Implement missing services** (agenticRAGDatabaseService)
-2. **Complete vector database implementation** for all providers
-3. **Add comprehensive error handling** for all edge cases
-4. **Implement real session tracking** instead of stubbed data
-5. **Add performance monitoring** for all critical paths
-
---
-
-## ✅ Verification Checklist
-
-### Documentation Accuracy
- [ ] All API endpoints match current implementation
- [ ] Environment variables use correct names
- [ ] Service names match actual file names
- [ ] Method signatures include proper types
- [ ] Configuration examples are current
- [ ] Error handling patterns are accurate
- [ ] Performance metrics are realistic
- [ ] Implementation status is clearly marked
-
-### Code Consistency
- [ ] Import statements match actual files
- [ ] Dependencies are correctly listed
- [ ] File paths are accurate
- [ ] Class names match implementation
- [ ] Interface definitions are current
- [ ] Configuration structure is correct
- [ ] Error types are properly defined
- [ ] Logging patterns are consistent
-
---
-
-## 🎯 Conclusion
-
-This audit identified several critical inaccuracies in the documentation that could mislead LLM agents and developers. The corrections ensure that:
-
-1. **API endpoints** accurately reflect the current implementation
-2. **Environment variables** use the correct names and structure
-3. **Service names** match the actual file names and implementations
-4. **Configuration options** reflect the current codebase structure
-5. **Implementation status** is clearly marked for incomplete features
-
-By implementing these corrections, the documentation will provide accurate, reliable information for LLM agents and developers, leading to more effective code understanding and modification.
-
---
-
-**Next Steps**:
-1. Apply all corrections identified in this audit
-2. Verify accuracy by testing documentation against actual code
-3. Update documentation templates to prevent future inaccuracies
-4. Establish regular documentation review process
-5. Monitor for new discrepancies as codebase evolves 
--- a/DOCUMENTATION_COMPLETION_REPORT.md
+++ b/DOCUMENTATION_COMPLETION_REPORT.md
@@ -1,273 +0,0 @@
-# Documentation Completion Report
-## Comprehensive Documentation and Cleanup Summary
-
-### 🎯 Executive Summary
-
-This report summarizes the completion of comprehensive documentation for the CIM Document Processor project, including the creation of detailed documentation for all critical components and the cleanup of obsolete files.
-
---
-
-## ✅ Completed Documentation
-
-### Phase 1: Core Service Documentation ✅
-**Status**: **COMPLETED**
-
-#### Critical Services Documented
-1. **`optimizedAgenticRAGProcessor.md`** - Core AI processing engine
-   - Intelligent chunking and vector embedding
-   - Memory optimization and batch processing
-   - Performance monitoring and error handling
-
-2. **`llmService.md`** - LLM interactions service
-   - Multi-provider support (Claude AI, OpenAI)
-   - Intelligent model selection and cost tracking
-   - Comprehensive prompt engineering
-
-3. **`documentAiProcessor.md`** - Document AI integration
-   - Google Document AI with fallback strategies
-   - PDF text extraction and entity recognition
-   - Integration with agentic RAG processing
-
-4. **`pdfGenerationService.md`** - PDF generation service
-   - High-performance PDF generation with Puppeteer
-   - Page pooling and caching optimization
-   - Professional CIM review PDF templates
-
-5. **`unifiedDocumentProcessor.md`** - Main orchestrator (already existed)
-   - Document processing pipeline orchestration
-   - Strategy selection and routing
-   - Comprehensive error handling
-
-### Phase 2: API Documentation ✅
-**Status**: **COMPLETED**
-
-#### `API_DOCUMENTATION_GUIDE.md`
- Complete API endpoint reference
- Authentication and error handling
- Rate limiting and monitoring
- Usage examples in multiple languages
- Correlation ID tracking for debugging
-
-### Phase 3: Database & Models ✅
-**Status**: **COMPLETED**
-
-#### `DocumentModel.md`
- Core data model for document management
- CRUD operations and lifecycle management
- User-specific data isolation
- Performance optimization strategies
-
-#### `DATABASE_SCHEMA_DOCUMENTATION.md`
- Complete database schema documentation
- All tables, relationships, and indexes
- Row Level Security (RLS) policies
- Migration scripts and optimization strategies
-
-### Phase 4: Configuration & Setup ✅
-**Status**: **COMPLETED**
-
-#### `CONFIGURATION_GUIDE.md`
- Environment variables and setup procedures
- Development, staging, and production configurations
- Security and performance optimization
- Troubleshooting and validation
-
-### Phase 5: Frontend Documentation ✅
-**Status**: **COMPLETED**
-
-#### `FRONTEND_DOCUMENTATION_SUMMARY.md`
- Complete frontend architecture overview
- Component hierarchy and data flow
- Service layer documentation
- Performance and security considerations
-
-### Phase 6: Testing & Quality Assurance ✅
-**Status**: **COMPLETED**
-
-#### `TESTING_STRATEGY_DOCUMENTATION.md`
- Testing strategy and current state
- Future testing approach and guidelines
- Test removal rationale and benefits
- Modern testing stack recommendations
-
-### Phase 7: Operational Documentation ✅
-**Status**: **COMPLETED**
-
-#### `MONITORING_AND_ALERTING_GUIDE.md`
- Complete monitoring strategy and alerting system
- Performance metrics and health checks
- Incident response procedures
- Dashboard and visualization setup
-
-#### `TROUBLESHOOTING_GUIDE.md`
- Common issues and diagnostic procedures
- Problem resolution and debugging tools
- Maintenance procedures and preventive measures
- Support and escalation procedures
-
-#### `OPERATIONAL_DOCUMENTATION_SUMMARY.md`
- Comprehensive operational guide
- Key performance indicators and metrics
- Support structure and escalation procedures
- Continuous improvement strategies
-
---
-
-## 🧹 Cleanup Summary
-
-### Obsolete Files Removed
-
-#### Documentation Files
- ❌ `codebase-audit-report.md` - Outdated audit report
- ❌ `DEPENDENCY_ANALYSIS_REPORT.md` - Outdated dependency analysis
- ❌ `DOCUMENT_AI_INTEGRATION_SUMMARY.md` - Superseded by comprehensive documentation
-
-#### Temporary Files
- ❌ `currrent_output.json` - Temporary output file (2.1MB)
- ❌ `document-e8910144-eb6b-4b76-8fbc-717ff077eba8.pdf` - Test document (62KB)
- ❌ `backend/src/services/unifiedDocumentProcessor.md` - Duplicate documentation
-
-#### Test Files (Removed)
- ❌ `backend/src/test/` - Complete test directory
- ❌ `backend/src/*/__tests__/` - All test directories
- ❌ `frontend/src/components/__tests__/` - Frontend component tests
- ❌ `frontend/src/test/` - Frontend test setup
- ❌ `backend/jest.config.js` - Jest configuration
-
-### Files Retained (Essential)
- ✅ `README.md` - Project overview and quick start
- ✅ `APP_DESIGN_DOCUMENTATION.md` - System architecture
- ✅ `AGENTIC_RAG_IMPLEMENTATION_PLAN.md` - AI processing strategy
- ✅ `PDF_GENERATION_ANALYSIS.md` - PDF optimization details
- ✅ `DEPLOYMENT_GUIDE.md` - Deployment instructions
- ✅ `ARCHITECTURE_DIAGRAMS.md` - Visual architecture
- ✅ `DOCUMENTATION_AUDIT_REPORT.md` - Accuracy audit
- ✅ `FULL_DOCUMENTATION_PLAN.md` - Documentation strategy
- ✅ `LLM_DOCUMENTATION_SUMMARY.md` - LLM optimization guide
- ✅ `CODE_SUMMARY_TEMPLATE.md` - Documentation template
- ✅ `LLM_AGENT_DOCUMENTATION_GUIDE.md` - Best practices guide
-
---
-
-## 📊 Documentation Quality Metrics
-
-### Completeness
- **Core Services**: 100% documented (5/5 services)
- **API Endpoints**: 100% documented (all endpoints)
- **Database Models**: 100% documented (core models)
- **Configuration**: 100% documented (all environments)
-
-### Accuracy
- **API References**: 100% accurate (verified against codebase)
- **Service Names**: 100% accurate (matches actual implementation)
- **Environment Variables**: 100% accurate (correct names and structure)
- **Method Signatures**: 100% accurate (proper types and parameters)
-
-### LLM Optimization
- **Structured Information**: 100% consistent formatting
- **Context-Rich Descriptions**: 100% comprehensive context
- **Example-Rich Content**: 100% realistic usage examples
- **Error Documentation**: 100% complete error scenarios
-
---
-
-## 🎯 LLM Agent Benefits
-
-### Immediate Benefits
-1. **Complete Understanding** - LLM agents can now understand the entire processing pipeline
-2. **Accurate References** - All API endpoints, service names, and configurations are correct
-3. **Error Handling** - Comprehensive error scenarios and recovery strategies documented
-4. **Performance Context** - Understanding of processing times, memory usage, and optimization strategies
-
-### Long-term Benefits
-1. **Faster Development** - LLM agents can make accurate code modifications
-2. **Reduced Errors** - Better context leads to fewer implementation errors
-3. **Improved Maintenance** - Comprehensive documentation supports long-term maintenance
-4. **Enhanced Collaboration** - Clear documentation improves team collaboration
-
---
-
-## 📋 Documentation Structure
-
-### Level 1: Project Overview
- `README.md` - Entry point and quick start guide
-
-### Level 2: Architecture Documentation
- `APP_DESIGN_DOCUMENTATION.md` - Complete system architecture
- `ARCHITECTURE_DIAGRAMS.md` - Visual system design
- `AGENTIC_RAG_IMPLEMENTATION_PLAN.md` - AI processing strategy
-
-### Level 3: Service Documentation
- `backend/src/services/optimizedAgenticRAGProcessor.md` - AI processing engine
- `backend/src/services/llmService.md` - LLM interactions
- `backend/src/services/documentAiProcessor.md` - Document AI integration
- `backend/src/services/pdfGenerationService.md` - PDF generation
- `backend/src/models/DocumentModel.md` - Document data model
-
-### Level 4: Implementation Guides
- `API_DOCUMENTATION_GUIDE.md` - Complete API reference
- `CONFIGURATION_GUIDE.md` - Environment setup and configuration
- `DATABASE_SCHEMA_DOCUMENTATION.md` - Database structure and optimization
-
-### Level 5: Best Practices
- `LLM_AGENT_DOCUMENTATION_GUIDE.md` - Documentation best practices
- `CODE_SUMMARY_TEMPLATE.md` - Standardized documentation template
- `LLM_DOCUMENTATION_SUMMARY.md` - LLM optimization strategies
-
---
-
-## 🔄 Maintenance Recommendations
-
-### Documentation Updates
-1. **Regular Reviews** - Monthly documentation accuracy reviews
-2. **Version Tracking** - Track documentation versions with code releases
-3. **Automated Validation** - Implement automated documentation validation
-4. **User Feedback** - Collect feedback on documentation effectiveness
-
-### Quality Assurance
-1. **Accuracy Checks** - Regular verification against actual codebase
-2. **Completeness Audits** - Ensure all new features are documented
-3. **LLM Testing** - Test documentation effectiveness with LLM agents
-4. **Performance Monitoring** - Track documentation usage and effectiveness
-
---
-
-## 📈 Success Metrics
-
-### Documentation Quality
- **Completeness**: 100% of critical components documented
- **Accuracy**: 0% of inaccurate references
- **Clarity**: Clear and understandable content
- **Consistency**: Consistent style and format across all documents
-
-### LLM Agent Effectiveness
- **Understanding Accuracy**: LLM agents comprehend codebase structure
- **Modification Success**: Successful code modifications with documentation guidance
- **Error Reduction**: Reduced LLM-generated errors due to better context
- **Development Speed**: Faster development with comprehensive documentation
-
-### User Experience
- **Onboarding Time**: Reduced time for new developers to understand system
- **Issue Resolution**: Faster issue resolution with comprehensive documentation
- **Feature Development**: Faster feature implementation with clear guidance
- **Code Review Efficiency**: More efficient code reviews with better context
-
---
-
-## 🎯 Conclusion
-
-The comprehensive documentation project has been successfully completed, providing:
-
-1. **Complete Coverage** - All critical components are thoroughly documented
-2. **High Accuracy** - All references have been verified against the actual codebase
-3. **LLM Optimization** - Documentation is optimized for AI agent understanding
-4. **Clean Repository** - Obsolete and temporary files have been removed
-
-The CIM Document Processor now has world-class documentation that will significantly enhance development efficiency, reduce errors, and improve maintainability. LLM agents can now work effectively with the codebase, leading to faster development cycles and higher quality code.
-
---
-
-**Project Status**: ✅ **COMPLETED** (100% - All 7 phases)
-**Documentation Quality**: 🏆 **EXCELLENT**
-**LLM Agent Readiness**: 🚀 **OPTIMIZED**
-**Operational Excellence**: 🎯 **COMPREHENSIVE** 
--- a/DOCUMENT_AI_AGENTIC_RAG_INTEGRATION.md
+++ b/DOCUMENT_AI_AGENTIC_RAG_INTEGRATION.md
@@ -1,355 +0,0 @@
-# Document AI + Agentic RAG Integration Guide
-
-## Overview
-
-This guide explains how to integrate Google Cloud Document AI with Agentic RAG for enhanced CIM document processing. This approach provides superior text extraction and structured analysis compared to traditional PDF parsing.
-
-## 🎯 **Benefits of Document AI + Agentic RAG**
-
-### **Document AI Advantages:**
- **Superior text extraction** from complex PDF layouts
- **Table structure preservation** with accurate cell relationships
- **Entity recognition** for financial data, dates, amounts
- **Layout understanding** maintains document structure
- **Multi-format support** (PDF, images, scanned documents)
-
-### **Agentic RAG Advantages:**
- **Structured AI workflows** with type safety
- **Map-reduce processing** for large documents
- **Timeout handling** and error recovery
- **Cost optimization** with intelligent chunking
- **Consistent output formatting** with Zod schemas
-
-## 🔧 **Setup Requirements**
-
-### **1. Google Cloud Configuration**
-
-```bash
-# Environment variables to add to your .env file
-GCLOUD_PROJECT_ID=cim-summarizer
-DOCUMENT_AI_LOCATION=us
-DOCUMENT_AI_PROCESSOR_ID=your-processor-id
-GCS_BUCKET_NAME=cim-summarizer-uploads
-DOCUMENT_AI_OUTPUT_BUCKET_NAME=cim-summarizer-document-ai-output
-```
-
-### **2. Google Cloud Services Setup**
-
-```bash
-# Enable required APIs
-gcloud services enable documentai.googleapis.com
-gcloud services enable storage.googleapis.com
-
-# Create Document AI processor
-gcloud ai document processors create \
-  --processor-type=document-ocr \
-  --location=us \
-  --display-name="CIM Document Processor"
-
-# Create GCS buckets
-gsutil mb gs://cim-summarizer-uploads
-gsutil mb gs://cim-summarizer-document-ai-output
-```
-
-### **3. Service Account Permissions**
-
-```bash
-# Create service account with required roles
-gcloud iam service-accounts create cim-document-processor \
-  --display-name="CIM Document Processor"
-
-# Grant necessary permissions
-gcloud projects add-iam-policy-binding cim-summarizer \
-  --member="serviceAccount:cim-document-processor@cim-summarizer.iam.gserviceaccount.com" \
-  --role="roles/documentai.apiUser"
-
-gcloud projects add-iam-policy-binding cim-summarizer \
-  --member="serviceAccount:cim-document-processor@cim-summarizer.iam.gserviceaccount.com" \
-  --role="roles/storage.objectAdmin"
-```
-
-## 📦 **Dependencies**
-
-Add these to your `package.json`:
-
-```json
-{
-  "dependencies": {
-    "@google-cloud/documentai": "^8.0.0",
-    "@google-cloud/storage": "^7.0.0",
-    "@google-cloud/documentai": "^8.0.0",
-    "zod": "^3.25.76"
-  }
-}
-```
-
-## 🔄 **Integration with Existing System**
-
-### **1. Processing Strategy Selection**
-
-Your system now supports 5 processing strategies:
-
-```typescript
-type ProcessingStrategy = 
-  | 'chunking'           // Traditional chunking approach
-  | 'rag'               // Retrieval-Augmented Generation
-  | 'agentic_rag'       // Multi-agent RAG system
-  | 'optimized_agentic_rag' // Optimized multi-agent system
-  | 'document_ai_agentic_rag';   // Document AI + Agentic RAG (NEW)
-```
-
-### **2. Environment Configuration**
-
-Update your environment configuration:
-
-```typescript
-// In backend/src/config/env.ts
-const envSchema = Joi.object({
-  // ... existing config
-  
-  // Google Cloud Document AI Configuration
-  GCLOUD_PROJECT_ID: Joi.string().default('cim-summarizer'),
-  DOCUMENT_AI_LOCATION: Joi.string().default('us'),
-  DOCUMENT_AI_PROCESSOR_ID: Joi.string().allow('').optional(),
-  GCS_BUCKET_NAME: Joi.string().default('cim-summarizer-uploads'),
-  DOCUMENT_AI_OUTPUT_BUCKET_NAME: Joi.string().default('cim-summarizer-document-ai-output'),
-});
-```
-
-### **3. Strategy Selection**
-
-```typescript
-// Set as default strategy
-PROCESSING_STRATEGY=document_ai_agentic_rag
-
-// Or select per document
-const result = await unifiedDocumentProcessor.processDocument(
-  documentId, 
-  userId, 
-  text, 
-  { strategy: 'document_ai_agentic_rag' }
-);
-```
-
-## 🚀 **Usage Examples**
-
-### **1. Basic Document Processing**
-
-```typescript
-import { processCimDocumentServerAction } from './documentAiProcessor';
-
-const result = await processCimDocumentServerAction({
-  fileDataUri: 'data:application/pdf;base64,JVBERi0xLjc...',
-  fileName: 'investment-memo.pdf'
-});
-
-console.log(result.markdownOutput);
-```
-
-### **2. Integration with Existing Controller**
-
-```typescript
-// In your document controller
-export const documentController = {
-  async uploadDocument(req: Request, res: Response): Promise<void> {
-    // ... existing upload logic
-    
-    // Use Document AI + Agentic RAG strategy
-    const processingOptions = {
-      strategy: 'document_ai_agentic_rag',
-      enableTableExtraction: true,
-      enableEntityRecognition: true
-    };
-    
-    const result = await unifiedDocumentProcessor.processDocument(
-      document.id, 
-      userId, 
-      extractedText, 
-      processingOptions
-    );
-  }
-};
-```
-
-### **3. Strategy Comparison**
-
-```typescript
-// Compare all strategies
-const comparison = await unifiedDocumentProcessor.compareProcessingStrategies(
-  documentId,
-  userId,
-  text,
-  { includeDocumentAiAgenticRag: true }
-);
-
-console.log('Best strategy:', comparison.winner);
-console.log('Document AI + Agentic RAG result:', comparison.documentAiAgenticRag);
-```
-
-## 📊 **Performance Comparison**
-
-### **Expected Performance Metrics:**
-
-| Strategy | Processing Time | API Calls | Quality Score | Cost |
-|----------|----------------|-----------|---------------|------|
-| Chunking | 3-5 minutes | 9-12 | 7/10 | $2-3 |
-| RAG | 2-3 minutes | 6-8 | 8/10 | $1.5-2 |
-| Agentic RAG | 4-6 minutes | 15-20 | 9/10 | $3-4 |
-| **Document AI + Agentic RAG** | **1-2 minutes** | **1-2** | **9.5/10** | **$1-1.5** |
-
-### **Key Advantages:**
- **50% faster** than traditional chunking
- **90% fewer API calls** than agentic RAG
- **Superior text extraction** with table preservation
- **Lower costs** with better quality
-
-## 🔍 **Error Handling**
-
-### **Common Issues and Solutions:**
-
-```typescript
-// 1. Document AI Processing Errors
-try {
-  const result = await processCimDocumentServerAction(input);
-} catch (error) {
-  if (error.message.includes('Document AI')) {
-    // Fallback to traditional processing
-    return await fallbackToTraditionalProcessing(input);
-  }
-}
-
-// 2. Agentic RAG Flow Timeouts
-const TIMEOUT_DURATION_FLOW = 1800000; // 30 minutes
-const TIMEOUT_DURATION_ACTION = 2100000; // 35 minutes
-
-// 3. GCS Cleanup Failures
-try {
-  await cleanupGCSFiles(gcsFilePath);
-} catch (cleanupError) {
-  logger.warn('GCS cleanup failed, but processing succeeded', cleanupError);
-  // Continue with success response
-}
-```
-
-## 🧪 **Testing**
-
-### **1. Unit Tests**
-
-```typescript
-// Test Document AI + Agentic RAG processor
-describe('DocumentAiProcessor', () => {
-  it('should process CIM document successfully', async () => {
-    const processor = new DocumentAiProcessor();
-    const result = await processor.processDocument(
-      'test-doc-id',
-      'test-user-id',
-      Buffer.from('test content'),
-      'test.pdf',
-      'application/pdf'
-    );
-    
-    expect(result.success).toBe(true);
-    expect(result.content).toContain('<START_WORKSHEET>');
-  });
-});
-```
-
-### **2. Integration Tests**
-
-```typescript
-// Test full pipeline
-describe('Document AI + Agentic RAG Integration', () => {
-  it('should process real CIM document', async () => {
-    const fileDataUri = await loadTestPdfAsDataUri();
-    const result = await processCimDocumentServerAction({
-      fileDataUri,
-      fileName: 'test-cim.pdf'
-    });
-    
-    expect(result.markdownOutput).toMatch(/Investment Summary/);
-    expect(result.markdownOutput).toMatch(/Financial Metrics/);
-  });
-});
-```
-
-## 🔒 **Security Considerations**
-
-### **1. File Validation**
-
-```typescript
-// Validate file types and sizes
-const allowedMimeTypes = [
-  'application/pdf',
-  'image/jpeg',
-  'image/png',
-  'image/tiff'
-];
-
-const maxFileSize = 50 * 1024 * 1024; // 50MB
-```
-
-### **2. GCS Security**
-
-```typescript
-// Use signed URLs for temporary access
-const signedUrl = await bucket.file(fileName).getSignedUrl({
-  action: 'read',
-  expires: Date.now() + 15 * 60 * 1000, // 15 minutes
-});
-```
-
-### **3. Service Account Permissions**
-
-```bash
-# Follow principle of least privilege
-gcloud projects add-iam-policy-binding cim-summarizer \
-  --member="serviceAccount:cim-document-processor@cim-summarizer.iam.gserviceaccount.com" \
-  --role="roles/documentai.apiUser"
-```
-
-## 📈 **Monitoring and Analytics**
-
-### **1. Performance Tracking**
-
-```typescript
-// Track processing metrics
-const metrics = {
-  processingTime: Date.now() - startTime,
-  fileSize: fileBuffer.length,
-  extractedTextLength: combinedExtractedText.length,
-  documentAiEntities: fullDocumentAiOutput.entities?.length || 0,
-  documentAiTables: fullDocumentAiOutput.tables?.length || 0
-};
-```
-
-### **2. Error Monitoring**
-
-```typescript
-// Log detailed error information
-logger.error('Document AI + Agentic RAG processing failed', {
-  documentId,
-  error: error.message,
-  stack: error.stack,
-  documentAiOutput: fullDocumentAiOutput,
-  processingTime: Date.now() - startTime
-});
-```
-
-## 🎯 **Next Steps**
-
-1. **Set up Google Cloud project** with Document AI and GCS
-2. **Configure environment variables** with your project details
-3. **Test with sample CIM documents** to validate extraction quality
-4. **Compare performance** with existing strategies
-5. **Gradually migrate** from chunking to Document AI + Agentic RAG
-6. **Monitor costs and performance** in production
-
-## 📞 **Support**
-
-For issues with:
- **Google Cloud setup**: Check Google Cloud documentation
- **Document AI**: Review processor configuration and permissions
- **Agentic RAG integration**: Verify API keys and model configuration
- **Performance**: Monitor logs and adjust timeout settings
-
-This integration provides a significant upgrade to your CIM processing capabilities with better quality, faster processing, and lower costs. 
--- a/FRONTEND_DOCUMENTATION_SUMMARY.md
+++ b/FRONTEND_DOCUMENTATION_SUMMARY.md
@@ -1,438 +0,0 @@
-# Frontend Documentation Summary
-## Complete Frontend Architecture and Component Documentation
-
-### 🎯 Overview
-
-This document provides a comprehensive summary of the frontend documentation for the CIM Document Processor, covering all major components, services, and architectural patterns.
-
---
-
-## 📋 Documentation Status
-
-### ✅ **Completed Documentation**
-
-#### **Core Components**
-1. **`App.tsx`** - Main application component with routing and dashboard
-   - **Purpose**: Application orchestrator with authentication and navigation
-   - **Key Features**: Dashboard tabs, document management, real-time updates
-   - **Documentation**: `frontend/src/App.md`
-
-2. **`DocumentUpload.tsx`** - File upload component with drag-and-drop
-   - **Purpose**: Document upload interface with progress tracking
-   - **Key Features**: Drag-and-drop, progress bars, error handling
-   - **Documentation**: `frontend/src/components/DocumentUpload.md`
-
-#### **Services**
-3. **`documentService.ts`** - Document API service
-   - **Purpose**: Centralized API client for document operations
-   - **Key Features**: Upload, retrieval, CIM review management, analytics
-   - **Documentation**: `frontend/src/services/documentService.md`
-
---
-
-## 🏗️ Frontend Architecture
-
-### Technology Stack
- **Framework**: React 18 with TypeScript
- **Routing**: React Router v6
- **State Management**: React Context API
- **HTTP Client**: Axios with interceptors
- **UI Components**: Custom components with Tailwind CSS
- **Icons**: Lucide React
- **File Upload**: React Dropzone
- **Storage**: Firebase Storage with GCS fallback
-
-### Architecture Patterns
- **Component-Based**: Modular, reusable components
- **Service Layer**: Centralized API communication
- **Context Pattern**: Global state management
- **HOC Pattern**: Route protection and authentication
- **Custom Hooks**: Reusable logic extraction
-
---
-
-## 📊 Component Hierarchy
-
-```
-App.tsx (Main Application)
-├── AuthProvider (Authentication Context)
-├── Router (Client-side Routing)
-│   ├── LoginPage (Authentication)
-│   ├── UnauthorizedPage (Error Handling)
-│   └── ProtectedRoute (Route Protection)
-│       └── Dashboard (Main Interface)
-│           ├── DocumentUpload (File Upload)
-│           ├── DocumentList (Document Management)
-│           ├── DocumentViewer (Document Display)
-│           ├── Analytics (Data Visualization)
-│           └── UploadMonitoringDashboard (Monitoring)
-└── LogoutButton (User Actions)
-```
-
---
-
-## 🔧 Key Components
-
-### App Component
-**File**: `frontend/src/App.tsx`
-**Purpose**: Main application orchestrator
-
-#### Key Features
- **Routing**: Client-side routing with React Router
- **Authentication**: Protected routes and auth state management
- **Dashboard**: Multi-tab interface for different functionalities
- **Real-time Updates**: Document status polling and updates
- **Error Handling**: Comprehensive error handling and user feedback
-
-#### State Management
-```typescript
-interface DashboardState {
-  documents: Document[];
-  loading: boolean;
-  viewingDocument: string | null;
-  searchTerm: string;
-  activeTab: 'overview' | 'documents' | 'upload' | 'analytics' | 'monitoring';
-}
-```
-
-#### Key Functions
- `mapBackendStatus()` - Status mapping from backend to frontend
- `fetchDocuments()` - Document retrieval with authentication
- `handleUploadComplete()` - Upload completion handling
- `handleViewDocument()` - Document viewing navigation
-
-### DocumentUpload Component
-**File**: `frontend/src/components/DocumentUpload.tsx`
-**Purpose**: File upload interface with drag-and-drop
-
-#### Key Features
- **Drag-and-Drop**: React Dropzone integration
- **Progress Tracking**: Real-time upload progress visualization
- **File Validation**: Type, size, and format validation
- **Error Handling**: Comprehensive error scenarios and recovery
- **Upload Cancellation**: Abort controller for upload cancellation
-
-#### State Management
-```typescript
-interface UploadedFile {
-  id: string;
-  name: string;
-  size: number;
-  type: string;
-  status: 'uploading' | 'uploaded' | 'processing' | 'completed' | 'error';
-  progress: number;
-  error?: string;
-  documentId?: string;
-  storageError?: boolean;
-  storageType?: 'firebase' | 'local';
-  storageUrl?: string;
-}
-```
-
-#### Key Functions
- `onDrop()` - File drop handling and upload initiation
- `checkProgress()` - Progress polling and status updates
- `removeFile()` - File removal and upload cancellation
- `formatFileSize()` - File size formatting utility
-
---
-
-## 🔌 Services Layer
-
-### Document Service
-**File**: `frontend/src/services/documentService.ts`
-**Purpose**: Centralized API client for document operations
-
-#### Key Features
- **HTTP Client**: Axios with authentication interceptors
- **Error Handling**: Comprehensive error classification and recovery
- **Progress Tracking**: Upload progress callbacks
- **CIM Review Management**: Structured CIM review data handling
- **Analytics**: Document analytics and reporting
-
-#### Core Methods
-```typescript
-class DocumentService {
-  async uploadDocument(file: File, onProgress?: callback, signal?: AbortSignal): Promise<Document>
-  async getDocuments(): Promise<Document[]>
-  async getDocumentStatus(documentId: string): Promise<StatusInfo>
-  async saveCIMReview(documentId: string, reviewData: CIMReviewData): Promise<void>
-  async getAnalytics(days: number): Promise<AnalyticsData>
-}
-```
-
-#### Data Structures
- `Document` - Complete document information
- `CIMReviewData` - Structured CIM review template data
- `GCSError` - Google Cloud Storage error classification
- `UploadProgress` - Upload progress tracking
-
---
-
-## 📊 Data Flow
-
-### Document Upload Flow
-1. **File Selection**: User selects files via drag-and-drop
-2. **Validation**: Component validates file type, size, and format
-3. **Upload Initiation**: Document service uploads to Firebase Storage
-4. **Progress Tracking**: Real-time progress updates via callbacks
-5. **Backend Notification**: Notify backend of successful upload
-6. **Processing**: Backend starts document processing
-7. **Status Updates**: Poll for processing status updates
-8. **Completion**: Display final results and analysis
-
-### Document Management Flow
-1. **Authentication**: Verify user authentication
-2. **Document Fetch**: Retrieve user's documents from API
-3. **Data Transformation**: Transform backend data to frontend format
-4. **Status Mapping**: Map backend status to frontend display
-5. **UI Rendering**: Display documents with appropriate status indicators
-6. **User Actions**: Handle view, download, delete, retry actions
-
-### CIM Review Flow
-1. **Data Entry**: User enters CIM review data
-2. **Validation**: Validate data structure and required fields
-3. **API Save**: Send review data to backend API
-4. **Storage**: Backend stores in database
-5. **Confirmation**: Show success confirmation to user
-6. **Retrieval**: Load saved review data for editing
-
---
-
-## 🚨 Error Handling
-
-### Error Types
- **Authentication Errors**: Token expiry, invalid credentials
- **Upload Errors**: File validation, storage failures
- **Network Errors**: Connectivity issues, timeouts
- **API Errors**: Backend service failures
- **GCS Errors**: Google Cloud Storage specific errors
-
-### Error Recovery Strategies
- **Authentication**: Automatic token refresh, redirect to login
- **Upload**: Retry with exponential backoff, fallback storage
- **Network**: Retry on reconnection, offline indicators
- **API**: Retry with backoff, user-friendly error messages
- **GCS**: Fallback to local storage, error classification
-
-### Error Logging
-```typescript
-console.error('Frontend error:', {
-  component: 'ComponentName',
-  action: 'ActionName',
-  error: error.message,
-  errorType: error.type,
-  userId: user?.id,
-  timestamp: new Date().toISOString()
-});
-```
-
---
-
-## 🧪 Testing Strategy
-
-### Test Coverage
- **Unit Tests**: 90% - Component rendering and state management
- **Integration Tests**: 85% - API interactions and authentication
- **E2E Tests**: 80% - Complete user workflows
-
-### Test Data
- **Sample Documents**: Mock document data for testing
- **Authentication States**: Different auth states for testing
- **Error Scenarios**: Various error conditions for testing
- **Upload Files**: Test files for upload functionality
-
-### Mock Strategy
- **API Calls**: Mock axios responses and interceptors
- **Authentication**: Mock AuthContext with different states
- **File Upload**: Mock Firebase Storage operations
- **Network Conditions**: Mock network errors and timeouts
-
---
-
-## 📈 Performance Characteristics
-
-### Performance Metrics
- **Initial Load Time**: <2 seconds for authenticated users
- **Document List Rendering**: <500ms for 100 documents
- **Upload Speed**: 10MB/s for typical network conditions
- **Progress Updates**: 100ms intervals for smooth UI updates
- **Memory Usage**: <50MB for typical usage
-
-### Optimization Strategies
- **Lazy Loading**: Components loaded on demand
- **Memoization**: Expensive operations memoized
- **Debouncing**: Search input debounced for performance
- **Virtual Scrolling**: Large lists use virtual scrolling
- **Caching**: Document data cached to reduce API calls
-
-### Scalability Limits
- **Document Count**: 1000+ documents per user
- **Concurrent Uploads**: 10 simultaneous uploads
- **File Size**: Up to 100MB per file
- **Concurrent Users**: 100+ simultaneous users
-
---
-
-## 🔐 Security Considerations
-
-### Authentication
- **Token Management**: Secure token storage and refresh
- **Route Protection**: Protected routes with authentication checks
- **Session Management**: Handle session expiry gracefully
- **Secure Storage**: Store tokens securely in memory
-
-### Data Protection
- **Input Validation**: Validate all user inputs
- **File Validation**: Validate file types and sizes
- **XSS Prevention**: Sanitize user-generated content
- **Error Information**: Prevent sensitive data leakage in errors
-
-### API Security
- **HTTPS Only**: All API calls use HTTPS
- **CORS Configuration**: Proper CORS settings
- **Rate Limiting**: Client-side rate limiting
- **Request Validation**: Validate all API requests
-
---
-
-## 🔍 Debugging & Monitoring
-
-### Logging
- **Component Lifecycle**: Log component mount/unmount events
- **API Calls**: Log all API requests and responses
- **User Actions**: Log user interactions and state changes
- **Error Tracking**: Comprehensive error logging and analysis
-
-### Debug Tools
- **React DevTools**: Component state and props inspection
- **Network Tab**: API call monitoring and debugging
- **Console Logging**: Detailed operation logging
- **Error Boundaries**: Graceful error handling and reporting
-
-### Common Issues
-1. **Authentication Token Expiry**: Handle token refresh automatically
-2. **Large File Uploads**: Implement chunked uploads for large files
-3. **Component Re-renders**: Optimize with React.memo and useCallback
-4. **Memory Leaks**: Clean up event listeners and subscriptions
-
---
-
-## 📚 Related Documentation
-
-### Internal References
- `contexts/AuthContext.tsx` - Authentication state management
- `config/env.ts` - Environment configuration
- `utils/cn.ts` - CSS utility functions
-
-### External References
- [React Documentation](https://react.dev/)
- [React Router Documentation](https://reactrouter.com/docs)
- [Axios Documentation](https://axios-http.com/docs/intro)
- [Firebase Storage Documentation](https://firebase.google.com/docs/storage)
-
---
-
-## 🔄 Change History
-
-### Recent Changes
- `2024-12-20` - Implemented comprehensive frontend documentation - `[Author]`
- `2024-12-15` - Added component and service documentation - `[Author]`
- `2024-12-10` - Implemented error handling and performance optimization - `[Author]`
-
-### Planned Changes
- Advanced search and filtering - `2025-01-15`
- Real-time collaboration features - `2025-01-30`
- Enhanced analytics dashboard - `2025-02-15`
-
---
-
-## 🎯 LLM Agent Benefits
-
-### Immediate Benefits
-1. **Complete Understanding** - LLM agents can understand the entire frontend architecture
-2. **Component Relationships** - Clear understanding of component hierarchy and dependencies
-3. **State Management** - Understanding of data flow and state management patterns
-4. **Error Handling** - Comprehensive error scenarios and recovery strategies
-
-### Long-term Benefits
-1. **Faster Development** - LLM agents can make accurate frontend modifications
-2. **Reduced Errors** - Better context leads to fewer implementation errors
-3. **Improved Maintenance** - Comprehensive documentation supports long-term maintenance
-4. **Enhanced Collaboration** - Clear documentation improves team collaboration
-
---
-
-## 📋 Usage Examples
-
-### Component Integration
-```typescript
-import React from 'react';
-import { DocumentUpload } from './components/DocumentUpload';
-import { documentService } from './services/documentService';
-
-const MyComponent: React.FC = () => {
-  const handleUploadComplete = (documentId: string) => {
-    console.log('Upload completed:', documentId);
-  };
-
-  const handleUploadError = (error: string) => {
-    console.error('Upload error:', error);
-  };
-
-  return (
-    <DocumentUpload
-      onUploadComplete={handleUploadComplete}
-      onUploadError={handleUploadError}
-    />
-  );
-};
-```
-
-### Service Usage
-```typescript
-import { documentService } from './services/documentService';
-
-// Upload document with progress tracking
-const uploadDocument = async (file: File) => {
-  try {
-    const document = await documentService.uploadDocument(
-      file,
-      (progress) => console.log(`Progress: ${progress}%`)
-    );
-    console.log('Upload completed:', document.id);
-  } catch (error) {
-    console.error('Upload failed:', error);
-  }
-};
-
-// Get user documents
-const getDocuments = async () => {
-  try {
-    const documents = await documentService.getDocuments();
-    console.log('Documents:', documents);
-  } catch (error) {
-    console.error('Failed to get documents:', error);
-  }
-};
-```
-
---
-
-## 🎯 Conclusion
-
-The frontend documentation provides comprehensive coverage of:
-
-1. **Complete Architecture** - Understanding of the entire frontend structure
-2. **Component Relationships** - Clear component hierarchy and dependencies
-3. **Service Layer** - API communication and data management
-4. **Error Handling** - Comprehensive error scenarios and recovery
-5. **Performance Optimization** - Performance characteristics and optimization strategies
-
-This documentation enables LLM agents to effectively work with the frontend codebase, leading to faster development, reduced errors, and improved maintainability.
-
---
-
-**Frontend Documentation Status**: ✅ **COMPLETED**
-**Component Coverage**: 🏆 **COMPREHENSIVE**
-**LLM Agent Readiness**: 🚀 **OPTIMIZED** 
--- a/FULL_DOCUMENTATION_PLAN.md
+++ b/FULL_DOCUMENTATION_PLAN.md
@@ -1,370 +0,0 @@
-# Full Documentation Plan
-## Comprehensive Documentation Strategy for CIM Document Processor
-
-### 🎯 Project Overview
-
-This plan outlines a systematic approach to create complete, accurate, and LLM-optimized documentation for the CIM Document Processor project. The documentation will cover all aspects of the system from high-level architecture to detailed implementation guides.
-
---
-
-## 📋 Documentation Inventory & Status
-
-### ✅ Existing Documentation (Good Quality)
- `README.md` - Project overview and quick start
- `APP_DESIGN_DOCUMENTATION.md` - System architecture
- `AGENTIC_RAG_IMPLEMENTATION_PLAN.md` - AI processing strategy
- `PDF_GENERATION_ANALYSIS.md` - PDF optimization details
- `DEPLOYMENT_GUIDE.md` - Deployment instructions
- `ARCHITECTURE_DIAGRAMS.md` - Visual architecture
- `DOCUMENTATION_AUDIT_REPORT.md` - Accuracy audit
-
-### ⚠️ Existing Documentation (Needs Updates)
- `codebase-audit-report.md` - May need updates
- `DEPENDENCY_ANALYSIS_REPORT.md` - May need updates
- `DOCUMENT_AI_INTEGRATION_SUMMARY.md` - May need updates
-
-### ❌ Missing Documentation (To Be Created)
- Individual service documentation
- API endpoint documentation
- Database schema documentation
- Configuration guide
- Testing documentation
- Troubleshooting guide
- Development workflow guide
- Security documentation
- Performance optimization guide
- Monitoring and alerting guide
-
---
-
-## 🏗️ Documentation Architecture
-
-### Level 1: Project Overview
- **README.md** - Entry point and quick start
- **PROJECT_OVERVIEW.md** - Detailed project description
- **ARCHITECTURE_OVERVIEW.md** - High-level system design
-
-### Level 2: System Architecture
- **APP_DESIGN_DOCUMENTATION.md** - Complete architecture
- **ARCHITECTURE_DIAGRAMS.md** - Visual diagrams
- **DATA_FLOW_DOCUMENTATION.md** - System data flow
- **INTEGRATION_GUIDE.md** - External service integration
-
-### Level 3: Component Documentation
- **SERVICES/** - Individual service documentation
- **API/** - API endpoint documentation
- **DATABASE/** - Database schema and models
- **FRONTEND/** - Frontend component documentation
-
-### Level 4: Implementation Guides
- **CONFIGURATION_GUIDE.md** - Environment setup
- **DEPLOYMENT_GUIDE.md** - Deployment procedures
- **TESTING_GUIDE.md** - Testing strategies
- **DEVELOPMENT_WORKFLOW.md** - Development processes
-
-### Level 5: Operational Documentation
- **MONITORING_GUIDE.md** - Monitoring and alerting
- **TROUBLESHOOTING_GUIDE.md** - Common issues and solutions
- **SECURITY_GUIDE.md** - Security considerations
- **PERFORMANCE_GUIDE.md** - Performance optimization
-
---
-
-## 📊 Documentation Priority Matrix
-
-### 🔴 High Priority (Critical for LLM Agents)
-1. **Service Documentation** - All backend services
-2. **API Documentation** - Complete endpoint documentation
-3. **Configuration Guide** - Environment and setup
-4. **Database Schema** - Data models and relationships
-5. **Error Handling** - Comprehensive error documentation
-
-### 🟡 Medium Priority (Important for Development)
-1. **Frontend Documentation** - React components and services
-2. **Testing Documentation** - Test strategies and examples
-3. **Development Workflow** - Development processes
-4. **Performance Guide** - Optimization strategies
-5. **Security Guide** - Security considerations
-
-### 🟢 Low Priority (Nice to Have)
-1. **Monitoring Guide** - Monitoring and alerting
-2. **Troubleshooting Guide** - Common issues
-3. **Integration Guide** - External service integration
-4. **Data Flow Documentation** - Detailed data flow
-5. **Project Overview** - Detailed project description
-
---
-
-## 🚀 Implementation Plan
-
-### Phase 1: Core Service Documentation (Week 1)
-**Goal**: Document all backend services for LLM agent understanding
-
-#### Day 1-2: Critical Services
- [ ] `unifiedDocumentProcessor.ts` - Main orchestrator
- [ ] `optimizedAgenticRAGProcessor.ts` - AI processing engine
- [ ] `llmService.ts` - LLM interactions
- [ ] `documentAiProcessor.ts` - Document AI integration
-
-#### Day 3-4: File Management Services
- [ ] `fileStorageService.ts` - Google Cloud Storage
- [ ] `pdfGenerationService.ts` - PDF generation
- [ ] `uploadMonitoringService.ts` - Upload tracking
- [ ] `uploadProgressService.ts` - Progress tracking
-
-#### Day 5-7: Data Management Services
- [ ] `agenticRAGDatabaseService.ts` - Analytics and sessions
- [ ] `vectorDatabaseService.ts` - Vector embeddings
- [ ] `sessionService.ts` - Session management
- [ ] `jobQueueService.ts` - Background processing
-
-### Phase 2: API Documentation (Week 2)
-**Goal**: Complete API endpoint documentation
-
-#### Day 1-2: Document Routes
- [ ] `documents.ts` - Document management endpoints
- [ ] `monitoring.ts` - Monitoring endpoints
- [ ] `vector.ts` - Vector database endpoints
-
-#### Day 3-4: Controller Documentation
- [ ] `documentController.ts` - Document controller
- [ ] `authController.ts` - Authentication controller
-
-#### Day 5-7: API Integration Guide
- [ ] API authentication guide
- [ ] Request/response examples
- [ ] Error handling documentation
- [ ] Rate limiting documentation
-
-### Phase 3: Database & Models (Week 3)
-**Goal**: Complete database schema and model documentation
-
-#### Day 1-2: Core Models
- [ ] `DocumentModel.ts` - Document data model
- [ ] `UserModel.ts` - User data model
- [ ] `ProcessingJobModel.ts` - Job processing model
-
-#### Day 3-4: AI Models
- [ ] `AgenticRAGModels.ts` - AI processing models
- [ ] `agenticTypes.ts` - AI type definitions
- [ ] `VectorDatabaseModel.ts` - Vector database model
-
-#### Day 5-7: Database Schema
- [ ] Complete database schema documentation
- [ ] Migration documentation
- [ ] Data relationships and constraints
- [ ] Query optimization guide
-
-### Phase 4: Configuration & Setup (Week 4)
-**Goal**: Complete configuration and setup documentation
-
-#### Day 1-2: Environment Configuration
- [ ] Environment variables guide
- [ ] Configuration validation
- [ ] Service account setup
- [ ] API key management
-
-#### Day 3-4: Development Setup
- [ ] Local development setup
- [ ] Development environment configuration
- [ ] Testing environment setup
- [ ] Debugging configuration
-
-#### Day 5-7: Production Setup
- [ ] Production environment setup
- [ ] Deployment configuration
- [ ] Monitoring setup
- [ ] Security configuration
-
-### Phase 5: Frontend Documentation (Week 5)
-**Goal**: Complete frontend component and service documentation
-
-#### Day 1-2: Core Components
- [ ] `App.tsx` - Main application component
- [ ] `DocumentUpload.tsx` - Upload component
- [ ] `DocumentList.tsx` - Document listing
- [ ] `DocumentViewer.tsx` - Document viewing
-
-#### Day 3-4: Service Components
- [ ] `authService.ts` - Authentication service
- [ ] `documentService.ts` - Document service
- [ ] Context providers and hooks
- [ ] Utility functions
-
-#### Day 5-7: Frontend Integration
- [ ] Component interaction patterns
- [ ] State management documentation
- [ ] Error handling in frontend
- [ ] Performance optimization
-
-### Phase 6: Testing & Quality Assurance (Week 6)
-**Goal**: Complete testing documentation and quality assurance
-
-#### Day 1-2: Testing Strategy
- [ ] Unit testing documentation
- [ ] Integration testing documentation
- [ ] End-to-end testing documentation
- [ ] Test data management
-
-#### Day 3-4: Quality Assurance
- [ ] Code quality standards
- [ ] Review processes
- [ ] Performance testing
- [ ] Security testing
-
-#### Day 5-7: Continuous Integration
- [ ] CI/CD pipeline documentation
- [ ] Automated testing
- [ ] Quality gates
- [ ] Release processes
-
-### Phase 7: Operational Documentation (Week 7)
-**Goal**: Complete operational and maintenance documentation
-
-#### Day 1-2: Monitoring & Alerting
- [ ] Monitoring setup guide
- [ ] Alert configuration
- [ ] Performance metrics
- [ ] Health checks
-
-#### Day 3-4: Troubleshooting
- [ ] Common issues and solutions
- [ ] Debug procedures
- [ ] Log analysis
- [ ] Error recovery
-
-#### Day 5-7: Maintenance
- [ ] Backup procedures
- [ ] Update procedures
- [ ] Scaling strategies
- [ ] Disaster recovery
-
---
-
-## 📝 Documentation Standards
-
-### File Naming Convention
- Use descriptive, lowercase names with hyphens
- Include component type in filename
- Example: `unified-document-processor-service.md`
-
-### Content Structure
- Use consistent section headers with emojis
- Include file information header
- Provide usage examples
- Include error handling documentation
- Add LLM agent notes
-
-### Code Examples
- Include TypeScript interfaces
- Provide realistic usage examples
- Show error handling patterns
- Include configuration examples
-
-### Cross-References
- Link related documentation
- Reference external resources
- Include version information
- Maintain consistency across documents
-
---
-
-## 🔍 Quality Assurance
-
-### Documentation Review Process
-1. **Technical Accuracy** - Verify against actual code
-2. **Completeness** - Ensure all aspects are covered
-3. **Clarity** - Ensure clear and understandable
-4. **Consistency** - Maintain consistent style and format
-5. **LLM Optimization** - Optimize for AI agent understanding
-
-### Review Checklist
- [ ] All code examples are current and working
- [ ] API documentation matches implementation
- [ ] Configuration examples are accurate
- [ ] Error handling documentation is complete
- [ ] Performance metrics are realistic
- [ ] Links and references are valid
- [ ] LLM agent notes are included
- [ ] Cross-references are accurate
-
---
-
-## 📊 Success Metrics
-
-### Documentation Quality Metrics
- **Completeness**: 100% of services documented
- **Accuracy**: 0% of inaccurate references
- **Clarity**: Clear and understandable content
- **Consistency**: Consistent style and format
-
-### LLM Agent Effectiveness Metrics
- **Understanding Accuracy**: LLM agents comprehend codebase
- **Modification Success**: Successful code modifications
- **Error Reduction**: Reduced LLM-generated errors
- **Development Speed**: Faster development with LLM assistance
-
-### User Experience Metrics
- **Onboarding Time**: Reduced time for new developers
- **Issue Resolution**: Faster issue resolution
- **Feature Development**: Faster feature implementation
- **Code Review Efficiency**: More efficient code reviews
-
---
-
-## 🎯 Expected Outcomes
-
-### Immediate Benefits
-1. **Complete Documentation Coverage** - All components documented
-2. **Accurate References** - No more inaccurate information
-3. **LLM Optimization** - Optimized for AI agent understanding
-4. **Developer Onboarding** - Faster onboarding for new developers
-
-### Long-term Benefits
-1. **Maintainability** - Easier to maintain and update
-2. **Scalability** - Easier to scale development team
-3. **Quality** - Higher code quality through better understanding
-4. **Efficiency** - More efficient development processes
-
---
-
-## 📋 Implementation Timeline
-
-### Week 1: Core Service Documentation
- Complete documentation of all backend services
- Focus on critical services first
- Ensure LLM agent optimization
-
-### Week 2: API Documentation
- Complete API endpoint documentation
- Include authentication and error handling
- Provide usage examples
-
-### Week 3: Database & Models
- Complete database schema documentation
- Document all data models
- Include relationships and constraints
-
-### Week 4: Configuration & Setup
- Complete configuration documentation
- Include environment setup guides
- Document deployment procedures
-
-### Week 5: Frontend Documentation
- Complete frontend component documentation
- Document state management
- Include performance optimization
-
-### Week 6: Testing & Quality Assurance
- Complete testing documentation
- Document quality assurance processes
- Include CI/CD documentation
-
-### Week 7: Operational Documentation
- Complete monitoring and alerting documentation
- Document troubleshooting procedures
- Include maintenance procedures
-
---
-
-This comprehensive documentation plan ensures that the CIM Document Processor project will have complete, accurate, and LLM-optimized documentation that supports efficient development and maintenance. 
--- a/LLM_AGENT_DOCUMENTATION_GUIDE.md
+++ b/LLM_AGENT_DOCUMENTATION_GUIDE.md
@@ -1,634 +0,0 @@
-# LLM Agent Documentation Guide
-## Best Practices for Code Documentation Optimized for AI Coding Assistants
-
-### 🎯 Purpose
-This guide outlines best practices for documenting code in a way that maximizes LLM coding agent understanding, evaluation accuracy, and development efficiency.
-
---
-
-## 📋 Documentation Structure for LLM Agents
-
-### 1. **Hierarchical Information Architecture**
-
-#### Level 1: Project Overview (README.md)
- **Purpose**: High-level system understanding
- **Content**: What the system does, core technologies, architecture diagram
- **LLM Benefits**: Quick context establishment, technology stack identification
-
-#### Level 2: Architecture Documentation
- **Purpose**: System design and component relationships
- **Content**: Detailed architecture, data flow, service interactions
- **LLM Benefits**: Understanding component dependencies and integration points
-
-#### Level 3: Service-Level Documentation
- **Purpose**: Individual service functionality and APIs
- **Content**: Service purpose, methods, interfaces, error handling
- **LLM Benefits**: Precise understanding of service capabilities and constraints
-
-#### Level 4: Code-Level Documentation
- **Purpose**: Implementation details and business logic
- **Content**: Function documentation, type definitions, algorithm explanations
- **LLM Benefits**: Detailed implementation understanding for modifications
-
---
-
-## 🔧 Best Practices for LLM-Optimized Documentation
-
-### 1. **Clear Information Hierarchy**
-
-#### Use Consistent Section Headers
-```markdown
-## 🎯 Purpose
-## 🏗️ Architecture
-## 🔧 Implementation
-## 📊 Data Flow
-## 🚨 Error Handling
-## 🧪 Testing
-## 📚 References
-```
-
-#### Emoji-Based Visual Organization
- 🎯 Purpose/Goals
- 🏗️ Architecture/Structure
- 🔧 Implementation/Code
- 📊 Data/Flow
- 🚨 Errors/Issues
- 🧪 Testing/Validation
- 📚 References/Links
-
-### 2. **Structured Code Comments**
-
-#### Function Documentation Template
-```typescript
-/**
- * @purpose Brief description of what this function does
- * @context When/why this function is called
- * @inputs What parameters it expects and their types
- * @outputs What it returns and the format
- * @dependencies What other services/functions it depends on
- * @errors What errors it can throw and when
- * @example Usage example with sample data
- * @complexity Time/space complexity if relevant
- */
-```
-
-#### Service Documentation Template
-```typescript
-/**
- * @service ServiceName
- * @purpose High-level purpose of this service
- * @responsibilities List of main responsibilities
- * @dependencies External services and internal dependencies
- * @interfaces Main public methods and their purposes
- * @configuration Environment variables and settings
- * @errorHandling How errors are handled and reported
- * @performance Expected performance characteristics
- */
-```
-
-### 3. **Context-Rich Descriptions**
-
-#### Instead of:
-```typescript
-// Process document
-function processDocument(doc) { ... }
-```
-
-#### Use:
-```typescript
-/**
- * @purpose Processes CIM documents through the AI analysis pipeline
- * @context Called when a user uploads a PDF document for analysis
- * @workflow 1. Extract text via Document AI, 2. Chunk content, 3. Generate embeddings, 4. Run LLM analysis, 5. Create PDF report
- * @inputs Document object with file metadata and user context
- * @outputs Structured analysis data and PDF report URL
- * @dependencies Google Document AI, Claude AI, Supabase, Google Cloud Storage
- */
-function processDocument(doc: DocumentInput): Promise<ProcessingResult> { ... }
-```
-
---
-
-## 📊 Data Flow Documentation
-
-### 1. **Visual Flow Diagrams**
-```mermaid
-graph TD
-    A[User Upload] --> B[Get Signed URL]
-    B --> C[Upload to GCS]
-    C --> D[Confirm Upload]
-    D --> E[Start Processing]
-    E --> F[Document AI Extraction]
-    F --> G[Semantic Chunking]
-    G --> H[Vector Embedding]
-    H --> I[LLM Analysis]
-    I --> J[PDF Generation]
-    J --> K[Store Results]
-    K --> L[Notify User]
-```
-
-### 2. **Step-by-Step Process Documentation**
-```markdown
-## Document Processing Pipeline
-
-### Step 1: File Upload
- **Trigger**: User selects PDF file
- **Action**: Generate signed URL from Google Cloud Storage
- **Output**: Secure upload URL with expiration
- **Error Handling**: Retry on URL generation failure
-
-### Step 2: Text Extraction
- **Trigger**: File upload confirmation
- **Action**: Send PDF to Google Document AI
- **Output**: Extracted text with confidence scores
- **Error Handling**: Fallback to OCR if extraction fails
-```
-
---
-
-## 🔍 Error Handling Documentation
-
-### 1. **Error Classification System**
-```typescript
-/**
- * @errorType VALIDATION_ERROR
- * @description Input validation failures
- * @recoverable true
- * @retryStrategy none
- * @userMessage "Please check your input and try again"
- */
-
-/**
- * @errorType PROCESSING_ERROR
- * @description AI processing failures
- * @recoverable true
- * @retryStrategy exponential_backoff
- * @userMessage "Processing failed, please try again"
- */
-
-/**
- * @errorType SYSTEM_ERROR
- * @description Infrastructure failures
- * @recoverable false
- * @retryStrategy none
- * @userMessage "System temporarily unavailable"
- */
-```
-
-### 2. **Error Recovery Documentation**
-```markdown
-## Error Recovery Strategies
-
-### LLM API Failures
-1. **Retry Logic**: Up to 3 attempts with exponential backoff
-2. **Model Fallback**: Switch from Claude to GPT-4 if available
-3. **Graceful Degradation**: Return partial results if possible
-4. **User Notification**: Clear error messages with retry options
-
-### Database Connection Failures
-1. **Connection Pooling**: Automatic retry with connection pool
-2. **Circuit Breaker**: Prevent cascade failures
-3. **Read Replicas**: Fallback to read replicas for queries
-4. **Caching**: Serve cached data during outages
-```
-
---
-
-## 🧪 Testing Documentation
-
-### 1. **Test Strategy Documentation**
-```markdown
-## Testing Strategy
-
-### Unit Tests
- **Coverage Target**: >90% for business logic
- **Focus Areas**: Service methods, utility functions, data transformations
- **Mock Strategy**: External dependencies (APIs, databases)
- **Assertion Style**: Behavior-driven assertions
-
-### Integration Tests
- **Coverage Target**: All API endpoints
- **Focus Areas**: End-to-end workflows, data persistence, external integrations
- **Test Data**: Realistic CIM documents with known characteristics
- **Environment**: Isolated test database and storage
-
-### Performance Tests
- **Load Testing**: 10+ concurrent document processing
- **Memory Testing**: Large document handling (50MB+)
- **API Testing**: Rate limit compliance and optimization
- **Cost Testing**: API usage optimization and monitoring
-```
-
-### 2. **Test Data Documentation**
-```typescript
-/**
- * @testData sample_cim_document.pdf
- * @description Standard CIM document with typical structure
- * @size 2.5MB
- * @pages 15
- * @sections Financial, Market, Management, Operations
- * @expectedOutput Complete analysis with all sections populated
- */
-
-/**
- * @testData large_cim_document.pdf
- * @description Large CIM document for performance testing
- * @size 25MB
- * @pages 150
- * @sections Comprehensive business analysis
- * @expectedOutput Analysis within 5-minute time limit
- */
-```
-
---
-
-## 📚 API Documentation
-
-### 1. **Endpoint Documentation Template**
-```markdown
-## POST /documents/upload-url
-
-### Purpose
-Generate a signed URL for secure file upload to Google Cloud Storage.
-
-### Request
-```json
-{
-  "fileName": "string",
-  "fileSize": "number",
-  "contentType": "application/pdf"
-}
-```
-
-### Response
-```json
-{
-  "uploadUrl": "string",
-  "expiresAt": "ISO8601",
-  "fileId": "UUID"
-}
-```
-
-### Error Responses
- `400 Bad Request`: Invalid file type or size
- `401 Unauthorized`: Missing or invalid authentication
- `500 Internal Server Error`: Storage service unavailable
-
-### Dependencies
- Google Cloud Storage
- Firebase Authentication
- File validation service
-
-### Rate Limits
- 100 requests per minute per user
- 1000 requests per hour per user
-```
-
-### 2. **Request/Response Examples**
-```typescript
-/**
- * @example Successful Upload URL Generation
- * @request {
- *   "fileName": "sample_cim.pdf",
- *   "fileSize": 2500000,
- *   "contentType": "application/pdf"
- * }
- * @response {
- *   "uploadUrl": "https://storage.googleapis.com/...",
- *   "expiresAt": "2024-12-20T15:30:00Z",
- *   "fileId": "550e8400-e29b-41d4-a716-446655440000"
- * }
- */
-```
-
---
-
-## 🔧 Configuration Documentation
-
-### 1. **Environment Variables**
-```markdown
-## Environment Configuration
-
-### Required Variables
- `GOOGLE_CLOUD_PROJECT_ID`: Google Cloud project identifier
- `GOOGLE_CLOUD_STORAGE_BUCKET`: Storage bucket for documents
- `ANTHROPIC_API_KEY`: Claude AI API key for document analysis
- `DATABASE_URL`: Supabase database connection string
-
-### Optional Variables
- `AGENTIC_RAG_ENABLED`: Enable AI processing (default: true)
- `PROCESSING_STRATEGY`: Processing method (default: optimized_agentic_rag)
- `LLM_MODEL`: AI model selection (default: claude-3-opus-20240229)
- `MAX_FILE_SIZE`: Maximum file size in bytes (default: 52428800)
-
-### Development Variables
- `NODE_ENV`: Environment mode (development/production)
- `LOG_LEVEL`: Logging verbosity (debug/info/warn/error)
- `ENABLE_METRICS`: Enable performance monitoring (default: true)
-```
-
-### 2. **Service Configuration**
-```typescript
-/**
- * @configuration LLM Service Configuration
- * @purpose Configure AI model behavior and performance
- * @settings {
- *   "model": "claude-3-opus-20240229",
- *   "maxTokens": 4000,
- *   "temperature": 0.1,
- *   "timeoutMs": 60000,
- *   "retryAttempts": 3,
- *   "retryDelayMs": 1000
- * }
- * @constraints {
- *   "maxTokens": "1000-8000",
- *   "temperature": "0.0-1.0",
- *   "timeoutMs": "30000-300000"
- * }
- */
-```
-
---
-
-## 📊 Performance Documentation
-
-### 1. **Performance Characteristics**
-```markdown
-## Performance Benchmarks
-
-### Document Processing Times
- **Small Documents** (<5MB): 30-60 seconds
- **Medium Documents** (5-15MB): 1-3 minutes
- **Large Documents** (15-50MB): 3-5 minutes
-
-### Resource Usage
- **Memory**: 50-150MB per processing session
- **CPU**: Moderate usage during AI processing
- **Network**: 10-50 API calls per document
- **Storage**: Temporary files cleaned up automatically
-
-### Scalability Limits
- **Concurrent Processing**: 5 documents simultaneously
- **Daily Volume**: 1000 documents per day
- **File Size Limit**: 50MB per document
- **API Rate Limits**: 1000 requests per 15 minutes
-```
-
-### 2. **Optimization Strategies**
-```markdown
-## Performance Optimizations
-
-### Memory Management
-1. **Batch Processing**: Process chunks in batches of 10
-2. **Garbage Collection**: Automatic cleanup of temporary data
-3. **Connection Pooling**: Reuse database connections
-4. **Streaming**: Stream large files instead of loading entirely
-
-### API Optimization
-1. **Rate Limiting**: Respect API quotas and limits
-2. **Caching**: Cache frequently accessed data
-3. **Model Selection**: Use appropriate models for task complexity
-4. **Parallel Processing**: Execute independent operations concurrently
-```
-
---
-
-## 🔍 Debugging Documentation
-
-### 1. **Logging Strategy**
-```typescript
-/**
- * @logging Structured Logging Configuration
- * @levels {
- *   "debug": "Detailed execution flow",
- *   "info": "Important business events",
- *   "warn": "Potential issues",
- *   "error": "System failures"
- * }
- * @correlation Correlation IDs for request tracking
- * @context User ID, session ID, document ID
- * @format JSON structured logging
- */
-```
-
-### 2. **Debug Tools and Commands**
-```markdown
-## Debugging Tools
-
-### Log Analysis
-```bash
-# View recent errors
-grep "ERROR" logs/app.log | tail -20
-
-# Track specific request
-grep "correlation_id:abc123" logs/app.log
-
-# Monitor processing times
-grep "processing_time" logs/app.log | jq '.processing_time'
-```
-
-### Health Checks
-```bash
-# Check service health
-curl http://localhost:5001/health
-
-# Check database connectivity
-curl http://localhost:5001/health/database
-
-# Check external services
-curl http://localhost:5001/health/external
-```
-```
-
---
-
-## 📈 Monitoring Documentation
-
-### 1. **Key Metrics**
-```markdown
-## Monitoring Metrics
-
-### Business Metrics
- **Documents Processed**: Total documents processed per day
- **Success Rate**: Percentage of successful processing
- **Processing Time**: Average time per document
- **User Activity**: Active users and session duration
-
-### Technical Metrics
- **API Response Time**: Endpoint response times
- **Error Rate**: Percentage of failed requests
- **Memory Usage**: Application memory consumption
- **Database Performance**: Query times and connection usage
-
-### Cost Metrics
- **API Costs**: LLM API usage costs
- **Storage Costs**: Google Cloud Storage usage
- **Compute Costs**: Server resource usage
- **Bandwidth Costs**: Data transfer costs
-```
-
-### 2. **Alert Configuration**
-```markdown
-## Alert Rules
-
-### Critical Alerts
- **High Error Rate**: >5% error rate for 5 minutes
- **Service Down**: Health check failures
- **High Latency**: >30 second response times
- **Memory Issues**: >80% memory usage
-
-### Warning Alerts
- **Increased Error Rate**: >2% error rate for 10 minutes
- **Performance Degradation**: >15 second response times
- **High API Usage**: >80% of rate limits
- **Storage Issues**: >90% storage usage
-```
-
---
-
-## 🚀 Deployment Documentation
-
-### 1. **Deployment Process**
-```markdown
-## Deployment Process
-
-### Pre-deployment Checklist
- [ ] All tests passing
- [ ] Documentation updated
- [ ] Environment variables configured
- [ ] Database migrations ready
- [ ] External services configured
-
-### Deployment Steps
-1. **Build**: Create production build
-2. **Test**: Run integration tests
-3. **Deploy**: Deploy to staging environment
-4. **Validate**: Verify functionality
-5. **Promote**: Deploy to production
-6. **Monitor**: Watch for issues
-
-### Rollback Plan
-1. **Detect Issue**: Monitor error rates and performance
-2. **Assess Impact**: Determine severity and scope
-3. **Execute Rollback**: Revert to previous version
-4. **Verify Recovery**: Confirm system stability
-5. **Investigate**: Root cause analysis
-```
-
-### 2. **Environment Management**
-```markdown
-## Environment Configuration
-
-### Development Environment
- **Purpose**: Local development and testing
- **Database**: Local Supabase instance
- **Storage**: Development GCS bucket
- **AI Services**: Test API keys with limits
-
-### Staging Environment
- **Purpose**: Pre-production testing
- **Database**: Staging Supabase instance
- **Storage**: Staging GCS bucket
- **AI Services**: Production API keys with monitoring
-
-### Production Environment
- **Purpose**: Live user service
- **Database**: Production Supabase instance
- **Storage**: Production GCS bucket
- **AI Services**: Production API keys with full monitoring
-```
-
---
-
-## 📚 Documentation Maintenance
-
-### 1. **Documentation Review Process**
-```markdown
-## Documentation Maintenance
-
-### Review Schedule
- **Weekly**: Update API documentation for new endpoints
- **Monthly**: Review and update architecture documentation
- **Quarterly**: Comprehensive documentation audit
- **Release**: Update all documentation for new features
-
-### Quality Checklist
- [ ] All code examples are current and working
- [ ] API documentation matches implementation
- [ ] Configuration examples are accurate
- [ ] Error handling documentation is complete
- [ ] Performance metrics are up-to-date
- [ ] Links and references are valid
-```
-
-### 2. **Version Control for Documentation**
-```markdown
-## Documentation Version Control
-
-### Branch Strategy
- **main**: Current production documentation
- **develop**: Latest development documentation
- **feature/***: Documentation for new features
- **release/***: Documentation for specific releases
-
-### Change Management
-1. **Propose Changes**: Create documentation issue
-2. **Review Changes**: Peer review of documentation updates
-3. **Test Examples**: Verify all code examples work
-4. **Update References**: Update all related documentation
-5. **Merge Changes**: Merge with approval
-```
-
---
-
-## 🎯 LLM Agent Optimization Tips
-
-### 1. **Context Provision**
- Provide complete context for each code section
- Include business rules and constraints
- Document assumptions and limitations
- Explain why certain approaches were chosen
-
-### 2. **Example-Rich Documentation**
- Include realistic examples for all functions
- Provide before/after examples for complex operations
- Show error scenarios and recovery
- Include performance examples
-
-### 3. **Structured Information**
- Use consistent formatting and organization
- Provide clear hierarchies of information
- Include cross-references between related sections
- Use standardized templates for similar content
-
-### 4. **Error Scenario Documentation**
- Document all possible error conditions
- Provide specific error messages and codes
- Include recovery procedures for each error type
- Show debugging steps for common issues
-
---
-
-## 📋 Documentation Checklist
-
-### For Each New Feature
- [ ] Update README.md with feature overview
- [ ] Document API endpoints and examples
- [ ] Update architecture diagrams if needed
- [ ] Add configuration documentation
- [ ] Include error handling scenarios
- [ ] Add test examples and strategies
- [ ] Update deployment documentation
- [ ] Review and update related documentation
-
-### For Each Code Change
- [ ] Update function documentation
- [ ] Add inline comments for complex logic
- [ ] Update type definitions if changed
- [ ] Add examples for new functionality
- [ ] Update error handling documentation
- [ ] Verify all links and references
-
---
-
-This guide ensures that your documentation is optimized for LLM coding agents, providing them with the context, structure, and examples they need to understand and work with your codebase effectively. 
--- a/LLM_DOCUMENTATION_SUMMARY.md
+++ b/LLM_DOCUMENTATION_SUMMARY.md
@@ -1,388 +0,0 @@
-# LLM Documentation Strategy Summary
-## Complete Guide for Optimizing Code Documentation for AI Coding Assistants
-
-### 🎯 Executive Summary
-
-This document summarizes the comprehensive documentation strategy for making your CIM Document Processor codebase easily understandable and evaluable by LLM coding agents. The strategy includes hierarchical documentation, structured templates, and best practices that maximize AI agent effectiveness.
-
---
-
-## 📚 Documentation Hierarchy
-
-### Level 1: Project Overview (README.md)
-**Purpose**: High-level system understanding and quick context establishment
-
-**Key Elements**:
- 🎯 Project purpose and business context
- 🏗️ Architecture diagram and technology stack
- 📁 Directory structure and file organization
- 🚀 Quick start guide and setup instructions
- 🔧 Core services overview
- 📊 Processing strategies and data flow
- 🔌 API endpoints summary
- 🗄️ Database schema overview
-
-**LLM Benefits**: 
- Rapid context establishment
- Technology stack identification
- System architecture understanding
- Quick navigation guidance
-
-### Level 2: Architecture Documentation
-**Purpose**: Detailed system design and component relationships
-
-**Key Documents**:
- `APP_DESIGN_DOCUMENTATION.md` - Complete system architecture
- `ARCHITECTURE_DIAGRAMS.md` - Visual system design
- `AGENTIC_RAG_IMPLEMENTATION_PLAN.md` - AI processing strategy
- `DEPLOYMENT_GUIDE.md` - Deployment and configuration
-
-**LLM Benefits**:
- Understanding component dependencies
- Integration point identification
- Data flow comprehension
- System design patterns
-
-### Level 3: Service-Level Documentation
-**Purpose**: Individual service functionality and implementation details
-
-**Key Elements**:
- Service purpose and responsibilities
- Method signatures and interfaces
- Error handling strategies
- Performance characteristics
- Integration patterns
-
-**LLM Benefits**:
- Precise service understanding
- API usage patterns
- Error scenario handling
- Performance optimization opportunities
-
-### Level 4: Code-Level Documentation
-**Purpose**: Implementation details and business logic
-
-**Key Elements**:
- Function-level documentation
- Type definitions and interfaces
- Algorithm explanations
- Configuration options
- Testing strategies
-
-**LLM Benefits**:
- Detailed implementation understanding
- Code modification guidance
- Bug identification and fixes
- Feature enhancement suggestions
-
---
-
-## 🔧 Best Practices for LLM Optimization
-
-### 1. **Structured Information Architecture**
-
-#### Use Consistent Section Headers
-```markdown
-## 🎯 Purpose
-## 🏗️ Architecture
-## 🔧 Implementation
-## 📊 Data Flow
-## 🚨 Error Handling
-## 🧪 Testing
-## 📚 References
-```
-
-#### Emoji-Based Visual Organization
- 🎯 Purpose/Goals
- 🏗️ Architecture/Structure
- 🔧 Implementation/Code
- 📊 Data/Flow
- 🚨 Errors/Issues
- 🧪 Testing/Validation
- 📚 References/Links
-
-### 2. **Context-Rich Descriptions**
-
-#### Instead of:
-```typescript
-// Process document
-function processDocument(doc) { ... }
-```
-
-#### Use:
-```typescript
-/**
- * @purpose Processes CIM documents through the AI analysis pipeline
- * @context Called when a user uploads a PDF document for analysis
- * @workflow 1. Extract text via Document AI, 2. Chunk content, 3. Generate embeddings, 4. Run LLM analysis, 5. Create PDF report
- * @inputs Document object with file metadata and user context
- * @outputs Structured analysis data and PDF report URL
- * @dependencies Google Document AI, Claude AI, Supabase, Google Cloud Storage
- */
-function processDocument(doc: DocumentInput): Promise<ProcessingResult> { ... }
-```
-
-### 3. **Comprehensive Error Documentation**
-
-#### Error Classification System
-```typescript
-/**
- * @errorType VALIDATION_ERROR
- * @description Input validation failures
- * @recoverable true
- * @retryStrategy none
- * @userMessage "Please check your input and try again"
- */
-```
-
-#### Error Recovery Strategies
- Document all possible error conditions
- Provide specific error messages and codes
- Include recovery procedures for each error type
- Show debugging steps for common issues
-
-### 4. **Example-Rich Documentation**
-
-#### Usage Examples
- Basic usage patterns
- Advanced configuration examples
- Error handling scenarios
- Integration examples
- Performance optimization examples
-
-#### Test Data Documentation
-```typescript
-/**
- * @testData sample_cim_document.pdf
- * @description Standard CIM document with typical structure
- * @size 2.5MB
- * @pages 15
- * @sections Financial, Market, Management, Operations
- * @expectedOutput Complete analysis with all sections populated
- */
-```
-
---
-
-## 📊 Documentation Templates
-
-### 1. **README.md Template**
- Project overview and purpose
- Technology stack and architecture
- Quick start guide
- Core services overview
- API endpoints summary
- Database schema overview
- Security considerations
- Performance characteristics
- Troubleshooting guide
-
-### 2. **Service Documentation Template**
- File information and metadata
- Purpose and business context
- Architecture and dependencies
- Implementation details
- Data flow documentation
- Error handling strategies
- Testing approach
- Performance characteristics
- Security considerations
- Usage examples
-
-### 3. **API Documentation Template**
- Endpoint purpose and functionality
- Request/response formats
- Error responses and codes
- Dependencies and rate limits
- Authentication requirements
- Usage examples
- Performance characteristics
-
---
-
-## 🎯 LLM Agent Optimization Strategies
-
-### 1. **Context Provision**
- Provide complete context for each code section
- Include business rules and constraints
- Document assumptions and limitations
- Explain why certain approaches were chosen
-
-### 2. **Structured Information**
- Use consistent formatting and organization
- Provide clear hierarchies of information
- Include cross-references between related sections
- Use standardized templates for similar content
-
-### 3. **Example-Rich Content**
- Include realistic examples for all functions
- Provide before/after examples for complex operations
- Show error scenarios and recovery
- Include performance examples
-
-### 4. **Error Scenario Documentation**
- Document all possible error conditions
- Provide specific error messages and codes
- Include recovery procedures for each error type
- Show debugging steps for common issues
-
---
-
-## 📈 Performance Documentation
-
-### Key Metrics to Document
- **Response Times**: Average, p95, p99 response times
- **Throughput**: Requests per second, concurrent processing limits
- **Resource Usage**: Memory, CPU, network usage patterns
- **Scalability Limits**: Maximum concurrent requests, data size limits
- **Cost Metrics**: API usage costs, storage costs, compute costs
-
-### Optimization Strategies
- **Caching**: Document caching strategies and hit rates
- **Batching**: Document batch processing approaches
- **Parallelization**: Document parallel processing patterns
- **Resource Management**: Document resource optimization techniques
-
---
-
-## 🔍 Monitoring and Debugging
-
-### Logging Strategy
-```typescript
-/**
- * @logging Structured logging with correlation IDs
- * @levels debug, info, warn, error
- * @correlation Request correlation IDs for tracking
- * @context User ID, session ID, document ID, processing strategy
- */
-```
-
-### Debug Tools
- Health check endpoints
- Performance metrics dashboards
- Request tracing with correlation IDs
- Error analysis and reporting tools
-
-### Common Issues
- Document common problems and solutions
- Provide troubleshooting steps
- Include debugging commands and tools
- Show error recovery procedures
-
---
-
-## 🔐 Security Documentation
-
-### Input Validation
- Document all input validation rules
- Include file type and size restrictions
- Document content validation approaches
- Show sanitization procedures
-
-### Authentication & Authorization
- Document authentication mechanisms
- Include authorization rules and policies
- Show data isolation strategies
- Document access control patterns
-
-### Data Protection
- Document encryption approaches
- Include data sanitization procedures
- Show audit logging strategies
- Document compliance requirements
-
---
-
-## 📋 Documentation Maintenance
-
-### Review Schedule
- **Weekly**: Update API documentation for new endpoints
- **Monthly**: Review and update architecture documentation
- **Quarterly**: Comprehensive documentation audit
- **Release**: Update all documentation for new features
-
-### Quality Checklist
- [ ] All code examples are current and working
- [ ] API documentation matches implementation
- [ ] Configuration examples are accurate
- [ ] Error handling documentation is complete
- [ ] Performance metrics are up-to-date
- [ ] Links and references are valid
-
-### Version Control
- Use feature branches for documentation updates
- Include documentation changes in code reviews
- Maintain documentation version history
- Tag documentation with release versions
-
---
-
-## 🚀 Implementation Recommendations
-
-### Immediate Actions
-1. **Update README.md** with comprehensive project overview
-2. **Document core services** using the provided template
-3. **Add API documentation** for all endpoints
-4. **Include error handling** documentation for all services
-5. **Add usage examples** for common operations
-
-### Short-term Goals (1-2 weeks)
-1. **Complete service documentation** for all major services
-2. **Add performance documentation** with metrics and benchmarks
-3. **Include security documentation** for all components
-4. **Add testing documentation** with examples and strategies
-5. **Create troubleshooting guides** for common issues
-
-### Long-term Goals (1-2 months)
-1. **Implement documentation automation** for API changes
-2. **Add interactive examples** and code playgrounds
-3. **Create video tutorials** for complex workflows
-4. **Implement documentation analytics** to track usage
-5. **Establish documentation review process** for quality assurance
-
---
-
-## 📊 Success Metrics
-
-### Documentation Quality Metrics
- **Completeness**: Percentage of documented functions and services
- **Accuracy**: Documentation matches implementation
- **Clarity**: User feedback on documentation understandability
- **Maintenance**: Documentation update frequency and quality
-
-### LLM Agent Effectiveness Metrics
- **Understanding Accuracy**: LLM agent comprehension of codebase
- **Modification Success**: Success rate of LLM-suggested changes
- **Error Reduction**: Reduction in LLM-generated errors
- **Development Speed**: Faster development with LLM assistance
-
-### User Experience Metrics
- **Onboarding Time**: Time for new developers to understand system
- **Issue Resolution**: Time to resolve common issues
- **Feature Development**: Time to implement new features
- **Code Review Efficiency**: Faster and more accurate code reviews
-
---
-
-## 🎯 Conclusion
-
-This comprehensive documentation strategy ensures that your CIM Document Processor codebase is optimally structured for LLM coding agent understanding and evaluation. By implementing these practices, you'll achieve:
-
-1. **Faster Development**: LLM agents can understand and modify code more efficiently
-2. **Reduced Errors**: Better context leads to more accurate code suggestions
-3. **Improved Maintenance**: Comprehensive documentation supports long-term maintenance
-4. **Enhanced Collaboration**: Clear documentation improves team collaboration
-5. **Better Onboarding**: New developers can understand the system quickly
-
-The key is consistency, completeness, and context. By providing structured, comprehensive, and context-rich documentation, you maximize the effectiveness of LLM coding agents while also improving the overall developer experience.
-
---
-
-**Next Steps**:
-1. Review and implement the documentation templates
-2. Update existing documentation using the provided guidelines
-3. Establish documentation maintenance processes
-4. Monitor and measure the effectiveness of the documentation strategy
-5. Continuously improve based on feedback and usage patterns
-
-This documentation strategy will significantly enhance your ability to work effectively with LLM coding agents while improving the overall quality and maintainability of your codebase. 
--- a/OPERATIONAL_DOCUMENTATION_SUMMARY.md
+++ b/OPERATIONAL_DOCUMENTATION_SUMMARY.md
@@ -1,489 +0,0 @@
-# Operational Documentation Summary
-## Complete Operational Guide for CIM Document Processor
-
-### 🎯 Overview
-
-This document provides a comprehensive summary of all operational documentation for the CIM Document Processor, covering monitoring, alerting, troubleshooting, maintenance, and operational procedures.
-
---
-
-## 📋 Operational Documentation Status
-
-### ✅ **Completed Documentation**
-
-#### **1. Monitoring and Alerting**
- **Document**: `MONITORING_AND_ALERTING_GUIDE.md`
- **Coverage**: Complete monitoring strategy and alerting system
- **Key Areas**: Metrics, alerts, dashboards, incident response
-
-#### **2. Troubleshooting Guide**
- **Document**: `TROUBLESHOOTING_GUIDE.md`
- **Coverage**: Common issues, diagnostic procedures, solutions
- **Key Areas**: Problem resolution, debugging tools, maintenance
-
---
-
-## 🏗️ Operational Architecture
-
-### Monitoring Stack
- **Application Monitoring**: Winston logging with structured data
- **Infrastructure Monitoring**: Google Cloud Monitoring
- **Error Tracking**: Comprehensive error logging and classification
- **Performance Monitoring**: Custom metrics and timing
- **User Analytics**: Usage tracking and business metrics
-
-### Alerting System
- **Critical Alerts**: System downtime, security breaches, service failures
- **Warning Alerts**: Performance degradation, high error rates
- **Informational Alerts**: Normal operations, maintenance events
-
-### Support Structure
- **Level 1**: Basic user support and common issues
- **Level 2**: Technical support and system issues
- **Level 3**: Advanced support and complex problems
-
---
-
-## 📊 Key Operational Metrics
-
-### Application Performance
-```typescript
-interface OperationalMetrics {
-  // System Health
-  uptime: number;                    // System uptime percentage
-  responseTime: number;              // Average API response time
-  errorRate: number;                 // Error rate percentage
-  
-  // Document Processing
-  uploadSuccessRate: number;         // Successful upload percentage
-  processingTime: number;            // Average processing time
-  queueLength: number;               // Pending documents
-  
-  // User Activity
-  activeUsers: number;               // Current active users
-  dailyUploads: number;              // Documents uploaded today
-  processingThroughput: number;      // Documents per hour
-}
-```
-
-### Infrastructure Metrics
-```typescript
-interface InfrastructureMetrics {
-  // Server Resources
-  cpuUsage: number;                  // CPU utilization percentage
-  memoryUsage: number;               // Memory usage percentage
-  diskUsage: number;                 // Disk usage percentage
-  
-  // Database Performance
-  dbConnections: number;             // Active database connections
-  queryPerformance: number;          // Average query time
-  dbErrorRate: number;               // Database error rate
-  
-  // Cloud Services
-  firebaseHealth: string;            // Firebase service status
-  supabaseHealth: string;            // Supabase service status
-  gcsHealth: string;                 // Google Cloud Storage status
-}
-```
-
---
-
-## 🚨 Alert Management
-
-### Alert Severity Levels
-
-#### **🔴 Critical Alerts**
-**Immediate Action Required**
- System downtime or unavailability
- Authentication service failures
- Database connection failures
- Storage service failures
- Security breaches
-
-**Response Time**: < 5 minutes
-**Escalation**: Immediate to Level 3
-
-#### **🟡 Warning Alerts**
-**Attention Required**
- High error rates (>5%)
- Performance degradation
- Resource usage approaching limits
- Unusual traffic patterns
-
-**Response Time**: < 30 minutes
-**Escalation**: Level 2 support
-
-#### **🟢 Informational Alerts**
-**Monitoring Only**
- Normal operational events
- Scheduled maintenance
- Performance improvements
- Usage statistics
-
-**Response Time**: No immediate action
-**Escalation**: Level 1 monitoring
-
-### Alert Channels
- **Email**: Critical alerts to operations team
- **Slack**: Real-time notifications to development team
- **PagerDuty**: Escalation for critical issues
- **Dashboard**: Real-time monitoring dashboard
-
---
-
-## 🔍 Troubleshooting Framework
-
-### Diagnostic Procedures
-
-#### **Quick Health Assessment**
-```bash
-# System health check
-curl -f http://localhost:5000/health
-
-# Database connectivity
-curl -f http://localhost:5000/api/documents
-
-# Authentication status
-curl -f http://localhost:5000/api/auth/status
-```
-
-#### **Comprehensive Diagnostics**
-```typescript
-// Complete system diagnostics
-const runSystemDiagnostics = async () => {
-  return {
-    timestamp: new Date().toISOString(),
-    services: {
-      database: await checkDatabaseHealth(),
-      storage: await checkStorageHealth(),
-      auth: await checkAuthHealth(),
-      ai: await checkAIHealth()
-    },
-    resources: {
-      memory: process.memoryUsage(),
-      cpu: process.cpuUsage(),
-      uptime: process.uptime()
-    }
-  };
-};
-```
-
-### Common Issue Categories
-
-#### **Authentication Issues**
- User login failures
- Token expiration problems
- Firebase configuration errors
- Authentication state inconsistencies
-
-#### **Document Upload Issues**
- File upload failures
- Upload progress stalls
- Storage service errors
- File validation problems
-
-#### **Document Processing Issues**
- Processing failures
- AI service errors
- PDF generation problems
- Queue processing delays
-
-#### **Database Issues**
- Connection failures
- Slow query performance
- Connection pool exhaustion
- Data consistency problems
-
-#### **Performance Issues**
- Slow application response
- High resource usage
- Timeout errors
- Scalability problems
-
---
-
-## 🛠️ Maintenance Procedures
-
-### Regular Maintenance Schedule
-
-#### **Daily Tasks**
- [ ] Review system health metrics
- [ ] Check error logs for new issues
- [ ] Monitor performance trends
- [ ] Verify backup systems
-
-#### **Weekly Tasks**
- [ ] Review alert effectiveness
- [ ] Analyze performance metrics
- [ ] Update monitoring thresholds
- [ ] Review security logs
-
-#### **Monthly Tasks**
- [ ] Performance optimization review
- [ ] Capacity planning assessment
- [ ] Security audit
- [ ] Documentation updates
-
-### Preventive Maintenance
-
-#### **System Optimization**
-```typescript
-// Automated maintenance tasks
-const performMaintenance = async () => {
-  // Clean up old logs
-  await cleanupOldLogs();
-  
-  // Clear expired cache entries
-  await clearExpiredCache();
-  
-  // Optimize database
-  await optimizeDatabase();
-  
-  // Update system metrics
-  await updateSystemMetrics();
-};
-```
-
---
-
-## 📈 Performance Optimization
-
-### Monitoring-Driven Optimization
-
-#### **Performance Analysis**
- **Identify Bottlenecks**: Use metrics to find slow operations
- **Resource Optimization**: Monitor resource usage patterns
- **Capacity Planning**: Use trends to plan for growth
-
-#### **Optimization Strategies**
-```typescript
-// Performance monitoring middleware
-const performanceMonitor = (req: Request, res: Response, next: NextFunction) => {
-  const start = Date.now();
-  
-  res.on('finish', () => {
-    const duration = Date.now() - start;
-    
-    if (duration > 5000) {
-      logger.warn('Slow request detected', {
-        method: req.method,
-        path: req.path,
-        duration
-      });
-    }
-  });
-  
-  next();
-};
-
-// Caching middleware
-const cacheMiddleware = (ttlMs = 300000) => {
-  const cache = new Map();
-  
-  return (req: Request, res: Response, next: NextFunction) => {
-    const key = `${req.method}:${req.path}:${JSON.stringify(req.query)}`;
-    const cached = cache.get(key);
-    
-    if (cached && Date.now() - cached.timestamp < ttlMs) {
-      return res.json(cached.data);
-    }
-    
-    const originalSend = res.json;
-    res.json = function(data) {
-      cache.set(key, { data, timestamp: Date.now() });
-      return originalSend.call(this, data);
-    };
-    
-    next();
-  };
-};
-```
-
---
-
-## 🔧 Operational Tools
-
-### Monitoring Tools
- **Winston**: Structured logging
- **Google Cloud Monitoring**: Infrastructure monitoring
- **Firebase Console**: Firebase service monitoring
- **Supabase Dashboard**: Database monitoring
-
-### Debugging Tools
- **Log Analysis**: Structured log parsing and analysis
- **Debug Endpoints**: System information and health checks
- **Performance Profiling**: Request timing and resource usage
- **Error Tracking**: Comprehensive error classification
-
-### Maintenance Tools
- **Automated Cleanup**: Log rotation and cache cleanup
- **Database Optimization**: Query optimization and maintenance
- **System Updates**: Automated security and performance updates
- **Backup Management**: Automated backup and recovery procedures
-
---
-
-## 📞 Support and Escalation
-
-### Support Levels
-
-#### **Level 1: Basic Support**
-**Scope**: User authentication issues, basic configuration problems, common error messages
-**Response Time**: < 2 hours
-**Tools**: User guides, FAQ, basic troubleshooting
-
-#### **Level 2: Technical Support**
-**Scope**: System performance issues, database problems, integration issues
-**Response Time**: < 4 hours
-**Tools**: System diagnostics, performance analysis, configuration management
-
-#### **Level 3: Advanced Support**
-**Scope**: Complex system failures, security incidents, architecture problems
-**Response Time**: < 1 hour
-**Tools**: Full system access, advanced diagnostics, emergency procedures
-
-### Escalation Procedures
-
-#### **Escalation Criteria**
- System downtime > 15 minutes
- Data loss or corruption
- Security breaches
- Performance degradation > 50%
-
-#### **Escalation Contacts**
- **Primary**: Operations Team Lead
- **Secondary**: System Administrator
- **Emergency**: CTO/Technical Director
-
---
-
-## 📋 Operational Checklists
-
-### Incident Response Checklist
- [ ] Assess impact and scope
- [ ] Check system health endpoints
- [ ] Review recent logs and metrics
- [ ] Identify root cause
- [ ] Implement immediate fix
- [ ] Communicate with stakeholders
- [ ] Monitor system recovery
-
-### Post-Incident Review Checklist
- [ ] Document incident timeline
- [ ] Analyze root cause
- [ ] Review response effectiveness
- [ ] Update procedures and documentation
- [ ] Implement preventive measures
- [ ] Schedule follow-up review
-
-### Maintenance Checklist
- [ ] Review system health metrics
- [ ] Check error logs for new issues
- [ ] Monitor performance trends
- [ ] Verify backup systems
- [ ] Update monitoring thresholds
- [ ] Review security logs
-
---
-
-## 🎯 Operational Excellence
-
-### Key Performance Indicators
-
-#### **System Reliability**
- **Uptime**: > 99.9%
- **Error Rate**: < 1%
- **Response Time**: < 2 seconds average
- **Recovery Time**: < 15 minutes for critical issues
-
-#### **User Experience**
- **Upload Success Rate**: > 99%
- **Processing Success Rate**: > 95%
- **User Satisfaction**: > 4.5/5
- **Support Response Time**: < 2 hours
-
-#### **Operational Efficiency**
- **Incident Resolution Time**: < 4 hours average
- **False Positive Alerts**: < 5%
- **Documentation Accuracy**: > 95%
- **Team Productivity**: Measured by incident reduction
-
-### Continuous Improvement
-
-#### **Process Optimization**
- **Alert Tuning**: Adjust thresholds based on patterns
- **Procedure Updates**: Streamline operational procedures
- **Tool Enhancement**: Improve monitoring tools and dashboards
- **Training Programs**: Regular team training and skill development
-
-#### **Technology Advancement**
- **Automation**: Increase automated monitoring and response
- **Predictive Analytics**: Implement predictive maintenance
- **AI-Powered Monitoring**: Use AI for anomaly detection
- **Self-Healing Systems**: Implement automatic recovery procedures
-
---
-
-## 📚 Related Documentation
-
-### Internal References
- `MONITORING_AND_ALERTING_GUIDE.md` - Detailed monitoring strategy
- `TROUBLESHOOTING_GUIDE.md` - Complete troubleshooting procedures
- `CONFIGURATION_GUIDE.md` - System configuration and setup
- `API_DOCUMENTATION_GUIDE.md` - API reference and usage
-
-### External References
- [Google Cloud Monitoring](https://cloud.google.com/monitoring)
- [Firebase Console](https://console.firebase.google.com/)
- [Supabase Dashboard](https://app.supabase.com/)
- [Winston Logging](https://github.com/winstonjs/winston)
-
---
-
-## 🔄 Maintenance Schedule
-
-### Daily Operations
- **Health Monitoring**: Continuous system health checks
- **Alert Review**: Review and respond to alerts
- **Performance Monitoring**: Track key performance metrics
- **Log Analysis**: Review error logs and trends
-
-### Weekly Operations
- **Performance Review**: Analyze weekly performance trends
- **Alert Tuning**: Adjust alert thresholds based on patterns
- **Security Review**: Review security logs and access patterns
- **Capacity Planning**: Assess current usage and plan for growth
-
-### Monthly Operations
- **System Optimization**: Performance optimization and tuning
- **Security Audit**: Comprehensive security review
- **Documentation Updates**: Update operational documentation
- **Team Training**: Conduct operational training sessions
-
---
-
-## 🎯 Conclusion
-
-### Operational Excellence Achieved
- ✅ **Comprehensive Monitoring**: Complete monitoring and alerting system
- ✅ **Robust Troubleshooting**: Detailed troubleshooting procedures
- ✅ **Efficient Maintenance**: Automated and manual maintenance procedures
- ✅ **Clear Escalation**: Well-defined support and escalation procedures
-
-### Operational Benefits
-1. **High Availability**: 99.9% uptime target with monitoring
-2. **Quick Response**: Fast incident detection and resolution
-3. **Proactive Maintenance**: Preventive maintenance reduces issues
-4. **Continuous Improvement**: Ongoing optimization and enhancement
-
-### Future Enhancements
-1. **AI-Powered Monitoring**: Implement AI for anomaly detection
-2. **Predictive Maintenance**: Use analytics for predictive maintenance
-3. **Automated Recovery**: Implement self-healing systems
-4. **Advanced Analytics**: Enhanced performance and usage analytics
-
---
-
-**Operational Status**: ✅ **COMPREHENSIVE**  
-**Monitoring Coverage**: 🏆 **COMPLETE**  
-**Support Structure**: 🚀 **OPTIMIZED** 
--- a/PDF_GENERATION_ANALYSIS.md
+++ b/PDF_GENERATION_ANALYSIS.md
@@ -1,225 +0,0 @@
-# PDF Generation Analysis & Optimization Report
-
-## Executive Summary
-
-The current PDF generation implementation has been analyzed for effectiveness, efficiency, and visual quality. While functional, significant improvements have been identified and implemented to enhance performance, visual appeal, and maintainability.
-
-## Current Implementation Assessment
-
-### **Effectiveness: 7/10 → 9/10**
-**Previous Strengths:**
- Uses Puppeteer for reliable HTML-to-PDF conversion
- Supports multiple input formats (markdown, HTML, URLs)
- Comprehensive error handling and validation
- Proper browser lifecycle management
-
-**Previous Weaknesses:**
- Basic markdown-to-HTML conversion
- Limited customization options
- No advanced markdown features support
-
-**Improvements Implemented:**
- ✅ Enhanced markdown parsing with better structure
- ✅ Advanced CSS styling with modern design elements
- ✅ Professional typography and color schemes
- ✅ Improved table formatting and visual hierarchy
- ✅ Added icons and visual indicators for better UX
-
-### **Efficiency: 6/10 → 9/10**
-**Previous Issues:**
- ❌ **Major Performance Issue**: Created new page for each PDF generation
- ❌ No caching mechanism
- ❌ Heavy resource usage
- ❌ No concurrent processing support
- ❌ Potential memory leaks
-
-**Optimizations Implemented:**
- ✅ **Page Pooling**: Reuse browser pages instead of creating new ones
- ✅ **Caching System**: Cache generated PDFs for repeated requests
- ✅ **Resource Management**: Proper cleanup and timeout handling
- ✅ **Concurrent Processing**: Support for multiple simultaneous requests
- ✅ **Memory Optimization**: Automatic cleanup of expired resources
- ✅ **Performance Monitoring**: Added statistics tracking
-
-### **Visual Quality: 6/10 → 9/10**
-**Previous Issues:**
- ❌ Inconsistent styling between different PDF types
- ❌ Basic, outdated design
- ❌ Limited visual elements
- ❌ Poor typography and spacing
-
-**Visual Improvements:**
- ✅ **Modern Design System**: Professional gradients and color schemes
- ✅ **Enhanced Typography**: Better font hierarchy and spacing
- ✅ **Visual Elements**: Icons, borders, and styling boxes
- ✅ **Consistent Branding**: Unified design across all PDF types
- ✅ **Professional Layout**: Better page breaks and section organization
- ✅ **Interactive Elements**: Hover effects and visual feedback
-
-## Technical Improvements
-
-### 1. **Performance Optimizations**
-
-#### Page Pooling System
-```typescript
-interface PagePool {
-  page: any;
-  inUse: boolean;
-  lastUsed: number;
-}
-```
- **Pool Size**: Configurable (default: 5 pages)
- **Timeout Management**: Automatic cleanup of expired pages
- **Concurrent Access**: Queue system for high-demand scenarios
-
-#### Caching Mechanism
-```typescript
-private readonly cache = new Map<string, { buffer: Buffer; timestamp: number }>();
-private readonly cacheTimeout = 300000; // 5 minutes
-```
- **Content-based Keys**: Hash-based caching for identical content
- **Time-based Expiration**: Automatic cache cleanup
- **Memory Management**: Size limits to prevent memory issues
-
-### 2. **Enhanced Styling System**
-
-#### Modern CSS Framework
- **Gradient Backgrounds**: Professional color schemes
- **Typography Hierarchy**: Clear visual structure
- **Responsive Design**: Better layout across different content types
- **Interactive Elements**: Hover effects and visual feedback
-
-#### Professional Templates
- **Header/Footer**: Consistent branding and metadata
- **Section Styling**: Clear content organization
- **Table Design**: Enhanced financial data presentation
- **Visual Indicators**: Icons and color coding
-
-### 3. **Code Quality Improvements**
-
-#### Better Error Handling
- **Timeout Management**: Configurable timeouts for operations
- **Resource Cleanup**: Proper disposal of browser resources
- **Logging**: Enhanced error tracking and debugging
-
-#### Monitoring & Statistics
-```typescript
-getStats(): {
-  pagePoolSize: number;
-  cacheSize: number;
-  activePages: number;
-}
-```
-
-## Performance Benchmarks
-
-### **Before Optimization:**
- **Memory Usage**: ~150MB per PDF generation
- **Generation Time**: 3-5 seconds per PDF
- **Concurrent Requests**: Limited to 1-2 simultaneous
- **Resource Cleanup**: Manual, error-prone
-
-### **After Optimization:**
- **Memory Usage**: ~50MB per PDF generation (67% reduction)
- **Generation Time**: 1-2 seconds per PDF (60% improvement)
- **Concurrent Requests**: Support for 5+ simultaneous
- **Resource Cleanup**: Automatic, reliable
-
-## Recommendations for Further Improvement
-
-### 1. **Alternative PDF Libraries** (Future Consideration)
-
-#### Option A: jsPDF
-```typescript
-// Pros: Lightweight, no browser dependency
-// Cons: Limited CSS support, manual layout
-import jsPDF from 'jspdf';
-```
-
-#### Option B: PDFKit
-```typescript
-// Pros: Full control, streaming support
-// Cons: Complex API, manual styling
-import PDFDocument from 'pdfkit';
-```
-
-#### Option C: Puppeteer + Optimization (Current Choice)
-```typescript
-// Pros: Full CSS support, reliable rendering
-// Cons: Higher resource usage
-// Status: ✅ Optimized and recommended
-```
-
-### 2. **Advanced Features**
-
-#### Template System
-```typescript
-interface PDFTemplate {
-  name: string;
-  styles: string;
-  layout: string;
-  variables: string[];
-}
-```
-
-#### Dynamic Content
- **Charts and Graphs**: Integration with Chart.js or D3.js
- **Interactive Elements**: Forms and dynamic content
- **Multi-language Support**: Internationalization
-
-### 3. **Production Optimizations**
-
-#### CDN Integration
- **Static Assets**: Host CSS and fonts on CDN
- **Caching Headers**: Optimize browser caching
- **Compression**: Gzip/Brotli compression
-
-#### Monitoring & Analytics
-```typescript
-interface PDFMetrics {
-  generationTime: number;
-  fileSize: number;
-  cacheHitRate: number;
-  errorRate: number;
-}
-```
-
-## Implementation Status
-
-### ✅ **Completed Optimizations**
-1. Page pooling system
-2. Caching mechanism
-3. Enhanced styling
-4. Performance monitoring
-5. Resource management
-6. Error handling improvements
-
-### 🔄 **In Progress**
-1. Template system development
-2. Advanced markdown features
-3. Chart integration
-
-### 📋 **Planned Features**
-1. Multi-language support
-2. Advanced analytics
-3. Custom branding options
-4. Batch processing optimization
-
-## Conclusion
-
-The PDF generation system has been significantly improved across all three key areas:
-
-1. **Effectiveness**: Enhanced functionality and feature set
-2. **Efficiency**: Major performance improvements and resource optimization
-3. **Visual Quality**: Professional, modern design system
-
-The current implementation using Puppeteer with the implemented optimizations provides the best balance of features, performance, and maintainability. The system is now production-ready and can handle high-volume PDF generation with excellent performance characteristics.
-
-## Next Steps
-
-1. **Deploy Optimizations**: Implement the improved service in production
-2. **Monitor Performance**: Track the new metrics and performance improvements
-3. **Gather Feedback**: Collect user feedback on the new visual design
-4. **Iterate**: Continue improving based on usage patterns and requirements
-
-The optimized PDF generation service represents a significant upgrade that will improve user experience, reduce server load, and provide professional-quality output for all generated documents. 
--- a/QUICK_START.md
+++ b/QUICK_START.md
@@ -0,0 +1,178 @@
+# Quick Start: Fix Job Processing Now
+
+**Status:** ✅ Code implemented - Need DATABASE_URL configuration
+
+---
+
+## 🚀 Quick Fix (5 minutes)
+
+### Step 1: Get PostgreSQL Connection String
+
+1. Go to **Supabase Dashboard**: https://supabase.com/dashboard
+2. Select your project
+3. Navigate to **Settings → Database**
+4. Scroll to **Connection string** section
+5. Click **"URI"** tab
+6. Copy the connection string (looks like):
+   ```
+   postgresql://postgres.[PROJECT-REF]:[PASSWORD]@aws-0-us-central-1.pooler.supabase.com:6543/postgres
+   ```
+
+### Step 2: Add to Environment
+
+**For Local Testing:**
+```bash
+cd backend
+echo 'DATABASE_URL=postgresql://postgres.[PROJECT-REF]:[PASSWORD]@aws-0-us-central-1.pooler.supabase.com:6543/postgres' >> .env
+```
+
+**For Firebase Functions (Production):**
+```bash
+# For secrets (recommended for sensitive data):
+firebase functions:secrets:set DATABASE_URL
+
+# Or set as environment variable in firebase.json or function configuration
+# See: https://firebase.google.com/docs/functions/config-env
+```
+
+### Step 3: Test Connection
+
+```bash
+cd backend
+npm run test:postgres
+```
+
+**Expected Output:**
+```
+✅ PostgreSQL pool created
+✅ Connection successful!
+✅ processing_jobs table exists
+✅ documents table exists
+🎯 Ready to create jobs via direct PostgreSQL connection
+```
+
+### Step 4: Test Job Creation
+
+```bash
+# Get a document ID first
+npm run test:postgres
+
+# Then create a job for a document
+npm run test:job <document-id>
+```
+
+### Step 5: Build and Deploy
+
+```bash
+cd backend
+npm run build
+firebase deploy --only functions
+```
+
+---
+
+## ✅ What This Fixes
+
+**Before:**
+- ❌ Jobs fail to create (PostgREST cache error)
+- ❌ Documents stuck in `processing_llm`
+- ❌ No processing happens
+
+**After:**
+- ✅ Jobs created via direct PostgreSQL
+- ✅ Bypasses PostgREST cache issues
+- ✅ Jobs processed by scheduled function
+- ✅ Documents complete successfully
+
+---
+
+## 🔍 Verification
+
+After deployment, test with a real upload:
+
+1. **Upload a document** via frontend
+2. **Check logs:**
+   ```bash
+   firebase functions:log --only api --limit 50
+   ```
+   Look for: `"Processing job created via direct PostgreSQL"`
+
+3. **Check database:**
+   ```sql
+   SELECT * FROM processing_jobs WHERE status = 'pending' ORDER BY created_at DESC LIMIT 5;
+   ```
+
+4. **Wait 1-2 minutes** for scheduled function to process
+
+5. **Check document:**
+   ```sql
+   SELECT id, status, analysis_data FROM documents WHERE id = '[DOCUMENT-ID]';
+   ```
+   Should show: `status = 'completed'` and `analysis_data` populated
+
+---
+
+## 🐛 Troubleshooting
+
+### Error: "DATABASE_URL environment variable is required"
+
+**Solution:** Make sure you added `DATABASE_URL` to `.env` or Firebase config
+
+### Error: "Connection timeout"
+
+**Solution:** 
+- Verify connection string is correct
+- Check if your IP is allowed in Supabase (Settings → Database → Connection pooling)
+- Try using transaction mode instead of session mode
+
+### Error: "Authentication failed"
+
+**Solution:** 
+- Verify password in connection string
+- Reset database password in Supabase if needed
+- Make sure you're using the pooler connection string (port 6543)
+
+### Still Getting Cache Errors?
+
+**Solution:** The fallback to Supabase client will still work, but direct PostgreSQL should succeed first. Check logs to see which method was used.
+
+---
+
+## 📊 Expected Flow After Fix
+
+```
+1. User Uploads PDF ✅
+2. GCS Upload ✅
+3. Confirm Upload ✅
+4. Job Created via Direct PostgreSQL ✅ (NEW!)
+5. Scheduled Function Finds Job ✅
+6. Job Processor Executes ✅
+7. Document Updated to Completed ✅
+```
+
+---
+
+## 🎯 Success Criteria
+
+You'll know it's working when:
+
+- ✅ `test:postgres` script succeeds
+- ✅ `test:job` script creates job
+- ✅ Upload creates job automatically
+- ✅ Scheduled function logs show jobs being processed
+- ✅ Documents transition from `processing_llm` → `completed`
+- ✅ `analysis_data` is populated
+
+---
+
+## 📝 Next Steps
+
+1. ✅ Code implemented
+2. ⏳ Get DATABASE_URL from Supabase
+3. ⏳ Add to environment
+4. ⏳ Test connection
+5. ⏳ Test job creation
+6. ⏳ Deploy to Firebase
+7. ⏳ Verify end-to-end
+
+**Once DATABASE_URL is configured, the system will work end-to-end!**
--- a/TODO_AND_OPTIMIZATIONS.md
+++ b/TODO_AND_OPTIMIZATIONS.md
@@ -0,0 +1,22 @@
+# Operational To-Dos & Optimization Backlog
+
+## To-Do List (as of 2026-02-23)
+- **Wire Firebase Functions secrets**: Attach `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `SUPABASE_SERVICE_KEY`, `SUPABASE_ANON_KEY`, `DATABASE_URL`, `EMAIL_PASS`, and `FIREBASE_SERVICE_ACCOUNT` to every deployed function so the runtime no longer depends on local `.env` values.
+- **Set `GCLOUD_PROJECT_ID` explicitly**: Export `GCLOUD_PROJECT_ID=cim-summarizer` (or the active project) for local scripts and production functions so Document AI processor paths stop defaulting to `projects/undefined`.
+- **Acceptance-test expansion**: Add additional CIM/output fixture pairs (beyond Handi Foods) so the automated acceptance suite enforces coverage across diverse deal structures.
+- **Backend log hygiene**: Keep tailing `logs/error.log` after each deploy to confirm the service account + Anthropic credential fixes remain in place; document notable findings in deployment notes.
+- **Infrastructure deployment checklist**: Update `DEPLOYMENT_GUIDE.md` with the exact Firebase/GCP commands used to fetch secrets and run Sonnet validation so future deploys stay reproducible.
+- ~~**Runtime upgrade**: Migrate Firebase Functions from Node.js 20 to a supported runtime well before the 2026‑10‑30 decommission date (warning surfaced during deploy).~~ ✅ Done 2026-02-24 — upgraded to Node.js 22 LTS.
+- ~~**`firebase-functions` dependency bump**: Upgrade the project to the latest `firebase-functions` package and address any breaking changes on the next development pass.~~ ✅ Done 2026-02-24 — upgraded to firebase-functions v7, removed deprecated `functions.config()` fallback, TS target bumped to ES2022.
+- **Document viewer KPIs missing after Project Panther run**: `Project Panther - Confidential Information Memorandum_vBluePoint.pdf` → `Revenue/EBITDA/Employees/Founded` surfaced as "Not specified in CIM" even though the CIM has numeric tables. Trace `optimizedAgenticRAGProcessor` → `dealOverview` mapper to ensure summary metrics populate the dashboard cards and add a regression test for this doc.
+- **10+ minute processing latency regression**: The same Project Panther run (doc ID `document-55c4a6e2-8c08-4734-87f6-24407cea50ac.pdf`) took ~10 minutes end-to-end. Instrument each pipeline phase (PDF chunking, Document AI, RAG passes, financial parser) so we can see where time is lost, then cap slow stages (e.g., GCS upload retries, three Anthropic fallbacks) before the next deploy.
+
+## Optimization Backlog (ordered by Accuracy → Speed → Cost benefit vs. implementation risk)
+1. **Deterministic financial parser enhancements** (status: partially addressed). Continue improving token alignment (multi-row tables, negative numbers) to reduce dependence on LLM retries. Risk: low, limited to parser module.
+2. **Retrieval gating per Agentic pass**. Swap the “top-N chunk blast” with similarity search keyed to each prompt (deal overview, market, thesis). Benefit: higher accuracy + lower token count. Risk: medium; needs robust Supabase RPC fallbacks.
+3. **Embedding cache keyed by document checksum**. Skip re-embedding when a document/version is unchanged to cut processing time/cost on retries. Risk: medium; requires schema changes to store content hashes.
+4. **Field-level validation & dependency checks prior to gap filling**. Enforce numeric relationships (e.g., EBITDA margin = EBITDA / Revenue) and re-query only the failing sections. Benefit: accuracy; risk: medium (adds validator & targeted prompts).
+5. **Stream Document AI chunks directly into chunker**. Avoid writing intermediate PDFs to disk/GCS when splitting >30 page CIMs. Benefit: speed/cost; risk: medium-high because it touches PDF splitting + Document AI integration.
+6. **Parallelize independent multi-pass queries** (e.g., run Pass 2 and Pass 3 concurrently when quota allows). Benefit: lower latency; risk: medium-high due to Anthropic rate limits & merge ordering.
+7. **Expose per-pass metrics via `/health/agentic-rag`**. Surface timing/token/cost data so regressions are visible. Benefit: operational accuracy; risk: low.
+8. **Structured comparison harness for CIM outputs**. Reuse the acceptance-test fixtures to generate diff reports for human reviewers (baseline vs. new model). Benefit: accuracy guardrail; risk: low once additional fixtures exist.
--- a/backend/.env.bak
+++ b/backend/.env.bak
@@ -0,0 +1,140 @@
+# Node Environment
+NODE_ENV=testing
+
+# Firebase Configuration (Testing Project) - ✅ COMPLETED
+FB_PROJECT_ID=cim-summarizer-testing
+FB_STORAGE_BUCKET=cim-summarizer-testing.firebasestorage.app
+FB_API_KEY=AIzaSyBNf58cnNMbXb6VE3sVEJYJT5CGNQr0Kmg
+FB_AUTH_DOMAIN=cim-summarizer-testing.firebaseapp.com
+
+# Supabase Configuration (Testing Instance) - ✅ COMPLETED
+SUPABASE_URL=https://gzoclmbqmgmpuhufbnhy.supabase.co
+
+# Google Cloud Configuration (Testing Project) - ✅ COMPLETED
+GCLOUD_PROJECT_ID=cim-summarizer-testing
+DOCUMENT_AI_LOCATION=us
+DOCUMENT_AI_PROCESSOR_ID=575027767a9291f6
+GCS_BUCKET_NAME=cim-processor-testing-uploads
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=cim-processor-testing-processed
+GOOGLE_APPLICATION_CREDENTIALS=./serviceAccountKey-testing.json
+
+# LLM Configuration (Same as production but with cost limits) - ✅ COMPLETED
+LLM_PROVIDER=anthropic
+LLM_MAX_COST_PER_DOCUMENT=1.00
+LLM_ENABLE_COST_OPTIMIZATION=true
+LLM_USE_FAST_MODEL_FOR_SIMPLE_TASKS=true
+
+# Email Configuration (Testing) - ✅ COMPLETED
+EMAIL_HOST=smtp.gmail.com
+EMAIL_PORT=587
+EMAIL_USER=press7174@gmail.com
+EMAIL_FROM=press7174@gmail.com
+WEEKLY_EMAIL_RECIPIENT=jpressnell@bluepointcapital.com
+
+# Vector Database (Testing)
+VECTOR_PROVIDER=supabase
+
+# Testing-specific settings
+RATE_LIMIT_MAX_REQUESTS=1000
+RATE_LIMIT_WINDOW_MS=900000
+AGENTIC_RAG_DETAILED_LOGGING=true
+AGENTIC_RAG_PERFORMANCE_TRACKING=true
+AGENTIC_RAG_ERROR_REPORTING=true
+
+# Week 8 Features Configuration
+# Cost Monitoring
+COST_MONITORING_ENABLED=true
+USER_DAILY_COST_LIMIT=50.00
+USER_MONTHLY_COST_LIMIT=500.00
+DOCUMENT_COST_LIMIT=10.00
+SYSTEM_DAILY_COST_LIMIT=1000.00
+
+# Caching Configuration
+CACHE_ENABLED=true
+CACHE_TTL_HOURS=168
+CACHE_SIMILARITY_THRESHOLD=0.85
+CACHE_MAX_SIZE=10000
+
+# Microservice Configuration
+MICROSERVICE_ENABLED=true
+MICROSERVICE_MAX_CONCURRENT_JOBS=5
+MICROSERVICE_HEALTH_CHECK_INTERVAL=30000
+MICROSERVICE_QUEUE_PROCESSING_INTERVAL=5000
+
+# Processing Strategy
+PROCESSING_STRATEGY=document_ai_agentic_rag
+ENABLE_RAG_PROCESSING=true
+ENABLE_PROCESSING_COMPARISON=false
+
+# Agentic RAG Configuration
+AGENTIC_RAG_ENABLED=true
+AGENTIC_RAG_MAX_AGENTS=6
+AGENTIC_RAG_PARALLEL_PROCESSING=true
+AGENTIC_RAG_VALIDATION_STRICT=true
+AGENTIC_RAG_RETRY_ATTEMPTS=3
+AGENTIC_RAG_TIMEOUT_PER_AGENT=60000
+
+# Agent-Specific Configuration
+AGENT_DOCUMENT_UNDERSTANDING_ENABLED=true
+AGENT_FINANCIAL_ANALYSIS_ENABLED=true
+AGENT_MARKET_ANALYSIS_ENABLED=true
+AGENT_INVESTMENT_THESIS_ENABLED=true
+AGENT_SYNTHESIS_ENABLED=true
+AGENT_VALIDATION_ENABLED=true
+
+# Quality Control
+AGENTIC_RAG_QUALITY_THRESHOLD=0.8
+AGENTIC_RAG_COMPLETENESS_THRESHOLD=0.9
+AGENTIC_RAG_CONSISTENCY_CHECK=true
+
+# Logging Configuration
+LOG_LEVEL=debug
+LOG_FILE=logs/testing.log
+
+# Security Configuration
+BCRYPT_ROUNDS=10
+
+# Database Configuration (Testing)
+DATABASE_HOST=db.supabase.co
+DATABASE_PORT=5432
+DATABASE_NAME=postgres
+DATABASE_USER=postgres
+DATABASE_PASSWORD=your-testing-supabase-password
+
+# Redis Configuration (Testing - using in-memory for testing)
+REDIS_URL=redis://localhost:6379
+REDIS_HOST=localhost
+REDIS_PORT=6379
+ALLOWED_FILE_TYPES=application/pdf
+MAX_FILE_SIZE=52428800
+
+GCLOUD_PROJECT_ID=324837881067
+DOCUMENT_AI_LOCATION=us
+DOCUMENT_AI_PROCESSOR_ID=abb95bdd56632e4d
+GCS_BUCKET_NAME=cim-processor-testing-uploads
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=cim-processor-testing-processed
+OPENROUTER_USE_BYOK=true
+
+# Email Configuration
+EMAIL_SECURE=false
+EMAIL_WEEKLY_RECIPIENT=jpressnell@bluepointcapital.com
+
+#SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImlhdCI6MTc1MzgxNjY3OCwiZXhwIjoyMDY5MzkyNjc4fQ.f9PUzL1F8JqIkqD_DwrGBIyHPcehMo-97jXD8hee5ss
+
+#SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTM4MTY2NzgsImV4cCI6MjA2OTM5MjY3OH0.Jg8cAKbujDv7YgeLCeHsOkgkP-LwM-7fAXVIHno0pLI
+
+#OPENROUTER_API_KEY=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
+
+#ANTHROPIC_API_KEY=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
+
+#OPENAI_API_KEY=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
+
+SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImlhdCI6MTc1MzgxNjY3OCwiZXhwIjoyMDY5MzkyNjc4fQ.f9PUzL1F8JqIkqD_DwrGBIyHPcehMo-97jXD8hee5ss
+
+SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTM4MTY2NzgsImV4cCI6MjA2OTM5MjY3OH0.Jg8cAKbujDv7YgeLCeHsOkgkP-LwM-7fAXVIHno0pLI
+
+OPENROUTER_API_KEY=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
+
+ANTHROPIC_API_KEY=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
+
+OPENAI_API_KEY=sk-proj-dFNxetn-sm08kbZ8IpFROe0LgVQev3lEsyfrGNqdYruyW_mLATHXVGee3ay55zkDHDBYR_XX4T3BlbkFJ2mJVmqt5u58hqrPSLhDsoN6HPQD_vyQFCqtlePYagbcnAnRDcleK06pYUf-Z3NhzfD-ONkEoMA
--- a/backend/.env.bak2
+++ b/backend/.env.bak2
@@ -0,0 +1,130 @@
+# Node Environment
+NODE_ENV=testing
+
+# Firebase Configuration (Testing Project) - ✅ COMPLETED
+FB_PROJECT_ID=cim-summarizer-testing
+FB_STORAGE_BUCKET=cim-summarizer-testing.firebasestorage.app
+FB_API_KEY=AIzaSyBNf58cnNMbXb6VE3sVEJYJT5CGNQr0Kmg
+FB_AUTH_DOMAIN=cim-summarizer-testing.firebaseapp.com
+
+# Supabase Configuration (Testing Instance) - ✅ COMPLETED
+SUPABASE_URL=https://gzoclmbqmgmpuhufbnhy.supabase.co
+
+# Google Cloud Configuration (Testing Project) - ✅ COMPLETED
+GCLOUD_PROJECT_ID=cim-summarizer-testing
+DOCUMENT_AI_LOCATION=us
+DOCUMENT_AI_PROCESSOR_ID=575027767a9291f6
+GCS_BUCKET_NAME=cim-processor-testing-uploads
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=cim-processor-testing-processed
+GOOGLE_APPLICATION_CREDENTIALS=./serviceAccountKey-testing.json
+
+# LLM Configuration (Same as production but with cost limits) - ✅ COMPLETED
+LLM_PROVIDER=anthropic
+LLM_MAX_COST_PER_DOCUMENT=1.00
+LLM_ENABLE_COST_OPTIMIZATION=true
+LLM_USE_FAST_MODEL_FOR_SIMPLE_TASKS=true
+
+# Email Configuration (Testing) - ✅ COMPLETED
+EMAIL_HOST=smtp.gmail.com
+EMAIL_PORT=587
+EMAIL_USER=press7174@gmail.com
+EMAIL_FROM=press7174@gmail.com
+WEEKLY_EMAIL_RECIPIENT=jpressnell@bluepointcapital.com
+
+# Vector Database (Testing)
+VECTOR_PROVIDER=supabase
+
+# Testing-specific settings
+RATE_LIMIT_MAX_REQUESTS=1000
+RATE_LIMIT_WINDOW_MS=900000
+AGENTIC_RAG_DETAILED_LOGGING=true
+AGENTIC_RAG_PERFORMANCE_TRACKING=true
+AGENTIC_RAG_ERROR_REPORTING=true
+
+# Week 8 Features Configuration
+# Cost Monitoring
+COST_MONITORING_ENABLED=true
+USER_DAILY_COST_LIMIT=50.00
+USER_MONTHLY_COST_LIMIT=500.00
+DOCUMENT_COST_LIMIT=10.00
+SYSTEM_DAILY_COST_LIMIT=1000.00
+
+# Caching Configuration
+CACHE_ENABLED=true
+CACHE_TTL_HOURS=168
+CACHE_SIMILARITY_THRESHOLD=0.85
+CACHE_MAX_SIZE=10000
+
+# Microservice Configuration
+MICROSERVICE_ENABLED=true
+MICROSERVICE_MAX_CONCURRENT_JOBS=5
+MICROSERVICE_HEALTH_CHECK_INTERVAL=30000
+MICROSERVICE_QUEUE_PROCESSING_INTERVAL=5000
+
+# Processing Strategy
+PROCESSING_STRATEGY=document_ai_agentic_rag
+ENABLE_RAG_PROCESSING=true
+ENABLE_PROCESSING_COMPARISON=false
+
+# Agentic RAG Configuration
+AGENTIC_RAG_ENABLED=true
+AGENTIC_RAG_MAX_AGENTS=6
+AGENTIC_RAG_PARALLEL_PROCESSING=true
+AGENTIC_RAG_VALIDATION_STRICT=true
+AGENTIC_RAG_RETRY_ATTEMPTS=3
+AGENTIC_RAG_TIMEOUT_PER_AGENT=60000
+
+# Agent-Specific Configuration
+AGENT_DOCUMENT_UNDERSTANDING_ENABLED=true
+AGENT_FINANCIAL_ANALYSIS_ENABLED=true
+AGENT_MARKET_ANALYSIS_ENABLED=true
+AGENT_INVESTMENT_THESIS_ENABLED=true
+AGENT_SYNTHESIS_ENABLED=true
+AGENT_VALIDATION_ENABLED=true
+
+# Quality Control
+AGENTIC_RAG_QUALITY_THRESHOLD=0.8
+AGENTIC_RAG_COMPLETENESS_THRESHOLD=0.9
+AGENTIC_RAG_CONSISTENCY_CHECK=true
+
+# Logging Configuration
+LOG_LEVEL=debug
+LOG_FILE=logs/testing.log
+
+# Security Configuration
+BCRYPT_ROUNDS=10
+
+# Database Configuration (Testing)
+DATABASE_HOST=db.supabase.co
+DATABASE_PORT=5432
+DATABASE_NAME=postgres
+DATABASE_USER=postgres
+DATABASE_PASSWORD=your-testing-supabase-password
+
+# Redis Configuration (Testing - using in-memory for testing)
+REDIS_URL=redis://localhost:6379
+REDIS_HOST=localhost
+REDIS_PORT=6379
+ALLOWED_FILE_TYPES=application/pdf
+MAX_FILE_SIZE=52428800
+
+GCLOUD_PROJECT_ID=324837881067
+DOCUMENT_AI_LOCATION=us
+DOCUMENT_AI_PROCESSOR_ID=abb95bdd56632e4d
+GCS_BUCKET_NAME=cim-processor-testing-uploads
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=cim-processor-testing-processed
+OPENROUTER_USE_BYOK=true
+
+# Email Configuration
+EMAIL_SECURE=false
+EMAIL_WEEKLY_RECIPIENT=jpressnell@bluepointcapital.com
+
+#SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImlhdCI6MTc1MzgxNjY3OCwiZXhwIjoyMDY5MzkyNjc4fQ.f9PUzL1F8JqIkqD_DwrGBIyHPcehMo-97jXD8hee5ss
+
+#SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd6b2NsbWJxbWdtcHVodWZibmh5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTM4MTY2NzgsImV4cCI6MjA2OTM5MjY3OH0.Jg8cAKbujDv7YgeLCeHsOkgkP-LwM-7fAXVIHno0pLI
+
+#OPENROUTER_API_KEY=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
+
+#ANTHROPIC_API_KEY=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
+
+#OPENAI_API_KEY=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
--- a/backend/.env.example
+++ b/backend/.env.example
@@ -30,7 +30,8 @@ DOCUMENT_AI_LOCATION=us
 DOCUMENT_AI_PROCESSOR_ID=your-processor-id
 GCS_BUCKET_NAME=your-gcs-bucket-name
 DOCUMENT_AI_OUTPUT_BUCKET_NAME=your-document-ai-output-bucket
-GOOGLE_APPLICATION_CREDENTIALS=./serviceAccountKey.json
+# Leave blank when using Firebase Functions secrets/Application Default Credentials
+GOOGLE_APPLICATION_CREDENTIALS=

 # Processing Strategy
 PROCESSING_STRATEGY=document_ai_genkit
@@ -72,4 +73,4 @@ AGENTIC_RAG_CONSISTENCY_CHECK=true
 # Monitoring and Logging
 AGENTIC_RAG_DETAILED_LOGGING=true
 AGENTIC_RAG_PERFORMANCE_TRACKING=true
-AGENTIC_RAG_ERROR_REPORTING=true
+AGENTIC_RAG_ERROR_REPORTING=true
--- a/backend/TROUBLESHOOTING_PLAN.md
+++ b/backend/TROUBLESHOOTING_PLAN.md
@@ -0,0 +1,418 @@
+# CIM Summary LLM Processing - Rapid Diagnostic & Fix Plan
+
+## 🚨 If Processing Fails - Execute This Plan
+
+### Phase 1: Immediate Diagnosis (2-5 minutes)
+
+#### Step 1.1: Check Recent Failures in Database
+```bash
+npx ts-node -e "
+import { supabase } from './src/config/supabase';
+
+(async () => {
+  const { data } = await supabase
+    .from('documents')
+    .select('id, filename, status, error_message, created_at, updated_at')
+    .eq('status', 'failed')
+    .order('updated_at', { ascending: false })
+    .limit(5);
+
+  console.log('Recent Failures:');
+  data?.forEach(d => {
+    console.log(\`- \${d.filename}: \${d.error_message?.substring(0, 200)}\`);
+  });
+  process.exit(0);
+})();
+"
+```
+
+**What to look for:**
+- Repeating error patterns
+- Specific error messages (timeout, API error, invalid model, etc.)
+- Time pattern (all failures at same time = system issue)
+
+---
+
+#### Step 1.2: Check Real-Time Error Logs
+```bash
+# Check last 100 errors
+tail -100 logs/error.log | grep -E "(error|ERROR|failed|FAILED|timeout|TIMEOUT)" | tail -20
+
+# Or check specific patterns
+grep -E "OpenRouter|Anthropic|LLM|model ID" logs/error.log | tail -20
+```
+
+**What to look for:**
+- `"invalid model ID"` → Model name issue
+- `"timeout"` → Timeout configuration issue
+- `"rate limit"` → API quota exceeded
+- `"401"` or `"403"` → Authentication issue
+- `"Cannot read properties"` → Code bug
+
+---
+
+#### Step 1.3: Test LLM Directly (Fastest Check)
+```bash
+# This takes 30-60 seconds
+npx ts-node src/scripts/test-openrouter-simple.ts 2>&1 | grep -E "(SUCCESS|FAILED|error.*model|OpenRouter API)"
+```
+
+**Expected output if working:**
+```
+✅ OpenRouter API call successful
+✅ Test Result: SUCCESS
+```
+
+**If it fails, note the EXACT error message.**
+
+---
+
+### Phase 2: Root Cause Identification (3-10 minutes)
+
+Based on the error from Phase 1, jump to the appropriate section:
+
+#### **Error Type A: Invalid Model ID**
+
+**Symptoms:**
+```
+"anthropic/claude-haiku-4 is not a valid model ID"
+"anthropic/claude-sonnet-4 is not a valid model ID"
+```
+
+**Root Cause:** Model name mismatch with OpenRouter API
+
+**Fix Location:** `backend/src/services/llmService.ts` lines 526-552
+
+**Verification:**
+```bash
+# Check what OpenRouter actually supports
+curl -s "https://openrouter.ai/api/v1/models" \
+  -H "Authorization: Bearer $OPENROUTER_API_KEY" | \
+  python3 -m json.tool | \
+  grep -A 2 "\"id\": \"anthropic" | \
+  head -30
+```
+
+**Quick Fix:**
+Update the model mapping in `llmService.ts`:
+```typescript
+// Current valid OpenRouter model IDs (as of Nov 2024):
+if (model.includes('sonnet') && model.includes('4')) {
+  openRouterModel = 'anthropic/claude-sonnet-4.5';
+} else if (model.includes('haiku') && model.includes('4')) {
+  openRouterModel = 'anthropic/claude-haiku-4.5';
+}
+```
+
+---
+
+#### **Error Type B: Timeout Errors**
+
+**Symptoms:**
+```
+"LLM call timeout after X minutes"
+"Processing timeout: Document stuck"
+```
+
+**Root Cause:** Operation taking longer than configured timeout
+
+**Diagnosis:**
+```bash
+# Check current timeout settings
+grep -E "timeout|TIMEOUT" backend/src/config/env.ts | grep -v "//"
+grep "timeoutMs" backend/src/services/llmService.ts | head -5
+```
+
+**Check Locations:**
+1. `env.ts:319` - `LLM_TIMEOUT_MS` (default 180000 = 3 min)
+2. `llmService.ts:343` - Wrapper timeout
+3. `llmService.ts:516` - OpenRouter abort timeout
+
+**Quick Fix:**
+Add to `.env`:
+```bash
+LLM_TIMEOUT_MS=360000  # Increase to 6 minutes
+```
+
+Or edit `env.ts:319`:
+```typescript
+timeoutMs: parseInt(envVars['LLM_TIMEOUT_MS'] || '360000'), // 6 min
+```
+
+---
+
+#### **Error Type C: Authentication/API Key Issues**
+
+**Symptoms:**
+```
+"401 Unauthorized"
+"403 Forbidden"
+"API key is missing"
+"ANTHROPIC_API_KEY is not set"
+```
+
+**Root Cause:** Missing or invalid API keys
+
+**Diagnosis:**
+```bash
+# Check which keys are set
+echo "ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:0:20}..."
+echo "OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:0:20}..."
+echo "OPENAI_API_KEY: ${OPENAI_API_KEY:0:20}..."
+
+# Check .env file
+grep -E "ANTHROPIC|OPENROUTER|OPENAI" backend/.env | grep -v "^#"
+```
+
+**Quick Fix:**
+Ensure these are set in `backend/.env`:
+```bash
+ANTHROPIC_API_KEY=sk-ant-api03-...
+OPENROUTER_API_KEY=sk-or-v1-...
+OPENROUTER_USE_BYOK=true
+```
+
+---
+
+#### **Error Type D: Rate Limit Exceeded**
+
+**Symptoms:**
+```
+"429 Too Many Requests"
+"rate limit exceeded"
+"Retry after X seconds"
+```
+
+**Root Cause:** Too many API calls in short time
+
+**Diagnosis:**
+```bash
+# Check recent API call frequency
+grep "LLM API call" logs/testing.log | tail -20 | \
+  awk '{print $1, $2}' | uniq -c
+```
+
+**Quick Fix:**
+1. Wait for rate limit to reset (check error for retry time)
+2. Add rate limiting in code:
+   ```typescript
+   // In llmService.ts, add delay between retries
+   await new Promise(resolve => setTimeout(resolve, 2000)); // 2 sec delay
+   ```
+
+---
+
+#### **Error Type E: Code Bugs (TypeError, Cannot read property)**
+
+**Symptoms:**
+```
+"Cannot read properties of undefined (reading '0')"
+"TypeError: response.data is undefined"
+"Unexpected token in JSON"
+```
+
+**Root Cause:** Missing null checks or incorrect data access
+
+**Diagnosis:**
+```bash
+# Find the exact line causing the error
+grep -A 5 "Cannot read properties" logs/error.log | tail -10
+```
+
+**Quick Fix Pattern:**
+Replace unsafe access:
+```typescript
+// Bad:
+const content = response.data.choices[0].message.content;
+
+// Good:
+const content = response.data?.choices?.[0]?.message?.content || '';
+```
+
+**File to check:** `llmService.ts:696-720`
+
+---
+
+### Phase 3: Systematic Testing (5-10 minutes)
+
+After applying a fix, test in this order:
+
+#### Test 1: Direct LLM Call
+```bash
+npx ts-node src/scripts/test-openrouter-simple.ts
+```
+**Expected:** Success in 30-90 seconds
+
+#### Test 2: Simple RAG Processing
+```bash
+npx ts-node -e "
+import { llmService } from './src/services/llmService';
+
+(async () => {
+  const text = 'CIM for Target Corp. Revenue: \$100M. EBITDA: \$20M.';
+  const result = await llmService.processCIMDocument(text, 'BPCP Template');
+  console.log('Success:', result.success);
+  console.log('Has JSON:', !!result.jsonOutput);
+  process.exit(result.success ? 0 : 1);
+})();
+"
+```
+**Expected:** Success with JSON output
+
+#### Test 3: Full Document Upload
+Use the frontend to upload a real CIM and monitor:
+```bash
+# In one terminal, watch logs
+tail -f logs/testing.log | grep -E "(error|success|completed)"
+
+# Check processing status
+npx ts-node src/scripts/check-current-processing.ts
+```
+
+---
+
+### Phase 4: Emergency Fallback Options
+
+If all else fails, use these fallback strategies:
+
+#### Option 1: Switch to Direct Anthropic (Bypass OpenRouter)
+```bash
+# In .env
+LLM_PROVIDER=anthropic  # Instead of openrouter
+```
+
+**Pro:** Eliminates OpenRouter as variable
+**Con:** Different rate limits
+
+#### Option 2: Use Older Claude Model
+```bash
+# In .env or env.ts
+LLM_MODEL=claude-3.5-sonnet
+LLM_FAST_MODEL=claude-3.5-haiku
+```
+
+**Pro:** More stable, widely supported
+**Con:** Slightly older model
+
+#### Option 3: Reduce Input Size
+```typescript
+// In optimizedAgenticRAGProcessor.ts:651
+const targetTokenCount = 8000; // Down from 50000
+```
+
+**Pro:** Faster processing, less likely to timeout
+**Con:** Less context for analysis
+
+---
+
+### Phase 5: Preventive Monitoring
+
+Set up these checks to catch issues early:
+
+#### Daily Health Check Script
+Create `backend/scripts/daily-health-check.sh`:
+```bash
+#!/bin/bash
+echo "=== Daily CIM Processor Health Check ==="
+echo ""
+
+# Check for stuck documents
+npx ts-node src/scripts/check-database-failures.ts
+
+# Test LLM connectivity
+npx ts-node src/scripts/test-openrouter-simple.ts
+
+# Check recent success rate
+echo "Recent processing stats (last 24 hours):"
+npx ts-node -e "
+import { supabase } from './src/config/supabase';
+(async () => {
+  const yesterday = new Date(Date.now() - 86400000).toISOString();
+  const { data } = await supabase
+    .from('documents')
+    .select('status')
+    .gte('created_at', yesterday);
+
+  const stats = data?.reduce((acc, d) => {
+    acc[d.status] = (acc[d.status] || 0) + 1;
+    return acc;
+  }, {});
+
+  console.log(stats);
+  process.exit(0);
+})();
+"
+```
+
+Run daily:
+```bash
+chmod +x backend/scripts/daily-health-check.sh
+./backend/scripts/daily-health-check.sh
+```
+
+---
+
+## 📋 Quick Reference Checklist
+
+When processing fails, check in this order:
+
+- [ ] **Error logs** (`tail -100 logs/error.log`)
+- [ ] **Recent failures** (database query in Step 1.1)
+- [ ] **Direct LLM test** (`test-openrouter-simple.ts`)
+- [ ] **Model ID validity** (curl OpenRouter API)
+- [ ] **API keys set** (check `.env`)
+- [ ] **Timeout values** (check `env.ts`)
+- [ ] **OpenRouter vs Anthropic** (which provider?)
+- [ ] **Rate limits** (check error for 429)
+- [ ] **Code bugs** (look for TypeErrors in logs)
+- [ ] **Build succeeded** (`npm run build`)
+
+---
+
+## 🔧 Common Fix Commands
+
+```bash
+# Rebuild after code changes
+npm run build
+
+# Clear error logs and start fresh
+> logs/error.log
+
+# Test with verbose logging
+LOG_LEVEL=debug npx ts-node src/scripts/test-openrouter-simple.ts
+
+# Check what's actually in .env
+cat .env | grep -v "^#" | grep -E "LLM|ANTHROPIC|OPENROUTER"
+
+# Verify OpenRouter models
+curl -s "https://openrouter.ai/api/v1/models" -H "Authorization: Bearer $OPENROUTER_API_KEY" | python3 -m json.tool | grep "claude.*haiku\|claude.*sonnet"
+```
+
+---
+
+## 📞 Escalation Path
+
+If issue persists after 30 minutes:
+
+1. **Check OpenRouter Status:** https://status.openrouter.ai/
+2. **Check Anthropic Status:** https://status.anthropic.com/
+3. **Review OpenRouter Docs:** https://openrouter.ai/docs
+4. **Test with curl:** Send raw API request to isolate issue
+5. **Compare git history:** `git diff HEAD~10 -- backend/src/services/llmService.ts`
+
+---
+
+## 🎯 Success Criteria
+
+Processing is "working" when:
+
+- ✅ Direct LLM test completes in < 2 minutes
+- ✅ Returns valid JSON matching schema
+- ✅ No errors in last 10 log entries
+- ✅ Database shows recent "completed" documents
+- ✅ Frontend can upload and process test CIM
+
+---
+
+**Last Updated:** 2025-11-07
+**Next Review:** After any production deployment
--- a/backend/firebase.json
+++ b/backend/firebase.json
@@ -1,7 +1,7 @@
 {
  "functions": {
    "source": ".",
-    "runtime": "nodejs20",
+    "runtime": "nodejs22",
    "ignore": [
      "node_modules",
      "src",
@@ -13,7 +13,15 @@
      "tsconfig.json",
      ".eslintrc.js",
      "Dockerfile",
-      "cloud-run.yaml"
+      "cloud-run.yaml",
+      ".env",
+      ".env.*",
+      "*.env",
+      ".env.bak",
+      ".env.bak*",
+      "*.env.bak",
+      "*.env.bak*",
+      "pnpm-lock.yaml"
    ],
    "predeploy": [
      "npm run build"
--- a/backend/package-lock.json
+++ b/backend/package-lock.json
--- a/backend/package.json
+++ b/backend/package.json
@@ -1,6 +1,6 @@
 {
  "name": "cim-processor-backend",
-  "version": "1.0.0",
+  "version": "2.0.0",
  "description": "Backend API for CIM Document Processor",
  "main": "dist/index.js",
  "scripts": {
@@ -15,17 +15,35 @@
    "db:migrate": "ts-node src/scripts/setup-database.ts",
    "db:seed": "ts-node src/models/seed.ts",
    "db:setup": "npm run db:migrate && node scripts/setup_supabase.js",
-    "deploy:firebase": "npm run build && firebase deploy --only functions",
+    "pre-deploy-check": "bash scripts/pre-deploy-check.sh",
+    "clean-env-secrets": "bash scripts/clean-env-secrets.sh",
+    "deploy:firebase": "npm run pre-deploy-check && npm run build && firebase deploy --only functions",
+    "deploy:firebase:force": "npm run build && firebase deploy --only functions",
    "deploy:cloud-run": "npm run build && gcloud run deploy cim-processor-backend --source . --region us-central1 --platform managed --allow-unauthenticated",
    "deploy:docker": "npm run build && docker build -t cim-processor-backend . && docker run -p 8080:8080 cim-processor-backend",
    "docker:build": "docker build -t cim-processor-backend .",
    "docker:push": "docker tag cim-processor-backend gcr.io/cim-summarizer/cim-processor-backend:latest && docker push gcr.io/cim-summarizer/cim-processor-backend:latest",
    "emulator": "firebase emulators:start --only functions",
-    "emulator:ui": "firebase emulators:start --only functions --ui"
+    "emulator:ui": "firebase emulators:start --only functions --ui",
+    "sync:config": "./scripts/sync-firebase-config.sh",
+    "sync-secrets": "ts-node src/scripts/sync-firebase-secrets-to-env.ts",
+    "diagnose": "ts-node src/scripts/comprehensive-diagnostic.ts",
+    "test:linkage": "ts-node src/scripts/test-linkage.ts",
+    "test:postgres": "ts-node src/scripts/test-postgres-connection.ts",
+    "test:job": "ts-node src/scripts/test-job-creation.ts",
+    "setup:jobs-table": "ts-node src/scripts/setup-processing-jobs-table.ts",
+    "monitor": "ts-node src/scripts/monitor-system.ts",
+    "test": "vitest run",
+    "test:watch": "vitest",
+    "test:coverage": "vitest run --coverage",
+    "test:pipeline": "ts-node src/scripts/test-complete-pipeline.ts",
+    "check:pipeline": "ts-node src/scripts/check-pipeline-readiness.ts",
+    "logs:cloud": "ts-node src/scripts/fetch-cloud-run-logs.ts"
  },
  "dependencies": {
    "@anthropic-ai/sdk": "^0.57.0",
    "@google-cloud/documentai": "^9.3.0",
+    "@google-cloud/functions-framework": "^3.4.0",
    "@google-cloud/storage": "^7.16.0",
    "@supabase/supabase-js": "^2.53.0",
    "@types/pdfkit": "^0.17.2",
@@ -36,20 +54,22 @@
    "express": "^4.18.2",
    "express-rate-limit": "^7.1.5",
    "firebase-admin": "^13.4.0",
-    "firebase-functions": "^6.4.0",
+    "firebase-functions": "^7.0.5",
    "helmet": "^7.1.0",
    "joi": "^17.11.0",
    "jsonwebtoken": "^9.0.2",
    "morgan": "^1.10.0",
+    "nodemailer": "^8.0.1",
    "openai": "^5.10.2",
+    "pdf-lib": "^1.17.1",
    "pdf-parse": "^1.1.1",
    "pdfkit": "^0.17.1",
    "pg": "^8.11.3",
    "puppeteer": "^21.11.0",
-    "redis": "^4.6.10",
    "uuid": "^11.1.0",
    "winston": "^3.11.0",
-    "zod": "^3.25.76"
+    "zod": "^3.25.76",
+    "zod-to-json-schema": "^3.24.6"
  },
  "devDependencies": {
    "@types/bcryptjs": "^2.4.6",
@@ -58,13 +78,17 @@
    "@types/jsonwebtoken": "^9.0.5",
    "@types/morgan": "^1.9.9",
    "@types/node": "^20.9.0",
+    "@types/nodemailer": "^7.0.11",
    "@types/pdf-parse": "^1.1.4",
    "@types/pg": "^8.10.7",
    "@types/uuid": "^10.0.0",
    "@typescript-eslint/eslint-plugin": "^6.10.0",
    "@typescript-eslint/parser": "^6.10.0",
+    "@vitest/coverage-v8": "^2.1.0",
    "eslint": "^8.53.0",
+    "ts-node": "^10.9.2",
    "ts-node-dev": "^2.0.0",
-    "typescript": "^5.2.2"
+    "typescript": "^5.2.2",
+    "vitest": "^2.1.0"
  }
 }
--- a/backend/scripts/clean-env-secrets.sh
+++ b/backend/scripts/clean-env-secrets.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# Remove secrets from .env file that should only be Firebase Secrets
+# This prevents conflicts during deployment
+
+set -e
+
+if [ ! -f .env ]; then
+  echo "No .env file found"
+  exit 0
+fi
+
+# List of secrets to remove from .env
+SECRETS=(
+  "ANTHROPIC_API_KEY"
+  "OPENAI_API_KEY"
+  "OPENROUTER_API_KEY"
+  "DATABASE_URL"
+  "SUPABASE_SERVICE_KEY"
+  "SUPABASE_ANON_KEY"
+  "EMAIL_PASS"
+)
+
+echo "🧹 Cleaning secrets from .env file..."
+
+BACKUP_FILE=".env.pre-clean-$(date +%Y%m%d-%H%M%S).bak"
+cp .env "$BACKUP_FILE"
+echo "📋 Backup created: $BACKUP_FILE"
+
+REMOVED=0
+for secret in "${SECRETS[@]}"; do
+  if grep -q "^${secret}=" .env; then
+    # Remove the line (including commented versions)
+    sed -i.tmp "/^#*${secret}=/d" .env
+    rm -f .env.tmp
+    echo "  ✅ Removed ${secret}"
+    REMOVED=$((REMOVED + 1))
+  fi
+done
+
+if [ $REMOVED -gt 0 ]; then
+  echo ""
+  echo "✅ Removed ${REMOVED} secret(s) from .env"
+  echo "💡 For local development, use: npm run sync-secrets"
+else
+  echo "✅ No secrets found in .env (already clean)"
+  rm "$BACKUP_FILE"
+fi
+
--- a/backend/scripts/pre-deploy-check.sh
+++ b/backend/scripts/pre-deploy-check.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# Pre-deployment validation script
+# Checks for environment variable conflicts before deploying Firebase Functions
+
+set -e
+
+echo "🔍 Pre-deployment validation..."
+
+# List of secrets that should NOT be in .env
+SECRETS=(
+  "ANTHROPIC_API_KEY"
+  "OPENAI_API_KEY"
+  "OPENROUTER_API_KEY"
+  "DATABASE_URL"
+  "SUPABASE_SERVICE_KEY"
+  "SUPABASE_ANON_KEY"
+  "EMAIL_PASS"
+)
+
+CONFLICTS=0
+
+if [ -f .env ]; then
+  echo "Checking .env file for secret conflicts..."
+  
+  for secret in "${SECRETS[@]}"; do
+    if grep -q "^${secret}=" .env; then
+      echo "⚠️  CONFLICT: ${secret} is in .env but should only be a Firebase Secret"
+      CONFLICTS=$((CONFLICTS + 1))
+    fi
+  done
+  
+  if [ $CONFLICTS -gt 0 ]; then
+    echo ""
+    echo "❌ Found ${CONFLICTS} conflict(s). Please remove these from .env:"
+    echo ""
+    echo "For local development, use: npm run sync-secrets"
+    echo "This will temporarily add secrets to .env for local testing."
+    echo ""
+    echo "To fix now, run: npm run clean-env-secrets"
+    exit 1
+  fi
+else
+  echo "✅ No .env file found (this is fine for deployment)"
+fi
+
+echo "✅ Pre-deployment check passed!"
+exit 0
+
--- a/backend/sql/alter_processing_jobs_table.sql
+++ b/backend/sql/alter_processing_jobs_table.sql
@@ -0,0 +1,60 @@
+-- Add missing columns to existing processing_jobs table
+-- This aligns the existing table with what the new code expects
+
+-- Add attempts column (tracks retry attempts)
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS attempts INTEGER NOT NULL DEFAULT 0;
+
+-- Add max_attempts column (maximum retry attempts allowed)
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS max_attempts INTEGER NOT NULL DEFAULT 3;
+
+-- Add options column (stores processing configuration as JSON)
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS options JSONB;
+
+-- Add last_error_at column (timestamp of last error)
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS last_error_at TIMESTAMP WITH TIME ZONE;
+
+-- Add error column (current error message)
+-- Note: This will coexist with error_message, we can migrate data later
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS error TEXT;
+
+-- Add result column (stores processing result as JSON)
+ALTER TABLE processing_jobs
+ADD COLUMN IF NOT EXISTS result JSONB;
+
+-- Update status column to include new statuses
+-- Note: Can't modify CHECK constraint easily, so we'll just document the new values
+-- Existing statuses: pending, processing, completed, failed
+-- New status: retrying
+
+-- Create index on last_error_at for efficient retryable job queries
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_last_error_at
+ON processing_jobs(last_error_at)
+WHERE status = 'retrying';
+
+-- Create index on attempts for monitoring
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_attempts
+ON processing_jobs(attempts);
+
+-- Comments for documentation
+COMMENT ON COLUMN processing_jobs.attempts IS 'Number of processing attempts made';
+COMMENT ON COLUMN processing_jobs.max_attempts IS 'Maximum number of retry attempts allowed';
+COMMENT ON COLUMN processing_jobs.options IS 'Processing options and configuration (JSON)';
+COMMENT ON COLUMN processing_jobs.last_error_at IS 'Timestamp of last error occurrence';
+COMMENT ON COLUMN processing_jobs.error IS 'Current error message (new format)';
+COMMENT ON COLUMN processing_jobs.result IS 'Processing result data (JSON)';
+
+-- Verify the changes
+SELECT
+  column_name,
+  data_type,
+  is_nullable,
+  column_default
+FROM information_schema.columns
+WHERE table_name = 'processing_jobs'
+AND table_schema = 'public'
+ORDER BY ordinal_position;
--- a/backend/sql/check-rls-policies.sql
+++ b/backend/sql/check-rls-policies.sql
@@ -0,0 +1,25 @@
+-- Check RLS status and policies on documents table
+SELECT 
+  tablename,
+  rowsecurity as rls_enabled
+FROM pg_tables 
+WHERE schemaname = 'public' 
+  AND tablename IN ('documents', 'processing_jobs');
+
+-- Check RLS policies on documents
+SELECT 
+  schemaname,
+  tablename,
+  policyname,
+  permissive,
+  roles,
+  cmd,
+  qual,
+  with_check
+FROM pg_policies
+WHERE tablename IN ('documents', 'processing_jobs')
+ORDER BY tablename, policyname;
+
+-- Check current role
+SELECT current_user, current_role, session_user;
+
--- a/backend/sql/check_table_sizes.sql
+++ b/backend/sql/check_table_sizes.sql
@@ -0,0 +1,109 @@
+-- ============================================================
+-- CHECK TABLE SIZES - Run in Supabase SQL Editor
+-- ============================================================
+-- Part 1: Shows all public tables with sizes (auto-discovers)
+-- Part 2: Cleanup candidate counts (only for tables that exist)
+-- ============================================================
+
+-- PART 1: All public table sizes
+SELECT
+  c.relname AS table_name,
+  pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
+  pg_size_pretty(pg_relation_size(c.oid)) AS data_size,
+  pg_size_pretty(pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) AS index_size,
+  c.reltuples::bigint AS estimated_rows
+FROM pg_class c
+JOIN pg_namespace n ON n.oid = c.relnamespace
+WHERE n.nspname = 'public'
+  AND c.relkind = 'r'
+ORDER BY pg_total_relation_size(c.oid) DESC;
+
+-- PART 2: Cleanup candidates (safe — checks table existence before querying)
+DO $$
+DECLARE
+  rec RECORD;
+  row_count bigint;
+  cleanup_count bigint;
+  query text;
+BEGIN
+  RAISE NOTICE '--- CLEANUP CANDIDATE BREAKDOWN ---';
+
+  -- Processing jobs
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'processing_jobs') THEN
+    SELECT count(*), count(*) FILTER (WHERE status IN ('completed', 'failed') AND completed_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM processing_jobs;
+    RAISE NOTICE 'processing_jobs: % total, % cleanup candidates (completed/failed > 30d)', row_count, cleanup_count;
+  END IF;
+
+  -- Vector similarity searches
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'vector_similarity_searches') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '90 days')
+    INTO row_count, cleanup_count FROM vector_similarity_searches;
+    RAISE NOTICE 'vector_similarity_searches: % total, % cleanup candidates (> 90d)', row_count, cleanup_count;
+  END IF;
+
+  -- Session events
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'session_events') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM session_events;
+    RAISE NOTICE 'session_events: % total, % cleanup candidates (> 30d)', row_count, cleanup_count;
+  END IF;
+
+  -- Execution events
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'execution_events') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM execution_events;
+    RAISE NOTICE 'execution_events: % total, % cleanup candidates (> 30d)', row_count, cleanup_count;
+  END IF;
+
+  -- Performance metrics
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'performance_metrics') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '90 days')
+    INTO row_count, cleanup_count FROM performance_metrics;
+    RAISE NOTICE 'performance_metrics: % total, % cleanup candidates (> 90d)', row_count, cleanup_count;
+  END IF;
+
+  -- Service health checks
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'service_health_checks') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM service_health_checks;
+    RAISE NOTICE 'service_health_checks: % total, % cleanup candidates (> 30d)', row_count, cleanup_count;
+  END IF;
+
+  -- Alert events
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'alert_events') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM alert_events;
+    RAISE NOTICE 'alert_events: % total, % cleanup candidates (> 30d)', row_count, cleanup_count;
+  END IF;
+
+  -- Agent executions
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'agent_executions') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '90 days')
+    INTO row_count, cleanup_count FROM agent_executions;
+    RAISE NOTICE 'agent_executions: % total, % cleanup candidates (> 90d)', row_count, cleanup_count;
+  END IF;
+
+  -- Agentic RAG sessions
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'agentic_rag_sessions') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '90 days')
+    INTO row_count, cleanup_count FROM agentic_rag_sessions;
+    RAISE NOTICE 'agentic_rag_sessions: % total, % cleanup candidates (> 90d)', row_count, cleanup_count;
+  END IF;
+
+  -- Processing quality metrics
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'processing_quality_metrics') THEN
+    SELECT count(*), count(*) FILTER (WHERE created_at < NOW() - INTERVAL '90 days')
+    INTO row_count, cleanup_count FROM processing_quality_metrics;
+    RAISE NOTICE 'processing_quality_metrics: % total, % cleanup candidates (> 90d)', row_count, cleanup_count;
+  END IF;
+
+  -- Documents extracted_text
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'documents') THEN
+    SELECT count(*), count(*) FILTER (WHERE status = 'completed' AND analysis_data IS NOT NULL AND extracted_text IS NOT NULL AND created_at < NOW() - INTERVAL '30 days')
+    INTO row_count, cleanup_count FROM documents;
+    RAISE NOTICE 'documents (extracted_text nullable): % total, % cleanup candidates (completed > 30d with analysis_data)', row_count, cleanup_count;
+  END IF;
+
+  RAISE NOTICE '--- END CLEANUP BREAKDOWN ---';
+END $$;
--- a/backend/sql/cleanup_old_data.sql
+++ b/backend/sql/cleanup_old_data.sql
@@ -0,0 +1,102 @@
+-- ============================================================
+-- CLEANUP OLD DATA - Run in Supabase SQL Editor
+-- ============================================================
+-- Removes stale data that accumulates over time without
+-- impacting application functionality.
+--
+-- SAFE TO RUN: All deleted data is either intermediate
+-- processing artifacts or analytics logs. Core document
+-- data (documents, document_chunks, analysis_data) is
+-- never touched by DELETE statements.
+--
+-- Skips tables that don't exist yet (safe for any state).
+--
+-- RECOMMENDATION: Run the check_table_sizes.sql query first
+-- to see how much data will be affected.
+-- ============================================================
+
+DO $$
+DECLARE
+  deleted bigint;
+BEGIN
+
+  -- 1. Processing jobs: completed/failed older than 30 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'processing_jobs') THEN
+    DELETE FROM processing_jobs WHERE status IN ('completed', 'failed') AND completed_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'processing_jobs: deleted % rows', deleted;
+  END IF;
+
+  -- 2. Execution events: older than 30 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'execution_events') THEN
+    DELETE FROM execution_events WHERE created_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'execution_events: deleted % rows', deleted;
+  END IF;
+
+  -- 3. Session events: older than 30 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'session_events') THEN
+    DELETE FROM session_events WHERE created_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'session_events: deleted % rows', deleted;
+  END IF;
+
+  -- 4. Performance metrics: older than 90 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'performance_metrics') THEN
+    DELETE FROM performance_metrics WHERE created_at < NOW() - INTERVAL '90 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'performance_metrics: deleted % rows', deleted;
+  END IF;
+
+  -- 5. Vector similarity searches: older than 90 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'vector_similarity_searches') THEN
+    DELETE FROM vector_similarity_searches WHERE created_at < NOW() - INTERVAL '90 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'vector_similarity_searches: deleted % rows', deleted;
+  END IF;
+
+  -- 6. Service health checks: older than 30 days (INFR-01)
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'service_health_checks') THEN
+    DELETE FROM service_health_checks WHERE created_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'service_health_checks: deleted % rows', deleted;
+  END IF;
+
+  -- 7. Alert events: resolved older than 30 days (INFR-01)
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'alert_events') THEN
+    DELETE FROM alert_events WHERE status = 'resolved' AND created_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'alert_events: deleted % rows', deleted;
+  END IF;
+
+  -- 8. Agent executions: older than 90 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'agent_executions') THEN
+    DELETE FROM agent_executions WHERE created_at < NOW() - INTERVAL '90 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'agent_executions: deleted % rows', deleted;
+  END IF;
+
+  -- 9. Processing quality metrics: older than 90 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'processing_quality_metrics') THEN
+    DELETE FROM processing_quality_metrics WHERE created_at < NOW() - INTERVAL '90 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'processing_quality_metrics: deleted % rows', deleted;
+  END IF;
+
+  -- 10. Agentic RAG sessions: completed older than 90 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'agentic_rag_sessions') THEN
+    DELETE FROM agentic_rag_sessions WHERE status IN ('completed', 'failed') AND created_at < NOW() - INTERVAL '90 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'agentic_rag_sessions: deleted % rows', deleted;
+  END IF;
+
+  -- 11. Null out extracted_text for completed documents older than 30 days
+  IF EXISTS (SELECT 1 FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'documents') THEN
+    UPDATE documents SET extracted_text = NULL
+    WHERE status = 'completed' AND analysis_data IS NOT NULL AND extracted_text IS NOT NULL AND created_at < NOW() - INTERVAL '30 days';
+    GET DIAGNOSTICS deleted = ROW_COUNT;
+    RAISE NOTICE 'documents extracted_text nulled: % rows', deleted;
+  END IF;
+
+  RAISE NOTICE '--- CLEANUP COMPLETE ---';
+END $$;
--- a/backend/sql/complete_database_setup.sql
+++ b/backend/sql/complete_database_setup.sql
@@ -0,0 +1,96 @@
+-- Complete Database Setup for CIM Summarizer
+-- Run this in Supabase SQL Editor to create all necessary tables
+
+-- 1. Create users table
+CREATE TABLE IF NOT EXISTS users (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    firebase_uid VARCHAR(255) UNIQUE NOT NULL,
+    email VARCHAR(255) UNIQUE NOT NULL,
+    display_name VARCHAR(255),
+    photo_url VARCHAR(1000),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    last_login_at TIMESTAMP WITH TIME ZONE
+);
+
+CREATE INDEX IF NOT EXISTS idx_users_firebase_uid ON users(firebase_uid);
+CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
+
+-- 2. Create update_updated_at_column function (needed for triggers)
+CREATE OR REPLACE FUNCTION update_updated_at_column()
+RETURNS TRIGGER AS $$
+BEGIN
+    NEW.updated_at = CURRENT_TIMESTAMP;
+    RETURN NEW;
+END;
+$$ language 'plpgsql';
+
+-- 3. Create documents table
+CREATE TABLE IF NOT EXISTS documents (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    user_id VARCHAR(255) NOT NULL, -- Changed from UUID to VARCHAR to match Firebase UID
+    original_file_name VARCHAR(500) NOT NULL,
+    file_path VARCHAR(1000) NOT NULL,
+    file_size BIGINT NOT NULL CHECK (file_size > 0),
+    uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    status VARCHAR(50) NOT NULL DEFAULT 'uploaded' CHECK (status IN ('uploading', 'uploaded', 'extracting_text', 'processing_llm', 'generating_pdf', 'completed', 'failed')),
+    extracted_text TEXT,
+    generated_summary TEXT,
+    summary_markdown_path VARCHAR(1000),
+    summary_pdf_path VARCHAR(1000),
+    processing_started_at TIMESTAMP WITH TIME ZONE,
+    processing_completed_at TIMESTAMP WITH TIME ZONE,
+    error_message TEXT,
+    analysis_data JSONB, -- Added for storing analysis results
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_documents_user_id ON documents(user_id);
+CREATE INDEX IF NOT EXISTS idx_documents_status ON documents(status);
+CREATE INDEX IF NOT EXISTS idx_documents_uploaded_at ON documents(uploaded_at);
+CREATE INDEX IF NOT EXISTS idx_documents_processing_completed_at ON documents(processing_completed_at);
+CREATE INDEX IF NOT EXISTS idx_documents_user_status ON documents(user_id, status);
+
+CREATE TRIGGER update_documents_updated_at
+    BEFORE UPDATE ON documents
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+-- 4. Create processing_jobs table
+CREATE TABLE IF NOT EXISTS processing_jobs (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+    user_id VARCHAR(255) NOT NULL,
+    status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed', 'retrying')),
+    attempts INTEGER NOT NULL DEFAULT 0,
+    max_attempts INTEGER NOT NULL DEFAULT 3,
+    options JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    started_at TIMESTAMP WITH TIME ZONE,
+    completed_at TIMESTAMP WITH TIME ZONE,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    error TEXT,
+    last_error_at TIMESTAMP WITH TIME ZONE,
+    result JSONB
+);
+
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_status ON processing_jobs(status);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_created_at ON processing_jobs(created_at);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_document_id ON processing_jobs(document_id);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_user_id ON processing_jobs(user_id);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_pending ON processing_jobs(status, created_at) WHERE status = 'pending';
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_last_error_at ON processing_jobs(last_error_at) WHERE status = 'retrying';
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_attempts ON processing_jobs(attempts);
+
+CREATE TRIGGER update_processing_jobs_updated_at
+    BEFORE UPDATE ON processing_jobs
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+-- Verify all tables were created
+SELECT table_name
+FROM information_schema.tables
+WHERE table_schema = 'public'
+AND table_name IN ('users', 'documents', 'processing_jobs')
+ORDER BY table_name;
--- a/backend/sql/create-job-bypass-rls-fk.sql
+++ b/backend/sql/create-job-bypass-rls-fk.sql
@@ -0,0 +1,76 @@
+-- Create job bypassing RLS foreign key check
+-- This uses a SECURITY DEFINER function to bypass RLS
+
+-- Step 1: Create a function that bypasses RLS
+CREATE OR REPLACE FUNCTION create_processing_job(
+  p_document_id UUID,
+  p_user_id TEXT,
+  p_options JSONB DEFAULT '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  p_max_attempts INTEGER DEFAULT 3
+)
+RETURNS TABLE (
+  job_id UUID,
+  document_id UUID,
+  status TEXT,
+  created_at TIMESTAMP WITH TIME ZONE
+)
+LANGUAGE plpgsql
+SECURITY DEFINER
+SET search_path = public
+AS $$
+DECLARE
+  v_job_id UUID;
+BEGIN
+  -- Insert job (bypasses RLS due to SECURITY DEFINER)
+  INSERT INTO processing_jobs (
+    document_id,
+    user_id,
+    status,
+    attempts,
+    max_attempts,
+    options,
+    created_at
+  ) VALUES (
+    p_document_id,
+    p_user_id,
+    'pending',
+    0,
+    p_max_attempts,
+    p_options,
+    NOW()
+  )
+  RETURNING id INTO v_job_id;
+  
+  -- Return the created job
+  RETURN QUERY
+  SELECT 
+    pj.id,
+    pj.document_id,
+    pj.status,
+    pj.created_at
+  FROM processing_jobs pj
+  WHERE pj.id = v_job_id;
+END;
+$$;
+
+-- Step 2: Grant execute permission
+GRANT EXECUTE ON FUNCTION create_processing_job TO postgres, authenticated, anon, service_role;
+
+-- Step 3: Use the function to create the job
+SELECT * FROM create_processing_job(
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  3
+);
+
+-- Step 4: Verify job was created
+SELECT 
+  id,
+  document_id,
+  status,
+  created_at
+FROM processing_jobs
+WHERE document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid
+ORDER BY created_at DESC;
+
--- a/backend/sql/create-job-bypass-rls.sql
+++ b/backend/sql/create-job-bypass-rls.sql
@@ -0,0 +1,41 @@
+-- Create job for processing document
+-- This bypasses RLS by using service role or direct insert
+-- The document ID and user_id are from Supabase client query
+
+-- Option 1: If RLS is blocking, disable it temporarily (run as superuser)
+SET ROLE postgres;
+
+-- Create job directly (use the exact IDs from Supabase client)
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,  -- Document ID from Supabase client
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',                 -- User ID from Supabase client
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+)
+ON CONFLICT DO NOTHING  -- In case job already exists
+RETURNING id, document_id, status, created_at;
+
+-- Reset role
+RESET ROLE;
+
+-- Verify job was created
+SELECT 
+  pj.id as job_id,
+  pj.document_id,
+  pj.status as job_status,
+  pj.created_at
+FROM processing_jobs pj
+WHERE pj.document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid
+ORDER BY pj.created_at DESC;
+
--- a/backend/sql/create-job-for-existing-documents.sql
+++ b/backend/sql/create-job-for-existing-documents.sql
@@ -0,0 +1,51 @@
+-- Create jobs for all documents stuck in processing_llm status
+-- This will find all stuck documents and create jobs for them
+
+-- First, find all stuck documents
+SELECT 
+  id,
+  user_id,
+  status,
+  original_file_name,
+  updated_at
+FROM documents
+WHERE status = 'processing_llm'
+ORDER BY updated_at ASC;
+
+-- Then create jobs for each document (replace DOCUMENT_ID and USER_ID)
+-- Run this for each document found above:
+
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) 
+SELECT 
+  id as document_id,
+  user_id,
+  'pending' as status,
+  0 as attempts,
+  3 as max_attempts,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb as options,
+  NOW() as created_at
+FROM documents
+WHERE status = 'processing_llm'
+  AND id NOT IN (SELECT document_id FROM processing_jobs WHERE status IN ('pending', 'processing', 'retrying'))
+RETURNING id, document_id, status, created_at;
+
+-- Verify jobs were created
+SELECT 
+  pj.id as job_id,
+  pj.document_id,
+  pj.status as job_status,
+  d.original_file_name,
+  pj.created_at
+FROM processing_jobs pj
+JOIN documents d ON d.id = pj.document_id
+WHERE pj.status = 'pending'
+ORDER BY pj.created_at DESC;
+
--- a/backend/sql/create-job-manually.sql
+++ b/backend/sql/create-job-manually.sql
@@ -0,0 +1,28 @@
+-- Manual Job Creation for Stuck Document
+-- Use this if PostgREST schema cache won't refresh
+
+-- Create job for stuck document
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d',
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+) RETURNING id, document_id, status, created_at;
+
+-- Verify job was created
+SELECT id, document_id, status, created_at 
+FROM processing_jobs 
+WHERE document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'
+ORDER BY created_at DESC;
+
--- a/backend/sql/create-job-safe.sql
+++ b/backend/sql/create-job-safe.sql
@@ -0,0 +1,52 @@
+-- Safe job creation - finds document and creates job in one query
+-- This avoids foreign key issues by using a subquery
+
+-- First, verify the document exists
+SELECT 
+  id,
+  user_id,
+  status,
+  original_file_name
+FROM documents
+WHERE id = '78359b58-762c-4a68-a8e4-17ce38580a8d';
+
+-- If document exists, create job using subquery
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+)
+SELECT 
+  d.id as document_id,
+  d.user_id,
+  'pending' as status,
+  0 as attempts,
+  3 as max_attempts,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb as options,
+  NOW() as created_at
+FROM documents d
+WHERE d.id = '78359b58-762c-4a68-a8e4-17ce38580a8d'
+  AND d.status = 'processing_llm'
+  AND NOT EXISTS (
+    SELECT 1 FROM processing_jobs pj 
+    WHERE pj.document_id = d.id 
+    AND pj.status IN ('pending', 'processing', 'retrying')
+  )
+RETURNING id, document_id, status, created_at;
+
+-- Verify job was created
+SELECT 
+  pj.id as job_id,
+  pj.document_id,
+  pj.status as job_status,
+  d.original_file_name,
+  pj.created_at
+FROM processing_jobs pj
+JOIN documents d ON d.id = pj.document_id
+WHERE pj.document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'
+ORDER BY pj.created_at DESC;
+
--- a/backend/sql/create-job-temp-disable-fk.sql
+++ b/backend/sql/create-job-temp-disable-fk.sql
@@ -0,0 +1,49 @@
+-- Temporary workaround: Drop FK, create job, recreate FK
+-- This is safe because we know the document exists (verified via service client)
+-- The FK will be recreated to maintain data integrity
+
+-- Step 1: Drop FK constraint temporarily
+ALTER TABLE processing_jobs 
+DROP CONSTRAINT IF EXISTS processing_jobs_document_id_fkey;
+
+-- Step 2: Create the job
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+)
+RETURNING id, document_id, status, created_at;
+
+-- Step 3: Recreate FK constraint (with explicit schema)
+ALTER TABLE processing_jobs
+ADD CONSTRAINT processing_jobs_document_id_fkey
+FOREIGN KEY (document_id) 
+REFERENCES public.documents(id) 
+ON DELETE CASCADE;
+
+-- Step 4: Verify job was created
+SELECT 
+  id as job_id,
+  document_id,
+  status as job_status,
+  created_at
+FROM processing_jobs
+WHERE document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid
+ORDER BY created_at DESC;
+
+-- Note: The FK constraint will validate existing data when recreated
+-- If the document doesn't exist, the ALTER TABLE will fail at step 3
+-- But if it succeeds, we know the document exists and the job is valid
+
--- a/backend/sql/create-job-without-fk-check.sql
+++ b/backend/sql/create-job-without-fk-check.sql
@@ -0,0 +1,48 @@
+-- Create job without FK constraint check (temporary workaround)
+-- This disables FK validation temporarily, creates job, then re-enables
+
+-- Step 1: Disable FK constraint temporarily
+ALTER TABLE processing_jobs 
+DROP CONSTRAINT IF EXISTS processing_jobs_document_id_fkey;
+
+-- Step 2: Create the job
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+)
+RETURNING id, document_id, status, created_at;
+
+-- Step 3: Recreate FK constraint (but make it DEFERRABLE so it checks later)
+ALTER TABLE processing_jobs
+ADD CONSTRAINT processing_jobs_document_id_fkey
+FOREIGN KEY (document_id) 
+REFERENCES public.documents(id) 
+ON DELETE CASCADE
+DEFERRABLE INITIALLY DEFERRED;
+
+-- Note: DEFERRABLE INITIALLY DEFERRED means FK is checked at end of transaction
+-- This allows creating jobs even if document visibility is temporarily blocked
+
+-- Step 4: Verify job was created
+SELECT 
+  id,
+  document_id,
+  status,
+  created_at
+FROM processing_jobs
+WHERE document_id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid
+ORDER BY created_at DESC;
+
--- a/backend/sql/create_processing_jobs_table.sql
+++ b/backend/sql/create_processing_jobs_table.sql
@@ -0,0 +1,77 @@
+-- Processing Jobs Table
+-- This table stores document processing jobs that need to be executed
+-- Replaces the in-memory job queue with persistent database storage
+
+CREATE TABLE IF NOT EXISTS processing_jobs (
+  -- Primary key
+  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+
+  -- Job data
+  document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+  user_id TEXT NOT NULL,
+
+  -- Job status and progress
+  status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'completed', 'failed', 'retrying')),
+  attempts INTEGER NOT NULL DEFAULT 0,
+  max_attempts INTEGER NOT NULL DEFAULT 3,
+
+  -- Processing options (stored as JSONB)
+  options JSONB,
+
+  -- Timestamps
+  created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
+  started_at TIMESTAMP WITH TIME ZONE,
+  completed_at TIMESTAMP WITH TIME ZONE,
+  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+
+  -- Error tracking
+  error TEXT,
+  last_error_at TIMESTAMP WITH TIME ZONE,
+
+  -- Result storage
+  result JSONB
+);
+
+-- Indexes for efficient querying
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_status ON processing_jobs(status);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_created_at ON processing_jobs(created_at);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_document_id ON processing_jobs(document_id);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_user_id ON processing_jobs(user_id);
+CREATE INDEX IF NOT EXISTS idx_processing_jobs_pending ON processing_jobs(status, created_at) WHERE status = 'pending';
+
+-- Function to automatically update updated_at timestamp
+CREATE OR REPLACE FUNCTION update_processing_jobs_updated_at()
+RETURNS TRIGGER AS $$
+BEGIN
+  NEW.updated_at = NOW();
+  RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Trigger to call the update function
+DROP TRIGGER IF EXISTS set_processing_jobs_updated_at ON processing_jobs;
+CREATE TRIGGER set_processing_jobs_updated_at
+  BEFORE UPDATE ON processing_jobs
+  FOR EACH ROW
+  EXECUTE FUNCTION update_processing_jobs_updated_at();
+
+-- Grant permissions (adjust role name as needed)
+-- ALTER TABLE processing_jobs ENABLE ROW LEVEL SECURITY;
+
+-- Optional: Create a view for monitoring
+CREATE OR REPLACE VIEW processing_jobs_summary AS
+SELECT
+  status,
+  COUNT(*) as count,
+  AVG(EXTRACT(EPOCH FROM (COALESCE(completed_at, NOW()) - created_at))) as avg_duration_seconds,
+  MAX(created_at) as latest_created_at
+FROM processing_jobs
+GROUP BY status;
+
+-- Comments for documentation
+COMMENT ON TABLE processing_jobs IS 'Stores document processing jobs for async background processing';
+COMMENT ON COLUMN processing_jobs.status IS 'Current status: pending, processing, completed, failed, retrying';
+COMMENT ON COLUMN processing_jobs.attempts IS 'Number of processing attempts made';
+COMMENT ON COLUMN processing_jobs.max_attempts IS 'Maximum number of retry attempts allowed';
+COMMENT ON COLUMN processing_jobs.options IS 'Processing options and configuration (JSON)';
+COMMENT ON COLUMN processing_jobs.error IS 'Last error message if processing failed';
--- a/backend/sql/create_vector_store.sql
+++ b/backend/sql/create_vector_store.sql
@@ -0,0 +1,57 @@
+-- Enable the pgvector extension
+CREATE EXTENSION IF NOT EXISTS vector;
+
+-- 1. Create document_chunks table
+CREATE TABLE IF NOT EXISTS document_chunks (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+    content TEXT NOT NULL,
+    embedding VECTOR(1536), -- OpenAI text-embedding-3-small uses 1536 dimensions
+    metadata JSONB,
+    chunk_index INTEGER NOT NULL,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_document_chunks_document_id ON document_chunks(document_id);
+CREATE INDEX IF NOT EXISTS idx_document_chunks_created_at ON document_chunks(created_at);
+
+-- Use IVFFlat index for faster similarity search
+CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops)
+WITH (lists = 100);
+
+
+-- 2. Create match_document_chunks function
+CREATE OR REPLACE FUNCTION match_document_chunks (
+  query_embedding vector(1536),
+  match_threshold float,
+  match_count int
+)
+RETURNS TABLE (
+  id UUID,
+  document_id UUID,
+  content text,
+  metadata JSONB,
+  chunk_index INT,
+  similarity float
+)
+LANGUAGE sql STABLE
+AS $$
+  SELECT
+    document_chunks.id,
+    document_chunks.document_id,
+    document_chunks.content,
+    document_chunks.metadata,
+    document_chunks.chunk_index,
+    1 - (document_chunks.embedding <=> query_embedding) AS similarity
+  FROM document_chunks
+  WHERE 1 - (document_chunks.embedding <=> query_embedding) > match_threshold
+  ORDER BY similarity DESC
+  LIMIT match_count;
+$$;
+
+-- 3. Create trigger for updated_at
+CREATE TRIGGER update_document_chunks_updated_at
+    BEFORE UPDATE ON document_chunks
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
--- a/backend/sql/debug-foreign-key.sql
+++ b/backend/sql/debug-foreign-key.sql
@@ -0,0 +1,56 @@
+-- Debug foreign key constraint and document existence
+
+-- 1. Check if document exists (bypassing RLS with service role context)
+SELECT id, user_id, status 
+FROM documents 
+WHERE id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid;
+
+-- 2. Check foreign key constraint definition
+SELECT
+    tc.constraint_name,
+    tc.table_name,
+    kcu.column_name,
+    ccu.table_name AS foreign_table_name,
+    ccu.column_name AS foreign_column_name,
+    tc.constraint_type
+FROM information_schema.table_constraints AS tc
+JOIN information_schema.key_column_usage AS kcu
+  ON tc.constraint_name = kcu.constraint_name
+  AND tc.table_schema = kcu.table_schema
+JOIN information_schema.constraint_column_usage AS ccu
+  ON ccu.constraint_name = tc.constraint_name
+  AND ccu.table_schema = tc.table_schema
+WHERE tc.constraint_type = 'FOREIGN KEY'
+  AND tc.table_name = 'processing_jobs'
+  AND kcu.column_name = 'document_id';
+
+-- 3. Check if document exists in different ways
+-- Direct query (should work with SECURITY DEFINER)
+DO $$
+DECLARE
+  v_doc_id UUID := '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid;
+  v_exists BOOLEAN;
+BEGIN
+  SELECT EXISTS(
+    SELECT 1 FROM documents WHERE id = v_doc_id
+  ) INTO v_exists;
+  
+  RAISE NOTICE 'Document exists: %', v_exists;
+  
+  IF NOT v_exists THEN
+    RAISE NOTICE 'Document does not exist in database!';
+    RAISE NOTICE 'This explains the foreign key constraint failure.';
+  END IF;
+END $$;
+
+-- 4. Check table schema
+SELECT 
+  table_name,
+  column_name,
+  data_type,
+  is_nullable
+FROM information_schema.columns
+WHERE table_name = 'documents'
+  AND column_name = 'id'
+ORDER BY ordinal_position;
+
--- a/backend/sql/enable_sql_execution.sql
+++ b/backend/sql/enable_sql_execution.sql
@@ -0,0 +1,6 @@
+CREATE OR REPLACE FUNCTION execute_sql(sql_statement TEXT)
+RETURNS void AS $$
+BEGIN
+  EXECUTE sql_statement;
+END;
+$$ LANGUAGE plpgsql;
--- a/backend/sql/find-all-processing-documents.sql
+++ b/backend/sql/find-all-processing-documents.sql
@@ -0,0 +1,36 @@
+-- Find all documents that need processing
+-- Run this to see what documents exist and their status
+
+-- All documents in processing status
+SELECT 
+  id,
+  user_id,
+  status,
+  original_file_name,
+  created_at,
+  updated_at
+FROM documents
+WHERE status IN ('processing', 'processing_llm', 'uploading', 'extracting_text')
+ORDER BY updated_at DESC;
+
+-- Count by status
+SELECT 
+  status,
+  COUNT(*) as count
+FROM documents
+GROUP BY status
+ORDER BY count DESC;
+
+-- Documents stuck in processing (updated more than 10 minutes ago)
+SELECT 
+  id,
+  user_id,
+  status,
+  original_file_name,
+  updated_at,
+  NOW() - updated_at as time_since_update
+FROM documents
+WHERE status IN ('processing', 'processing_llm')
+  AND updated_at < NOW() - INTERVAL '10 minutes'
+ORDER BY updated_at ASC;
+
--- a/backend/sql/fix-fk-with-schema.sql
+++ b/backend/sql/fix-fk-with-schema.sql
@@ -0,0 +1,60 @@
+-- Fix: Foreign key constraint may be checking wrong schema or table
+-- PostgreSQL FK checks happen at engine level and should bypass RLS
+-- But if the constraint points to wrong table, it will fail
+
+-- Step 1: Check FK constraint definition
+SELECT
+    tc.constraint_name,
+    tc.table_schema,
+    tc.table_name,
+    kcu.column_name,
+    ccu.table_schema AS foreign_table_schema,
+    ccu.table_name AS foreign_table_name,
+    ccu.column_name AS foreign_column_name
+FROM information_schema.table_constraints AS tc
+JOIN information_schema.key_column_usage AS kcu
+  ON tc.constraint_name = kcu.constraint_name
+  AND tc.table_schema = kcu.table_schema
+JOIN information_schema.constraint_column_usage AS ccu
+  ON ccu.constraint_name = tc.constraint_name
+  AND ccu.table_schema = tc.table_schema
+WHERE tc.constraint_type = 'FOREIGN KEY'
+  AND tc.table_name = 'processing_jobs'
+  AND kcu.column_name = 'document_id';
+
+-- Step 2: Check if document exists in public.documents (explicit schema)
+SELECT COUNT(*) as document_count
+FROM public.documents
+WHERE id = '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid;
+
+-- Step 3: Create job with explicit schema (if needed)
+-- First, let's try dropping and recreating the FK constraint with explicit schema
+ALTER TABLE processing_jobs 
+DROP CONSTRAINT IF EXISTS processing_jobs_document_id_fkey;
+
+ALTER TABLE processing_jobs
+ADD CONSTRAINT processing_jobs_document_id_fkey
+FOREIGN KEY (document_id) 
+REFERENCES public.documents(id) 
+ON DELETE CASCADE;
+
+-- Step 4: Now try creating the job
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+)
+RETURNING id, document_id, status, created_at;
+
--- a/backend/sql/fix-foreign-key-constraint.sql
+++ b/backend/sql/fix-foreign-key-constraint.sql
@@ -0,0 +1,45 @@
+-- Fix foreign key constraint issue
+-- If document doesn't exist, we need to either:
+-- 1. Create the document (if it was deleted)
+-- 2. Remove the foreign key constraint temporarily
+-- 3. Use a different approach
+
+-- Option 1: Check if we should drop and recreate FK constraint
+-- (This allows creating jobs even if document doesn't exist - useful for testing)
+
+-- First, let's see the constraint
+SELECT
+    conname as constraint_name,
+    conrelid::regclass as table_name,
+    confrelid::regclass as foreign_table_name
+FROM pg_constraint
+WHERE conname = 'processing_jobs_document_id_fkey';
+
+-- Option 2: Temporarily disable FK constraint (for testing only)
+-- WARNING: Only do this if you understand the implications
+-- ALTER TABLE processing_jobs DROP CONSTRAINT IF EXISTS processing_jobs_document_id_fkey;
+-- Then recreate later with:
+-- ALTER TABLE processing_jobs ADD CONSTRAINT processing_jobs_document_id_fkey 
+--   FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE;
+
+-- Option 3: Create job without FK constraint (if document truly doesn't exist)
+-- This is a workaround - the real fix is to ensure documents exist
+INSERT INTO processing_jobs (
+  document_id,
+  user_id,
+  status,
+  attempts,
+  max_attempts,
+  options,
+  created_at
+) VALUES (
+  '78359b58-762c-4a68-a8e4-17ce38580a8d'::uuid,
+  'B00HiMnleGhGdJgQwbX2Ume01Z53',
+  'pending',
+  0,
+  3,
+  '{"strategy": "document_ai_agentic_rag"}'::jsonb,
+  NOW()
+)
+ON CONFLICT DO NOTHING;
+
--- a/backend/sql/fix_vector_search_timeout.sql
+++ b/backend/sql/fix_vector_search_timeout.sql
@@ -0,0 +1,65 @@
+-- Fix vector search timeout by pre-filtering on document_id BEFORE vector search
+-- When document_id is provided, this avoids the full IVFFlat index scan (26K+ rows)
+-- and instead computes distances on only ~80 chunks per document.
+
+-- Drop old function signatures
+DROP FUNCTION IF EXISTS match_document_chunks(vector(1536), float, int);
+DROP FUNCTION IF EXISTS match_document_chunks(vector(1536), float, int, text);
+
+-- Create optimized function that branches based on whether document_id is provided
+CREATE OR REPLACE FUNCTION match_document_chunks (
+  query_embedding vector(1536),
+  match_threshold float,
+  match_count int,
+  filter_document_id text DEFAULT NULL
+)
+RETURNS TABLE (
+  id UUID,
+  document_id VARCHAR(255),
+  content text,
+  metadata JSONB,
+  chunk_index INT,
+  similarity float
+)
+LANGUAGE plpgsql STABLE
+AS $$
+BEGIN
+  IF filter_document_id IS NOT NULL THEN
+    -- FAST PATH: Pre-filter by document_id using btree index, then compute
+    -- vector distances on only that document's chunks (~80 rows).
+    -- This completely bypasses the IVFFlat index scan.
+    RETURN QUERY
+    SELECT
+      dc.id,
+      dc.document_id,
+      dc.content,
+      dc.metadata,
+      dc.chunk_index,
+      1 - (dc.embedding <=> query_embedding) AS similarity
+    FROM document_chunks dc
+    WHERE dc.document_id = filter_document_id
+      AND dc.embedding IS NOT NULL
+      AND 1 - (dc.embedding <=> query_embedding) > match_threshold
+    ORDER BY dc.embedding <=> query_embedding
+    LIMIT match_count;
+  ELSE
+    -- SLOW PATH: Search across all documents using IVFFlat index.
+    -- Only used when no document_id filter is provided.
+    RETURN QUERY
+    SELECT
+      dc.id,
+      dc.document_id,
+      dc.content,
+      dc.metadata,
+      dc.chunk_index,
+      1 - (dc.embedding <=> query_embedding) AS similarity
+    FROM document_chunks dc
+    WHERE dc.embedding IS NOT NULL
+      AND 1 - (dc.embedding <=> query_embedding) > match_threshold
+    ORDER BY dc.embedding <=> query_embedding
+    LIMIT match_count;
+  END IF;
+END;
+$$;
+
+COMMENT ON FUNCTION match_document_chunks IS 'Vector search with fast document-scoped path. When filter_document_id is provided, uses btree index to pre-filter (~80 rows) instead of scanning the full IVFFlat index (26K+ rows).';
--- a/backend/sql/minimal_setup.sql
+++ b/backend/sql/minimal_setup.sql
@@ -0,0 +1,84 @@
+-- Minimal Database Setup - Just what's needed for uploads to work
+-- This won't conflict with existing tables
+
+-- 1. Create update function if it doesn't exist
+CREATE OR REPLACE FUNCTION update_updated_at_column()
+RETURNS TRIGGER AS $$
+BEGIN
+    NEW.updated_at = CURRENT_TIMESTAMP;
+    RETURN NEW;
+END;
+$$ language 'plpgsql';
+
+-- 2. Drop and recreate documents table (to ensure clean state)
+DROP TABLE IF EXISTS processing_jobs CASCADE;
+DROP TABLE IF EXISTS documents CASCADE;
+
+-- 3. Create documents table (user_id as VARCHAR to match Firebase UID)
+CREATE TABLE documents (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    user_id VARCHAR(255) NOT NULL,
+    original_file_name VARCHAR(500) NOT NULL,
+    file_path VARCHAR(1000) NOT NULL,
+    file_size BIGINT NOT NULL CHECK (file_size > 0),
+    uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    status VARCHAR(50) NOT NULL DEFAULT 'uploaded',
+    extracted_text TEXT,
+    generated_summary TEXT,
+    summary_markdown_path VARCHAR(1000),
+    summary_pdf_path VARCHAR(1000),
+    processing_started_at TIMESTAMP WITH TIME ZONE,
+    processing_completed_at TIMESTAMP WITH TIME ZONE,
+    error_message TEXT,
+    analysis_data JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX idx_documents_user_id ON documents(user_id);
+CREATE INDEX idx_documents_status ON documents(status);
+CREATE INDEX idx_documents_uploaded_at ON documents(uploaded_at);
+CREATE INDEX idx_documents_user_status ON documents(user_id, status);
+
+CREATE TRIGGER update_documents_updated_at
+    BEFORE UPDATE ON documents
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+-- 4. Create processing_jobs table
+CREATE TABLE processing_jobs (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+    user_id VARCHAR(255) NOT NULL,
+    status VARCHAR(50) NOT NULL DEFAULT 'pending',
+    attempts INTEGER NOT NULL DEFAULT 0,
+    max_attempts INTEGER NOT NULL DEFAULT 3,
+    options JSONB,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    started_at TIMESTAMP WITH TIME ZONE,
+    completed_at TIMESTAMP WITH TIME ZONE,
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
+    error TEXT,
+    last_error_at TIMESTAMP WITH TIME ZONE,
+    result JSONB
+);
+
+CREATE INDEX idx_processing_jobs_status ON processing_jobs(status);
+CREATE INDEX idx_processing_jobs_created_at ON processing_jobs(created_at);
+CREATE INDEX idx_processing_jobs_document_id ON processing_jobs(document_id);
+CREATE INDEX idx_processing_jobs_user_id ON processing_jobs(user_id);
+CREATE INDEX idx_processing_jobs_pending ON processing_jobs(status, created_at) WHERE status = 'pending';
+
+CREATE TRIGGER update_processing_jobs_updated_at
+    BEFORE UPDATE ON processing_jobs
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+-- 5. Verify tables were created
+SELECT
+    table_name,
+    (SELECT COUNT(*) FROM information_schema.columns WHERE table_name = t.table_name) as column_count
+FROM information_schema.tables t
+WHERE table_schema = 'public'
+AND table_name IN ('documents', 'processing_jobs')
+ORDER BY table_name;
--- a/backend/sql/refresh_schema_cache.sql
+++ b/backend/sql/refresh_schema_cache.sql
@@ -0,0 +1,16 @@
+-- Refresh PostgREST Schema Cache
+-- Run this in Supabase SQL Editor to force PostgREST to reload the schema cache
+
+-- Method 1: Use NOTIFY (recommended)
+NOTIFY pgrst, 'reload schema';
+
+-- Method 2: Force refresh by making a dummy change
+ALTER TABLE processing_jobs ADD COLUMN IF NOT EXISTS _temp_refresh BOOLEAN DEFAULT FALSE;
+ALTER TABLE processing_jobs DROP COLUMN IF EXISTS _temp_refresh;
+
+-- Method 3: Update table comment (fixed syntax)
+DO $$
+BEGIN
+  EXECUTE 'COMMENT ON TABLE processing_jobs IS ''Stores document processing jobs - Cache refreshed at ' || NOW()::text || '''';
+END $$;
+
--- a/backend/sql/setup_pg_cron_cleanup.sql
+++ b/backend/sql/setup_pg_cron_cleanup.sql
@@ -0,0 +1,145 @@
+-- ============================================================
+-- ALTERNATIVE: PG_CRON AUTOMATED CLEANUP
+-- ============================================================
+-- NOTE: The primary cleanup runs as a Firebase scheduled
+-- function (cleanupOldData in index.ts). This pg_cron
+-- approach is an ALTERNATIVE if you prefer database-level
+-- scheduling instead.
+--
+-- Supabase includes pg_cron. This script creates scheduled
+-- jobs that automatically enforce retention policies.
+--
+-- PREREQUISITE: pg_cron extension must be enabled.
+-- Go to Supabase Dashboard → Database → Extensions → enable pg_cron
+--
+-- SCHEDULE: Runs daily at 03:00 UTC (off-peak)
+-- ============================================================
+
+-- Enable the pg_cron extension (if not already enabled)
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+
+-- Grant usage to postgres role (required on Supabase)
+GRANT USAGE ON SCHEMA cron TO postgres;
+
+-- ============================================================
+-- Create the cleanup function
+-- ============================================================
+CREATE OR REPLACE FUNCTION public.cleanup_old_data()
+RETURNS jsonb
+LANGUAGE plpgsql
+SECURITY DEFINER
+AS $$
+DECLARE
+  result jsonb := '{}'::jsonb;
+  deleted_count bigint;
+BEGIN
+  -- 1. Processing jobs: completed/failed older than 30 days
+  DELETE FROM processing_jobs
+  WHERE status IN ('completed', 'failed')
+    AND completed_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('processing_jobs', deleted_count);
+
+  -- 2. Execution events: older than 30 days
+  DELETE FROM execution_events
+  WHERE created_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('execution_events', deleted_count);
+
+  -- 3. Session events: older than 30 days
+  DELETE FROM session_events
+  WHERE created_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('session_events', deleted_count);
+
+  -- 4. Performance metrics: older than 90 days
+  DELETE FROM performance_metrics
+  WHERE created_at < NOW() - INTERVAL '90 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('performance_metrics', deleted_count);
+
+  -- 5. Vector similarity searches: older than 90 days
+  DELETE FROM vector_similarity_searches
+  WHERE created_at < NOW() - INTERVAL '90 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('vector_similarity_searches', deleted_count);
+
+  -- 6. Service health checks: older than 30 days (INFR-01)
+  DELETE FROM service_health_checks
+  WHERE created_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('service_health_checks', deleted_count);
+
+  -- 7. Alert events: resolved older than 30 days (INFR-01)
+  DELETE FROM alert_events
+  WHERE status = 'resolved'
+    AND created_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('alert_events', deleted_count);
+
+  -- 8. Agent executions: older than 90 days
+  DELETE FROM agent_executions
+  WHERE created_at < NOW() - INTERVAL '90 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('agent_executions', deleted_count);
+
+  -- 9. Processing quality metrics: older than 90 days
+  DELETE FROM processing_quality_metrics
+  WHERE created_at < NOW() - INTERVAL '90 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('processing_quality_metrics', deleted_count);
+
+  -- 10. Agentic RAG sessions: completed older than 90 days
+  DELETE FROM agentic_rag_sessions
+  WHERE status IN ('completed', 'failed')
+    AND created_at < NOW() - INTERVAL '90 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('agentic_rag_sessions', deleted_count);
+
+  -- 11. Null out extracted_text for completed documents older than 30 days
+  UPDATE documents
+  SET extracted_text = NULL
+  WHERE status = 'completed'
+    AND analysis_data IS NOT NULL
+    AND extracted_text IS NOT NULL
+    AND created_at < NOW() - INTERVAL '30 days';
+  GET DIAGNOSTICS deleted_count = ROW_COUNT;
+  result := result || jsonb_build_object('documents_text_nulled', deleted_count);
+
+  RETURN result;
+END;
+$$;
+
+-- ============================================================
+-- Schedule the cron job: daily at 03:00 UTC
+-- ============================================================
+SELECT cron.schedule(
+  'daily-cleanup-old-data',          -- job name
+  '0 3 * * *',                       -- cron expression: 3 AM UTC daily
+  $$SELECT public.cleanup_old_data()$$
+);
+
+-- ============================================================
+-- Verify the job was created
+-- ============================================================
+SELECT * FROM cron.job WHERE jobname = 'daily-cleanup-old-data';
+
+-- ============================================================
+-- MANAGEMENT COMMANDS (for reference)
+-- ============================================================
+
+-- View all scheduled jobs:
+-- SELECT * FROM cron.job;
+
+-- View recent job runs and results:
+-- SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 20;
+
+-- Run cleanup manually (to test):
+-- SELECT public.cleanup_old_data();
+
+-- Unschedule the job:
+-- SELECT cron.unschedule('daily-cleanup-old-data');
+
+-- Change schedule to weekly (Sundays at 3 AM):
+-- SELECT cron.unschedule('daily-cleanup-old-data');
+-- SELECT cron.schedule('weekly-cleanup-old-data', '0 3 * * 0', $$SELECT public.cleanup_old_data()$$);
--- a/Show More
+++ b/Show More