docs: map existing codebase

2026-02-24 10:28:22 -05:00
parent 9a906763c7
commit e6e1b1fa6f
7 changed files with 1969 additions and 0 deletions
--- a/.planning/codebase/ARCHITECTURE.md
+++ b/.planning/codebase/ARCHITECTURE.md
@@ -0,0 +1,243 @@
+# Architecture
+
+**Analysis Date:** 2026-02-24
+
+## Pattern Overview
+
+**Overall:** Full-stack distributed system combining Express.js backend with React frontend, implementing a **multi-stage document processing pipeline** with queued background jobs and real-time monitoring.
+
+**Key Characteristics:**
+- Server-rendered PDF generation with single-pass LLM processing
+- Asynchronous job queue for background document processing (max 3 concurrent)
+- Firebase authentication with Supabase PostgreSQL + pgvector for embeddings
+- Multi-language LLM support (Anthropic, OpenAI, OpenRouter)
+- Structured schema extraction using Zod and LLM-driven analysis
+- Google Document AI for OCR and text extraction
+- Real-time upload progress tracking via SSE/polling
+- Correlation ID tracking throughout distributed pipeline
+
+## Layers
+
+**API Layer (Express + TypeScript):**
+- Purpose: HTTP request routing, authentication, and response handling
+- Location: `backend/src/index.ts`, `backend/src/routes/`, `backend/src/controllers/`
+- Contains: Route definitions, request validation, error handling
+- Depends on: Middleware (auth, validation), Services
+- Used by: Frontend and external clients
+
+**Authentication Layer:**
+- Purpose: Firebase ID token verification and user identity validation
+- Location: `backend/src/middleware/firebaseAuth.ts`, `backend/src/config/firebase.ts`
+- Contains: Token verification, service account initialization, session recovery
+- Depends on: Firebase Admin SDK, configuration
+- Used by: All protected routes via `verifyFirebaseToken` middleware
+
+**Controller Layer:**
+- Purpose: Request handling, input validation, service orchestration
+- Location: `backend/src/controllers/documentController.ts`, `backend/src/controllers/authController.ts`
+- Contains: `getUploadUrl()`, `processDocument()`, `getDocumentStatus()` handlers
+- Depends on: Models, Services, Middleware
+- Used by: Routes
+
+**Service Layer:**
+- Purpose: Business logic, external API integration, document processing orchestration
+- Location: `backend/src/services/`
+- Contains:
+  - `unifiedDocumentProcessor.ts` - Main orchestrator, strategy selection
+  - `singlePassProcessor.ts` - 2-LLM-call extraction (pass 1 + quality check)
+  - `documentAiProcessor.ts` - Google Document AI text extraction
+  - `llmService.ts` - LLM API calls with retry logic (3 attempts, exponential backoff)
+  - `jobQueueService.ts` - Background job processing (EventEmitter-based)
+  - `fileStorageService.ts` - Google Cloud Storage signed URLs and uploads
+  - `vectorDatabaseService.ts` - Supabase vector embeddings and search
+  - `pdfGenerationService.ts` - Puppeteer-based PDF rendering
+  - `csvExportService.ts` - Financial data export
+- Depends on: Models, Config, Utilities
+- Used by: Controllers, Job Queue
+
+**Model Layer (Data Access):**
+- Purpose: Database interactions, query execution, schema validation
+- Location: `backend/src/models/`
+- Contains: `DocumentModel.ts`, `ProcessingJobModel.ts`, `UserModel.ts`, `VectorDatabaseModel.ts`
+- Depends on: Supabase client, configuration
+- Used by: Services, Controllers
+
+**Job Queue Layer:**
+- Purpose: Asynchronous background processing with priority and retry handling
+- Location: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`
+- Contains: In-memory queue, worker pool (max 3 concurrent), Firebase scheduled function trigger
+- Depends on: Services (document processor), Models
+- Used by: Controllers (to enqueue work), Scheduled functions (to trigger processing)
+
+**Frontend Layer (React + TypeScript):**
+- Purpose: User interface for document upload, processing monitoring, and review
+- Location: `frontend/src/`
+- Contains: Components (Upload, List, Viewer, Analytics), Services, Contexts
+- Depends on: Backend API, Firebase Auth, Axios
+- Used by: Web browsers
+
+## Data Flow
+
+**Document Upload & Processing Flow:**
+
+1. **Upload Initiation** (Frontend)
+   - User selects PDF file via `DocumentUpload` component
+   - Calls `documentService.getUploadUrl()` → Backend `/documents/upload-url` endpoint
+   - Backend creates document record (status: 'uploading') and generates signed GCS URL
+
+2. **File Upload** (Frontend → GCS)
+   - Frontend uploads file directly to Google Cloud Storage via signed URL
+   - Frontend polls `documentService.getDocumentStatus()` for upload completion
+   - `UploadMonitoringDashboard` displays real-time progress
+
+3. **Processing Trigger** (Frontend → Backend)
+   - Frontend calls `POST /documents/{id}/process` once upload complete
+   - Controller creates processing job and enqueues to `jobQueueService`
+   - Controller immediately returns job ID
+
+4. **Background Job Execution** (Job Queue)
+   - Scheduled Firebase function (`processDocumentJobs`) runs every 1 minute
+   - Calls `jobProcessorService.processJobs()` to dequeue and execute
+   - For each queued document:
+     - Fetch file from GCS
+     - Update status to 'extracting_text'
+     - Call `unifiedDocumentProcessor.processDocument()`
+
+5. **Document Processing** (Single-Pass Strategy)
+   - **Pass 1 - LLM Extraction:**
+     - `documentAiProcessor.extractText()` (if needed) - Google Document AI OCR
+     - `llmService.processCIMDocument()` - Claude/OpenAI structured extraction
+     - Produces `CIMReview` object with financial, market, management data
+     - Updates document status to 'processing_llm'
+
+   - **Pass 2 - Quality Check:**
+     - `llmService.validateCIMReview()` - Verify completeness and accuracy
+     - Updates status to 'quality_validation'
+
+   - **PDF Generation:**
+     - `pdfGenerationService.generatePDF()` - Puppeteer renders HTML template
+     - Uploads PDF to GCS
+     - Updates status to 'generating_pdf'
+
+   - **Vector Indexing (Background):**
+     - `vectorDatabaseService.createDocumentEmbedding()` - Generate 3072-dim embeddings
+     - Chunk document semantically, store in Supabase with vector index
+     - Status moves to 'vector_indexing' then 'completed'
+
+6. **Result Delivery** (Backend → Frontend)
+   - Frontend polls `GET /documents/{id}` to check completion
+   - When status = 'completed', fetches summary and analysis data
+   - `DocumentViewer` displays results, allows regeneration with feedback
+
+**State Management:**
+- Backend: Document status progresses through `uploading → extracting_text → processing_llm → generating_pdf → vector_indexing → completed` or `failed` at any step
+- Frontend: AuthContext manages user/token, component state tracks selected document and loading states
+- Job Queue: In-memory queue with EventEmitter for state transitions
+
+## Key Abstractions
+
+**Unified Processor:**
+- Purpose: Strategy pattern for document processing (single-pass vs. agentic RAG vs. simple)
+- Examples: `singlePassProcessor`, `simpleDocumentProcessor`, `optimizedAgenticRAGProcessor`
+- Pattern: Pluggable strategies via `ProcessingStrategy` selection in config
+
+**LLM Service:**
+- Purpose: Unified interface for multiple LLM providers with retry logic
+- Examples: `backend/src/services/llmService.ts` (Anthropic, OpenAI, OpenRouter)
+- Pattern: Provider-agnostic API with `processCIMDocument()` returning structured `CIMReview`
+
+**Vector Database Abstraction:**
+- Purpose: PostgreSQL pgvector operations via Supabase for semantic search
+- Examples: `backend/src/services/vectorDatabaseService.ts`
+- Pattern: Embedding + chunking → vector search via cosine similarity
+
+**File Storage Abstraction:**
+- Purpose: Google Cloud Storage operations with signed URLs
+- Examples: `backend/src/services/fileStorageService.ts`
+- Pattern: Signed upload/download URLs for temporary access without IAM burden
+
+**Job Queue Pattern:**
+- Purpose: Async processing with retry and priority handling
+- Examples: `backend/src/services/jobQueueService.ts` (EventEmitter-based)
+- Pattern: Priority queue with exponential backoff retry
+
+## Entry Points
+
+**API Entry Point:**
+- Location: `backend/src/index.ts`
+- Triggers: Process startup or Firebase Functions invocation
+- Responsibilities:
+  - Initialize Express app
+  - Set up middleware (CORS, helmet, rate limiting, authentication)
+  - Register routes (`/documents`, `/vector`, `/monitoring`, `/api/audit`)
+  - Start job queue service
+  - Export Firebase Functions v2 handlers (`api`, `processDocumentJobs`)
+
+**Scheduled Job Processing:**
+- Location: `backend/src/index.ts` (line 252: `processDocumentJobs` function export)
+- Triggers: Firebase Cloud Scheduler every 1 minute
+- Responsibilities:
+  - Health check database connection
+  - Detect stuck jobs (processing > 15 min, pending > 2 min)
+  - Call `jobProcessorService.processJobs()`
+  - Log metrics and errors
+
+**Frontend Entry Point:**
+- Location: `frontend/src/main.tsx`
+- Triggers: Browser navigation
+- Responsibilities:
+  - Initialize React app with AuthProvider
+  - Set up Firebase client
+  - Render routing structure (Login → Dashboard)
+
+**Document Processing Controller:**
+- Location: `backend/src/controllers/documentController.ts`
+- Route: `POST /documents/{id}/process`
+- Responsibilities:
+  - Validate user authentication
+  - Enqueue processing job
+  - Return job ID to client
+
+## Error Handling
+
+**Strategy:** Multi-layer error recovery with structured logging and graceful degradation
+
+**Patterns:**
+- **Retry Logic:** DocumentModel uses exponential backoff (1s → 2s → 4s) for network errors
+- **LLM Retry:** `llmService` retries API calls 3 times with exponential backoff
+- **Firebase Auth Recovery:** `firebaseAuth.ts` attempts session recovery on token verify failure
+- **Job Queue Retry:** Jobs retry up to 3 times with configurable backoff (5s → 300s max)
+- **Structured Error Logging:** All errors include correlation ID, stack trace, and context metadata
+- **Circuit Breaker Pattern:** Database health check in `processDocumentJobs` prevents cascading failures
+
+**Error Boundaries:**
+- Global error handler at end of Express middleware chain (`errorHandler`)
+- Try/catch in all async functions with context-aware logging
+- Unhandled rejection listener at process level (line 24 of `index.ts`)
+
+## Cross-Cutting Concerns
+
+**Logging:**
+- Framework: Winston (json + console in dev)
+- Approach: Structured logger with correlation IDs, Winston transports for error/upload logs
+- Location: `backend/src/utils/logger.ts`
+- Pattern: `logger.info()`, `logger.error()`, `StructuredLogger` for operations
+
+**Validation:**
+- Approach: Joi schema in environment config, Zod for API request/response types
+- Location: `backend/src/config/env.ts`, `backend/src/services/llmSchemas.ts`
+- Pattern: Joi for config, Zod for runtime validation
+
+**Authentication:**
+- Approach: Firebase ID tokens verified via `verifyFirebaseToken` middleware
+- Location: `backend/src/middleware/firebaseAuth.ts`
+- Pattern: Bearer token in Authorization header, cached in req.user
+
+**Correlation Tracking:**
+- Approach: UUID correlation ID added to all requests, propagated through job processing
+- Location: `backend/src/middleware/validation.ts` (addCorrelationId)
+- Pattern: X-Correlation-ID header or generated UUID, included in all logs
+
+---
+
+*Architecture analysis: 2026-02-24*
--- a/.planning/codebase/CONCERNS.md
+++ b/.planning/codebase/CONCERNS.md
@@ -0,0 +1,329 @@
+# Codebase Concerns
+
+**Analysis Date:** 2026-02-24
+
+## Tech Debt
+
+**Console.log Debug Statements in Controllers:**
+- Issue: Excessive `console.log()` calls with emoji prefixes left throughout `documentController.ts` instead of using proper structured logging via Winston logger
+- Files: `backend/src/controllers/documentController.ts` (lines 12-80, multiple scattered instances)
+- Impact: Production logs become noisy and unstructured; debug output leaks to stdout/stderr; makes it harder to parse logs for errors and metrics
+- Fix approach: Replace all `console.log()` calls with `logger.info()`, `logger.debug()`, `logger.error()` via imported `logger` from `utils/logger.ts`. Follow pattern established in other services.
+
+**Incomplete Job Statistics Tracking:**
+- Issue: `jobQueueService.ts` and `jobProcessorService.ts` both have TODO markers indicating completed/failed job counts are not tracked (lines 606-607, 635-636)
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`
+- Impact: Job queue health metrics are incomplete; cannot audit success/failure rates; monitoring dashboards will show incomplete data
+- Fix approach: Implement `completedJobs` and `failedJobs` counters in both services using persistent storage or Redis. Update schema if needed.
+
+**Config Migration Debug Cruft:**
+- Issue: Multiple `console.log()` debug statements in `config/env.ts` (lines 23, 46, 51, 292) for Firebase Functions v1→v2 migration are still present
+- Files: `backend/src/config/env.ts`
+- Impact: Production logs polluted with migration warnings; makes it harder to spot real issues; clutters server startup output
+- Fix approach: Remove all `[CONFIG DEBUG]` console.log statements once migration to Firebase Functions v2 is confirmed complete. Wrap remaining fallback logic in logger.debug() if diagnostics needed.
+
+**Hardcoded Processing Strategy:**
+- Issue: Historical commit shows processing strategy was hardcoded, potential for incomplete refactoring
+- Files: `backend/src/services/`, controller logic
+- Impact: May not correctly use configured strategy; processing may default unexpectedly
+- Fix approach: Verify all processing paths read from `config.processingStrategy` and have proper fallback logic
+
+**Type Safety Issues - `any` Type Usage:**
+- Issue: 378 instances of `any` or `unknown` types found across backend TypeScript files
+- Files: Widespread including `optimizedAgenticRAGProcessor.ts:17`, `pdfGenerationService.ts`, `vectorDatabaseService.ts`
+- Impact: Loses type safety guarantees; harder to catch errors at compile time; refactoring becomes risky
+- Fix approach: Gradually replace `any` with proper types. Start with service boundaries and public APIs. Create typed interfaces for common patterns.
+
+## Known Bugs
+
+**Project Panther CIM KPI Missing After Processing:**
+- Symptoms: Document `Project Panther - Confidential Information Memorandum_vBluePoint.pdf` processed but dashboard shows "Not specified in CIM" for Revenue, EBITDA, Employees, Founded even though numeric tables exist in PDF
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (dealOverview mapper), processing pipeline
+- Trigger: Process Project Panther test document through full agentic RAG pipeline
+- Impact: Dashboard KPI cards remain empty; users see incomplete summaries
+- Workaround: Manual data entry in dashboard; skip financial summary display for affected documents
+- Fix approach: Trace through `optimizedAgenticRAGProcessor.generateLLMAnalysisMultiPass()` → `dealOverview` mapper. Add regression test for this specific document. Check if structured table extraction is working correctly.
+
+**10+ Minute Processing Latency Regression:**
+- Symptoms: Document `document-55c4a6e2-8c08-4734-87f6-24407cea50ac.pdf` (Project Panther) took ~10 minutes end-to-end despite typical processing being 2-3 minutes
+- Files: `backend/src/services/unifiedDocumentProcessor.ts`, `optimizedAgenticRAGProcessor.ts`, `documentAiProcessor.ts`, `llmService.ts`
+- Trigger: Large or complex CIM documents (30+ pages with tables)
+- Impact: Users experience timeouts; processing approaching or exceeding 14-minute Firebase Functions limit
+- Workaround: None currently; document fails to process if latency exceeds timeout
+- Fix approach: Instrument each pipeline phase (PDF chunking, Document AI extraction, RAG passes, financial parser) with timing logs. Identify bottleneck(s). Profile GCS upload retries, Anthropic fallbacks. Consider parallel multi-pass queries within quota limits.
+
+**Vector Search Timeouts After Index Growth:**
+- Symptoms: Supabase vector search RPC calls timeout after 30 seconds; fallback to document-scoped search with limited results
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 122-182)
+- Trigger: Large embedded document collections (1000+ chunks); similarity search under load
+- Impact: Retrieval quality degrades as index grows; fallback search returns fewer contextual chunks; RAG quality suffers
+- Workaround: Fallback query uses document-scoped filtering and direct embedding lookup
+- Fix approach: Implement query batching, result caching by content hash, or query optimization. Consider Pinecone migration if Supabase vector performance doesn't improve. Add metrics to track timeout frequency.
+
+## Security Considerations
+
+**Unencrypted Debug Logs in Production:**
+- Risk: Sensitive document content, user IDs, and processing details may be exposed in logs if debug mode enabled in production
+- Files: `backend/src/middleware/firebaseAuth.ts` (AUTH_DEBUG flag), `backend/src/config/env.ts`, `backend/src/controllers/documentController.ts`
+- Current mitigation: Debug logging controlled by `AUTH_DEBUG` environment variable; not enabled by default
+- Recommendations:
+  1. Ensure `AUTH_DEBUG` is never set to `true` in production
+  2. Implement log redaction middleware to strip PII (API keys, document content, user data)
+  3. Use correlation IDs instead of logging full request bodies
+  4. Add log level enforcement (error/warn only in production)
+
+**Hardcoded Service Account Credentials Path:**
+- Risk: If service account key JSON is accidentally committed or exposed, attacker gains full GCS and Document AI access
+- Files: `backend/src/config/env.ts`, `backend/src/utils/googleServiceAccount.ts`
+- Current mitigation: `.env` file in `.gitignore`; credentials path via env var
+- Recommendations:
+  1. Use Firebase Function secrets (defineSecret()) instead of env files
+  2. Implement credential rotation policy
+  3. Add pre-commit hook to prevent `.json` key files in commits
+  4. Audit GCS bucket permissions quarterly
+
+**Concurrent LLM Rate Limiting Insufficient:**
+- Risk: Although `llmService.ts` limits concurrent calls to 1 (line 52), burst requests could still trigger Anthropic 429 rate limit errors during high load
+- Files: `backend/src/services/llmService.ts` (MAX_CONCURRENT_LLM_CALLS = 1)
+- Current mitigation: Max 1 concurrent call; retry with exponential backoff (3 attempts)
+- Recommendations:
+  1. Consider reducing to 0.5 concurrent calls (queue instead of async) during peak hours
+  2. Add request batching for multi-pass analysis
+  3. Implement circuit breaker pattern for cascading failures
+  4. Monitor token spend and throttle proactively
+
+**No Request Rate Limiting on Upload Endpoint:**
+- Risk: Unauthenticated attackers could flood `/upload/url` endpoint to exhaust quota or fill storage
+- Files: `backend/src/controllers/documentController.ts` (getUploadUrl endpoint), `backend/src/routes/documents.ts`
+- Current mitigation: Firebase Auth check; file size limit enforced
+- Recommendations:
+  1. Add rate limiter middleware (e.g., express-rate-limit) with per-user quotas
+  2. Implement request signing for upload URLs
+  3. Add CORS restrictions to known frontend domains
+  4. Monitor upload rate and alert on anomalies
+
+## Performance Bottlenecks
+
+**Large File PDF Chunking Memory Usage:**
+- Problem: Documents larger than 50 MB may cause OOM errors during chunking; no memory limit guards
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (line 35, 4000-char chunks), `backend/src/services/unifiedDocumentProcessor.ts`
+- Cause: Entire document text loaded into memory before chunking; large overlap between chunks multiplies footprint
+- Improvement path:
+  1. Implement streaming chunk processing from GCS (read chunks, embed, write to DB before next chunk)
+  2. Reduce overlap from 200 to 100 characters or make dynamic based on document size
+  3. Add memory threshold checks; fail early with user-friendly error if approaching limit
+  4. Profile heap usage in tests with 50+ MB documents
+
+**Embedding Generation for Large Documents:**
+- Problem: Embedding 1000+ chunks sequentially takes 2-3 minutes; no concurrency despite `maxConcurrentEmbeddings = 5` setting
+- Files: `backend/src/services/optimizedAgenticRAGProcessor.ts` (lines 37, 172-180 region)
+- Cause: Batch size of 10 may be inefficient; OpenAI/Anthropic API concurrency not fully utilized
+- Improvement path:
+  1. Increase batch size to 25-50 chunks per concurrent request (test quota limits)
+  2. Use Promise.all() instead of sequential embedding calls
+  3. Cache embeddings by content hash to skip re-embedding on retries
+  4. Add progress callback to track batch completion
+
+**Multiple LLM Retries on Network Failure:**
+- Problem: 3 retry attempts for each LLM call with exponential backoff means up to 30+ seconds per call; multi-pass analysis does 3+ passes
+- Files: `backend/src/services/llmService.ts` (retry logic, lines 320+), `backend/src/services/optimizedAgenticRAGProcessor.ts` (line 83 multi-pass)
+- Cause: No circuit breaker; all retries execute even if service degraded
+- Improvement path:
+  1. Track consecutive failures; disable retries if failure rate >50% in last minute
+  2. Use adaptive retry backoff (double wait time only after first failure)
+  3. Implement multi-pass fallback: if Pass 2 fails, use Pass 1 results instead of failing entire document
+  4. Add metrics endpoint to show retry frequency and success rates
+
+**PDF Generation Memory Leak with Puppeteer Page Pool:**
+- Problem: Page pool in `pdfGenerationService.ts` may not properly release browser resources; max pool size 5 but no eviction policy
+- Files: `backend/src/services/pdfGenerationService.ts` (lines 66-71, page pool)
+- Cause: Pages may not be closed if PDF generation errors mid-stream; no cleanup on timeout
+- Improvement path:
+  1. Implement LRU eviction: close oldest page if pool reaches max size
+  2. Add page timeout with forced close after 30s
+  3. Add memory monitoring; close all pages if heap >500MB
+  4. Log page pool stats every 5 minutes to detect leaks
+
+## Fragile Areas
+
+**Job Queue State Machine:**
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/services/jobProcessorService.ts`, `backend/src/models/ProcessingJobModel.ts`
+- Why fragile:
+  1. Job status transitions (pending → processing → completed) not atomic; race condition if two workers pick same job
+  2. Stuck job detection relies on timestamp comparison; clock skew or server restart breaks detection
+  3. No idempotency tokens; job retry on network error could trigger duplicate processing
+- Safe modification:
+  1. Add database-level unique constraint on job ID + processing timestamp
+  2. Use database transactions for status updates
+  3. Implement idempotency with request deduplication ID
+- Test coverage:
+  1. No unit tests found for concurrent job processing scenario
+  2. No integration tests with actual database
+  3. Add tests for: concurrent workers, stuck job reset, duplicate submissions
+
+**Document Processing Pipeline Error Handling:**
+- Files: `backend/src/controllers/documentController.ts` (lines 200+), `backend/src/services/unifiedDocumentProcessor.ts`
+- Why fragile:
+  1. Hybrid approach tries job queue then fallback to immediate processing; error in job queue doesn't fully propagate
+  2. Document status not updated if processing fails mid-pipeline (remains 'processing_llm')
+  3. No compensating transaction to roll back partial results
+- Safe modification:
+  1. Separate job submission from immediate processing; always update document status atomically
+  2. Add processing stage tracking (document_ai → chunking → embedding → llm → pdf)
+  3. Implement rollback logic: delete chunks and embeddings if LLM stage fails
+- Test coverage:
+  1. Add tests for each pipeline stage failure
+  2. Test document status consistency after each failure
+  3. Add integration test with network failure injection
+
+**Vector Database Search Fallback Chain:**
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 110-182)
+- Why fragile:
+  1. Three-level fallback (RPC search → document-scoped search → direct lookup) masks underlying issues
+  2. If Supabase RPC is degraded, system degrades silently instead of alerting
+  3. Fallback search may return stale or incorrect results without indication
+- Safe modification:
+  1. Add circuit breaker: if timeout happens 3x in 5 minutes, stop trying RPC search
+  2. Return metadata flag indicating which fallback was used (for logging/debugging)
+  3. Add explicit timeout wrapped in try/catch, not via Promise.race() (cleaner code)
+- Test coverage:
+  1. Mock Supabase timeout at each RPC level
+  2. Verify correct fallback is triggered
+  3. Add performance benchmarks for each search method
+
+**Config Initialization Race Condition:**
+- Files: `backend/src/config/env.ts` (lines 15-52)
+- Why fragile:
+  1. Firebase Functions v1 fallback (`functions.config()`) may not be thread-safe
+  2. If multiple instances start simultaneously, config merge may be incomplete
+  3. No validation that config merge was successful
+- Safe modification:
+  1. Remove v1 fallback entirely; require explicit Firebase Functions v2 setup
+  2. Validate all critical env vars before allowing service startup
+  3. Fail fast with clear error message if required vars missing
+- Test coverage:
+  1. Add test for missing required env vars
+  2. Test with incomplete config to verify error message clarity
+
+## Scaling Limits
+
+**Supabase Concurrent Vector Search Connections:**
+- Current capacity: RPC timeout 30 seconds; Supabase connection pool typically 100 max
+- Limit: With 3 concurrent workers × multiple users, could exhaust connection pool during peak load
+- Scaling path:
+  1. Implement connection pooling via PgBouncer (already in Supabase Pro tier)
+  2. Reduce timeout from 30s to 10s; fail faster and retry
+  3. Migrate to Pinecone if vector search becomes >30% of workload
+
+**Firebase Functions Timeout (14 minutes):**
+- Current capacity: Serverless function execution up to 15 minutes (1 minute buffer before hard timeout)
+- Limit: Document processing hitting ~10 minutes; adding new features could exceed limit
+- Scaling path:
+  1. Move processing to Cloud Run (1 hour limit) for large documents
+  2. Implement processing timeout failover: if approach 12 minutes, checkpoint and requeue
+  3. Add background worker pool for long-running jobs (separate from request path)
+
+**LLM API Rate Limits (Anthropic/OpenAI):**
+- Current capacity: 1 concurrent call; 3 retries per call; no per-minute or per-second throttling beyond single-call serialization
+- Limit: Burst requests from multiple users could trigger 429 rate limit errors
+- Scaling path:
+  1. Negotiate higher rate limits with API providers
+  2. Implement request queuing with exponential backoff per user
+  3. Add cost monitoring and soft-limit alerts (warn at 80% of quota)
+
+**PDF Generation Browser Pool:**
+- Current capacity: 5 browser pages maximum
+- Limit: With 3+ concurrent document processing jobs, pool contention causes delays (queue wait time)
+- Scaling path:
+  1. Increase pool size to 10 (requires more memory)
+  2. Move PDF generation to separate worker queue (decouple from request path)
+  3. Implement adaptive pool sizing based on available memory
+
+**GCS Upload/Download Throughput:**
+- Current capacity: Single-threaded upload/download; file transfer waits on GCS API latency
+- Limit: Large documents (50+ MB) may timeout or be slow
+- Scaling path:
+  1. Implement resumable uploads with multi-part chunks
+  2. Add parallel chunk uploads for files >10 MB
+  3. Cache frequently accessed documents in Redis
+
+## Dependencies at Risk
+
+**Firebase Functions v1 Deprecation (EOL Dec 31, 2025):**
+- Risk: Runtime will be decommissioned; Node.js 20 support ending Oct 30, 2026 (warning already surfaced)
+- Impact: Functions will stop working after deprecation date; forced migration required
+- Migration plan:
+  1. Migrate to Firebase Functions v2 runtime (already partially done; fallback code still present)
+  2. Update `firebase-functions` package to latest major version
+  3. Remove deprecated `functions.config()` fallback once migration confirmed
+  4. Test all functions after upgrade
+
+**Puppeteer Version Pinning:**
+- Risk: Puppeteer has frequent security updates; pinned version likely outdated
+- Impact: Browser vulnerabilities in PDF generation; potential sandbox bypass
+- Migration plan:
+  1. Audit current Puppeteer version in `package.json`
+  2. Test upgrade path (may have breaking API changes)
+  3. Implement automated dependency security scanning
+
+**Document AI API Versioning:**
+- Risk: Google Cloud Document AI API may deprecate current processor version
+- Impact: Processing pipeline breaks if processor ID no longer valid
+- Migration plan:
+  1. Document current processor version and creation date
+  2. Subscribe to Google Cloud deprecation notices
+  3. Add feature flag to switch processor versions
+  4. Test new processor version before migration
+
+## Missing Critical Features
+
+**Job Processing Observability:**
+- Problem: No metrics for job success rate, average processing time per stage, or failure breakdown by error type
+- Blocks: Cannot diagnose performance regressions; cannot identify bottlenecks
+- Implementation: Add `/health/agentic-rag` endpoint exposing per-pass timing, token usage, cost data
+
+**Document Version History:**
+- Problem: Processing pipeline overwrites `analysis_data` on each run; no ability to compare old vs. new results
+- Blocks: Cannot detect if new model version improves accuracy; hard to debug regression
+- Implementation: Add `document_versions` table; keep historical results; implement diff UI
+
+**Retry Mechanism for Failed Documents:**
+- Problem: Failed documents stay in failed state; no way to retry after infrastructure recovers
+- Blocks: User must re-upload document; processing failures are permanent per upload
+- Implementation: Add "Retry" button to failed document status; re-queue without user re-upload
+
+## Test Coverage Gaps
+
+**End-to-End Pipeline with Large Documents:**
+- What's not tested: Full processing pipeline with 50+ MB documents; covers PDF chunking, Document AI extraction, embeddings, LLM analysis, PDF generation
+- Files: No integration test covering full flow with large fixture
+- Risk: Cannot detect if scaling to large documents introduces timeouts or memory issues
+- Priority: High (Project Panther regression was not caught by tests)
+
+**Concurrent Job Processing:**
+- What's not tested: Multiple jobs submitted simultaneously; verify no race conditions in job queue or database
+- Files: `backend/src/services/jobQueueService.ts`, `backend/src/models/ProcessingJobModel.ts`
+- Risk: Race condition causes duplicate processing or lost job state in production
+- Priority: High (affects reliability)
+
+**Vector Database Fallback Scenarios:**
+- What's not tested: Simulate Supabase RPC timeout and verify correct fallback search is executed
+- Files: `backend/src/services/vectorDatabaseService.ts` (lines 110-182)
+- Risk: Fallback search silent failures or incorrect results not detected
+- Priority: Medium (affects search quality)
+
+**LLM API Provider Switching:**
+- What's not tested: Switch between Anthropic, OpenAI, OpenRouter; verify each provider works correctly
+- Files: `backend/src/services/llmService.ts` (provider selection logic)
+- Risk: Provider-specific bugs not caught until production usage
+- Priority: Medium (currently only Anthropic heavily used)
+
+**Error Propagation in Hybrid Processing:**
+- What's not tested: Job queue failure → immediate processing fallback; verify document status and error reporting
+- Files: `backend/src/controllers/documentController.ts` (lines 200+)
+- Risk: Silent failures or incorrect status updates if fallback error not properly handled
+- Priority: High (affects user experience)
+
+---
+
+*Concerns audit: 2026-02-24*
--- a/.planning/codebase/CONVENTIONS.md
+++ b/.planning/codebase/CONVENTIONS.md
@@ -0,0 +1,286 @@
+# Coding Conventions
+
+**Analysis Date:** 2026-02-24
+
+## Naming Patterns
+
+**Files:**
+- Backend service files: `camelCase.ts` (e.g., `llmService.ts`, `unifiedDocumentProcessor.ts`, `vectorDatabaseService.ts`)
+- Backend middleware/controllers: `camelCase.ts` (e.g., `errorHandler.ts`, `firebaseAuth.ts`)
+- Frontend components: `PascalCase.tsx` (e.g., `DocumentUpload.tsx`, `LoginForm.tsx`, `ProtectedRoute.tsx`)
+- Frontend utility files: `camelCase.ts` (e.g., `cn.ts` for class name utilities)
+- Type definition files: `camelCase.ts` with `.d.ts` suffix optional (e.g., `express.d.ts`)
+- Model files: `PascalCase.ts` in `backend/src/models/` (e.g., `DocumentModel.ts`)
+- Config files: `camelCase.ts` (e.g., `env.ts`, `firebase.ts`, `supabase.ts`)
+
+**Functions:**
+- Both backend and frontend use camelCase: `processDocument()`, `validateUUID()`, `handleUpload()`
+- React components are PascalCase: `DocumentUpload`, `ErrorHandler`
+- Handler functions use `handle` or verb prefix: `handleVisibilityChange()`, `onDrop()`
+- Async functions use descriptive names: `fetchDocuments()`, `uploadDocument()`, `processDocument()`
+
+**Variables:**
+- camelCase for all variables: `documentId`, `correlationId`, `isUploading`, `uploadedFiles`
+- Constant state use UPPER_SNAKE_CASE in rare cases: `MAX_CONCURRENT_LLM_CALLS`, `MAX_TOKEN_LIMITS`
+- Boolean prefixes: `is*` (isUploading, isAdmin), `has*` (hasError), `can*` (canProcess)
+
+**Types:**
+- Interfaces use PascalCase: `LLMRequest`, `UploadedFile`, `DocumentUploadProps`, `CIMReview`
+- Type unions use PascalCase: `ErrorCategory`, `ProcessingStrategy`
+- Generic types use single uppercase letter or descriptive name: `T`, `K`, `V`
+- Enum values use UPPER_SNAKE_CASE: `ErrorCategory.VALIDATION`, `ErrorCategory.AUTHENTICATION`
+
+**Interfaces vs Types:**
+- **Interfaces** for object shapes that represent entities or components: `interface Document`, `interface UploadedFile`
+- **Types** for unions, primitives, and specialized patterns: `type ProcessingStrategy = 'document_ai_agentic_rag' | 'simple_full_document'`
+
+## Code Style
+
+**Formatting:**
+- No formal Prettier config detected in repo (allow varied formatting)
+- 2-space indentation (observed in TypeScript files)
+- Semicolons required at end of statements
+- Single quotes for strings in TypeScript, double quotes in JSX attributes
+- Line length: preferably under 100 characters but not enforced
+
+**Linting:**
+- Tool: ESLint with TypeScript support
+- Config: `.eslintrc.js` in backend
+- Key rules:
+  - `@typescript-eslint/no-unused-vars`: error (allows leading underscore for intentionally unused)
+  - `@typescript-eslint/no-explicit-any`: warn (use `unknown` instead)
+  - `@typescript-eslint/no-non-null-assertion`: warn (use proper type guards)
+  - `no-console`: off in backend (logging used via Winston)
+  - `no-undef`: error (strict undefined checking)
+- Frontend ESLint ignores unused disable directives and has max-warnings: 0
+
+**TypeScript Standards:**
+- Strict mode not fully enabled (noImplicitAny disabled in tsconfig.json for legacy reasons)
+- Prefer explicit typing over `any`: use `unknown` when type is truly unknown
+- Type guards required for safety checks: `error instanceof Error ? error.message : String(error)`
+- No type assertions with `as` for complex types; use proper type narrowing
+
+## Import Organization
+
+**Order:**
+1. External framework/library imports (`express`, `react`, `winston`)
+2. Google Cloud/Firebase imports (`@google-cloud/storage`, `firebase-admin`)
+3. Third-party service imports (`axios`, `zod`, `joi`)
+4. Internal config imports (`'../config/env'`, `'../config/firebase'`)
+5. Internal utility imports (`'../utils/logger'`, `'../utils/cn'`)
+6. Internal model imports (`'../models/DocumentModel'`)
+7. Internal service imports (`'../services/llmService'`)
+8. Internal middleware/helper imports (`'../middleware/errorHandler'`)
+9. Type-only imports at the end: `import type { ProcessingStrategy } from '...'`
+
+**Examples:**
+
+Backend service pattern from `optimizedAgenticRAGProcessor.ts`:
+```typescript
+import { logger } from '../utils/logger';
+import { vectorDatabaseService } from './vectorDatabaseService';
+import { VectorDatabaseModel } from '../models/VectorDatabaseModel';
+import { llmService } from './llmService';
+import { CIMReview } from './llmSchemas';
+import { config } from '../config/env';
+import type { ParsedFinancials } from './financialTableParser';
+import type { StructuredTable } from './documentAiProcessor';
+```
+
+Frontend component pattern from `DocumentList.tsx`:
+```typescript
+import React from 'react';
+import {
+  FileText,
+  Eye,
+  Download,
+  Trash2,
+  Calendar,
+  User,
+  Clock
+} from 'lucide-react';
+import { cn } from '../utils/cn';
+```
+
+**Path Aliases:**
+- No @ alias imports detected; all use relative `../` patterns
+- Monorepo structure: frontend and backend in separate directories with independent module resolution
+
+## Error Handling
+
+**Patterns:**
+
+1. **Structured Error Objects with Categories:**
+   - Use `ErrorCategory` enum for classification: `VALIDATION`, `AUTHENTICATION`, `AUTHORIZATION`, `NOT_FOUND`, `EXTERNAL_SERVICE`, `PROCESSING`, `DATABASE`, `SYSTEM`
+   - Attach `AppError` interface properties: `statusCode`, `isOperational`, `code`, `correlationId`, `category`, `retryable`, `context`
+   - Example from `errorHandler.ts`:
+     ```typescript
+     const enhancedError: AppError = {
+       category: ErrorCategory.VALIDATION,
+       statusCode: 400,
+       code: 'INVALID_UUID_FORMAT',
+       retryable: false
+     };
+     ```
+
+2. **Try-Catch with Structured Logging:**
+   - Always catch errors with explicit type checking
+   - Log with structured data including correlation ID
+   - Example pattern:
+     ```typescript
+     try {
+       await operation();
+     } catch (error) {
+       logger.error('Operation failed', {
+         error: error instanceof Error ? error.message : String(error),
+         stack: error instanceof Error ? error.stack : undefined,
+         context: { documentId, userId }
+       });
+       throw error;
+     }
+     ```
+
+3. **HTTP Response Pattern:**
+   - Success responses: `{ success: true, data: {...} }`
+   - Error responses: `{ success: false, error: { code, message, details, correlationId, timestamp, retryable } }`
+   - User-friendly messages mapped by error category
+   - Include `X-Correlation-ID` header in responses
+
+4. **Retry Logic:**
+   - LLM service implements concurrency limiting: max 1 concurrent call to prevent rate limits
+   - 3 retry attempts for LLM API calls with exponential backoff (see `llmService.ts` lines 236-450)
+   - Jobs respect 14-minute timeout limit with graceful status updates
+
+5. **External Service Errors:**
+   - Firebase Auth errors: extract from `error.message` and `error.name` (TokenExpiredError, JsonWebTokenError)
+   - Supabase errors: check `error.code` and `error.message`, handle UUID validation errors
+   - GCS errors: extract from error objects with proper null checks
+
+## Logging
+
+**Framework:** Winston logger from `backend/src/utils/logger.ts`
+
+**Levels:**
+- `logger.debug()`: Detailed diagnostic info (disabled in production)
+- `logger.info()`: Normal operation information, upload start/completion, processing status
+- `logger.warn()`: Warning conditions, CORS rejections, non-critical issues
+- `logger.error()`: Error conditions with full context and stack traces
+
+**Structured Logging Pattern:**
+```typescript
+logger.info('Message', {
+  correlationId: correlationId,
+  category: 'operation_type',
+  operation: 'specific_action',
+  documentId: documentId,
+  userId: userId,
+  metadata: value,
+  timestamp: new Date().toISOString()
+});
+```
+
+**StructuredLogger Class:**
+- Use for operations requiring correlation ID tracking
+- Constructor: `const logger = new StructuredLogger(correlationId)`
+- Specialized methods:
+  - `uploadStart()`, `uploadSuccess()`, `uploadError()` - for file operations
+  - `processingStart()`, `processingSuccess()`, `processingError()` - for document processing
+  - `storageOperation()` - for file storage operations
+  - `jobQueueOperation()` - for background jobs
+  - `info()`, `warn()`, `error()`, `debug()` - general logging
+- All methods automatically attach correlation ID to metadata
+
+**What NOT to Log:**
+- Credentials, API keys, or sensitive data
+- Large file contents or binary data
+- User passwords or tokens (log only presence: "token available" or "NO_TOKEN")
+- Request body contents (sanitized in error handler - only whitelisted fields: documentId, id, status, fileName, fileSize, contentType, correlationId)
+
+**Console Usage:**
+- Backend: `console.log` disabled by ESLint in production code; only Winston logger used
+- Frontend: `console.log` used in development (observed in DocumentUpload, App components)
+- Special case: logger initialization may use console.warn for setup diagnostics
+
+## Comments
+
+**When to Comment:**
+- Complex algorithms or business logic: explain "why", not "what" the code does
+- Non-obvious type conversions or workarounds
+- Links to related issues, tickets, or documentation
+- Critical security considerations or performance implications
+- TODO items for incomplete work (format: `// TODO: [description]`)
+
+**JSDoc/TSDoc:**
+- Used for function and class documentation in utility and service files
+- Function signature example from `test-helpers.ts`:
+  ```typescript
+  /**
+   * Creates a mock correlation ID for testing
+   */
+  export function createMockCorrelationId(): string
+  ```
+- Parameter and return types documented via TypeScript typing (preferred over verbose JSDoc)
+- Service classes include operation summaries: `/** Process document using Document AI + Agentic RAG strategy */`
+
+## Function Design
+
+**Size:**
+- Keep functions focused on single responsibility
+- Long services (300+ lines) separate concerns into helper methods
+- Controller/middleware functions stay under 50 lines
+
+**Parameters:**
+- Max 3-4 required parameters; use object for additional config
+- Example: `processDocument(documentId: string, userId: string, text: string, options?: { strategy?: string })`
+- Use destructuring for config objects: `{ strategy, maxTokens, temperature }`
+
+**Return Values:**
+- Async operations return Promise with typed success/error objects
+- Pattern: `Promise<{ success: boolean; data: T; error?: string }>`
+- Avoid throwing in service methods; return error in object
+- Controllers/middleware can throw for Express error handler
+
+**Type Signatures:**
+- Always specify parameter and return types (no implicit `any`)
+- Use generics for reusable patterns: `Promise<T>`, `Array<Document>`
+- Union types for multiple possibilities: `'uploading' | 'uploaded' | 'processing' | 'completed' | 'error'`
+
+## Module Design
+
+**Exports:**
+- Services exported as singleton instances: `export const llmService = new LLMService()`
+- Utility functions exported as named exports: `export function validateUUID() { ... }`
+- Type definitions exported from dedicated type files or alongside implementation
+- Classes exported as default or named based on usage pattern
+
+**Barrel Files:**
+- Not consistently used; services import directly from implementation files
+- Example: `import { llmService } from './llmService'` not from `./services/index`
+- Consider adding for cleaner imports when services directory grows
+
+**Service Singletons:**
+- All services instantiated once and exported as singletons
+- Examples:
+  - `backend/src/services/llmService.ts`: `export const llmService = new LLMService()`
+  - `backend/src/services/fileStorageService.ts`: `export const fileStorageService = new FileStorageService()`
+  - `backend/src/services/vectorDatabaseService.ts`: `export const vectorDatabaseService = new VectorDatabaseService()`
+- Prevents multiple initialization and enables dependency sharing
+
+**Frontend Context Pattern:**
+- React Context for auth: `AuthContext` exports `useAuth()` hook
+- Services pattern: `documentService` contains API methods, used as singleton
+- No service singletons in frontend (class instances recreated as needed)
+
+## Deprecated Patterns (DO NOT USE)
+
+- ❌ Direct PostgreSQL connections - Use Supabase client instead
+- ❌ JWT authentication - Use Firebase Auth tokens
+- ❌ `console.log` in production code - Use Winston logger
+- ❌ Type assertions with `as` for complex types - Use type guards
+- ❌ Manual error handling without correlation IDs
+- ❌ Redis caching - Not used in current architecture
+- ❌ Jest testing - Use Vitest instead
+
+---
+
+*Convention analysis: 2026-02-24*
--- a/.planning/codebase/INTEGRATIONS.md
+++ b/.planning/codebase/INTEGRATIONS.md
@@ -0,0 +1,247 @@
+# External Integrations
+
+**Analysis Date:** 2026-02-24
+
+## APIs & External Services
+
+**Document Processing:**
+- Google Document AI
+  - Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing
+  - Client: `@google-cloud/documentai` 9.3.0
+  - Implementation: `backend/src/services/documentAiProcessor.ts`
+  - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS` or default credentials
+  - Configuration: Processor ID from `DOCUMENT_AI_PROCESSOR_ID`, location from `DOCUMENT_AI_LOCATION` (default: 'us')
+  - Max pages per chunk: 15 pages (configurable)
+
+**Large Language Models:**
+- OpenAI
+  - Purpose: LLM analysis of document content, embeddings for vector search
+  - SDK/Client: `openai` 5.10.2
+  - Auth: API key from `OPENAI_API_KEY`
+  - Models: Default `gpt-4-turbo`, embeddings via `text-embedding-3-small`
+  - Implementation: `backend/src/services/llmService.ts` with provider abstraction
+  - Retry: 3 attempts with exponential backoff
+
+- Anthropic Claude
+  - Purpose: LLM analysis and document summary generation
+  - SDK/Client: `@anthropic-ai/sdk` 0.57.0
+  - Auth: API key from `ANTHROPIC_API_KEY`
+  - Models: Default `claude-sonnet-4-20250514` (configurable via `LLM_MODEL`)
+  - Implementation: `backend/src/services/llmService.ts`
+  - Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors)
+  - Retry: 3 attempts with exponential backoff
+
+- OpenRouter
+  - Purpose: Alternative LLM provider supporting multiple models through single API
+  - SDK/Client: HTTP requests via `axios` to OpenRouter API
+  - Auth: `OPENROUTER_API_KEY` or optional Bring-Your-Own-Key mode (`OPENROUTER_USE_BYOK`)
+  - Configuration: `LLM_PROVIDER: 'openrouter'` activates this provider
+  - Implementation: `backend/src/services/llmService.ts`
+
+**File Storage:**
+- Google Cloud Storage (GCS)
+  - Purpose: Store uploaded PDFs, processed documents, and generated PDFs
+  - SDK/Client: `@google-cloud/storage` 7.16.0
+  - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS`
+  - Buckets:
+    - Input: `GCS_BUCKET_NAME` for uploaded documents
+    - Output: `DOCUMENT_AI_OUTPUT_BUCKET_NAME` for processing results
+  - Implementation: `backend/src/services/fileStorageService.ts` and `backend/src/services/documentAiProcessor.ts`
+  - Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
+
+## Data Storage
+
+**Databases:**
+- Supabase PostgreSQL
+  - Connection: `SUPABASE_URL` for PostgREST API, `DATABASE_URL` for direct PostgreSQL
+  - Client: `@supabase/supabase-js` 2.53.0 for REST API, `pg` 8.11.3 for direct pool connections
+  - Auth: `SUPABASE_ANON_KEY` for client operations, `SUPABASE_SERVICE_KEY` for server operations
+  - Implementation:
+    - `backend/src/config/supabase.ts` - Client initialization with 30-second request timeout
+    - `backend/src/models/` - All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel)
+  - Vector Support: pgvector extension for semantic search
+  - Tables:
+    - `users` - User accounts and authentication data
+    - `documents` - CIM documents with status tracking
+    - `document_chunks` - Text chunks with embeddings for vector search
+    - `document_feedback` - User feedback on summaries
+    - `document_versions` - Document version history
+    - `document_audit_logs` - Audit trail for compliance
+    - `processing_jobs` - Background job queue with status tracking
+    - `performance_metrics` - System performance data
+  - Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout
+
+**Vector Database:**
+- Supabase pgvector (built into PostgreSQL)
+  - Purpose: Semantic search and RAG context retrieval
+  - Implementation: `backend/src/services/vectorDatabaseService.ts`
+  - Embedding generation: Via OpenAI `text-embedding-3-small` (embedded in service)
+  - Search: Cosine similarity via Supabase RPC calls
+  - Semantic cache: 1-hour TTL for cached embeddings
+
+**File Storage:**
+- Google Cloud Storage (primary storage above)
+- Local filesystem (fallback for development, stored in `uploads/` directory)
+
+**Caching:**
+- In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL
+- No external cache service (Redis, Memcached) currently used
+
+## Authentication & Identity
+
+**Auth Provider:**
+- Firebase Authentication
+  - Purpose: User authentication, JWT token generation and verification
+  - Client: `firebase` 12.0.0 (frontend at `frontend/src/config/firebase.ts`)
+  - Admin: `firebase-admin` 13.4.0 (backend at `backend/src/config/firebase.ts`)
+  - Implementation:
+    - Frontend: `frontend/src/services/authService.ts` - Login, logout, token refresh
+    - Backend: `backend/src/middleware/firebaseAuth.ts` - Token verification middleware
+  - Project: `cim-summarizer` (hardcoded in config)
+  - Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header
+
+**Token-Based Auth:**
+- JWT (JSON Web Tokens)
+  - Purpose: API request authentication
+  - Implementation: `backend/src/middleware/firebaseAuth.ts`
+  - Verification: Firebase Admin SDK verifies token signature and expiration
+  - Header: `Authorization: Bearer <token>`
+
+**Fallback Auth (for service-to-service):**
+- API Key based (not currently exposed but framework supports it in `backend/src/config/env.ts`)
+
+## Monitoring & Observability
+
+**Error Tracking:**
+- No external error tracking service configured
+- Errors logged via Winston logger with correlation IDs for tracing
+
+**Logs:**
+- Winston logger 3.11.0 - Structured JSON logging at `backend/src/utils/logger.ts`
+- Transports: Console (development), File-based for production logs
+- Correlation ID middleware at `backend/src/middleware/errorHandler.ts` - Every request traced
+- Request logging: Morgan 1.10.0 with Winston transport
+- Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments
+
+**Monitoring Endpoints:**
+- `GET /health` - Basic health check with uptime and environment info
+- `GET /health/config` - Configuration validation status
+- `GET /health/agentic-rag` - Agentic RAG system health (placeholder)
+- `GET /monitoring/dashboard` - Aggregated system metrics (queryable by time range)
+
+## CI/CD & Deployment
+
+**Hosting:**
+- **Backend**:
+  - Firebase Cloud Functions (default, Node.js 20 runtime)
+  - Google Cloud Run (alternative containerized deployment)
+  - Configuration: `backend/firebase.json` defines function source, runtime, and predeploy hooks
+
+- **Frontend**:
+  - Firebase Hosting (CDN-backed static hosting)
+  - Configuration: Defined in `frontend/` directory with `firebase.json`
+
+**Deployment Commands:**
+```bash
+# Backend deployment
+npm run deploy:firebase          # Deploy functions to Firebase
+npm run deploy:cloud-run        # Deploy to Cloud Run
+npm run docker:build            # Build Docker image
+npm run docker:push             # Push to GCR
+
+# Frontend deployment
+npm run deploy:firebase         # Deploy to Firebase Hosting
+npm run deploy:preview          # Deploy to preview channel
+
+# Emulator
+npm run emulator                # Run Firebase emulator locally
+npm run emulator:ui             # Run emulator with UI
+```
+
+**Build Pipeline:**
+- TypeScript compilation: `tsc` targets ES2020
+- Predeploy: Defined in `firebase.json` - runs `npm run build`
+- Docker image for Cloud Run: `Dockerfile` in backend root
+
+## Environment Configuration
+
+**Required env vars (Production):**
+```
+NODE_ENV=production
+LLM_PROVIDER=anthropic
+GCLOUD_PROJECT_ID=cim-summarizer
+DOCUMENT_AI_PROCESSOR_ID=<processor-id>
+GCS_BUCKET_NAME=<bucket-name>
+DOCUMENT_AI_OUTPUT_BUCKET_NAME=<output-bucket>
+SUPABASE_URL=https://<project>.supabase.co
+SUPABASE_ANON_KEY=<anon-key>
+SUPABASE_SERVICE_KEY=<service-key>
+DATABASE_URL=postgresql://postgres:<password>@aws-0-us-central-1.pooler.supabase.com:6543/postgres
+ANTHROPIC_API_KEY=sk-ant-...
+OPENAI_API_KEY=sk-...
+FIREBASE_PROJECT_ID=cim-summarizer
+```
+
+**Optional env vars:**
+```
+DOCUMENT_AI_LOCATION=us
+VECTOR_PROVIDER=supabase
+LLM_MODEL=claude-sonnet-4-20250514
+LLM_MAX_TOKENS=16000
+LLM_TEMPERATURE=0.1
+OPENROUTER_API_KEY=<key>
+OPENROUTER_USE_BYOK=true
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
+```
+
+**Secrets location:**
+- Development: `.env` file (gitignored, never committed)
+- Production: Firebase Functions secrets via `firebase functions:secrets:set`
+- Google Credentials: `backend/serviceAccountKey.json` for local dev, service account in Cloud Functions environment
+
+## Webhooks & Callbacks
+
+**Incoming:**
+- No external webhooks currently configured
+- All document processing triggered by HTTP POST to `POST /documents/upload`
+
+**Outgoing:**
+- No outgoing webhooks implemented
+- Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue
+
+**Real-time Monitoring:**
+- Server-Sent Events (SSE) not implemented
+- Polling endpoints for progress:
+  - `GET /documents/{id}/progress` - Document processing progress
+  - `GET /documents/queue/status` - Job queue status (frontend polls every 5 seconds)
+
+## Rate Limiting & Quotas
+
+**API Rate Limits:**
+- Express rate limiter: 1000 requests per 15 minutes per IP
+- LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling)
+- OpenAI rate limits: Handled by SDK with backoff
+
+**File Upload Limits:**
+- Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
+- Allowed MIME types: `application/pdf` (configurable via `ALLOWED_FILE_TYPES`)
+
+## Network Configuration
+
+**CORS Origins (Allowed):**
+- `https://cim-summarizer.web.app` (production)
+- `https://cim-summarizer.firebaseapp.com` (production)
+- `http://localhost:3000` (development)
+- `http://localhost:5173` (development)
+- `https://localhost:3000` (SSL local dev)
+- `https://localhost:5173` (SSL local dev)
+
+**Port Mappings:**
+- Frontend dev: Port 5173 (Vite dev server)
+- Backend dev: Port 5001 (Firebase Functions emulator)
+- Backend API: Port 5000 (Express in standard deployment)
+- Vite proxy to backend: `/api` routes proxied from port 5173 to `http://localhost:5000`
+
+---
+
+*Integration audit: 2026-02-24*
--- a/.planning/codebase/STACK.md
+++ b/.planning/codebase/STACK.md
@@ -0,0 +1,148 @@
+# Technology Stack
+
+**Analysis Date:** 2026-02-24
+
+## Languages
+
+**Primary:**
+- TypeScript 5.2.2 - Both backend and frontend, strict mode enabled
+- JavaScript (CommonJS) - Build outputs and configuration
+
+**Supporting:**
+- SQL - Supabase PostgreSQL database via migrations in `backend/src/models/migrations/`
+
+## Runtime
+
+**Environment:**
+- Node.js 20 (specified in `backend/firebase.json`)
+- Browser (ES2020 target for both client and server)
+
+**Package Manager:**
+- npm - Primary package manager for both backend and frontend
+- Lockfile: `package-lock.json` present in both `backend/` and `frontend/`
+
+## Frameworks
+
+**Backend - Core:**
+- Express.js 4.18.2 - HTTP server and REST API framework at `backend/src/index.ts`
+- Firebase Admin SDK 13.4.0 - Authentication and service account management at `backend/src/config/firebase.ts`
+- Firebase Functions 6.4.0 - Cloud Functions deployment runtime at port 5001
+
+**Frontend - Core:**
+- React 18.2.0 - UI framework with TypeScript support
+- Vite 4.5.0 - Build tool and dev server (port 5173 for dev, port 3000 production)
+
+**Backend - Testing:**
+- Vitest 2.1.0 - Test runner with v8 coverage provider at `backend/vitest.config.ts`
+- Configuration: Global test environment set to 'node', 30-second test timeout
+
+**Backend - Build/Dev:**
+- ts-node 10.9.2 - TypeScript execution for scripts
+- ts-node-dev 2.0.0 - Live reload development server with `--transpile-only` flag
+- TypeScript Compiler (tsc) 5.2.2 - Strict type checking, ES2020 target
+
+**Frontend - Build/Dev:**
+- Vite React plugin 4.1.1 - React JSX transformation
+- TailwindCSS 3.3.5 - Utility-first CSS framework with PostCSS 8.4.31
+
+## Key Dependencies
+
+**Critical Infrastructure:**
+- `@google-cloud/documentai` 9.3.0 - Google Document AI OCR/text extraction at `backend/src/services/documentAiProcessor.ts`
+- `@google-cloud/storage` 7.16.0 - Google Cloud Storage (GCS) for file uploads and processing
+- `@supabase/supabase-js` 2.53.0 - PostgreSQL database client with vector support at `backend/src/config/supabase.ts`
+- `pg` 8.11.3 - Direct PostgreSQL connection pool for critical operations bypassing PostgREST
+
+**LLM & AI:**
+- `@anthropic-ai/sdk` 0.57.0 - Claude API integration with support for Anthropic provider
+- `openai` 5.10.2 - OpenAI API and embeddings (text-embedding-3-small)
+- Both providers abstracted via `backend/src/services/llmService.ts`
+
+**PDF Processing:**
+- `pdf-lib` 1.17.1 - PDF generation and manipulation at `backend/src/services/pdfGenerationService.ts`
+- `pdf-parse` 1.1.1 - PDF text extraction
+- `pdfkit` 0.17.1 - PDF document creation
+
+**Document Processing:**
+- `puppeteer` 21.11.0 - Headless Chrome for HTML/PDF conversion
+
+**Security & Authentication:**
+- `firebase` 12.0.0 (frontend) - Firebase client SDK for authentication at `frontend/src/config/firebase.ts`
+- `firebase-admin` 13.4.0 (backend) - Admin SDK for token verification at `backend/src/middleware/firebaseAuth.ts`
+- `jsonwebtoken` 9.0.2 - JWT token creation and verification
+- `bcryptjs` 2.4.3 - Password hashing with 12 rounds default
+
+**API & HTTP:**
+- `axios` 1.11.0 - HTTP client for both frontend and backend
+- `cors` 2.8.5 - Cross-Origin Resource Sharing middleware for Express
+- `helmet` 7.1.0 - Security headers middleware
+- `morgan` 1.10.0 - HTTP request logging middleware
+- `express-rate-limit` 7.1.5 - Rate limiting middleware (1000 requests per 15 minutes)
+
+**Data Validation & Schema:**
+- `zod` 3.25.76 - TypeScript-first schema validation at `backend/src/services/llmSchemas.ts`
+- `zod-to-json-schema` 3.24.6 - Convert Zod schemas to JSON Schema for LLM structured output
+- `joi` 17.11.0 - Environment variable validation in `backend/src/config/env.ts`
+
+**Logging & Monitoring:**
+- `winston` 3.11.0 - Structured logging framework with multiple transports at `backend/src/utils/logger.ts`
+
+**Frontend - UI Components:**
+- `lucide-react` 0.294.0 - Icon library
+- `react-dom` 18.2.0 - React rendering for web
+- `react-router-dom` 6.20.1 - Client-side routing
+- `react-dropzone` 14.3.8 - File upload handling
+- `clsx` 2.0.0 - Conditional className utility
+- `tailwind-merge` 2.0.0 - Merge Tailwind classes with conflict resolution
+
+**Utilities:**
+- `uuid` 11.1.0 - Unique identifier generation
+- `dotenv` 16.3.1 - Environment variable loading from `.env` files
+
+## Configuration
+
+**Environment:**
+- **.env file support** - Dotenv loads from `.env` for local development in `backend/src/config/env.ts`
+- **Environment validation** - Joi schema at `backend/src/config/env.ts` validates all required/optional env vars
+- **Firebase Functions v2** - Uses `defineString()` and `defineSecret()` for secure configuration (migration from v1 functions.config())
+
+**Key Configuration Variables (Backend):**
+- `NODE_ENV` - 'development' | 'production' | 'test'
+- `LLM_PROVIDER` - 'openai' | 'anthropic' | 'openrouter' (default: 'openai')
+- `GCLOUD_PROJECT_ID` - Google Cloud project ID (required)
+- `DOCUMENT_AI_PROCESSOR_ID` - Document AI processor ID (required)
+- `GCS_BUCKET_NAME` - Google Cloud Storage bucket (required)
+- `SUPABASE_URL`, `SUPABASE_ANON_KEY`, `SUPABASE_SERVICE_KEY` - Supabase PostgreSQL connection
+- `DATABASE_URL` - Direct PostgreSQL connection string for bypass operations
+- `OPENAI_API_KEY` - OpenAI API key for embeddings and models
+- `ANTHROPIC_API_KEY` - Anthropic Claude API key
+- `OPENROUTER_API_KEY` - OpenRouter API key (optional, uses BYOK with Anthropic key)
+
+**Key Configuration Variables (Frontend):**
+- `VITE_API_BASE_URL` - Backend API endpoint
+- `VITE_FIREBASE_*` - Firebase configuration (API key, auth domain, project ID, etc.)
+
+**Build Configuration:**
+- **Backend**: `backend/tsconfig.json` - Strict TypeScript, CommonJS module output, ES2020 target
+- **Frontend**: `frontend/tsconfig.json` - ES2020 target, JSX React support, path alias `@/*`
+- **Firebase**: `backend/firebase.json` - Node.js 20 runtime, Firebase Functions emulator on port 5001
+
+## Platform Requirements
+
+**Development:**
+- Node.js 20.x
+- npm 9+
+- Google Cloud credentials (for Document AI and GCS)
+- Firebase project credentials (service account key)
+- Supabase project URL and keys
+
+**Production:**
+- **Backend**: Firebase Cloud Functions (Node.js 20 runtime) or Google Cloud Run
+- **Frontend**: Firebase Hosting (CDN-backed static hosting)
+- **Database**: Supabase PostgreSQL with pgvector extension for vector search
+- **Storage**: Google Cloud Storage for documents and generated PDFs
+- **Memory Limits**: Backend configured with `--max-old-space-size=8192` for large document processing
+
+---
+
+*Stack analysis: 2026-02-24*
--- a/.planning/codebase/STRUCTURE.md
+++ b/.planning/codebase/STRUCTURE.md
@@ -0,0 +1,374 @@
+# Codebase Structure
+
+**Analysis Date:** 2026-02-24
+
+## Directory Layout
+
+```
+cim_summary/
+├── backend/                        # Express.js + TypeScript backend (Node.js)
+│   ├── src/
+│   │   ├── index.ts               # Express app + Firebase Functions exports
+│   │   ├── controllers/           # Request handlers
+│   │   ├── models/                # Database access + schema
+│   │   ├── services/              # Business logic + external integrations
+│   │   ├── routes/                # Express route definitions
+│   │   ├── middleware/            # Express middleware (auth, validation, error)
+│   │   ├── config/                # Configuration (env, firebase, supabase)
+│   │   ├── utils/                 # Utilities (logger, validation, parsing)
+│   │   ├── types/                 # TypeScript type definitions
+│   │   ├── scripts/               # One-off CLI scripts (diagnostics, setup)
+│   │   ├── assets/                # Static assets (HTML templates)
+│   │   └── __tests__/             # Test suites (unit, integration, acceptance)
+│   ├── package.json               # Node dependencies
+│   ├── tsconfig.json              # TypeScript config
+│   ├── .eslintrc.json             # ESLint config
+│   └── dist/                       # Compiled JavaScript (generated)
+│
+├── frontend/                       # React + Vite + TypeScript frontend
+│   ├── src/
+│   │   ├── main.tsx               # React entry point
+│   │   ├── App.tsx                # Root component with routing
+│   │   ├── components/            # React components (UI)
+│   │   ├── services/              # API clients (documentService, authService)
+│   │   ├── contexts/              # React Context (AuthContext)
+│   │   ├── config/                # Configuration (env, firebase)
+│   │   ├── types/                 # TypeScript interfaces
+│   │   ├── utils/                 # Utilities (validation, cn, auth debug)
+│   │   └── assets/                # Static images and icons
+│   ├── package.json               # Node dependencies
+│   ├── tsconfig.json              # TypeScript config
+│   ├── vite.config.ts             # Vite bundler config
+│   ├── eslintrc.json              # ESLint config
+│   ├── tailwind.config.js          # Tailwind CSS config
+│   ├── postcss.config.js           # PostCSS config
+│   └── dist/                       # Built static assets (generated)
+│
+├── .planning/                      # GSD planning directory
+│   └── codebase/                  # Codebase analysis documents
+│
+├── package.json                    # Monorepo root package (if used)
+├── .git/                           # Git repository
+├── .gitignore                      # Git ignore rules
+├── .cursorrules                    # Cursor IDE configuration
+├── README.md                       # Project overview
+├── CONFIGURATION_GUIDE.md          # Setup instructions
+├── CODEBASE_ARCHITECTURE_SUMMARY.md # Existing architecture notes
+└── [PDF documents]                 # Sample CIM documents for testing
+```
+
+## Directory Purposes
+
+**backend/src/:**
+- Purpose: All backend server code
+- Contains: TypeScript source files
+- Key files: `index.ts` (main app), routes, controllers, services, models
+
+**backend/src/controllers/:**
+- Purpose: HTTP request handlers
+- Contains: `documentController.ts`, `authController.ts`
+- Functions: Map HTTP requests to service calls, handle validation, construct responses
+
+**backend/src/services/:**
+- Purpose: Business logic and external integrations
+- Contains: Document processing, LLM integration, file storage, database, job queue
+- Key files:
+  - `unifiedDocumentProcessor.ts` - Orchestrator, strategy selection
+  - `singlePassProcessor.ts` - 2-LLM extraction (current default)
+  - `optimizedAgenticRAGProcessor.ts` - Advanced agentic processing (stub)
+  - `documentAiProcessor.ts` - Google Document AI OCR
+  - `llmService.ts` - LLM API calls (Anthropic/OpenAI/OpenRouter)
+  - `jobQueueService.ts` - Async job queue (in-memory, EventEmitter)
+  - `jobProcessorService.ts` - Dequeue and execute jobs
+  - `fileStorageService.ts` - GCS signed URLs and upload
+  - `vectorDatabaseService.ts` - Supabase pgvector operations
+  - `pdfGenerationService.ts` - Puppeteer PDF rendering
+  - `uploadProgressService.ts` - Track upload status
+  - `uploadMonitoringService.ts` - Monitor processing progress
+  - `llmSchemas.ts` - Zod schemas for LLM extraction (CIMReview, financial data)
+
+**backend/src/models/:**
+- Purpose: Database access layer and schema definitions
+- Contains: Document, User, ProcessingJob, Feedback models
+- Key files:
+  - `types.ts` - TypeScript interfaces (Document, ProcessingJob, ProcessingStatus)
+  - `DocumentModel.ts` - Document CRUD with retry logic
+  - `ProcessingJobModel.ts` - Job tracking in database
+  - `UserModel.ts` - User management
+  - `VectorDatabaseModel.ts` - Vector embedding queries
+  - `migrate.ts` - Database migrations
+  - `seed.ts` - Test data seeding
+  - `migrations/` - SQL migration files
+
+**backend/src/routes/:**
+- Purpose: Express route definitions
+- Contains: Route handlers and middleware bindings
+- Key files:
+  - `documents.ts` - GET/POST/PUT/DELETE document endpoints
+  - `vector.ts` - Vector search endpoints
+  - `monitoring.ts` - Health and status endpoints
+  - `documentAudit.ts` - Audit log endpoints
+
+**backend/src/middleware/:**
+- Purpose: Express middleware for cross-cutting concerns
+- Contains: Authentication, validation, error handling
+- Key files:
+  - `firebaseAuth.ts` - Firebase ID token verification
+  - `errorHandler.ts` - Global error handling + correlation ID
+  - `notFoundHandler.ts` - 404 handler
+  - `validation.ts` - Request validation (UUID, pagination)
+
+**backend/src/config/:**
+- Purpose: Configuration and initialization
+- Contains: Environment setup, service initialization
+- Key files:
+  - `env.ts` - Environment variable validation (Joi schema)
+  - `firebase.ts` - Firebase Admin SDK initialization
+  - `supabase.ts` - Supabase client and pool setup
+  - `database.ts` - PostgreSQL connection (legacy)
+  - `errorConfig.ts` - Error handling config
+
+**backend/src/utils/:**
+- Purpose: Shared utility functions
+- Contains: Logging, validation, parsing
+- Key files:
+  - `logger.ts` - Winston logger setup (console + file transports)
+  - `validation.ts` - UUID and pagination validators
+  - `googleServiceAccount.ts` - Google Cloud credentials resolution
+  - `financialExtractor.ts` - Financial data parsing (deprecated for single-pass)
+  - `templateParser.ts` - CIM template utilities
+  - `auth.ts` - Authentication helpers
+
+**backend/src/scripts/:**
+- Purpose: One-off CLI scripts for diagnostics and setup
+- Contains: Database setup, testing, monitoring
+- Key files:
+  - `setup-database.ts` - Initialize database schema
+  - `monitor-document-processing.ts` - Watch job queue status
+  - `check-current-job.ts` - Debug stuck jobs
+  - `test-full-llm-pipeline.ts` - End-to-end testing
+  - `comprehensive-diagnostic.ts` - System health check
+
+**backend/src/__tests__/:**
+- Purpose: Test suites
+- Contains: Unit, integration, acceptance tests
+- Subdirectories:
+  - `unit/` - Isolated component tests
+  - `integration/` - Multi-component tests
+  - `acceptance/` - End-to-end flow tests
+  - `mocks/` - Mock data and fixtures
+  - `utils/` - Test utilities
+
+**frontend/src/:**
+- Purpose: All frontend code
+- Contains: React components, services, types
+
+**frontend/src/components/:**
+- Purpose: React UI components
+- Contains: Page components, reusable widgets
+- Key files:
+  - `DocumentUpload.tsx` - File upload UI with drag-and-drop
+  - `DocumentList.tsx` - List of processed documents
+  - `DocumentViewer.tsx` - View and edit extracted data
+  - `ProcessingProgress.tsx` - Real-time processing status
+  - `UploadMonitoringDashboard.tsx` - Admin view of active jobs
+  - `LoginForm.tsx` - Firebase auth login UI
+  - `ProtectedRoute.tsx` - Route guard for authenticated pages
+  - `Analytics.tsx` - Document analytics and statistics
+  - `CIMReviewTemplate.tsx` - Display extracted CIM review data
+
+**frontend/src/services/:**
+- Purpose: API clients and external service integration
+- Contains: HTTP clients for backend
+- Key files:
+  - `documentService.ts` - Document API calls (upload, list, process, status)
+  - `authService.ts` - Firebase authentication (login, logout, token)
+  - `adminService.ts` - Admin-only operations
+
+**frontend/src/contexts/:**
+- Purpose: React Context for global state
+- Contains: AuthContext for user and authentication state
+- Key files:
+  - `AuthContext.tsx` - User, token, login/logout state
+
+**frontend/src/config/:**
+- Purpose: Configuration
+- Contains: Environment variables, Firebase setup
+- Key files:
+  - `env.ts` - VITE_API_BASE_URL and other env vars
+  - `firebase.ts` - Firebase client initialization
+
+**frontend/src/types/:**
+- Purpose: TypeScript interfaces
+- Contains: API response types, component props
+- Key files:
+  - `auth.ts` - User, LoginCredentials, AuthContextType
+
+**frontend/src/utils/:**
+- Purpose: Shared utility functions
+- Contains: Validation, CSS utilities
+- Key files:
+  - `validation.ts` - Email, password validators
+  - `cn.ts` - Classname merger (clsx wrapper)
+  - `authDebug.ts` - Authentication debugging helpers
+
+## Key File Locations
+
+**Entry Points:**
+- `backend/src/index.ts` - Main Express app and Firebase Functions exports
+- `frontend/src/main.tsx` - React entry point
+- `frontend/src/App.tsx` - Root component with routing
+
+**Configuration:**
+- `backend/src/config/env.ts` - Environment variable schema and validation
+- `backend/src/config/firebase.ts` - Firebase Admin SDK setup
+- `backend/src/config/supabase.ts` - Supabase client and connection pool
+- `frontend/src/config/firebase.ts` - Firebase client configuration
+- `frontend/src/config/env.ts` - Frontend environment variables
+
+**Core Logic:**
+- `backend/src/services/unifiedDocumentProcessor.ts` - Main document processing orchestrator
+- `backend/src/services/singlePassProcessor.ts` - Single-pass 2-LLM strategy
+- `backend/src/services/llmService.ts` - LLM API integration with retry
+- `backend/src/services/jobQueueService.ts` - Background job queue
+- `backend/src/services/vectorDatabaseService.ts` - Vector search implementation
+
+**Testing:**
+- `backend/src/__tests__/unit/` - Unit tests
+- `backend/src/__tests__/integration/` - Integration tests
+- `backend/src/__tests__/acceptance/` - End-to-end tests
+
+**Database:**
+- `backend/src/models/types.ts` - TypeScript type definitions
+- `backend/src/models/DocumentModel.ts` - Document CRUD operations
+- `backend/src/models/ProcessingJobModel.ts` - Job tracking
+- `backend/src/models/migrations/` - SQL migration files
+
+**Middleware:**
+- `backend/src/middleware/firebaseAuth.ts` - JWT authentication
+- `backend/src/middleware/errorHandler.ts` - Global error handling
+- `backend/src/middleware/validation.ts` - Input validation
+
+**Logging:**
+- `backend/src/utils/logger.ts` - Winston logger configuration
+
+## Naming Conventions
+
+**Files:**
+- Controllers: `{resource}Controller.ts` (e.g., `documentController.ts`)
+- Services: `{service}Service.ts` or descriptive (e.g., `llmService.ts`, `singlePassProcessor.ts`)
+- Models: `{Entity}Model.ts` (e.g., `DocumentModel.ts`)
+- Routes: `{resource}.ts` (e.g., `documents.ts`)
+- Middleware: `{purpose}Handler.ts` or `{purpose}.ts` (e.g., `firebaseAuth.ts`)
+- Types/Interfaces: `types.ts` or `{name}Types.ts`
+- Tests: `{file}.test.ts` or `{file}.spec.ts`
+
+**Directories:**
+- Plurals for collections: `services/`, `models/`, `utils/`, `routes/`, `controllers/`
+- Singular for specific features: `config/`, `middleware/`, `types/`, `contexts/`
+- Nested by feature in larger directories: `__tests__/unit/`, `models/migrations/`
+
+**Functions/Variables:**
+- Camel case: `processDocument()`, `getUserId()`, `documentId`
+- Constants: UPPER_SNAKE_CASE: `MAX_RETRIES`, `TIMEOUT_MS`
+- Private methods: Prefix with `_` or use TypeScript `private`: `_retryOperation()`
+
+**Classes:**
+- Pascal case: `DocumentModel`, `JobQueueService`, `SinglePassProcessor`
+- Service instances exported as singletons: `export const llmService = new LLMService()`
+
+**React Components:**
+- Pascal case: `DocumentUpload.tsx`, `ProtectedRoute.tsx`
+- Hooks: `use{Feature}` (e.g., `useAuth` from AuthContext)
+
+## Where to Add New Code
+
+**New Document Processing Strategy:**
+- Primary code: `backend/src/services/{strategyName}Processor.ts`
+- Schema: Add types to `backend/src/services/llmSchemas.ts`
+- Integration: Register in `backend/src/services/unifiedDocumentProcessor.ts`
+- Tests: `backend/src/__tests__/integration/{strategyName}.test.ts`
+
+**New API Endpoint:**
+- Route: `backend/src/routes/{resource}.ts`
+- Controller: `backend/src/controllers/{resource}Controller.ts`
+- Service: `backend/src/services/{resource}Service.ts` (if needed)
+- Model: `backend/src/models/{Resource}Model.ts` (if database access)
+- Tests: `backend/src/__tests__/integration/{endpoint}.test.ts`
+
+**New React Component:**
+- Component: `frontend/src/components/{ComponentName}.tsx`
+- Types: Add to `frontend/src/types/` or inline in component
+- Services: Use existing `frontend/src/services/documentService.ts`
+- Tests: `frontend/src/__tests__/{ComponentName}.test.tsx` (if added)
+
+**Shared Utilities:**
+- Backend: `backend/src/utils/{utility}.ts`
+- Frontend: `frontend/src/utils/{utility}.ts`
+- Avoid code duplication - consider extracting common patterns
+
+**Database Schema Changes:**
+- Migration file: `backend/src/models/migrations/{timestamp}_{description}.sql`
+- TypeScript interface: Update `backend/src/models/types.ts`
+- Model methods: Update corresponding `*Model.ts` file
+- Run: `npm run db:migrate` in backend
+
+**Configuration Changes:**
+- Environment: Update `backend/src/config/env.ts` (Joi schema)
+- Frontend env: Update `frontend/src/config/env.ts`
+- Firebase secrets: Use `firebase functions:secrets:set VAR_NAME`
+- Local dev: Add to `.env` file (gitignored)
+
+## Special Directories
+
+**backend/src/__tests__/mocks/:**
+- Purpose: Mock data and fixtures for testing
+- Generated: No (manually maintained)
+- Committed: Yes
+- Usage: Import in tests for consistent test data
+
+**backend/src/scripts/:**
+- Purpose: One-off CLI utilities for development and operations
+- Generated: No (manually maintained)
+- Committed: Yes
+- Execution: `ts-node src/scripts/{script}.ts` or `npm run {script}`
+
+**backend/src/assets/:**
+- Purpose: Static HTML templates for PDF generation
+- Generated: No (manually maintained)
+- Committed: Yes
+- Usage: Rendered by Puppeteer in `pdfGenerationService.ts`
+
+**backend/src/models/migrations/:**
+- Purpose: Database schema migration SQL files
+- Generated: No (manually created)
+- Committed: Yes
+- Execution: Run via `npm run db:migrate`
+
+**frontend/src/assets/:**
+- Purpose: Images, icons, logos
+- Generated: No (manually added)
+- Committed: Yes
+- Usage: Import in components (e.g., `bluepoint-logo.png`)
+
+**backend/dist/ and frontend/dist/:**
+- Purpose: Compiled JavaScript and optimized bundles
+- Generated: Yes (build output)
+- Committed: No (gitignored)
+- Regeneration: `npm run build` in respective directory
+
+**backend/node_modules/ and frontend/node_modules/:**
+- Purpose: Installed dependencies
+- Generated: Yes (npm install)
+- Committed: No (gitignored)
+- Regeneration: `npm install`
+
+**backend/logs/:**
+- Purpose: Runtime log files
+- Generated: Yes (runtime)
+- Committed: No (gitignored)
+- Contents: `error.log`, `upload.log`, combined logs
+
+---
+
+*Structure analysis: 2026-02-24*
--- a/.planning/codebase/TESTING.md
+++ b/.planning/codebase/TESTING.md
@@ -0,0 +1,342 @@
+# Testing Patterns
+
+**Analysis Date:** 2026-02-24
+
+## Test Framework
+
+**Runner:**
+- Vitest 2.1.0
+- Config: No dedicated `vitest.config.ts` found (uses defaults)
+- Node.js test environment
+
+**Assertion Library:**
+- Vitest native assertions via `expect()`
+- Examples: `expect(value).toBe()`, `expect(value).toBeDefined()`, `expect(array).toContain()`
+
+**Run Commands:**
+```bash
+npm test                    # Run all tests once
+npm run test:watch         # Watch mode for continuous testing
+npm run test:coverage      # Generate coverage report
+```
+
+**Coverage Tool:**
+- `@vitest/coverage-v8` 2.1.0
+- Tracks line, branch, function, and statement coverage
+- V8 backend for accurate coverage metrics
+
+## Test File Organization
+
+**Location:**
+- Co-located in `backend/src/__tests__/` directory
+- Subdirectories for logical grouping:
+  - `backend/src/__tests__/utils/` - Utility function tests
+  - `backend/src/__tests__/mocks/` - Mock implementations
+  - `backend/src/__tests__/acceptance/` - Acceptance/integration tests
+
+**Naming:**
+- Pattern: `[feature].test.ts` or `[feature].spec.ts`
+- Examples:
+  - `backend/src/__tests__/financial-summary.test.ts`
+  - `backend/src/__tests__/acceptance/handiFoods.acceptance.test.ts`
+
+**Structure:**
+```
+backend/src/__tests__/
+├── utils/
+│   └── test-helpers.ts              # Test utility functions
+├── mocks/
+│   └── logger.mock.ts               # Mock implementations
+└── acceptance/
+    └── handiFoods.acceptance.test.ts # Acceptance tests
+```
+
+## Test Structure
+
+**Suite Organization:**
+```typescript
+import { describe, test, expect, beforeAll } from 'vitest';
+
+describe('Feature Category', () => {
+  describe('Nested Behavior Group', () => {
+    test('should do specific thing', () => {
+      expect(result).toBe(expected);
+    });
+
+    test('should handle edge case', () => {
+      expect(edge).toBeDefined();
+    });
+  });
+});
+```
+
+From `financial-summary.test.ts`:
+```typescript
+describe('Financial Summary Fixes', () => {
+  describe('Period Ordering', () => {
+    test('Summary table should display periods in chronological order (FY3 → FY2 → FY1 → LTM)', () => {
+      const periods = ['fy3', 'fy2', 'fy1', 'ltm'];
+      const expectedOrder = ['FY3', 'FY2', 'FY1', 'LTM'];
+
+      expect(periods[0]).toBe('fy3');
+      expect(periods[3]).toBe('ltm');
+    });
+  });
+});
+```
+
+**Patterns:**
+
+1. **Setup Pattern:**
+   - Use `beforeAll()` for shared test data initialization
+   - Example from `handiFoods.acceptance.test.ts`:
+     ```typescript
+     beforeAll(() => {
+       const normalize = (text: string) => text.replace(/\s+/g, ' ').toLowerCase();
+       const cimRaw = fs.readFileSync(cimTextPath, 'utf-8');
+       const outputRaw = fs.readFileSync(outputTextPath, 'utf-8');
+       cimNormalized = normalize(cimRaw);
+       outputNormalized = normalize(outputRaw);
+     });
+     ```
+
+2. **Teardown Pattern:**
+   - Not explicitly shown in current tests
+   - Use `afterAll()` for resource cleanup if needed
+
+3. **Assertion Pattern:**
+   - Descriptive test names that read as sentences: `'should display periods in chronological order'`
+   - Multiple assertions per test acceptable for related checks
+   - Use `expect().toContain()` for array/string membership
+   - Use `expect().toBeDefined()` for existence checks
+   - Use `expect().toBeGreaterThan()` for numeric comparisons
+
+## Mocking
+
+**Framework:** Vitest `vi` mock utilities
+
+**Patterns:**
+
+1. **Mock Logger:**
+   ```typescript
+   import { vi } from 'vitest';
+
+   export const mockLogger = {
+     debug: vi.fn(),
+     info: vi.fn(),
+     warn: vi.fn(),
+     error: vi.fn(),
+   };
+
+   export const mockStructuredLogger = {
+     uploadStart: vi.fn(),
+     uploadSuccess: vi.fn(),
+     uploadError: vi.fn(),
+     processingStart: vi.fn(),
+     processingSuccess: vi.fn(),
+     processingError: vi.fn(),
+     storageOperation: vi.fn(),
+     jobQueueOperation: vi.fn(),
+     info: vi.fn(),
+     warn: vi.fn(),
+     error: vi.fn(),
+     debug: vi.fn(),
+   };
+   ```
+
+2. **Mock Service Pattern:**
+   - Create mock implementations in `backend/src/__tests__/mocks/`
+   - Export as named exports: `export const mockLogger`, `export const mockStructuredLogger`
+   - Use `vi.fn()` for all callable methods to track calls and arguments
+
+3. **What to Mock:**
+   - External services: Firebase Auth, Supabase, Google Cloud APIs
+   - Logger: always mock to prevent log spam during tests
+   - File system operations (in unit tests; use real files in acceptance tests)
+   - LLM API calls: mock responses to avoid quota usage
+
+4. **What NOT to Mock:**
+   - Core utility functions: use real implementations
+   - Type definitions: no need to mock types
+   - Pure functions: test directly without mocks
+   - Business logic calculations: test with real data
+
+## Fixtures and Factories
+
+**Test Data:**
+
+1. **Helper Factory Pattern:**
+   From `backend/src/__tests__/utils/test-helpers.ts`:
+   ```typescript
+   export function createMockCorrelationId(): string {
+     return `test-correlation-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockUserId(): string {
+     return `test-user-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockDocumentId(): string {
+     return `test-doc-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function createMockJobId(): string {
+     return `test-job-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+   }
+
+   export function wait(ms: number): Promise<void> {
+     return new Promise((resolve) => setTimeout(resolve, ms));
+   }
+   ```
+
+2. **Acceptance Test Fixtures:**
+   - Located in `backend/test-fixtures/` directory
+   - Example: `backend/test-fixtures/handiFoods/` contains:
+     - `handi-foods-cim.txt` - Reference CIM content
+     - `handi-foods-output.txt` - Expected processor output
+   - Loaded via `fs.readFileSync()` in `beforeAll()` hooks
+
+**Location:**
+- Test helpers: `backend/src/__tests__/utils/test-helpers.ts`
+- Acceptance fixtures: `backend/test-fixtures/` (outside src)
+- Mocks: `backend/src/__tests__/mocks/`
+
+## Coverage
+
+**Requirements:**
+- No automated coverage enforcement detected (no threshold in config)
+- Manual review recommended for critical paths
+
+**View Coverage:**
+```bash
+npm run test:coverage
+```
+
+## Test Types
+
+**Unit Tests:**
+- **Scope:** Individual functions, services, utilities
+- **Approach:** Test in isolation with mocks for dependencies
+- **Examples:**
+  - Financial parser tests: parse tables with various formats
+  - Period ordering tests: verify chronological order logic
+  - Validate UUID format tests: regex pattern matching
+- **Location:** `backend/src/__tests__/[feature].test.ts`
+
+**Integration Tests:**
+- **Scope:** Multiple components working together
+- **Approach:** May use real Supabase/Firebase or mocks depending on test level
+- **Not heavily used:** minimal integration test infrastructure
+- **Pattern:** Could use real database in test environment with cleanup
+
+**Acceptance Tests:**
+- **Scope:** End-to-end feature validation with real artifacts
+- **Approach:** Load reference files, process through entire pipeline, verify output
+- **Example:** `handiFoods.acceptance.test.ts`
+  - Loads CIM text file
+  - Loads processor output file
+  - Validates all reference facts exist in both
+  - Validates key fields resolved instead of fallback messages
+- **Location:** `backend/src/__tests__/acceptance/`
+
+**E2E Tests:**
+- Not implemented in current setup
+- Would require browser automation (no Playwright/Cypress config found)
+- Frontend testing: not currently automated
+
+## Common Patterns
+
+**Async Testing:**
+```typescript
+test('should process document asynchronously', async () => {
+  const result = await processDocument(documentId, userId, text);
+  expect(result.success).toBe(true);
+});
+```
+
+**Error Testing:**
+```typescript
+test('should validate UUID format', () => {
+  const id = 'invalid-id';
+  const uuidRegex = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
+  expect(uuidRegex.test(id)).toBe(false);
+});
+```
+
+**Array/Collection Testing:**
+```typescript
+test('should extract all financial periods', () => {
+  const result = parseFinancialsFromText(tableText);
+  expect(result.data.fy3.revenue).toBeDefined();
+  expect(result.data.fy2.revenue).toBeDefined();
+  expect(result.data.fy1.revenue).toBeDefined();
+  expect(result.data.ltm.revenue).toBeDefined();
+});
+```
+
+**Text/Content Testing (Acceptance):**
+```typescript
+test('verifies each reference fact exists in CIM and generated output', () => {
+  for (const fact of referenceFacts) {
+    for (const token of fact.tokens) {
+      expect(cimNormalized).toContain(token);
+      expect(outputNormalized).toContain(token);
+    }
+  }
+});
+```
+
+**Normalization for Content Testing:**
+```typescript
+// Normalize whitespace and case for robust text matching
+const normalize = (text: string) => text.replace(/\s+/g, ' ').toLowerCase();
+const normalizedCIM = normalize(cimRaw);
+expect(normalizedCIM).toContain('reference-phrase');
+```
+
+## Test Coverage Priorities
+
+**Critical Paths (Test First):**
+1. Document upload and file storage operations
+2. Firebase authentication and token validation
+3. LLM service API interactions with retry logic
+4. Error handling and correlation ID tracking
+5. Financial data extraction and parsing
+6. PDF generation pipeline
+
+**Important Paths (Test Early):**
+1. Vector embeddings and database operations
+2. Job queue processing and timeout handling
+3. Google Document AI text extraction
+4. Supabase Row Level Security policies
+
+**Nice-to-Have (Test Later):**
+1. UI component rendering (would require React Testing Library)
+2. CSS/styling validation
+3. Frontend form submission flows
+4. Analytics tracking
+
+## Current Testing Gaps
+
+**Untested Areas:**
+- Backend services: Most services lack unit tests (llmService, fileStorageService, etc.)
+- Database models: No model tests for Supabase operations
+- Controllers/Endpoints: No API endpoint tests
+- Frontend components: No React component tests
+- Integration flows: Document upload through processing to PDF generation
+
+**Missing Patterns:**
+- No database integration test setup (fixtures, transactions)
+- No API request/response validation tests
+- No performance/load tests
+- No security tests (auth bypass, XSS, injection)
+
+## Deprecated Test Patterns (DO NOT USE)
+
+- ❌ Jest test suite - Use Vitest instead
+- ❌ Direct PostgreSQL connection tests - Use Supabase in test mode
+- ❌ Legacy test files referencing removed services - Updated implementations used only
+
+---
+
+*Testing analysis: 2026-02-24*