Files

admin 9c916d12f4 feat: Production release v2.0.0 - Simple Document Processor

Major release with significant performance improvements and new processing strategy.

## Core Changes
- Implemented simple_full_document processing strategy (default)
- Full document → LLM approach: 1-2 passes, ~5-6 minutes processing time
- Achieved 100% completeness with 2 API calls (down from 5+)
- Removed redundant Document AI passes for faster processing

## Financial Data Extraction
- Enhanced deterministic financial table parser
- Improved FY3/FY2/FY1/LTM identification from varying CIM formats
- Automatic merging of parser results with LLM extraction

## Code Quality & Infrastructure
- Cleaned up debug logging (removed emoji markers from production code)
- Fixed Firebase Secrets configuration (using modern defineSecret approach)
- Updated OpenAI API key
- Resolved deployment conflicts (secrets vs environment variables)
- Added .env files to Firebase ignore list

## Deployment
- Firebase Functions v2 deployment successful
- All 7 required secrets verified and configured
- Function URL: https://api-y56ccs6wva-uc.a.run.app

## Performance Improvements
- Processing time: ~5-6 minutes (down from 23+ minutes)
- API calls: 1-2 (down from 5+)
- Completeness: 100% achievable
- LLM Model: claude-3-7-sonnet-latest

## Breaking Changes
- Default processing strategy changed to 'simple_full_document'
- RAG processor available as alternative strategy 'document_ai_agentic_rag'

## Files Changed
- 36 files changed, 5642 insertions(+), 4451 deletions(-)
- Removed deprecated documentation files
- Cleaned up unused services and models

This release represents a major refactoring focused on speed, accuracy, and maintainability.

2025-11-09 21:07:22 -05:00

74 KiB

Raw Blame History

CIM Summary Codebase Architecture Summary

Last Updated: December 2024
Purpose: Comprehensive technical reference for senior developers optimizing and debugging the codebase

System Overview
Application Entry Points
Request Flow & API Architecture
Document Processing Pipeline (Critical Path)
Core Services Deep Dive
Data Models & Database Schema
Component Handoffs & Integration Points
Error Handling & Resilience
Performance Optimization Points
Background Processing Architecture
Frontend Architecture
Configuration & Environment
Debugging Guide
Optimization Opportunities

1. System Overview

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (React)                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ DocumentUpload│  │ DocumentList │  │  Analytics   │         │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘         │
│         │                  │                  │                  │
│         └──────────────────┴──────────────────┘                  │
│                            │                                     │
│                    ┌────────▼────────┐                            │
│                    │  documentService │                            │
│                    │  (Axios Client)  │                            │
│                    └────────┬────────┘                            │
└────────────────────────────┼────────────────────────────────────┘
                             │ HTTPS + JWT
┌────────────────────────────▼────────────────────────────────────┐
│                    Backend (Express + Node.js)                  │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Middleware Chain: CORS → Auth → Validation → Error Handler │ │
│  └──────────────────────────────────────────────────────────┘ │
│                            │                                     │
│         ┌──────────────────┼──────────────────┐                │
│         │                  │                  │                  │
│  ┌──────▼──────┐  ┌────────▼────────┐  ┌─────▼──────┐         │
│  │  Routes     │  │   Controllers    │  │  Services   │         │
│  └──────┬──────┘  └────────┬────────┘  └─────┬──────┘         │
│         │                  │                  │                  │
│         └──────────────────┴──────────────────┘                │
└────────────────────────────┬────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                      │
    ┌────▼────┐        ┌──────▼──────┐      ┌───────▼───────┐
    │Supabase │        │Google Cloud│      │  LLM APIs     │
    │(Postgres)│        │  Storage   │      │(Claude/OpenAI)│
    └─────────┘        └────────────┘      └───────────────┘

Technology Stack

Frontend:

React 18 + TypeScript
Vite (build tool)
Axios (HTTP client)
Firebase Auth (authentication)
React Router (routing)

Backend:

Node.js + Express + TypeScript
Firebase Functions v2 (deployment)
Supabase (PostgreSQL + Vector DB)
Google Cloud Storage (file storage)
Google Document AI (PDF text extraction)
Puppeteer (PDF generation)

AI/ML Services:

Anthropic Claude (primary LLM)
OpenAI (fallback LLM)
OpenRouter (LLM routing)
OpenAI Embeddings (vector embeddings)

Core Purpose

Automated processing and analysis of Confidential Information Memorandums (CIMs) using:

Text Extraction: Google Document AI extracts text from PDFs
Semantic Chunking: Split text into 4000-char chunks with overlap
Vector Embeddings: Generate embeddings for semantic search
LLM Analysis: Claude AI analyzes chunks and generates structured CIMReview data
PDF Generation: Create summary PDF with analysis results

2. Application Entry Points

Backend Entry Point

File: backend/src/index.ts

// Initialize Firebase Admin SDK first
import './config/firebase';

import express from 'express';
import cors from 'cors';
import helmet from 'helmet';
import morgan from 'morgan';
import rateLimit from 'express-rate-limit';
import { config } from './config/env';
import { logger } from './utils/logger';
import documentRoutes from './routes/documents';
import vectorRoutes from './routes/vector';
import monitoringRoutes from './routes/monitoring';
import auditRoutes from './routes/documentAudit';
import { jobQueueService } from './services/jobQueueService';

import { errorHandler, correlationIdMiddleware } from './middleware/errorHandler';
import { notFoundHandler } from './middleware/notFoundHandler';

// Start the job queue service for background processing
jobQueueService.start();

Key Initialization Steps:

Firebase Admin SDK initialization (./config/firebase)
Express app setup with middleware chain
Route registration (/documents, /vector, /monitoring, /api/audit)
Job queue service startup (legacy in-memory queue)
Firebase Functions export for Cloud deployment

Scheduled Function: processDocumentJobs (210:267:backend/src/index.ts)

Runs every minute via Firebase Cloud Scheduler
Processes pending/retrying jobs from database
Detects and resets stuck jobs

Frontend Entry Point

File: frontend/src/main.tsx

import React from 'react';
import ReactDOM from 'react-dom/client';
import App from './App';
import './index.css';

ReactDOM.createRoot(document.getElementById('root')!).render(
  <React.StrictMode>
    <App />
  </React.StrictMode>
);

Main App Component: frontend/src/App.tsx

Sets up React Router
Provides AuthContext
Renders protected routes and dashboard

3. Request Flow & API Architecture

Request Lifecycle

Client Request
    │
    ▼
┌─────────────────────────────────────┐
│ 1. CORS Middleware                   │
│    - Validates origin                │
│    - Sets CORS headers               │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 2. Correlation ID Middleware         │
│    - Generates/reads X-Correlation-ID│
│    - Adds to request object          │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 3. Firebase Auth Middleware          │
│    - Verifies JWT token              │
│    - Attaches user to req.user       │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 4. Rate Limiting                    │
│    - 1000 requests per 15 minutes   │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 5. Body Parsing                     │
│    - JSON (10MB limit)               │
│    - URL-encoded (10MB limit)        │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 6. Route Handler                    │
│    - Matches route pattern           │
│    - Calls controller method         │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 7. Controller                       │
│    - Validates input                 │
│    - Calls service methods           │
│    - Returns response                │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 8. Service Layer                    │
│    - Business logic                  │
│    - Database operations             │
│    - External API calls              │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│ 9. Error Handler (if error)         │
│    - Categorizes error               │
│    - Logs with correlation ID        │
│    - Returns structured response     │
└─────────────────────────────────────┘

Authentication Flow

Middleware: backend/src/middleware/firebaseAuth.ts

export const verifyFirebaseToken = async (
  req: FirebaseAuthenticatedRequest,
  res: Response,
  next: NextFunction
): Promise<void> => {
  try {
    console.log('🔐 Authentication middleware called for:', req.method, req.url);
    console.log('🔐 Request headers:', Object.keys(req.headers));
    
    // Debug Firebase Admin initialization
    console.log('🔐 Firebase apps available:', admin.apps.length);
    console.log('🔐 Firebase app names:', admin.apps.filter(app => app !== null).map(app => app!.name));
    
    const authHeader = req.headers.authorization;
    console.log('🔐 Auth header present:', !!authHeader);
    console.log('🔐 Auth header starts with Bearer:', authHeader?.startsWith('Bearer '));
    
    if (!authHeader || !authHeader.startsWith('Bearer ')) {
      console.log('❌ No valid authorization header');
      res.status(401).json({ error: 'No valid authorization header' });
      return;
    }

    const idToken = authHeader.split('Bearer ')[1];
    console.log('🔐 Token extracted, length:', idToken?.length);
    
    if (!idToken) {
      console.log('❌ No token provided');
      res.status(401).json({ error: 'No token provided' });
      return;
    }

    console.log('🔐 Attempting to verify Firebase ID token...');
    console.log('🔐 Token preview:', idToken.substring(0, 20) + '...');
    
    // Verify the Firebase ID token
    const decodedToken = await admin.auth().verifyIdToken(idToken, true);
    console.log('✅ Token verified successfully for user:', decodedToken.email);
    console.log('✅ Token UID:', decodedToken.uid);
    console.log('✅ Token issuer:', decodedToken.iss);
    
    // Check if token is expired
    const now = Math.floor(Date.now() / 1000);
    if (decodedToken.exp && decodedToken.exp < now) {
      logger.warn('Token expired for user:', decodedToken.uid);
      res.status(401).json({ error: 'Token expired' });
      return;
    }
    
    req.user = decodedToken;
    
    // Log successful authentication
    logger.info('Authenticated request for user:', decodedToken.email);
    
    next();

Frontend Auth: frontend/src/services/authService.ts

Manages Firebase Auth state
Provides token via getToken()
Axios interceptor adds token to requests

Route Structure

Main Routes (backend/src/routes/documents.ts):

POST /documents/upload-url - Get signed upload URL
POST /documents/:id/confirm-upload - Confirm upload and start processing
GET /documents - List user's documents
GET /documents/:id - Get document details
GET /documents/:id/download - Download processed PDF
GET /documents/analytics - Get processing analytics
POST /documents/:id/process-optimized-agentic-rag - Trigger AI processing

Middleware Applied:

// Apply authentication and correlation ID to all routes
router.use(verifyFirebaseToken);
router.use(addCorrelationId);

// Add logging middleware for document routes
router.use((req, res, next) => {
  console.log(`📄 Document route accessed: ${req.method} ${req.path}`);
  next();
});

4. Document Processing Pipeline (Critical Path)

Complete Flow Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                    DOCUMENT PROCESSING PIPELINE                    │
└─────────────────────────────────────────────────────────────────────┘

1. UPLOAD PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ User selects PDF                                             │
   │   ↓                                                          │
   │ DocumentUpload component                                     │
   │   ↓                                                          │
   │ documentService.uploadDocument()                            │
   │   ↓                                                          │
   │ POST /documents/upload-url                                  │
   │   ↓                                                          │
   │ documentController.getUploadUrl()                           │
   │   ↓                                                          │
   │ DocumentModel.create() → documents table                     │
   │   ↓                                                          │
   │ fileStorageService.generateSignedUploadUrl()                 │
   │   ↓                                                          │
   │ Direct upload to GCS via signed URL                         │
   │   ↓                                                          │
   │ POST /documents/:id/confirm-upload                          │
   └─────────────────────────────────────────────────────────────┘

2. JOB CREATION PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ documentController.confirmUpload()                           │
   │   ↓                                                          │
   │ ProcessingJobModel.create() → processing_jobs table          │
   │   ↓                                                          │
   │ Status: 'pending'                                            │
   │   ↓                                                          │
   │ Returns 202 Accepted (async processing)                      │
   └─────────────────────────────────────────────────────────────┘

3. JOB PROCESSING PHASE (Background)
   ┌─────────────────────────────────────────────────────────────┐
   │ Scheduled Function: processDocumentJobs (every 1 minute)     │
   │   OR                                                          │
   │ Immediate processing via jobProcessorService.processJob()    │
   │   ↓                                                          │
   │ JobProcessorService.processJob()                            │
   │   ↓                                                          │
   │ Download file from GCS                                       │
   │   ↓                                                          │
   │ unifiedDocumentProcessor.processDocument()                   │
   └─────────────────────────────────────────────────────────────┘

4. TEXT EXTRACTION PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ documentAiProcessor.processDocument()                        │
   │   ↓                                                          │
   │ Google Document AI API                                      │
   │   ↓                                                          │
   │ Extracted text returned                                      │
   └─────────────────────────────────────────────────────────────┘

5. CHUNKING & EMBEDDING PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ optimizedAgenticRAGProcessor.processLargeDocument()          │
   │   ↓                                                          │
   │ createIntelligentChunks()                                    │
   │   - Semantic boundary detection                               │
   │   - 4000-char chunks with 200-char overlap                  │
   │   ↓                                                          │
   │ processChunksInBatches()                                     │
   │   - Batch size: 10                                           │
   │   - Max concurrent: 5                                        │
   │   ↓                                                          │
   │ storeChunksOptimized()                                       │
   │   ↓                                                          │
   │ vectorDatabaseService.storeEmbedding()                       │
   │   - OpenAI embeddings API                                    │
   │   - Store in document_chunks table                           │
   └─────────────────────────────────────────────────────────────┘

6. LLM ANALYSIS PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ generateLLMAnalysisHybrid()                                  │
   │   ↓                                                          │
   │ llmService.processCIMDocument()                             │
   │   ↓                                                          │
   │ Vector search for relevant chunks                            │
   │   ↓                                                          │
   │ Claude/OpenAI API call with structured prompt                │
   │   ↓                                                          │
   │ Parse and validate CIMReview JSON                             │
   │   ↓                                                          │
   │ Return structured analysisData                                │
   └─────────────────────────────────────────────────────────────┘

7. PDF GENERATION PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ pdfGenerationService.generatePDF()                           │
   │   ↓                                                          │
   │ Puppeteer browser instance                                   │
   │   ↓                                                          │
   │ Render HTML template with analysisData                       │
   │   ↓                                                          │
   │ Generate PDF buffer                                          │
   │   ↓                                                          │
   │ Upload PDF to GCS                                            │
   │   ↓                                                          │
   │ Update document record with PDF path                         │
   └─────────────────────────────────────────────────────────────┘

8. STATUS UPDATE PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │ DocumentModel.updateById()                                    │
   │   - status: 'completed'                                       │
   │   - pdf_path: GCS path                                       │
   │   ↓                                                          │
   │ ProcessingJobModel.markAsCompleted()                              │
   │   ↓                                                          │
   │ Frontend polls /documents/:id for status updates            │
   └─────────────────────────────────────────────────────────────┘

Key Handoff Points

1. Upload to Job Creation

async confirmUpload(req: Request, res: Response): Promise<void> {
  // ... validation ...
  
  // Update status to processing
  await DocumentModel.updateById(documentId, { 
    status: 'processing_llm'
  });

  // Acknowledge the request immediately
  res.status(202).json({
    message: 'Upload confirmed, processing has started.',
    document: document,
    status: 'processing'
  });

  // CRITICAL FIX: Use database-backed job queue
  const { ProcessingJobModel } = await import('../models/ProcessingJobModel');
  await ProcessingJobModel.create({
    document_id: documentId,
    user_id: userId,
    options: {
      fileName: document.original_file_name,
      mimeType: 'application/pdf'
    }
  });
}

2. Job Processing to Document Processing

private async processJob(jobId: string): Promise<{ success: boolean; error?: string }> {
  // Get job details
  job = await ProcessingJobModel.findById(jobId);
  
  // Mark job as processing
  await ProcessingJobModel.markAsProcessing(jobId);

  // Download file from GCS
  const fileBuffer = await fileStorageService.downloadFile(document.file_path);

  // Process document
  const result = await unifiedDocumentProcessor.processDocument(
    job.document_id,
    job.user_id,
    fileBuffer.toString('utf-8'), // This will be re-read as buffer
    {
      fileBuffer,
      fileName: job.options?.fileName || 'document.pdf',
      mimeType: job.options?.mimeType || 'application/pdf'
    }
  );
}

3. Document Processing to Text Extraction

async processDocument(
  documentId: string, 
  userId: string, 
  fileBuffer: Buffer, 
  fileName: string, 
  mimeType: string
): Promise<ProcessingResult> {
  // Step 1: Extract text using Document AI or fallback
  const extractedText = await this.extractTextFromDocument(fileBuffer, fileName, mimeType);
  
  // Step 2: Process extracted text through Agentic RAG
  const agenticRagResult = await this.processWithAgenticRAG(documentId, extractedText);
}

4. Text to Chunking

async processLargeDocument(
  documentId: string,
  text: string,
  options: {
    enableSemanticChunking?: boolean;
    enableMetadataEnrichment?: boolean;
    similarityThreshold?: number;
  } = {}
): Promise<ProcessingResult> {
  // Step 1: Create intelligent chunks with semantic boundaries
  const chunks = await this.createIntelligentChunks(text, documentId, options.enableSemanticChunking);

  // Step 2: Process chunks in batches to manage memory
  const processedChunks = await this.processChunksInBatches(chunks, documentId, options);

  // Step 3: Store chunks with optimized batching
  const embeddingApiCalls = await this.storeChunksOptimized(processedChunks, documentId);

  // Step 4: Generate LLM analysis using HYBRID approach
  const llmResult = await this.generateLLMAnalysisHybrid(documentId, text, processedChunks);
}

5. Core Services Deep Dive

5.1 UnifiedDocumentProcessor

File: backend/src/services/unifiedDocumentProcessor.ts
Purpose: Main orchestrator for document processing strategies

Key Method:

async processDocument(
  documentId: string, 
  userId: string, 
  text: string,
  options: any = {}
): Promise<ProcessingResult> {
  const strategy = options.strategy || 'document_ai_agentic_rag';
  
  logger.info('Processing document with unified processor', {
    documentId,
    strategy,
    textLength: text.length
  });

  // Only support document_ai_agentic_rag strategy
  if (strategy === 'document_ai_agentic_rag') {
    return await this.processWithDocumentAiAgenticRag(documentId, userId, text, options);
  } else {
    throw new Error(`Unsupported processing strategy: ${strategy}. Only 'document_ai_agentic_rag' is supported.`);
  }
}

Dependencies:

documentAiProcessor - Text extraction
optimizedAgenticRAGProcessor - AI processing
llmService - LLM interactions
pdfGenerationService - PDF generation

Error Handling: Wraps errors with detailed context, validates analysisData presence

5.2 OptimizedAgenticRAGProcessor

File: backend/src/services/optimizedAgenticRAGProcessor.ts (1885 lines)
Purpose: Core AI processing engine for chunking, embeddings, and LLM analysis

Key Configuration:

private readonly maxChunkSize = 4000; // Optimal chunk size for embeddings
private readonly overlapSize = 200; // Overlap between chunks
private readonly maxConcurrentEmbeddings = 5; // Limit concurrent API calls
private readonly batchSize = 10; // Process chunks in batches

Key Methods:

processLargeDocument() - Main entry point
createIntelligentChunks() - Semantic chunking with boundary detection
processChunksInBatches() - Batch processing for memory efficiency
storeChunksOptimized() - Embedding generation and storage
generateLLMAnalysisHybrid() - LLM analysis with vector search

Performance Optimizations:

Semantic boundary detection (paragraphs, sections)
Batch processing to limit memory usage
Concurrent embedding generation (max 5)
Vector search with document_id filtering

5.3 JobProcessorService

File: backend/src/services/jobProcessorService.ts
Purpose: Database-backed job processor (replaces legacy in-memory queue)

Key Method:

async processJobs(): Promise<{
  processed: number;
  succeeded: number;
  failed: number;
  skipped: number;
}> {
  // Prevent concurrent processing runs
  if (this.isProcessing) {
    logger.info('Job processor already running, skipping this run');
    return { processed: 0, succeeded: 0, failed: 0, skipped: 0 };
  }

  this.isProcessing = true;
  const stats = { processed: 0, succeeded: 0, failed: 0, skipped: 0 };

  try {
    // Reset stuck jobs first
    const resetCount = await ProcessingJobModel.resetStuckJobs(this.JOB_TIMEOUT_MINUTES);
    
    // Get pending jobs
    const pendingJobs = await ProcessingJobModel.getPendingJobs(this.MAX_CONCURRENT_JOBS);
    
    // Get retrying jobs
    const retryingJobs = await ProcessingJobModel.getRetryableJobs(
      Math.max(0, this.MAX_CONCURRENT_JOBS - pendingJobs.length)
    );

    const allJobs = [...pendingJobs, ...retryingJobs];

    // Process jobs in parallel (up to MAX_CONCURRENT_JOBS)
    const results = await Promise.allSettled(
      allJobs.map((job) => this.processJob(job.id))
    );

Configuration:

MAX_CONCURRENT_JOBS = 3
JOB_TIMEOUT_MINUTES = 15

Features:

Stuck job detection and recovery
Retry logic with exponential backoff
Parallel processing with concurrency limit
Database-backed state management

5.4 VectorDatabaseService

File: backend/src/services/vectorDatabaseService.ts
Purpose: Vector embeddings and similarity search

Key Method - Vector Search:

async searchSimilar(
  embedding: number[], 
  limit: number = 10, 
  threshold: number = 0.7,
  documentId?: string
): Promise<VectorSearchResult[]> {
  try {
    if (this.provider === 'supabase') {
      // Use optimized Supabase vector search function with document_id filtering
      // This prevents timeouts by only searching within a specific document
      const rpcParams: any = {
        query_embedding: embedding,
        match_threshold: threshold,
        match_count: limit
      };
      
      // Add document_id filter if provided (critical for performance)
      if (documentId) {
        rpcParams.filter_document_id = documentId;
      }

      // Set a timeout for the RPC call (10 seconds)
      const searchPromise = this.supabaseClient
        .rpc('match_document_chunks', rpcParams);

      const timeoutPromise = new Promise<{ data: null; error: { message: string } }>((_, reject) => {
        setTimeout(() => reject(new Error('Vector search timeout after 10s')), 10000);
      });

      let result: any;
      try {
        result = await Promise.race([searchPromise, timeoutPromise]);
      } catch (timeoutError: any) {
        if (timeoutError.message?.includes('timeout')) {
          logger.error('Vector search timed out', { documentId, timeout: '10s' });
          throw new Error('Vector search timeout after 10s');
        }
        throw timeoutError;
      }

Critical Optimization: Always pass documentId to filter search scope and prevent timeouts

SQL Function: backend/sql/fix_vector_search_timeout.sql

CREATE OR REPLACE FUNCTION match_document_chunks (
  query_embedding vector(1536),
  match_threshold float,
  match_count int,
  filter_document_id text DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  document_id TEXT,
  content text,
  metadata JSONB,
  chunk_index INT,
  similarity float
)
LANGUAGE sql STABLE
AS $$
  SELECT
    document_chunks.id,
    document_chunks.document_id,
    document_chunks.content,
    document_chunks.metadata,
    document_chunks.chunk_index,
    1 - (document_chunks.embedding <=> query_embedding) AS similarity
  FROM document_chunks
  WHERE document_chunks.embedding IS NOT NULL
    AND (filter_document_id IS NULL OR document_chunks.document_id = filter_document_id)
    AND 1 - (document_chunks.embedding <=> query_embedding) > match_threshold
  ORDER BY document_chunks.embedding <=> query_embedding
  LIMIT match_count;
$$;

5.5 LLMService

File: backend/src/services/llmService.ts
Purpose: LLM interactions (Claude/OpenAI/OpenRouter)

Provider Selection:

constructor() {
  // Read provider from config (supports openrouter, anthropic, openai)
  this.provider = config.llm.provider;
  
  // CRITICAL: If provider is not set correctly, log and use fallback
  if (!this.provider || (this.provider !== 'openrouter' && this.provider !== 'anthropic' && this.provider !== 'openai')) {
    logger.error('LLM provider is invalid or not set', {
      provider: this.provider,
      configProvider: config.llm.provider,
      processEnvProvider: process.env['LLM_PROVIDER'],
      defaultingTo: 'anthropic'
    });
    this.provider = 'anthropic'; // Fallback
  }
  
  // Set API key based on provider
  if (this.provider === 'openai') {
    this.apiKey = config.llm.openaiApiKey!;
  } else if (this.provider === 'openrouter') {
    // OpenRouter: Use OpenRouter key if provided, otherwise use Anthropic key for BYOK
    this.apiKey = config.llm.openrouterApiKey || config.llm.anthropicApiKey!;
  } else {
    this.apiKey = config.llm.anthropicApiKey!;
  }
  
  // Use configured model instead of hardcoded value
  this.defaultModel = config.llm.model;
  this.maxTokens = config.llm.maxTokens;
  this.temperature = config.llm.temperature;
}

Key Method:

async processCIMDocument(text: string, template: string, analysis?: Record<string, any>): Promise<CIMAnalysisResult> {
  // Check and truncate text if it exceeds maxInputTokens
  const maxInputTokens = config.llm.maxInputTokens || 200000;
  const systemPromptTokens = this.estimateTokenCount(this.getCIMSystemPrompt());
  const templateTokens = this.estimateTokenCount(template);
  const promptBuffer = config.llm.promptBuffer || 1000;
  
  // Calculate available tokens for document text
  const reservedTokens = systemPromptTokens + templateTokens + promptBuffer + (config.llm.maxTokens || 16000);
  const availableTokens = maxInputTokens - reservedTokens;
  
  const textTokens = this.estimateTokenCount(text);
  let processedText = text;
  let wasTruncated = false;
  
  if (textTokens > availableTokens) {
    logger.warn('Document text exceeds token limit, truncating', {
      textTokens,
      availableTokens,
      maxInputTokens,
      reservedTokens,
      truncationRatio: (availableTokens / textTokens * 100).toFixed(1) + '%'
    });
    
    processedText = this.truncateText(text, availableTokens);
    wasTruncated = true;
  }

Features:

Automatic token counting and truncation
Model selection based on task complexity
JSON schema validation with Zod
Retry logic with exponential backoff
Cost tracking

5.6 DocumentAiProcessor

File: backend/src/services/documentAiProcessor.ts
Purpose: Google Document AI integration for text extraction

Key Method:

async processDocument(
  documentId: string, 
  userId: string, 
  fileBuffer: Buffer, 
  fileName: string, 
  mimeType: string
): Promise<ProcessingResult> {
  const startTime = Date.now();
  
  try {
    logger.info('Starting Document AI + Agentic RAG processing', { 
      documentId, 
      userId, 
      fileName,
      fileSize: fileBuffer.length,
      mimeType
    });

    // Step 1: Extract text using Document AI or fallback
    const extractedText = await this.extractTextFromDocument(fileBuffer, fileName, mimeType);
    
    if (!extractedText) {
      throw new Error('Failed to extract text from document');
    }

    logger.info('Text extraction completed', {
      textLength: extractedText.length
    });

    // Step 2: Process extracted text through Agentic RAG
    const agenticRagResult = await this.processWithAgenticRAG(documentId, extractedText);
    
    const processingTime = Date.now() - startTime;
    
    return {
      success: true,
      content: agenticRagResult.summary || extractedText,
      metadata: {
        processingStrategy: 'document_ai_agentic_rag',
        processingTime,
        extractedTextLength: extractedText.length,
        agenticRagResult,
        fileSize: fileBuffer.length,
        fileName,
        mimeType
      }
    };

Fallback Strategy: Uses pdf-parse if Document AI fails

5.7 PDFGenerationService

File: backend/src/services/pdfGenerationService.ts
Purpose: PDF generation using Puppeteer

Key Features:

Page pooling for performance
Caching for repeated requests
Browser instance reuse
Fallback to PDFKit if Puppeteer fails

Configuration:

class PDFGenerationService {
  private browser: any = null;
  private pagePool: PagePool[] = [];
  private readonly maxPoolSize = 5;
  private readonly pageTimeout = 30000; // 30 seconds
  private readonly cache = new Map<string, { buffer: Buffer; timestamp: number }>();
  private readonly cacheTimeout = 300000; // 5 minutes

  private readonly defaultOptions: PDFGenerationOptions = {
    format: 'A4',
    margin: {
      top: '1in',
      right: '1in',
      bottom: '1in',
      left: '1in',
    },
    displayHeaderFooter: true,
    printBackground: true,
    quality: 'high',
    timeout: 30000,
  };

5.8 FileStorageService

File: backend/src/services/fileStorageService.ts
Purpose: Google Cloud Storage operations

Key Methods:

generateSignedUploadUrl() - Generate signed URL for direct upload
downloadFile() - Download file from GCS
saveBuffer() - Save buffer to GCS
deleteFile() - Delete file from GCS

Credential Handling:

constructor() {
  this.bucketName = config.googleCloud.gcsBucketName;
  
  // Check if we're in Firebase Functions/Cloud Run environment
  const isCloudEnvironment = process.env.FUNCTION_TARGET || 
                             process.env.FUNCTION_NAME || 
                             process.env.K_SERVICE ||
                             process.env.GOOGLE_CLOUD_PROJECT ||
                             !!process.env.GCLOUD_PROJECT ||
                             process.env.X_GOOGLE_GCLOUD_PROJECT;
  
  // Initialize Google Cloud Storage
  const storageConfig: any = {
    projectId: config.googleCloud.projectId,
  };
  
  // Only use keyFilename in local development
  // In Firebase Functions/Cloud Run, use Application Default Credentials
  if (isCloudEnvironment) {
    // In cloud, ALWAYS clear GOOGLE_APPLICATION_CREDENTIALS to force use of ADC
    // Firebase Functions automatically provides credentials via metadata service
    // These credentials have signing capabilities for generating signed URLs
    const originalCreds = process.env.GOOGLE_APPLICATION_CREDENTIALS;
    if (originalCreds) {
      delete process.env.GOOGLE_APPLICATION_CREDENTIALS;
      logger.info('Using Application Default Credentials for GCS (cloud environment)', {
        clearedEnvVar: 'GOOGLE_APPLICATION_CREDENTIALS',
        originalValue: originalCreds,
        projectId: config.googleCloud.projectId
      });
    }

6. Data Models & Database Schema

Core Models

DocumentModel (backend/src/models/DocumentModel.ts):

create() - Create document record
findById() - Get document by ID
updateById() - Update document status/metadata
findByUserId() - List user's documents

ProcessingJobModel (backend/src/models/ProcessingJobModel.ts):

create() - Create processing job (uses direct PostgreSQL to bypass PostgREST cache)
findById() - Get job by ID
getPendingJobs() - Get pending jobs (limit by concurrency)
getRetryableJobs() - Get jobs ready for retry
markAsProcessing() - Update job status
markAsCompleted() - Mark job complete
markAsFailed() - Mark job failed with error
resetStuckJobs() - Reset jobs stuck in processing

VectorDatabaseModel (backend/src/models/VectorDatabaseModel.ts):

Chunk storage and retrieval
Embedding management

Database Tables

documents:

id (UUID, primary key)
user_id (UUID, foreign key)
original_file_name (text)
file_path (text, GCS path)
file_size (bigint)
status (text: 'uploading', 'uploaded', 'processing_llm', 'completed', 'failed')
pdf_path (text, GCS path for generated PDF)
created_at, updated_at (timestamps)

processing_jobs:

id (UUID, primary key)
document_id (UUID, foreign key)
user_id (UUID, foreign key)
status (text: 'pending', 'processing', 'completed', 'failed', 'retrying')
attempts (int)
max_attempts (int, default 3)
options (JSONB, processing options)
error (text, error message if failed)
result (JSONB, processing result)
created_at, started_at, completed_at, updated_at (timestamps)

document_chunks:

id (UUID, primary key)
document_id (text, foreign key)
content (text)
embedding (vector(1536))
metadata (JSONB)
chunk_index (int)
created_at, updated_at (timestamps)

agentic_rag_sessions:

id (UUID, primary key)
document_id (UUID, foreign key)
user_id (UUID, foreign key)
status (text)
metadata (JSONB)
created_at, updated_at (timestamps)

Vector Search Optimization

Critical SQL Function: match_document_chunks with document_id filtering

CREATE OR REPLACE FUNCTION match_document_chunks (
  query_embedding vector(1536),
  match_threshold float,
  match_count int,
  filter_document_id text DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  document_id TEXT,
  content text,
  metadata JSONB,
  chunk_index INT,
  similarity float
)
LANGUAGE sql STABLE
AS $$
  SELECT
    document_chunks.id,
    document_chunks.document_id,
    document_chunks.content,
    document_chunks.metadata,
    document_chunks.chunk_index,
    1 - (document_chunks.embedding <=> query_embedding) AS similarity
  FROM document_chunks
  WHERE document_chunks.embedding IS NOT NULL
    AND (filter_document_id IS NULL OR document_chunks.document_id = filter_document_id)
    AND 1 - (document_chunks.embedding <=> query_embedding) > match_threshold
  ORDER BY document_chunks.embedding <=> query_embedding
  LIMIT match_count;
$$;

Always pass filter_document_id to prevent timeouts when searching across all documents.

7. Component Handoffs & Integration Points

Frontend ↔ Backend

Axios Interceptor (frontend/src/services/documentService.ts):

export const apiClient = axios.create({
  baseURL: API_BASE_URL,
  timeout: 300000, // 5 minutes
});

// Add auth token to requests
apiClient.interceptors.request.use(async (config) => {
  const token = await authService.getToken();
  if (token) {
    config.headers.Authorization = `Bearer ${token}`;
  }
  return config;
});

// Handle auth errors with retry logic
apiClient.interceptors.response.use(
  (response) => response,
  async (error) => {
    const originalRequest = error.config;
    
    if (error.response?.status === 401 && !originalRequest._retry) {
      originalRequest._retry = true;
      
      try {
        // Attempt to refresh the token
        const newToken = await authService.getToken();
        if (newToken) {
          // Retry the original request with the new token
          originalRequest.headers.Authorization = `Bearer ${newToken}`;
          return apiClient(originalRequest);
        }
      } catch (refreshError) {
        console.error('Token refresh failed:', refreshError);
      }
      
      // If token refresh fails, logout the user
      authService.logout();
      window.location.href = '/login';
    }
    
    return Promise.reject(error);
  }
);

Backend ↔ Database

Two Connection Methods:

Supabase Client (default for most operations):

import { getSupabaseServiceClient } from '../config/supabase';
const supabase = getSupabaseServiceClient();

Direct PostgreSQL (for critical operations, bypasses PostgREST cache):

static async create(data: CreateProcessingJobData): Promise<ProcessingJob> {
  try {
    // Use direct PostgreSQL connection to bypass PostgREST cache
    // This is critical because PostgREST cache issues can block entire processing pipeline
    const pool = getPostgresPool();
    
    const result = await pool.query(
      `INSERT INTO processing_jobs (
        document_id, user_id, status, attempts, max_attempts, options, created_at
      ) VALUES ($1, $2, $3, $4, $5, $6, $7)
      RETURNING *`,
      [
        data.document_id,
        data.user_id,
        'pending',
        0,
        data.max_attempts || 3,
        JSON.stringify(data.options || {}),
        new Date().toISOString()
      ]
    );

    if (result.rows.length === 0) {
      throw new Error('Failed to create processing job: No data returned');
    }

    const job = result.rows[0];

    logger.info('Processing job created via direct PostgreSQL', {
      jobId: job.id,
      documentId: data.document_id,
      userId: data.user_id,
    });

    return job;

Backend ↔ GCS

Signed URL Generation:

const uploadUrl = await fileStorageService.generateSignedUploadUrl(filePath, contentType);

Direct Upload (frontend):

const fetchPromise = fetch(uploadUrl, {
  method: 'PUT',
  headers: {
    'Content-Type': contentType, // Must match exactly what was used in signed URL generation
  },
  body: file,
  signal: signal,
});

File Download (for processing):

const fileBuffer = await fileStorageService.downloadFile(document.file_path);

Backend ↔ Document AI

Text Extraction:

private async extractTextFromDocument(fileBuffer: Buffer, fileName: string, mimeType: string): Promise<string> {
  try {
    // Check document size first
    // ... size validation ...
    
    // Upload to GCS for Document AI processing
    const gcsFileName = `temp/${Date.now()}_${fileName}`;
    await this.storageClient.bucket(this.gcsBucketName).file(gcsFileName).save(fileBuffer);
    
    // Process with Document AI
    const request = {
      name: this.processorName,
      rawDocument: {
        gcsSource: {
          uri: `gs://${this.gcsBucketName}/${gcsFileName}`
        },
        mimeType: mimeType
      }
    };
    
    const [result] = await this.documentAiClient.processDocument(request);
    
    // Extract text from result
    const text = result.document?.text || '';
    
    // Clean up temp file
    await this.storageClient.bucket(this.gcsBucketName).file(gcsFileName).delete();
    
    return text;
  } catch (error) {
    // Fallback to pdf-parse
    logger.warn('Document AI failed, using pdf-parse fallback', { error });
    const data = await pdf(fileBuffer);
    return data.text;
  }
}

Backend ↔ LLM APIs

Provider Selection (Claude/OpenAI/OpenRouter):

Configured via LLM_PROVIDER environment variable
Automatic API key selection based on provider
Model selection based on task complexity

Request Flow:

// 1. Token counting and truncation
const processedText = this.truncateText(text, availableTokens);

// 2. Model selection
const model = this.selectModel(taskComplexity);

// 3. API call with retry logic
const response = await this.callLLMAPI({
  prompt: processedText,
  systemPrompt: systemPrompt,
  model: model,
  maxTokens: this.maxTokens,
  temperature: this.temperature
});

// 4. JSON parsing and validation
const parsed = JSON.parse(response.content);
const validated = cimReviewSchema.parse(parsed);

Services ↔ Services

Event-Driven Patterns:

jobQueueService emits events: job:added, job:started, job:completed, job:failed
uploadMonitoringService tracks upload events

Direct Method Calls:

Most service interactions are direct method calls
Services are exported as singletons for easy access

8. Error Handling & Resilience

Error Propagation Path

Service Method
    │
    ▼ (throws error)
Controller
    │
    ▼ (catches, logs, re-throws)
Express Error Handler
    │
    ▼ (categorizes, logs, responds)
Client (structured error response)

Error Categories

File: backend/src/middleware/errorHandler.ts

export enum ErrorCategory {
  VALIDATION = 'validation',
  AUTHENTICATION = 'authentication',
  AUTHORIZATION = 'authorization',
  NOT_FOUND = 'not_found',
  EXTERNAL_SERVICE = 'external_service',
  PROCESSING = 'processing',
  SYSTEM = 'system',
  DATABASE = 'database'
}

Error Response Structure:

export interface ErrorResponse {
  success: false;
  error: {
    code: string;
    message: string;
    details?: any;
    correlationId: string;
    timestamp: string;
    retryable: boolean;
  };
}

Retry Mechanisms

1. Job Retries:

Max attempts: 3 (configurable per job)
Exponential backoff between retries
Jobs marked as retrying status

2. API Retries:

LLM API calls: 3 retries with exponential backoff
Document AI: Fallback to pdf-parse
Vector search: 10-second timeout, fallback to direct query

3. Database Retries:

private static async retryOperation<T>(
  operation: () => Promise<T>,
  operationName: string,
  maxRetries: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  let lastError: any;
  
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error: any) {
      lastError = error;
      const isNetworkError = error?.message?.includes('fetch failed') || 
                           error?.message?.includes('ENOTFOUND') ||
                           error?.message?.includes('ECONNREFUSED') ||
                           error?.message?.includes('ETIMEDOUT') ||
                           error?.name === 'TypeError';
      
      if (!isNetworkError || attempt === maxRetries) {
        throw error;
      }
      
      const delay = baseDelay * Math.pow(2, attempt - 1);
      logger.warn(`${operationName} failed (attempt ${attempt}/${maxRetries}), retrying in ${delay}ms`, {
        error: error?.message || String(error),
        code: error?.code,
        attempt,
        maxRetries
      });
      
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw lastError;
}

Timeout Handling

Vector Search Timeout:

// Set a timeout for the RPC call (10 seconds)
const searchPromise = this.supabaseClient
  .rpc('match_document_chunks', rpcParams);

const timeoutPromise = new Promise<{ data: null; error: { message: string } }>((_, reject) => {
  setTimeout(() => reject(new Error('Vector search timeout after 10s')), 10000);
});

let result: any;
try {
  result = await Promise.race([searchPromise, timeoutPromise]);
} catch (timeoutError: any) {
  if (timeoutError.message?.includes('timeout')) {
    logger.error('Vector search timed out', { documentId, timeout: '10s' });
    throw new Error('Vector search timeout after 10s');
  }
  throw timeoutError;
}

LLM API Timeout: Handled by axios timeout configuration

Job Timeout: 15 minutes, jobs stuck longer are reset

Stuck Job Detection and Recovery

// Reset stuck jobs first
const resetCount = await ProcessingJobModel.resetStuckJobs(this.JOB_TIMEOUT_MINUTES);
if (resetCount > 0) {
  logger.info('Reset stuck jobs', { count: resetCount });
}

Scheduled Function Monitoring:

// Check for jobs stuck in processing status
const stuckProcessingJobs = await ProcessingJobModel.getStuckJobs(15); // Jobs stuck > 15 minutes
if (stuckProcessingJobs.length > 0) {
  logger.warn('Found stuck processing jobs', {
    count: stuckProcessingJobs.length,
    jobIds: stuckProcessingJobs.map(j => j.id),
    timestamp: new Date().toISOString(),
  });
}

// Check for jobs stuck in pending status (alert if > 2 minutes)
const stuckPendingJobs = await ProcessingJobModel.getStuckPendingJobs(2); // Jobs pending > 2 minutes
if (stuckPendingJobs.length > 0) {
  logger.warn('Found stuck pending jobs (may indicate processing issues)', {
    count: stuckPendingJobs.length,
    jobIds: stuckPendingJobs.map(j => j.id),
    oldestJobAge: stuckPendingJobs[0] ? Math.round((Date.now() - new Date(stuckPendingJobs[0].created_at).getTime()) / 1000 / 60) : 0,
    timestamp: new Date().toISOString(),
  });
}

Graceful Degradation

Document AI Failure: Falls back to pdf-parse library

Vector Search Failure: Falls back to direct database query without similarity calculation

LLM API Failure: Returns error with retryable flag, job can be retried

PDF Generation Failure: Falls back to PDFKit if Puppeteer fails

9. Performance Optimization Points

Vector Search Optimization

Critical: Always pass document_id filter to prevent timeouts

// Add document_id filter if provided (critical for performance)
if (documentId) {
  rpcParams.filter_document_id = documentId;
}

SQL Function Optimization: match_document_chunks filters by document_id first before vector similarity calculation

Chunking Strategy

Optimal Configuration:

private readonly maxChunkSize = 4000; // Optimal chunk size for embeddings
private readonly overlapSize = 200; // Overlap between chunks
private readonly maxConcurrentEmbeddings = 5; // Limit concurrent API calls
private readonly batchSize = 10; // Process chunks in batches

Semantic Chunking: Detects paragraph and section boundaries for better chunk quality

Batch Processing

Embedding Generation:

Processes chunks in batches of 10
Max 5 concurrent embedding API calls
Prevents memory overflow and API rate limiting

Chunk Storage:

Batched database inserts
Reduces database round trips

Memory Management

Chunk Processing:

Processes chunks in batches to limit memory usage
Cleans up processed chunks from memory after storage

PDF Generation:

Page pooling (max 5 pages)
Page timeout (30 seconds)
Cache with 5-minute TTL

Database Optimization

Direct PostgreSQL for Critical Operations:

Job creation uses direct PostgreSQL to bypass PostgREST cache issues
Ensures reliable job creation even when PostgREST schema cache is stale

Connection Pooling:

Supabase client uses connection pooling
Direct PostgreSQL uses pg pool

API Call Optimization

LLM Token Management:

Automatic token counting
Text truncation if exceeds limits
Model selection based on complexity (smaller models for simpler tasks)

Embedding Caching:

private semanticCache: Map<string, { embedding: number[]; timestamp: number }> = new Map();
private readonly CACHE_TTL = 3600000; // 1 hour cache TTL

10. Background Processing Architecture

Legacy vs Current System

Legacy: In-Memory Queue (jobQueueService)

EventEmitter-based
In-memory job storage
Still initialized but being phased out
Location: backend/src/services/jobQueueService.ts

Current: Database-Backed Queue (jobProcessorService)

Database-backed job storage
Scheduled processing via Firebase Cloud Scheduler
Location: backend/src/services/jobProcessorService.ts

Job Processing Flow

Job Creation
    │
    ▼
ProcessingJobModel.create()
    │
    ▼
Status: 'pending' in database
    │
    ▼
Scheduled Function (every 1 minute)
    OR
Immediate processing via API
    │
    ▼
JobProcessorService.processJobs()
    │
    ▼
Get pending/retrying jobs (max 3 concurrent)
    │
    ▼
Process jobs in parallel
    │
    ▼
For each job:
  - Mark as 'processing'
  - Download file from GCS
  - Call unifiedDocumentProcessor
  - Update document status
  - Mark job as 'completed' or 'failed'

Scheduled Function

File: backend/src/index.ts

export const processDocumentJobs = onSchedule({
  schedule: 'every 1 minutes', // Minimum interval for Firebase Cloud Scheduler
  timeoutSeconds: 900, // 15 minutes (max for Gen2 scheduled functions)
  memory: '1GiB',
  retryCount: 2, // Retry up to 2 times on failure
}, async (event) => {
  logger.info('Processing document jobs scheduled function triggered', {
    timestamp: new Date().toISOString(),
    scheduleTime: event.scheduleTime,
  });

  try {
    const { jobProcessorService } = await import('./services/jobProcessorService');
    
    // Check for stuck jobs before processing (monitoring)
    const { ProcessingJobModel } = await import('./models/ProcessingJobModel');
    
    // Check for jobs stuck in processing status
    const stuckProcessingJobs = await ProcessingJobModel.getStuckJobs(15); // Jobs stuck > 15 minutes
    if (stuckProcessingJobs.length > 0) {
      logger.warn('Found stuck processing jobs', {
        count: stuckProcessingJobs.length,
        jobIds: stuckProcessingJobs.map(j => j.id),
        timestamp: new Date().toISOString(),
      });
    }
    
    // Check for jobs stuck in pending status (alert if > 2 minutes)
    const stuckPendingJobs = await ProcessingJobModel.getStuckPendingJobs(2); // Jobs pending > 2 minutes
    if (stuckPendingJobs.length > 0) {
      logger.warn('Found stuck pending jobs (may indicate processing issues)', {
        count: stuckPendingJobs.length,
        jobIds: stuckPendingJobs.map(j => j.id),
        oldestJobAge: stuckPendingJobs[0] ? Math.round((Date.now() - new Date(stuckPendingJobs[0].created_at).getTime()) / 1000 / 60) : 0,
        timestamp: new Date().toISOString(),
      });
    }
    
    const result = await jobProcessorService.processJobs();

    logger.info('Document jobs processing completed', {
      ...result,
      timestamp: new Date().toISOString(),
    });
  } catch (error) {
    const errorMessage = error instanceof Error ? error.message : String(error);
    const errorStack = error instanceof Error ? error.stack : undefined;
    
    logger.error('Error processing document jobs', {
      error: errorMessage,
      stack: errorStack,
      timestamp: new Date().toISOString(),
    });

    // Re-throw to trigger retry mechanism (up to retryCount times)
    throw error;
  }
});

Job States

pending → processing → completed
    │                      │
    │                      ▼
    │                   failed
    │                      │
    └──────────────────────┘
                    │
                    ▼
                retrying
                    │
                    ▼
              (back to pending)

Concurrency Control

Max Concurrent Jobs: 3

private readonly MAX_CONCURRENT_JOBS = 3;
private readonly JOB_TIMEOUT_MINUTES = 15;

Processing Logic:

// Get pending jobs
const pendingJobs = await ProcessingJobModel.getPendingJobs(this.MAX_CONCURRENT_JOBS);

// Get retrying jobs (enabled - schema is updated)
const retryingJobs = await ProcessingJobModel.getRetryableJobs(
  Math.max(0, this.MAX_CONCURRENT_JOBS - pendingJobs.length)
);

const allJobs = [...pendingJobs, ...retryingJobs];

if (allJobs.length === 0) {
  logger.debug('No jobs to process');
  return stats;
}

logger.info('Processing jobs', {
  totalJobs: allJobs.length,
  pendingJobs: pendingJobs.length,
  retryingJobs: retryingJobs.length,
});

// Process jobs in parallel (up to MAX_CONCURRENT_JOBS)
const results = await Promise.allSettled(
  allJobs.map((job) => this.processJob(job.id))
);

11. Frontend Architecture

Component Structure

Main Components:

DocumentUpload - File upload with drag-and-drop
DocumentList - List of user's documents with status
DocumentViewer - View processed document and PDF
Analytics - Processing statistics dashboard
UploadMonitoringDashboard - Real-time upload monitoring

State Management

AuthContext (frontend/src/contexts/AuthContext.tsx):

export const AuthProvider: React.FC<AuthProviderProps> = ({ children }) => {
  const [user, setUser] = useState<User | null>(null);
  const [token, setToken] = useState<string | null>(null);
  const [isLoading, setIsLoading] = useState(true);
  const [error, setError] = useState<string | null>(null);
  const [isInitialized, setIsInitialized] = useState(false);

  useEffect(() => {
    setIsLoading(true);
    
    // Listen for Firebase auth state changes
    const unsubscribe = authService.onAuthStateChanged(async (firebaseUser) => {
      try {
        if (firebaseUser) {
          const user = authService.getCurrentUser();
          const token = await authService.getToken();
          setUser(user);
          setToken(token);
        } else {
          setUser(null);
          setToken(null);
        }
      } catch (error) {
        console.error('Auth state change error:', error);
        setError('Authentication error occurred');
        setUser(null);
        setToken(null);
      } finally {
        setIsLoading(false);
        setIsInitialized(true);
      }
    });

    // Cleanup subscription on unmount
    return () => unsubscribe();
  }, []);

API Communication

Document Service (frontend/src/services/documentService.ts):

Axios client with auth interceptor
Automatic token refresh on 401 errors
Progress tracking for uploads
Error handling with user-friendly messages

Upload Flow:

async uploadDocument(
  file: File, 
  onProgress?: (progress: number) => void, 
  signal?: AbortSignal
): Promise<Document> {
  try {
    // Check authentication before upload
    const token = await authService.getToken();
    if (!token) {
      throw new Error('Authentication required. Please log in to upload documents.');
    }

    // Step 1: Get signed upload URL
    onProgress?.(5); // 5% - Getting upload URL
    
    const uploadUrlResponse = await apiClient.post('/documents/upload-url', {
      fileName: file.name,
      fileSize: file.size,
      contentType: contentTypeForSigning
    }, { signal });

    const { documentId, uploadUrl } = uploadUrlResponse.data;

    // Step 2: Upload directly to Firebase Storage
    onProgress?.(10); // 10% - Starting direct upload
    
    await this.uploadToFirebaseStorage(
      file,
      uploadUrl,
      contentTypeForSigning,
      (uploadProgress) => {
        // Map upload progress (10-90%)
        const mappedProgress = 10 + (uploadProgress * 0.8);
        onProgress?.(mappedProgress);
      },
      signal
    );

    // Step 3: Confirm upload
    onProgress?.(90); // 90% - Confirming upload
    
    const confirmResponse = await apiClient.post(
      `/documents/${documentId}/confirm-upload`,
      {},
      { signal }
    );

    onProgress?.(100); // 100% - Complete

    return confirmResponse.data.document;
  } catch (error) {
    // ... error handling ...
  }
}

Real-Time Updates

Polling for Processing Status:

Frontend polls /documents/:id endpoint
Updates UI when status changes from 'processing' to 'completed'
Shows error messages if status is 'failed'

Upload Progress:

Real-time progress tracking via onProgress callback
Visual progress bar in DocumentUpload component

12. Configuration & Environment

Environment Variables

File: backend/src/config/env.ts

Key Configuration Categories:

LLM Provider:
- LLM_PROVIDER - 'anthropic', 'openai', or 'openrouter'
- ANTHROPIC_API_KEY - Claude API key
- OPENAI_API_KEY - OpenAI API key
- OPENROUTER_API_KEY - OpenRouter API key
- LLM_MODEL - Model name (e.g., 'claude-sonnet-4-5-20250929')
- LLM_MAX_TOKENS - Max output tokens
- LLM_MAX_INPUT_TOKENS - Max input tokens (default 200000)
Database:
- SUPABASE_URL - Supabase project URL
- SUPABASE_SERVICE_KEY - Service role key
- SUPABASE_ANON_KEY - Anonymous key
Google Cloud:
- GCLOUD_PROJECT_ID - GCP project ID
- GCS_BUCKET_NAME - Storage bucket name
- DOCUMENT_AI_PROCESSOR_ID - Document AI processor ID
- DOCUMENT_AI_LOCATION - Processor location (default 'us')
Feature Flags:
- AGENTIC_RAG_ENABLED - Enable/disable agentic RAG processing

Configuration Loading

Priority Order:

process.env (Firebase Functions v2)
functions.config() (Firebase Functions v1 fallback)
.env file (local development)

Validation: Joi schema validates all required environment variables

13. Debugging Guide

Key Log Points

Correlation IDs: Every request has a correlation ID for tracing

Structured Logging: Winston logger with structured data

Key Log Locations:

Request Entry: backend/src/index.ts - All incoming requests
Authentication: backend/src/middleware/firebaseAuth.ts - Auth success/failure
Job Processing: backend/src/services/jobProcessorService.ts - Job lifecycle
Document Processing: backend/src/services/unifiedDocumentProcessor.ts - Processing steps
LLM Calls: backend/src/services/llmService.ts - API calls and responses
Vector Search: backend/src/services/vectorDatabaseService.ts - Search operations
Error Handling: backend/src/middleware/errorHandler.ts - All errors with categorization

Common Failure Points

1. Vector Search Timeouts

Symptom: "Vector search timeout after 10s"
Cause: Searching across all documents without document_id filter
Fix: Always pass documentId to vectorDatabaseService.searchSimilar()

2. LLM API Failures

Symptom: "LLM API call failed" or "Invalid JSON response"
Cause: API rate limits, network issues, or invalid response format
Fix: Check API keys, retry logic, and response validation

3. GCS Upload Failures

Symptom: "Failed to upload to GCS" or "Signed URL expired"
Cause: Credential issues, bucket permissions, or URL expiration
Fix: Check GCS credentials and bucket configuration

4. Job Stuck in Processing

Symptom: Job status remains 'processing' for > 15 minutes
Cause: Process crashed, timeout, or error not caught
Fix: Check logs, reset stuck jobs, investigate error

5. Document AI Failures

Symptom: "Failed to extract text from document"
Cause: Document AI API error or invalid file format
Fix: Check Document AI processor configuration, fallback to pdf-parse

Diagnostic Tools

Health Check Endpoints:

GET /health - Basic health check
GET /health/config - Configuration health
GET /health/agentic-rag - Agentic RAG health status

Monitoring Endpoints:

GET /monitoring/upload-metrics - Upload statistics
GET /monitoring/upload-health - Upload health
GET /monitoring/real-time-stats - Real-time statistics

Database Debugging:

-- Check pending jobs
SELECT * FROM processing_jobs WHERE status = 'pending' ORDER BY created_at DESC;

-- Check stuck jobs
SELECT * FROM processing_jobs 
WHERE status = 'processing' 
AND started_at < NOW() - INTERVAL '15 minutes';

-- Check document status
SELECT id, original_file_name, status, created_at 
FROM documents 
WHERE user_id = '<user_id>' 
ORDER BY created_at DESC;

Job Inspection:

// Get job details
const job = await ProcessingJobModel.findById(jobId);

// Check job error
console.log('Job error:', job.error);

// Check job result
console.log('Job result:', job.result);

Debugging Workflow

Identify the Issue: Check error logs with correlation ID
Trace the Request: Follow correlation ID through logs
Check Job Status: Query processing_jobs table for job state
Check Document Status: Query documents table for document state
Review Service Logs: Check specific service logs for detailed errors
Test Components: Test individual services in isolation
Check External Services: Verify GCS, Document AI, LLM APIs are accessible

14. Optimization Opportunities

Identified Bottlenecks

1. Vector Search Performance

Current: 10-second timeout, can be slow for large document sets
Optimization: Ensure document_id filter is always used
Future: Consider indexing optimizations, batch search

2. LLM API Calls

Current: Sequential processing, no caching of similar requests
Optimization: Implement response caching for similar documents
Future: Batch API calls, use smaller models for simpler tasks

3. PDF Generation

Current: Puppeteer can be memory-intensive
Optimization: Page pooling already implemented
Future: Consider serverless PDF generation service

4. Database Queries

Current: Some queries don't use indexes effectively
Optimization: Add indexes on frequently queried columns
Future: Query optimization, connection pooling tuning

Memory Usage Patterns

Chunk Processing:

Processes chunks in batches to limit memory
Cleans up processed chunks after storage
Optimization: Consider streaming for very large documents

PDF Generation:

Page pooling limits memory usage
Browser instance reuse reduces overhead
Optimization: Consider headless browser optimization

API Call Optimization

Embedding Generation:

Current: Max 5 concurrent calls
Optimization: Tune based on API rate limits
Future: Batch embedding API if available

LLM Calls:

Current: Single call per document
Optimization: Use smaller models for simpler tasks
Future: Implement response caching

Database Query Optimization

Frequently Queried Tables:

documents - Add index on user_id, status
processing_jobs - Add index on status, created_at
document_chunks - Add index on document_id, chunk_index

Vector Search:

Current: Uses match_document_chunks function
Optimization: Ensure document_id filter is always used
Future: Consider HNSW index for faster similarity search

Appendix: Key File Locations

Backend Services

backend/src/services/unifiedDocumentProcessor.ts - Main orchestrator
backend/src/services/optimizedAgenticRAGProcessor.ts - AI processing engine
backend/src/services/jobProcessorService.ts - Job processor
backend/src/services/vectorDatabaseService.ts - Vector operations
backend/src/services/llmService.ts - LLM interactions
backend/src/services/documentAiProcessor.ts - Document AI integration
backend/src/services/pdfGenerationService.ts - PDF generation
backend/src/services/fileStorageService.ts - GCS operations

Backend Models

backend/src/models/DocumentModel.ts - Document data model
backend/src/models/ProcessingJobModel.ts - Job data model
backend/src/models/VectorDatabaseModel.ts - Vector data model

Backend Routes

backend/src/routes/documents.ts - Document endpoints
backend/src/routes/vector.ts - Vector endpoints
backend/src/routes/monitoring.ts - Monitoring endpoints

Backend Controllers

backend/src/controllers/documentController.ts - Document controller

Frontend Services

frontend/src/services/documentService.ts - Document API client
frontend/src/services/authService.ts - Authentication service

Frontend Components

frontend/src/components/DocumentUpload.tsx - Upload component
frontend/src/components/DocumentList.tsx - Document list
frontend/src/components/DocumentViewer.tsx - Document viewer

Configuration

backend/src/config/env.ts - Environment configuration
backend/src/config/supabase.ts - Supabase configuration
backend/src/config/firebase.ts - Firebase configuration

SQL

backend/sql/fix_vector_search_timeout.sql - Vector search optimization

End of Architecture Summary

74 KiB Raw Blame History

CIM Summary Codebase Architecture Summary

Table of Contents

1. System Overview

High-Level Architecture

Technology Stack

Core Purpose

2. Application Entry Points

Backend Entry Point

Frontend Entry Point

3. Request Flow & API Architecture

Request Lifecycle

Authentication Flow

Route Structure

4. Document Processing Pipeline (Critical Path)

Complete Flow Diagram

Key Handoff Points

5. Core Services Deep Dive

5.1 UnifiedDocumentProcessor

5.2 OptimizedAgenticRAGProcessor

5.3 JobProcessorService

5.4 VectorDatabaseService

5.5 LLMService

5.6 DocumentAiProcessor

5.7 PDFGenerationService

5.8 FileStorageService

6. Data Models & Database Schema

Core Models

Database Tables

Vector Search Optimization

7. Component Handoffs & Integration Points

Frontend ↔ Backend

Backend ↔ Database

Backend ↔ GCS

Backend ↔ Document AI

Backend ↔ LLM APIs

Services ↔ Services

8. Error Handling & Resilience

Error Propagation Path

Error Categories

Retry Mechanisms

Timeout Handling

Stuck Job Detection and Recovery

Graceful Degradation

9. Performance Optimization Points

Vector Search Optimization

Chunking Strategy

Batch Processing

Memory Management

Database Optimization

API Call Optimization

10. Background Processing Architecture

Legacy vs Current System

Job Processing Flow

Scheduled Function

Job States

Concurrency Control

11. Frontend Architecture

Component Structure

State Management

API Communication

Real-Time Updates

12. Configuration & Environment

Environment Variables

Configuration Loading

13. Debugging Guide

Key Log Points

Common Failure Points

Diagnostic Tools

Debugging Workflow

14. Optimization Opportunities

Identified Bottlenecks

Memory Usage Patterns

API Call Optimization

Database Query Optimization

Appendix: Key File Locations

Backend Services

Backend Models

Backend Routes

Backend Controllers

74 KiB

Raw Blame History