Major release with significant performance improvements and new processing strategy. ## Core Changes - Implemented simple_full_document processing strategy (default) - Full document → LLM approach: 1-2 passes, ~5-6 minutes processing time - Achieved 100% completeness with 2 API calls (down from 5+) - Removed redundant Document AI passes for faster processing ## Financial Data Extraction - Enhanced deterministic financial table parser - Improved FY3/FY2/FY1/LTM identification from varying CIM formats - Automatic merging of parser results with LLM extraction ## Code Quality & Infrastructure - Cleaned up debug logging (removed emoji markers from production code) - Fixed Firebase Secrets configuration (using modern defineSecret approach) - Updated OpenAI API key - Resolved deployment conflicts (secrets vs environment variables) - Added .env files to Firebase ignore list ## Deployment - Firebase Functions v2 deployment successful - All 7 required secrets verified and configured - Function URL: https://api-y56ccs6wva-uc.a.run.app ## Performance Improvements - Processing time: ~5-6 minutes (down from 23+ minutes) - API calls: 1-2 (down from 5+) - Completeness: 100% achievable - LLM Model: claude-3-7-sonnet-latest ## Breaking Changes - Default processing strategy changed to 'simple_full_document' - RAG processor available as alternative strategy 'document_ai_agentic_rag' ## Files Changed - 36 files changed, 5642 insertions(+), 4451 deletions(-) - Removed deprecated documentation files - Cleaned up unused services and models This release represents a major refactoring focused on speed, accuracy, and maintainability.
74 KiB
CIM Summary Codebase Architecture Summary
Last Updated: December 2024
Purpose: Comprehensive technical reference for senior developers optimizing and debugging the codebase
Table of Contents
- System Overview
- Application Entry Points
- Request Flow & API Architecture
- Document Processing Pipeline (Critical Path)
- Core Services Deep Dive
- Data Models & Database Schema
- Component Handoffs & Integration Points
- Error Handling & Resilience
- Performance Optimization Points
- Background Processing Architecture
- Frontend Architecture
- Configuration & Environment
- Debugging Guide
- Optimization Opportunities
1. System Overview
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ DocumentUpload│ │ DocumentList │ │ Analytics │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ documentService │ │
│ │ (Axios Client) │ │
│ └────────┬────────┘ │
└────────────────────────────┼────────────────────────────────────┘
│ HTTPS + JWT
┌────────────────────────────▼────────────────────────────────────┐
│ Backend (Express + Node.js) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Middleware Chain: CORS → Auth → Validation → Error Handler │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌────────▼────────┐ ┌─────▼──────┐ │
│ │ Routes │ │ Controllers │ │ Services │ │
│ └──────┬──────┘ └────────┬────────┘ └─────┬──────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌───────▼───────┐
│Supabase │ │Google Cloud│ │ LLM APIs │
│(Postgres)│ │ Storage │ │(Claude/OpenAI)│
└─────────┘ └────────────┘ └───────────────┘
Technology Stack
Frontend:
- React 18 + TypeScript
- Vite (build tool)
- Axios (HTTP client)
- Firebase Auth (authentication)
- React Router (routing)
Backend:
- Node.js + Express + TypeScript
- Firebase Functions v2 (deployment)
- Supabase (PostgreSQL + Vector DB)
- Google Cloud Storage (file storage)
- Google Document AI (PDF text extraction)
- Puppeteer (PDF generation)
AI/ML Services:
- Anthropic Claude (primary LLM)
- OpenAI (fallback LLM)
- OpenRouter (LLM routing)
- OpenAI Embeddings (vector embeddings)
Core Purpose
Automated processing and analysis of Confidential Information Memorandums (CIMs) using:
- Text Extraction: Google Document AI extracts text from PDFs
- Semantic Chunking: Split text into 4000-char chunks with overlap
- Vector Embeddings: Generate embeddings for semantic search
- LLM Analysis: Claude AI analyzes chunks and generates structured CIMReview data
- PDF Generation: Create summary PDF with analysis results
2. Application Entry Points
Backend Entry Point
File: backend/src/index.ts
// Initialize Firebase Admin SDK first
import './config/firebase';
import express from 'express';
import cors from 'cors';
import helmet from 'helmet';
import morgan from 'morgan';
import rateLimit from 'express-rate-limit';
import { config } from './config/env';
import { logger } from './utils/logger';
import documentRoutes from './routes/documents';
import vectorRoutes from './routes/vector';
import monitoringRoutes from './routes/monitoring';
import auditRoutes from './routes/documentAudit';
import { jobQueueService } from './services/jobQueueService';
import { errorHandler, correlationIdMiddleware } from './middleware/errorHandler';
import { notFoundHandler } from './middleware/notFoundHandler';
// Start the job queue service for background processing
jobQueueService.start();
Key Initialization Steps:
- Firebase Admin SDK initialization (
./config/firebase) - Express app setup with middleware chain
- Route registration (
/documents,/vector,/monitoring,/api/audit) - Job queue service startup (legacy in-memory queue)
- Firebase Functions export for Cloud deployment
Scheduled Function: processDocumentJobs (210:267:backend/src/index.ts)
- Runs every minute via Firebase Cloud Scheduler
- Processes pending/retrying jobs from database
- Detects and resets stuck jobs
Frontend Entry Point
File: frontend/src/main.tsx
import React from 'react';
import ReactDOM from 'react-dom/client';
import App from './App';
import './index.css';
ReactDOM.createRoot(document.getElementById('root')!).render(
<React.StrictMode>
<App />
</React.StrictMode>
);
Main App Component: frontend/src/App.tsx
- Sets up React Router
- Provides AuthContext
- Renders protected routes and dashboard
3. Request Flow & API Architecture
Request Lifecycle
Client Request
│
▼
┌─────────────────────────────────────┐
│ 1. CORS Middleware │
│ - Validates origin │
│ - Sets CORS headers │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 2. Correlation ID Middleware │
│ - Generates/reads X-Correlation-ID│
│ - Adds to request object │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 3. Firebase Auth Middleware │
│ - Verifies JWT token │
│ - Attaches user to req.user │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 4. Rate Limiting │
│ - 1000 requests per 15 minutes │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 5. Body Parsing │
│ - JSON (10MB limit) │
│ - URL-encoded (10MB limit) │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 6. Route Handler │
│ - Matches route pattern │
│ - Calls controller method │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 7. Controller │
│ - Validates input │
│ - Calls service methods │
│ - Returns response │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 8. Service Layer │
│ - Business logic │
│ - Database operations │
│ - External API calls │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 9. Error Handler (if error) │
│ - Categorizes error │
│ - Logs with correlation ID │
│ - Returns structured response │
└─────────────────────────────────────┘
Authentication Flow
Middleware: backend/src/middleware/firebaseAuth.ts
export const verifyFirebaseToken = async (
req: FirebaseAuthenticatedRequest,
res: Response,
next: NextFunction
): Promise<void> => {
try {
console.log('🔐 Authentication middleware called for:', req.method, req.url);
console.log('🔐 Request headers:', Object.keys(req.headers));
// Debug Firebase Admin initialization
console.log('🔐 Firebase apps available:', admin.apps.length);
console.log('🔐 Firebase app names:', admin.apps.filter(app => app !== null).map(app => app!.name));
const authHeader = req.headers.authorization;
console.log('🔐 Auth header present:', !!authHeader);
console.log('🔐 Auth header starts with Bearer:', authHeader?.startsWith('Bearer '));
if (!authHeader || !authHeader.startsWith('Bearer ')) {
console.log('❌ No valid authorization header');
res.status(401).json({ error: 'No valid authorization header' });
return;
}
const idToken = authHeader.split('Bearer ')[1];
console.log('🔐 Token extracted, length:', idToken?.length);
if (!idToken) {
console.log('❌ No token provided');
res.status(401).json({ error: 'No token provided' });
return;
}
console.log('🔐 Attempting to verify Firebase ID token...');
console.log('🔐 Token preview:', idToken.substring(0, 20) + '...');
// Verify the Firebase ID token
const decodedToken = await admin.auth().verifyIdToken(idToken, true);
console.log('✅ Token verified successfully for user:', decodedToken.email);
console.log('✅ Token UID:', decodedToken.uid);
console.log('✅ Token issuer:', decodedToken.iss);
// Check if token is expired
const now = Math.floor(Date.now() / 1000);
if (decodedToken.exp && decodedToken.exp < now) {
logger.warn('Token expired for user:', decodedToken.uid);
res.status(401).json({ error: 'Token expired' });
return;
}
req.user = decodedToken;
// Log successful authentication
logger.info('Authenticated request for user:', decodedToken.email);
next();
Frontend Auth: frontend/src/services/authService.ts
- Manages Firebase Auth state
- Provides token via
getToken() - Axios interceptor adds token to requests
Route Structure
Main Routes (backend/src/routes/documents.ts):
POST /documents/upload-url- Get signed upload URLPOST /documents/:id/confirm-upload- Confirm upload and start processingGET /documents- List user's documentsGET /documents/:id- Get document detailsGET /documents/:id/download- Download processed PDFGET /documents/analytics- Get processing analyticsPOST /documents/:id/process-optimized-agentic-rag- Trigger AI processing
Middleware Applied:
// Apply authentication and correlation ID to all routes
router.use(verifyFirebaseToken);
router.use(addCorrelationId);
// Add logging middleware for document routes
router.use((req, res, next) => {
console.log(`📄 Document route accessed: ${req.method} ${req.path}`);
next();
});
4. Document Processing Pipeline (Critical Path)
Complete Flow Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ DOCUMENT PROCESSING PIPELINE │
└─────────────────────────────────────────────────────────────────────┘
1. UPLOAD PHASE
┌─────────────────────────────────────────────────────────────┐
│ User selects PDF │
│ ↓ │
│ DocumentUpload component │
│ ↓ │
│ documentService.uploadDocument() │
│ ↓ │
│ POST /documents/upload-url │
│ ↓ │
│ documentController.getUploadUrl() │
│ ↓ │
│ DocumentModel.create() → documents table │
│ ↓ │
│ fileStorageService.generateSignedUploadUrl() │
│ ↓ │
│ Direct upload to GCS via signed URL │
│ ↓ │
│ POST /documents/:id/confirm-upload │
└─────────────────────────────────────────────────────────────┘
2. JOB CREATION PHASE
┌─────────────────────────────────────────────────────────────┐
│ documentController.confirmUpload() │
│ ↓ │
│ ProcessingJobModel.create() → processing_jobs table │
│ ↓ │
│ Status: 'pending' │
│ ↓ │
│ Returns 202 Accepted (async processing) │
└─────────────────────────────────────────────────────────────┘
3. JOB PROCESSING PHASE (Background)
┌─────────────────────────────────────────────────────────────┐
│ Scheduled Function: processDocumentJobs (every 1 minute) │
│ OR │
│ Immediate processing via jobProcessorService.processJob() │
│ ↓ │
│ JobProcessorService.processJob() │
│ ↓ │
│ Download file from GCS │
│ ↓ │
│ unifiedDocumentProcessor.processDocument() │
└─────────────────────────────────────────────────────────────┘
4. TEXT EXTRACTION PHASE
┌─────────────────────────────────────────────────────────────┐
│ documentAiProcessor.processDocument() │
│ ↓ │
│ Google Document AI API │
│ ↓ │
│ Extracted text returned │
└─────────────────────────────────────────────────────────────┘
5. CHUNKING & EMBEDDING PHASE
┌─────────────────────────────────────────────────────────────┐
│ optimizedAgenticRAGProcessor.processLargeDocument() │
│ ↓ │
│ createIntelligentChunks() │
│ - Semantic boundary detection │
│ - 4000-char chunks with 200-char overlap │
│ ↓ │
│ processChunksInBatches() │
│ - Batch size: 10 │
│ - Max concurrent: 5 │
│ ↓ │
│ storeChunksOptimized() │
│ ↓ │
│ vectorDatabaseService.storeEmbedding() │
│ - OpenAI embeddings API │
│ - Store in document_chunks table │
└─────────────────────────────────────────────────────────────┘
6. LLM ANALYSIS PHASE
┌─────────────────────────────────────────────────────────────┐
│ generateLLMAnalysisHybrid() │
│ ↓ │
│ llmService.processCIMDocument() │
│ ↓ │
│ Vector search for relevant chunks │
│ ↓ │
│ Claude/OpenAI API call with structured prompt │
│ ↓ │
│ Parse and validate CIMReview JSON │
│ ↓ │
│ Return structured analysisData │
└─────────────────────────────────────────────────────────────┘
7. PDF GENERATION PHASE
┌─────────────────────────────────────────────────────────────┐
│ pdfGenerationService.generatePDF() │
│ ↓ │
│ Puppeteer browser instance │
│ ↓ │
│ Render HTML template with analysisData │
│ ↓ │
│ Generate PDF buffer │
│ ↓ │
│ Upload PDF to GCS │
│ ↓ │
│ Update document record with PDF path │
└─────────────────────────────────────────────────────────────┘
8. STATUS UPDATE PHASE
┌─────────────────────────────────────────────────────────────┐
│ DocumentModel.updateById() │
│ - status: 'completed' │
│ - pdf_path: GCS path │
│ ↓ │
│ ProcessingJobModel.markAsCompleted() │
│ ↓ │
│ Frontend polls /documents/:id for status updates │
└─────────────────────────────────────────────────────────────┘
Key Handoff Points
1. Upload to Job Creation
async confirmUpload(req: Request, res: Response): Promise<void> {
// ... validation ...
// Update status to processing
await DocumentModel.updateById(documentId, {
status: 'processing_llm'
});
// Acknowledge the request immediately
res.status(202).json({
message: 'Upload confirmed, processing has started.',
document: document,
status: 'processing'
});
// CRITICAL FIX: Use database-backed job queue
const { ProcessingJobModel } = await import('../models/ProcessingJobModel');
await ProcessingJobModel.create({
document_id: documentId,
user_id: userId,
options: {
fileName: document.original_file_name,
mimeType: 'application/pdf'
}
});
}
2. Job Processing to Document Processing
private async processJob(jobId: string): Promise<{ success: boolean; error?: string }> {
// Get job details
job = await ProcessingJobModel.findById(jobId);
// Mark job as processing
await ProcessingJobModel.markAsProcessing(jobId);
// Download file from GCS
const fileBuffer = await fileStorageService.downloadFile(document.file_path);
// Process document
const result = await unifiedDocumentProcessor.processDocument(
job.document_id,
job.user_id,
fileBuffer.toString('utf-8'), // This will be re-read as buffer
{
fileBuffer,
fileName: job.options?.fileName || 'document.pdf',
mimeType: job.options?.mimeType || 'application/pdf'
}
);
}
3. Document Processing to Text Extraction
async processDocument(
documentId: string,
userId: string,
fileBuffer: Buffer,
fileName: string,
mimeType: string
): Promise<ProcessingResult> {
// Step 1: Extract text using Document AI or fallback
const extractedText = await this.extractTextFromDocument(fileBuffer, fileName, mimeType);
// Step 2: Process extracted text through Agentic RAG
const agenticRagResult = await this.processWithAgenticRAG(documentId, extractedText);
}
4. Text to Chunking
async processLargeDocument(
documentId: string,
text: string,
options: {
enableSemanticChunking?: boolean;
enableMetadataEnrichment?: boolean;
similarityThreshold?: number;
} = {}
): Promise<ProcessingResult> {
// Step 1: Create intelligent chunks with semantic boundaries
const chunks = await this.createIntelligentChunks(text, documentId, options.enableSemanticChunking);
// Step 2: Process chunks in batches to manage memory
const processedChunks = await this.processChunksInBatches(chunks, documentId, options);
// Step 3: Store chunks with optimized batching
const embeddingApiCalls = await this.storeChunksOptimized(processedChunks, documentId);
// Step 4: Generate LLM analysis using HYBRID approach
const llmResult = await this.generateLLMAnalysisHybrid(documentId, text, processedChunks);
}
5. Core Services Deep Dive
5.1 UnifiedDocumentProcessor
File: backend/src/services/unifiedDocumentProcessor.ts
Purpose: Main orchestrator for document processing strategies
Key Method:
async processDocument(
documentId: string,
userId: string,
text: string,
options: any = {}
): Promise<ProcessingResult> {
const strategy = options.strategy || 'document_ai_agentic_rag';
logger.info('Processing document with unified processor', {
documentId,
strategy,
textLength: text.length
});
// Only support document_ai_agentic_rag strategy
if (strategy === 'document_ai_agentic_rag') {
return await this.processWithDocumentAiAgenticRag(documentId, userId, text, options);
} else {
throw new Error(`Unsupported processing strategy: ${strategy}. Only 'document_ai_agentic_rag' is supported.`);
}
}
Dependencies:
documentAiProcessor- Text extractionoptimizedAgenticRAGProcessor- AI processingllmService- LLM interactionspdfGenerationService- PDF generation
Error Handling: Wraps errors with detailed context, validates analysisData presence
5.2 OptimizedAgenticRAGProcessor
File: backend/src/services/optimizedAgenticRAGProcessor.ts (1885 lines)
Purpose: Core AI processing engine for chunking, embeddings, and LLM analysis
Key Configuration:
private readonly maxChunkSize = 4000; // Optimal chunk size for embeddings
private readonly overlapSize = 200; // Overlap between chunks
private readonly maxConcurrentEmbeddings = 5; // Limit concurrent API calls
private readonly batchSize = 10; // Process chunks in batches
Key Methods:
processLargeDocument()- Main entry pointcreateIntelligentChunks()- Semantic chunking with boundary detectionprocessChunksInBatches()- Batch processing for memory efficiencystoreChunksOptimized()- Embedding generation and storagegenerateLLMAnalysisHybrid()- LLM analysis with vector search
Performance Optimizations:
- Semantic boundary detection (paragraphs, sections)
- Batch processing to limit memory usage
- Concurrent embedding generation (max 5)
- Vector search with document_id filtering
5.3 JobProcessorService
File: backend/src/services/jobProcessorService.ts
Purpose: Database-backed job processor (replaces legacy in-memory queue)
Key Method:
async processJobs(): Promise<{
processed: number;
succeeded: number;
failed: number;
skipped: number;
}> {
// Prevent concurrent processing runs
if (this.isProcessing) {
logger.info('Job processor already running, skipping this run');
return { processed: 0, succeeded: 0, failed: 0, skipped: 0 };
}
this.isProcessing = true;
const stats = { processed: 0, succeeded: 0, failed: 0, skipped: 0 };
try {
// Reset stuck jobs first
const resetCount = await ProcessingJobModel.resetStuckJobs(this.JOB_TIMEOUT_MINUTES);
// Get pending jobs
const pendingJobs = await ProcessingJobModel.getPendingJobs(this.MAX_CONCURRENT_JOBS);
// Get retrying jobs
const retryingJobs = await ProcessingJobModel.getRetryableJobs(
Math.max(0, this.MAX_CONCURRENT_JOBS - pendingJobs.length)
);
const allJobs = [...pendingJobs, ...retryingJobs];
// Process jobs in parallel (up to MAX_CONCURRENT_JOBS)
const results = await Promise.allSettled(
allJobs.map((job) => this.processJob(job.id))
);
Configuration:
MAX_CONCURRENT_JOBS = 3JOB_TIMEOUT_MINUTES = 15
Features:
- Stuck job detection and recovery
- Retry logic with exponential backoff
- Parallel processing with concurrency limit
- Database-backed state management
5.4 VectorDatabaseService
File: backend/src/services/vectorDatabaseService.ts
Purpose: Vector embeddings and similarity search
Key Method - Vector Search:
async searchSimilar(
embedding: number[],
limit: number = 10,
threshold: number = 0.7,
documentId?: string
): Promise<VectorSearchResult[]> {
try {
if (this.provider === 'supabase') {
// Use optimized Supabase vector search function with document_id filtering
// This prevents timeouts by only searching within a specific document
const rpcParams: any = {
query_embedding: embedding,
match_threshold: threshold,
match_count: limit
};
// Add document_id filter if provided (critical for performance)
if (documentId) {
rpcParams.filter_document_id = documentId;
}
// Set a timeout for the RPC call (10 seconds)
const searchPromise = this.supabaseClient
.rpc('match_document_chunks', rpcParams);
const timeoutPromise = new Promise<{ data: null; error: { message: string } }>((_, reject) => {
setTimeout(() => reject(new Error('Vector search timeout after 10s')), 10000);
});
let result: any;
try {
result = await Promise.race([searchPromise, timeoutPromise]);
} catch (timeoutError: any) {
if (timeoutError.message?.includes('timeout')) {
logger.error('Vector search timed out', { documentId, timeout: '10s' });
throw new Error('Vector search timeout after 10s');
}
throw timeoutError;
}
Critical Optimization: Always pass documentId to filter search scope and prevent timeouts
SQL Function: backend/sql/fix_vector_search_timeout.sql
CREATE OR REPLACE FUNCTION match_document_chunks (
query_embedding vector(1536),
match_threshold float,
match_count int,
filter_document_id text DEFAULT NULL
)
RETURNS TABLE (
id UUID,
document_id TEXT,
content text,
metadata JSONB,
chunk_index INT,
similarity float
)
LANGUAGE sql STABLE
AS $$
SELECT
document_chunks.id,
document_chunks.document_id,
document_chunks.content,
document_chunks.metadata,
document_chunks.chunk_index,
1 - (document_chunks.embedding <=> query_embedding) AS similarity
FROM document_chunks
WHERE document_chunks.embedding IS NOT NULL
AND (filter_document_id IS NULL OR document_chunks.document_id = filter_document_id)
AND 1 - (document_chunks.embedding <=> query_embedding) > match_threshold
ORDER BY document_chunks.embedding <=> query_embedding
LIMIT match_count;
$$;
5.5 LLMService
File: backend/src/services/llmService.ts
Purpose: LLM interactions (Claude/OpenAI/OpenRouter)
Provider Selection:
constructor() {
// Read provider from config (supports openrouter, anthropic, openai)
this.provider = config.llm.provider;
// CRITICAL: If provider is not set correctly, log and use fallback
if (!this.provider || (this.provider !== 'openrouter' && this.provider !== 'anthropic' && this.provider !== 'openai')) {
logger.error('LLM provider is invalid or not set', {
provider: this.provider,
configProvider: config.llm.provider,
processEnvProvider: process.env['LLM_PROVIDER'],
defaultingTo: 'anthropic'
});
this.provider = 'anthropic'; // Fallback
}
// Set API key based on provider
if (this.provider === 'openai') {
this.apiKey = config.llm.openaiApiKey!;
} else if (this.provider === 'openrouter') {
// OpenRouter: Use OpenRouter key if provided, otherwise use Anthropic key for BYOK
this.apiKey = config.llm.openrouterApiKey || config.llm.anthropicApiKey!;
} else {
this.apiKey = config.llm.anthropicApiKey!;
}
// Use configured model instead of hardcoded value
this.defaultModel = config.llm.model;
this.maxTokens = config.llm.maxTokens;
this.temperature = config.llm.temperature;
}
Key Method:
async processCIMDocument(text: string, template: string, analysis?: Record<string, any>): Promise<CIMAnalysisResult> {
// Check and truncate text if it exceeds maxInputTokens
const maxInputTokens = config.llm.maxInputTokens || 200000;
const systemPromptTokens = this.estimateTokenCount(this.getCIMSystemPrompt());
const templateTokens = this.estimateTokenCount(template);
const promptBuffer = config.llm.promptBuffer || 1000;
// Calculate available tokens for document text
const reservedTokens = systemPromptTokens + templateTokens + promptBuffer + (config.llm.maxTokens || 16000);
const availableTokens = maxInputTokens - reservedTokens;
const textTokens = this.estimateTokenCount(text);
let processedText = text;
let wasTruncated = false;
if (textTokens > availableTokens) {
logger.warn('Document text exceeds token limit, truncating', {
textTokens,
availableTokens,
maxInputTokens,
reservedTokens,
truncationRatio: (availableTokens / textTokens * 100).toFixed(1) + '%'
});
processedText = this.truncateText(text, availableTokens);
wasTruncated = true;
}
Features:
- Automatic token counting and truncation
- Model selection based on task complexity
- JSON schema validation with Zod
- Retry logic with exponential backoff
- Cost tracking
5.6 DocumentAiProcessor
File: backend/src/services/documentAiProcessor.ts
Purpose: Google Document AI integration for text extraction
Key Method:
async processDocument(
documentId: string,
userId: string,
fileBuffer: Buffer,
fileName: string,
mimeType: string
): Promise<ProcessingResult> {
const startTime = Date.now();
try {
logger.info('Starting Document AI + Agentic RAG processing', {
documentId,
userId,
fileName,
fileSize: fileBuffer.length,
mimeType
});
// Step 1: Extract text using Document AI or fallback
const extractedText = await this.extractTextFromDocument(fileBuffer, fileName, mimeType);
if (!extractedText) {
throw new Error('Failed to extract text from document');
}
logger.info('Text extraction completed', {
textLength: extractedText.length
});
// Step 2: Process extracted text through Agentic RAG
const agenticRagResult = await this.processWithAgenticRAG(documentId, extractedText);
const processingTime = Date.now() - startTime;
return {
success: true,
content: agenticRagResult.summary || extractedText,
metadata: {
processingStrategy: 'document_ai_agentic_rag',
processingTime,
extractedTextLength: extractedText.length,
agenticRagResult,
fileSize: fileBuffer.length,
fileName,
mimeType
}
};
Fallback Strategy: Uses pdf-parse if Document AI fails
5.7 PDFGenerationService
File: backend/src/services/pdfGenerationService.ts
Purpose: PDF generation using Puppeteer
Key Features:
- Page pooling for performance
- Caching for repeated requests
- Browser instance reuse
- Fallback to PDFKit if Puppeteer fails
Configuration:
class PDFGenerationService {
private browser: any = null;
private pagePool: PagePool[] = [];
private readonly maxPoolSize = 5;
private readonly pageTimeout = 30000; // 30 seconds
private readonly cache = new Map<string, { buffer: Buffer; timestamp: number }>();
private readonly cacheTimeout = 300000; // 5 minutes
private readonly defaultOptions: PDFGenerationOptions = {
format: 'A4',
margin: {
top: '1in',
right: '1in',
bottom: '1in',
left: '1in',
},
displayHeaderFooter: true,
printBackground: true,
quality: 'high',
timeout: 30000,
};
5.8 FileStorageService
File: backend/src/services/fileStorageService.ts
Purpose: Google Cloud Storage operations
Key Methods:
generateSignedUploadUrl()- Generate signed URL for direct uploaddownloadFile()- Download file from GCSsaveBuffer()- Save buffer to GCSdeleteFile()- Delete file from GCS
Credential Handling:
constructor() {
this.bucketName = config.googleCloud.gcsBucketName;
// Check if we're in Firebase Functions/Cloud Run environment
const isCloudEnvironment = process.env.FUNCTION_TARGET ||
process.env.FUNCTION_NAME ||
process.env.K_SERVICE ||
process.env.GOOGLE_CLOUD_PROJECT ||
!!process.env.GCLOUD_PROJECT ||
process.env.X_GOOGLE_GCLOUD_PROJECT;
// Initialize Google Cloud Storage
const storageConfig: any = {
projectId: config.googleCloud.projectId,
};
// Only use keyFilename in local development
// In Firebase Functions/Cloud Run, use Application Default Credentials
if (isCloudEnvironment) {
// In cloud, ALWAYS clear GOOGLE_APPLICATION_CREDENTIALS to force use of ADC
// Firebase Functions automatically provides credentials via metadata service
// These credentials have signing capabilities for generating signed URLs
const originalCreds = process.env.GOOGLE_APPLICATION_CREDENTIALS;
if (originalCreds) {
delete process.env.GOOGLE_APPLICATION_CREDENTIALS;
logger.info('Using Application Default Credentials for GCS (cloud environment)', {
clearedEnvVar: 'GOOGLE_APPLICATION_CREDENTIALS',
originalValue: originalCreds,
projectId: config.googleCloud.projectId
});
}
6. Data Models & Database Schema
Core Models
DocumentModel (backend/src/models/DocumentModel.ts):
create()- Create document recordfindById()- Get document by IDupdateById()- Update document status/metadatafindByUserId()- List user's documents
ProcessingJobModel (backend/src/models/ProcessingJobModel.ts):
create()- Create processing job (uses direct PostgreSQL to bypass PostgREST cache)findById()- Get job by IDgetPendingJobs()- Get pending jobs (limit by concurrency)getRetryableJobs()- Get jobs ready for retrymarkAsProcessing()- Update job statusmarkAsCompleted()- Mark job completemarkAsFailed()- Mark job failed with errorresetStuckJobs()- Reset jobs stuck in processing
VectorDatabaseModel (backend/src/models/VectorDatabaseModel.ts):
- Chunk storage and retrieval
- Embedding management
Database Tables
documents:
id(UUID, primary key)user_id(UUID, foreign key)original_file_name(text)file_path(text, GCS path)file_size(bigint)status(text: 'uploading', 'uploaded', 'processing_llm', 'completed', 'failed')pdf_path(text, GCS path for generated PDF)created_at,updated_at(timestamps)
processing_jobs:
id(UUID, primary key)document_id(UUID, foreign key)user_id(UUID, foreign key)status(text: 'pending', 'processing', 'completed', 'failed', 'retrying')attempts(int)max_attempts(int, default 3)options(JSONB, processing options)error(text, error message if failed)result(JSONB, processing result)created_at,started_at,completed_at,updated_at(timestamps)
document_chunks:
id(UUID, primary key)document_id(text, foreign key)content(text)embedding(vector(1536))metadata(JSONB)chunk_index(int)created_at,updated_at(timestamps)
agentic_rag_sessions:
id(UUID, primary key)document_id(UUID, foreign key)user_id(UUID, foreign key)status(text)metadata(JSONB)created_at,updated_at(timestamps)
Vector Search Optimization
Critical SQL Function: match_document_chunks with document_id filtering
CREATE OR REPLACE FUNCTION match_document_chunks (
query_embedding vector(1536),
match_threshold float,
match_count int,
filter_document_id text DEFAULT NULL
)
RETURNS TABLE (
id UUID,
document_id TEXT,
content text,
metadata JSONB,
chunk_index INT,
similarity float
)
LANGUAGE sql STABLE
AS $$
SELECT
document_chunks.id,
document_chunks.document_id,
document_chunks.content,
document_chunks.metadata,
document_chunks.chunk_index,
1 - (document_chunks.embedding <=> query_embedding) AS similarity
FROM document_chunks
WHERE document_chunks.embedding IS NOT NULL
AND (filter_document_id IS NULL OR document_chunks.document_id = filter_document_id)
AND 1 - (document_chunks.embedding <=> query_embedding) > match_threshold
ORDER BY document_chunks.embedding <=> query_embedding
LIMIT match_count;
$$;
Always pass filter_document_id to prevent timeouts when searching across all documents.
7. Component Handoffs & Integration Points
Frontend ↔ Backend
Axios Interceptor (frontend/src/services/documentService.ts):
export const apiClient = axios.create({
baseURL: API_BASE_URL,
timeout: 300000, // 5 minutes
});
// Add auth token to requests
apiClient.interceptors.request.use(async (config) => {
const token = await authService.getToken();
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
});
// Handle auth errors with retry logic
apiClient.interceptors.response.use(
(response) => response,
async (error) => {
const originalRequest = error.config;
if (error.response?.status === 401 && !originalRequest._retry) {
originalRequest._retry = true;
try {
// Attempt to refresh the token
const newToken = await authService.getToken();
if (newToken) {
// Retry the original request with the new token
originalRequest.headers.Authorization = `Bearer ${newToken}`;
return apiClient(originalRequest);
}
} catch (refreshError) {
console.error('Token refresh failed:', refreshError);
}
// If token refresh fails, logout the user
authService.logout();
window.location.href = '/login';
}
return Promise.reject(error);
}
);
Backend ↔ Database
Two Connection Methods:
- Supabase Client (default for most operations):
import { getSupabaseServiceClient } from '../config/supabase';
const supabase = getSupabaseServiceClient();
- Direct PostgreSQL (for critical operations, bypasses PostgREST cache):
static async create(data: CreateProcessingJobData): Promise<ProcessingJob> {
try {
// Use direct PostgreSQL connection to bypass PostgREST cache
// This is critical because PostgREST cache issues can block entire processing pipeline
const pool = getPostgresPool();
const result = await pool.query(
`INSERT INTO processing_jobs (
document_id, user_id, status, attempts, max_attempts, options, created_at
) VALUES ($1, $2, $3, $4, $5, $6, $7)
RETURNING *`,
[
data.document_id,
data.user_id,
'pending',
0,
data.max_attempts || 3,
JSON.stringify(data.options || {}),
new Date().toISOString()
]
);
if (result.rows.length === 0) {
throw new Error('Failed to create processing job: No data returned');
}
const job = result.rows[0];
logger.info('Processing job created via direct PostgreSQL', {
jobId: job.id,
documentId: data.document_id,
userId: data.user_id,
});
return job;
Backend ↔ GCS
Signed URL Generation:
const uploadUrl = await fileStorageService.generateSignedUploadUrl(filePath, contentType);
Direct Upload (frontend):
const fetchPromise = fetch(uploadUrl, {
method: 'PUT',
headers: {
'Content-Type': contentType, // Must match exactly what was used in signed URL generation
},
body: file,
signal: signal,
});
File Download (for processing):
const fileBuffer = await fileStorageService.downloadFile(document.file_path);
Backend ↔ Document AI
Text Extraction:
private async extractTextFromDocument(fileBuffer: Buffer, fileName: string, mimeType: string): Promise<string> {
try {
// Check document size first
// ... size validation ...
// Upload to GCS for Document AI processing
const gcsFileName = `temp/${Date.now()}_${fileName}`;
await this.storageClient.bucket(this.gcsBucketName).file(gcsFileName).save(fileBuffer);
// Process with Document AI
const request = {
name: this.processorName,
rawDocument: {
gcsSource: {
uri: `gs://${this.gcsBucketName}/${gcsFileName}`
},
mimeType: mimeType
}
};
const [result] = await this.documentAiClient.processDocument(request);
// Extract text from result
const text = result.document?.text || '';
// Clean up temp file
await this.storageClient.bucket(this.gcsBucketName).file(gcsFileName).delete();
return text;
} catch (error) {
// Fallback to pdf-parse
logger.warn('Document AI failed, using pdf-parse fallback', { error });
const data = await pdf(fileBuffer);
return data.text;
}
}
Backend ↔ LLM APIs
Provider Selection (Claude/OpenAI/OpenRouter):
- Configured via
LLM_PROVIDERenvironment variable - Automatic API key selection based on provider
- Model selection based on task complexity
Request Flow:
// 1. Token counting and truncation
const processedText = this.truncateText(text, availableTokens);
// 2. Model selection
const model = this.selectModel(taskComplexity);
// 3. API call with retry logic
const response = await this.callLLMAPI({
prompt: processedText,
systemPrompt: systemPrompt,
model: model,
maxTokens: this.maxTokens,
temperature: this.temperature
});
// 4. JSON parsing and validation
const parsed = JSON.parse(response.content);
const validated = cimReviewSchema.parse(parsed);
Services ↔ Services
Event-Driven Patterns:
jobQueueServiceemits events:job:added,job:started,job:completed,job:faileduploadMonitoringServicetracks upload events
Direct Method Calls:
- Most service interactions are direct method calls
- Services are exported as singletons for easy access
8. Error Handling & Resilience
Error Propagation Path
Service Method
│
▼ (throws error)
Controller
│
▼ (catches, logs, re-throws)
Express Error Handler
│
▼ (categorizes, logs, responds)
Client (structured error response)
Error Categories
File: backend/src/middleware/errorHandler.ts
export enum ErrorCategory {
VALIDATION = 'validation',
AUTHENTICATION = 'authentication',
AUTHORIZATION = 'authorization',
NOT_FOUND = 'not_found',
EXTERNAL_SERVICE = 'external_service',
PROCESSING = 'processing',
SYSTEM = 'system',
DATABASE = 'database'
}
Error Response Structure:
export interface ErrorResponse {
success: false;
error: {
code: string;
message: string;
details?: any;
correlationId: string;
timestamp: string;
retryable: boolean;
};
}
Retry Mechanisms
1. Job Retries:
- Max attempts: 3 (configurable per job)
- Exponential backoff between retries
- Jobs marked as
retryingstatus
2. API Retries:
- LLM API calls: 3 retries with exponential backoff
- Document AI: Fallback to pdf-parse
- Vector search: 10-second timeout, fallback to direct query
3. Database Retries:
private static async retryOperation<T>(
operation: () => Promise<T>,
operationName: string,
maxRetries: number = 3,
baseDelay: number = 1000
): Promise<T> {
let lastError: any;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
lastError = error;
const isNetworkError = error?.message?.includes('fetch failed') ||
error?.message?.includes('ENOTFOUND') ||
error?.message?.includes('ECONNREFUSED') ||
error?.message?.includes('ETIMEDOUT') ||
error?.name === 'TypeError';
if (!isNetworkError || attempt === maxRetries) {
throw error;
}
const delay = baseDelay * Math.pow(2, attempt - 1);
logger.warn(`${operationName} failed (attempt ${attempt}/${maxRetries}), retrying in ${delay}ms`, {
error: error?.message || String(error),
code: error?.code,
attempt,
maxRetries
});
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw lastError;
}
Timeout Handling
Vector Search Timeout:
// Set a timeout for the RPC call (10 seconds)
const searchPromise = this.supabaseClient
.rpc('match_document_chunks', rpcParams);
const timeoutPromise = new Promise<{ data: null; error: { message: string } }>((_, reject) => {
setTimeout(() => reject(new Error('Vector search timeout after 10s')), 10000);
});
let result: any;
try {
result = await Promise.race([searchPromise, timeoutPromise]);
} catch (timeoutError: any) {
if (timeoutError.message?.includes('timeout')) {
logger.error('Vector search timed out', { documentId, timeout: '10s' });
throw new Error('Vector search timeout after 10s');
}
throw timeoutError;
}
LLM API Timeout: Handled by axios timeout configuration
Job Timeout: 15 minutes, jobs stuck longer are reset
Stuck Job Detection and Recovery
// Reset stuck jobs first
const resetCount = await ProcessingJobModel.resetStuckJobs(this.JOB_TIMEOUT_MINUTES);
if (resetCount > 0) {
logger.info('Reset stuck jobs', { count: resetCount });
}
Scheduled Function Monitoring:
// Check for jobs stuck in processing status
const stuckProcessingJobs = await ProcessingJobModel.getStuckJobs(15); // Jobs stuck > 15 minutes
if (stuckProcessingJobs.length > 0) {
logger.warn('Found stuck processing jobs', {
count: stuckProcessingJobs.length,
jobIds: stuckProcessingJobs.map(j => j.id),
timestamp: new Date().toISOString(),
});
}
// Check for jobs stuck in pending status (alert if > 2 minutes)
const stuckPendingJobs = await ProcessingJobModel.getStuckPendingJobs(2); // Jobs pending > 2 minutes
if (stuckPendingJobs.length > 0) {
logger.warn('Found stuck pending jobs (may indicate processing issues)', {
count: stuckPendingJobs.length,
jobIds: stuckPendingJobs.map(j => j.id),
oldestJobAge: stuckPendingJobs[0] ? Math.round((Date.now() - new Date(stuckPendingJobs[0].created_at).getTime()) / 1000 / 60) : 0,
timestamp: new Date().toISOString(),
});
}
Graceful Degradation
Document AI Failure: Falls back to pdf-parse library
Vector Search Failure: Falls back to direct database query without similarity calculation
LLM API Failure: Returns error with retryable flag, job can be retried
PDF Generation Failure: Falls back to PDFKit if Puppeteer fails
9. Performance Optimization Points
Vector Search Optimization
Critical: Always pass document_id filter to prevent timeouts
// Add document_id filter if provided (critical for performance)
if (documentId) {
rpcParams.filter_document_id = documentId;
}
SQL Function Optimization: match_document_chunks filters by document_id first before vector similarity calculation
Chunking Strategy
Optimal Configuration:
private readonly maxChunkSize = 4000; // Optimal chunk size for embeddings
private readonly overlapSize = 200; // Overlap between chunks
private readonly maxConcurrentEmbeddings = 5; // Limit concurrent API calls
private readonly batchSize = 10; // Process chunks in batches
Semantic Chunking: Detects paragraph and section boundaries for better chunk quality
Batch Processing
Embedding Generation:
- Processes chunks in batches of 10
- Max 5 concurrent embedding API calls
- Prevents memory overflow and API rate limiting
Chunk Storage:
- Batched database inserts
- Reduces database round trips
Memory Management
Chunk Processing:
- Processes chunks in batches to limit memory usage
- Cleans up processed chunks from memory after storage
PDF Generation:
- Page pooling (max 5 pages)
- Page timeout (30 seconds)
- Cache with 5-minute TTL
Database Optimization
Direct PostgreSQL for Critical Operations:
- Job creation uses direct PostgreSQL to bypass PostgREST cache issues
- Ensures reliable job creation even when PostgREST schema cache is stale
Connection Pooling:
- Supabase client uses connection pooling
- Direct PostgreSQL uses pg pool
API Call Optimization
LLM Token Management:
- Automatic token counting
- Text truncation if exceeds limits
- Model selection based on complexity (smaller models for simpler tasks)
Embedding Caching:
private semanticCache: Map<string, { embedding: number[]; timestamp: number }> = new Map();
private readonly CACHE_TTL = 3600000; // 1 hour cache TTL
10. Background Processing Architecture
Legacy vs Current System
Legacy: In-Memory Queue (jobQueueService)
- EventEmitter-based
- In-memory job storage
- Still initialized but being phased out
- Location:
backend/src/services/jobQueueService.ts
Current: Database-Backed Queue (jobProcessorService)
- Database-backed job storage
- Scheduled processing via Firebase Cloud Scheduler
- Location:
backend/src/services/jobProcessorService.ts
Job Processing Flow
Job Creation
│
▼
ProcessingJobModel.create()
│
▼
Status: 'pending' in database
│
▼
Scheduled Function (every 1 minute)
OR
Immediate processing via API
│
▼
JobProcessorService.processJobs()
│
▼
Get pending/retrying jobs (max 3 concurrent)
│
▼
Process jobs in parallel
│
▼
For each job:
- Mark as 'processing'
- Download file from GCS
- Call unifiedDocumentProcessor
- Update document status
- Mark job as 'completed' or 'failed'
Scheduled Function
File: backend/src/index.ts
export const processDocumentJobs = onSchedule({
schedule: 'every 1 minutes', // Minimum interval for Firebase Cloud Scheduler
timeoutSeconds: 900, // 15 minutes (max for Gen2 scheduled functions)
memory: '1GiB',
retryCount: 2, // Retry up to 2 times on failure
}, async (event) => {
logger.info('Processing document jobs scheduled function triggered', {
timestamp: new Date().toISOString(),
scheduleTime: event.scheduleTime,
});
try {
const { jobProcessorService } = await import('./services/jobProcessorService');
// Check for stuck jobs before processing (monitoring)
const { ProcessingJobModel } = await import('./models/ProcessingJobModel');
// Check for jobs stuck in processing status
const stuckProcessingJobs = await ProcessingJobModel.getStuckJobs(15); // Jobs stuck > 15 minutes
if (stuckProcessingJobs.length > 0) {
logger.warn('Found stuck processing jobs', {
count: stuckProcessingJobs.length,
jobIds: stuckProcessingJobs.map(j => j.id),
timestamp: new Date().toISOString(),
});
}
// Check for jobs stuck in pending status (alert if > 2 minutes)
const stuckPendingJobs = await ProcessingJobModel.getStuckPendingJobs(2); // Jobs pending > 2 minutes
if (stuckPendingJobs.length > 0) {
logger.warn('Found stuck pending jobs (may indicate processing issues)', {
count: stuckPendingJobs.length,
jobIds: stuckPendingJobs.map(j => j.id),
oldestJobAge: stuckPendingJobs[0] ? Math.round((Date.now() - new Date(stuckPendingJobs[0].created_at).getTime()) / 1000 / 60) : 0,
timestamp: new Date().toISOString(),
});
}
const result = await jobProcessorService.processJobs();
logger.info('Document jobs processing completed', {
...result,
timestamp: new Date().toISOString(),
});
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
const errorStack = error instanceof Error ? error.stack : undefined;
logger.error('Error processing document jobs', {
error: errorMessage,
stack: errorStack,
timestamp: new Date().toISOString(),
});
// Re-throw to trigger retry mechanism (up to retryCount times)
throw error;
}
});
Job States
pending → processing → completed
│ │
│ ▼
│ failed
│ │
└──────────────────────┘
│
▼
retrying
│
▼
(back to pending)
Concurrency Control
Max Concurrent Jobs: 3
private readonly MAX_CONCURRENT_JOBS = 3;
private readonly JOB_TIMEOUT_MINUTES = 15;
Processing Logic:
// Get pending jobs
const pendingJobs = await ProcessingJobModel.getPendingJobs(this.MAX_CONCURRENT_JOBS);
// Get retrying jobs (enabled - schema is updated)
const retryingJobs = await ProcessingJobModel.getRetryableJobs(
Math.max(0, this.MAX_CONCURRENT_JOBS - pendingJobs.length)
);
const allJobs = [...pendingJobs, ...retryingJobs];
if (allJobs.length === 0) {
logger.debug('No jobs to process');
return stats;
}
logger.info('Processing jobs', {
totalJobs: allJobs.length,
pendingJobs: pendingJobs.length,
retryingJobs: retryingJobs.length,
});
// Process jobs in parallel (up to MAX_CONCURRENT_JOBS)
const results = await Promise.allSettled(
allJobs.map((job) => this.processJob(job.id))
);
11. Frontend Architecture
Component Structure
Main Components:
DocumentUpload- File upload with drag-and-dropDocumentList- List of user's documents with statusDocumentViewer- View processed document and PDFAnalytics- Processing statistics dashboardUploadMonitoringDashboard- Real-time upload monitoring
State Management
AuthContext (frontend/src/contexts/AuthContext.tsx):
export const AuthProvider: React.FC<AuthProviderProps> = ({ children }) => {
const [user, setUser] = useState<User | null>(null);
const [token, setToken] = useState<string | null>(null);
const [isLoading, setIsLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const [isInitialized, setIsInitialized] = useState(false);
useEffect(() => {
setIsLoading(true);
// Listen for Firebase auth state changes
const unsubscribe = authService.onAuthStateChanged(async (firebaseUser) => {
try {
if (firebaseUser) {
const user = authService.getCurrentUser();
const token = await authService.getToken();
setUser(user);
setToken(token);
} else {
setUser(null);
setToken(null);
}
} catch (error) {
console.error('Auth state change error:', error);
setError('Authentication error occurred');
setUser(null);
setToken(null);
} finally {
setIsLoading(false);
setIsInitialized(true);
}
});
// Cleanup subscription on unmount
return () => unsubscribe();
}, []);
API Communication
Document Service (frontend/src/services/documentService.ts):
- Axios client with auth interceptor
- Automatic token refresh on 401 errors
- Progress tracking for uploads
- Error handling with user-friendly messages
Upload Flow:
async uploadDocument(
file: File,
onProgress?: (progress: number) => void,
signal?: AbortSignal
): Promise<Document> {
try {
// Check authentication before upload
const token = await authService.getToken();
if (!token) {
throw new Error('Authentication required. Please log in to upload documents.');
}
// Step 1: Get signed upload URL
onProgress?.(5); // 5% - Getting upload URL
const uploadUrlResponse = await apiClient.post('/documents/upload-url', {
fileName: file.name,
fileSize: file.size,
contentType: contentTypeForSigning
}, { signal });
const { documentId, uploadUrl } = uploadUrlResponse.data;
// Step 2: Upload directly to Firebase Storage
onProgress?.(10); // 10% - Starting direct upload
await this.uploadToFirebaseStorage(
file,
uploadUrl,
contentTypeForSigning,
(uploadProgress) => {
// Map upload progress (10-90%)
const mappedProgress = 10 + (uploadProgress * 0.8);
onProgress?.(mappedProgress);
},
signal
);
// Step 3: Confirm upload
onProgress?.(90); // 90% - Confirming upload
const confirmResponse = await apiClient.post(
`/documents/${documentId}/confirm-upload`,
{},
{ signal }
);
onProgress?.(100); // 100% - Complete
return confirmResponse.data.document;
} catch (error) {
// ... error handling ...
}
}
Real-Time Updates
Polling for Processing Status:
- Frontend polls
/documents/:idendpoint - Updates UI when status changes from 'processing' to 'completed'
- Shows error messages if status is 'failed'
Upload Progress:
- Real-time progress tracking via
onProgresscallback - Visual progress bar in
DocumentUploadcomponent
12. Configuration & Environment
Environment Variables
File: backend/src/config/env.ts
Key Configuration Categories:
-
LLM Provider:
LLM_PROVIDER- 'anthropic', 'openai', or 'openrouter'ANTHROPIC_API_KEY- Claude API keyOPENAI_API_KEY- OpenAI API keyOPENROUTER_API_KEY- OpenRouter API keyLLM_MODEL- Model name (e.g., 'claude-sonnet-4-5-20250929')LLM_MAX_TOKENS- Max output tokensLLM_MAX_INPUT_TOKENS- Max input tokens (default 200000)
-
Database:
SUPABASE_URL- Supabase project URLSUPABASE_SERVICE_KEY- Service role keySUPABASE_ANON_KEY- Anonymous key
-
Google Cloud:
GCLOUD_PROJECT_ID- GCP project IDGCS_BUCKET_NAME- Storage bucket nameDOCUMENT_AI_PROCESSOR_ID- Document AI processor IDDOCUMENT_AI_LOCATION- Processor location (default 'us')
-
Feature Flags:
AGENTIC_RAG_ENABLED- Enable/disable agentic RAG processing
Configuration Loading
Priority Order:
process.env(Firebase Functions v2)functions.config()(Firebase Functions v1 fallback).envfile (local development)
Validation: Joi schema validates all required environment variables
13. Debugging Guide
Key Log Points
Correlation IDs: Every request has a correlation ID for tracing
Structured Logging: Winston logger with structured data
Key Log Locations:
- Request Entry:
backend/src/index.ts- All incoming requests - Authentication:
backend/src/middleware/firebaseAuth.ts- Auth success/failure - Job Processing:
backend/src/services/jobProcessorService.ts- Job lifecycle - Document Processing:
backend/src/services/unifiedDocumentProcessor.ts- Processing steps - LLM Calls:
backend/src/services/llmService.ts- API calls and responses - Vector Search:
backend/src/services/vectorDatabaseService.ts- Search operations - Error Handling:
backend/src/middleware/errorHandler.ts- All errors with categorization
Common Failure Points
1. Vector Search Timeouts
- Symptom: "Vector search timeout after 10s"
- Cause: Searching across all documents without
document_idfilter - Fix: Always pass
documentIdtovectorDatabaseService.searchSimilar()
2. LLM API Failures
- Symptom: "LLM API call failed" or "Invalid JSON response"
- Cause: API rate limits, network issues, or invalid response format
- Fix: Check API keys, retry logic, and response validation
3. GCS Upload Failures
- Symptom: "Failed to upload to GCS" or "Signed URL expired"
- Cause: Credential issues, bucket permissions, or URL expiration
- Fix: Check GCS credentials and bucket configuration
4. Job Stuck in Processing
- Symptom: Job status remains 'processing' for > 15 minutes
- Cause: Process crashed, timeout, or error not caught
- Fix: Check logs, reset stuck jobs, investigate error
5. Document AI Failures
- Symptom: "Failed to extract text from document"
- Cause: Document AI API error or invalid file format
- Fix: Check Document AI processor configuration, fallback to pdf-parse
Diagnostic Tools
Health Check Endpoints:
GET /health- Basic health checkGET /health/config- Configuration healthGET /health/agentic-rag- Agentic RAG health status
Monitoring Endpoints:
GET /monitoring/upload-metrics- Upload statisticsGET /monitoring/upload-health- Upload healthGET /monitoring/real-time-stats- Real-time statistics
Database Debugging:
-- Check pending jobs
SELECT * FROM processing_jobs WHERE status = 'pending' ORDER BY created_at DESC;
-- Check stuck jobs
SELECT * FROM processing_jobs
WHERE status = 'processing'
AND started_at < NOW() - INTERVAL '15 minutes';
-- Check document status
SELECT id, original_file_name, status, created_at
FROM documents
WHERE user_id = '<user_id>'
ORDER BY created_at DESC;
Job Inspection:
// Get job details
const job = await ProcessingJobModel.findById(jobId);
// Check job error
console.log('Job error:', job.error);
// Check job result
console.log('Job result:', job.result);
Debugging Workflow
- Identify the Issue: Check error logs with correlation ID
- Trace the Request: Follow correlation ID through logs
- Check Job Status: Query
processing_jobstable for job state - Check Document Status: Query
documentstable for document state - Review Service Logs: Check specific service logs for detailed errors
- Test Components: Test individual services in isolation
- Check External Services: Verify GCS, Document AI, LLM APIs are accessible
14. Optimization Opportunities
Identified Bottlenecks
1. Vector Search Performance
- Current: 10-second timeout, can be slow for large document sets
- Optimization: Ensure
document_idfilter is always used - Future: Consider indexing optimizations, batch search
2. LLM API Calls
- Current: Sequential processing, no caching of similar requests
- Optimization: Implement response caching for similar documents
- Future: Batch API calls, use smaller models for simpler tasks
3. PDF Generation
- Current: Puppeteer can be memory-intensive
- Optimization: Page pooling already implemented
- Future: Consider serverless PDF generation service
4. Database Queries
- Current: Some queries don't use indexes effectively
- Optimization: Add indexes on frequently queried columns
- Future: Query optimization, connection pooling tuning
Memory Usage Patterns
Chunk Processing:
- Processes chunks in batches to limit memory
- Cleans up processed chunks after storage
- Optimization: Consider streaming for very large documents
PDF Generation:
- Page pooling limits memory usage
- Browser instance reuse reduces overhead
- Optimization: Consider headless browser optimization
API Call Optimization
Embedding Generation:
- Current: Max 5 concurrent calls
- Optimization: Tune based on API rate limits
- Future: Batch embedding API if available
LLM Calls:
- Current: Single call per document
- Optimization: Use smaller models for simpler tasks
- Future: Implement response caching
Database Query Optimization
Frequently Queried Tables:
documents- Add index onuser_id,statusprocessing_jobs- Add index onstatus,created_atdocument_chunks- Add index ondocument_id,chunk_index
Vector Search:
- Current: Uses
match_document_chunksfunction - Optimization: Ensure
document_idfilter is always used - Future: Consider HNSW index for faster similarity search
Appendix: Key File Locations
Backend Services
backend/src/services/unifiedDocumentProcessor.ts- Main orchestratorbackend/src/services/optimizedAgenticRAGProcessor.ts- AI processing enginebackend/src/services/jobProcessorService.ts- Job processorbackend/src/services/vectorDatabaseService.ts- Vector operationsbackend/src/services/llmService.ts- LLM interactionsbackend/src/services/documentAiProcessor.ts- Document AI integrationbackend/src/services/pdfGenerationService.ts- PDF generationbackend/src/services/fileStorageService.ts- GCS operations
Backend Models
backend/src/models/DocumentModel.ts- Document data modelbackend/src/models/ProcessingJobModel.ts- Job data modelbackend/src/models/VectorDatabaseModel.ts- Vector data model
Backend Routes
backend/src/routes/documents.ts- Document endpointsbackend/src/routes/vector.ts- Vector endpointsbackend/src/routes/monitoring.ts- Monitoring endpoints
Backend Controllers
backend/src/controllers/documentController.ts- Document controller
Frontend Services
frontend/src/services/documentService.ts- Document API clientfrontend/src/services/authService.ts- Authentication service
Frontend Components
frontend/src/components/DocumentUpload.tsx- Upload componentfrontend/src/components/DocumentList.tsx- Document listfrontend/src/components/DocumentViewer.tsx- Document viewer
Configuration
backend/src/config/env.ts- Environment configurationbackend/src/config/supabase.ts- Supabase configurationbackend/src/config/firebase.ts- Firebase configuration
SQL
backend/sql/fix_vector_search_timeout.sql- Vector search optimization
End of Architecture Summary