9.8 KiB
9.8 KiB
External Integrations
Analysis Date: 2026-02-24
APIs & External Services
Document Processing:
- Google Document AI
- Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing
- Client:
@google-cloud/documentai9.3.0 - Implementation:
backend/src/services/documentAiProcessor.ts - Auth: Google Application Credentials via
GOOGLE_APPLICATION_CREDENTIALSor default credentials - Configuration: Processor ID from
DOCUMENT_AI_PROCESSOR_ID, location fromDOCUMENT_AI_LOCATION(default: 'us') - Max pages per chunk: 15 pages (configurable)
Large Language Models:
-
OpenAI
- Purpose: LLM analysis of document content, embeddings for vector search
- SDK/Client:
openai5.10.2 - Auth: API key from
OPENAI_API_KEY - Models: Default
gpt-4-turbo, embeddings viatext-embedding-3-small - Implementation:
backend/src/services/llmService.tswith provider abstraction - Retry: 3 attempts with exponential backoff
-
Anthropic Claude
- Purpose: LLM analysis and document summary generation
- SDK/Client:
@anthropic-ai/sdk0.57.0 - Auth: API key from
ANTHROPIC_API_KEY - Models: Default
claude-sonnet-4-20250514(configurable viaLLM_MODEL) - Implementation:
backend/src/services/llmService.ts - Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors)
- Retry: 3 attempts with exponential backoff
-
OpenRouter
- Purpose: Alternative LLM provider supporting multiple models through single API
- SDK/Client: HTTP requests via
axiosto OpenRouter API - Auth:
OPENROUTER_API_KEYor optional Bring-Your-Own-Key mode (OPENROUTER_USE_BYOK) - Configuration:
LLM_PROVIDER: 'openrouter'activates this provider - Implementation:
backend/src/services/llmService.ts
File Storage:
- Google Cloud Storage (GCS)
- Purpose: Store uploaded PDFs, processed documents, and generated PDFs
- SDK/Client:
@google-cloud/storage7.16.0 - Auth: Google Application Credentials via
GOOGLE_APPLICATION_CREDENTIALS - Buckets:
- Input:
GCS_BUCKET_NAMEfor uploaded documents - Output:
DOCUMENT_AI_OUTPUT_BUCKET_NAMEfor processing results
- Input:
- Implementation:
backend/src/services/fileStorageService.tsandbackend/src/services/documentAiProcessor.ts - Max file size: 100MB (configurable via
MAX_FILE_SIZE)
Data Storage
Databases:
- Supabase PostgreSQL
- Connection:
SUPABASE_URLfor PostgREST API,DATABASE_URLfor direct PostgreSQL - Client:
@supabase/supabase-js2.53.0 for REST API,pg8.11.3 for direct pool connections - Auth:
SUPABASE_ANON_KEYfor client operations,SUPABASE_SERVICE_KEYfor server operations - Implementation:
backend/src/config/supabase.ts- Client initialization with 30-second request timeoutbackend/src/models/- All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel)
- Vector Support: pgvector extension for semantic search
- Tables:
users- User accounts and authentication datadocuments- CIM documents with status trackingdocument_chunks- Text chunks with embeddings for vector searchdocument_feedback- User feedback on summariesdocument_versions- Document version historydocument_audit_logs- Audit trail for complianceprocessing_jobs- Background job queue with status trackingperformance_metrics- System performance data
- Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout
- Connection:
Vector Database:
- Supabase pgvector (built into PostgreSQL)
- Purpose: Semantic search and RAG context retrieval
- Implementation:
backend/src/services/vectorDatabaseService.ts - Embedding generation: Via OpenAI
text-embedding-3-small(embedded in service) - Search: Cosine similarity via Supabase RPC calls
- Semantic cache: 1-hour TTL for cached embeddings
File Storage:
- Google Cloud Storage (primary storage above)
- Local filesystem (fallback for development, stored in
uploads/directory)
Caching:
- In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL
- No external cache service (Redis, Memcached) currently used
Authentication & Identity
Auth Provider:
- Firebase Authentication
- Purpose: User authentication, JWT token generation and verification
- Client:
firebase12.0.0 (frontend atfrontend/src/config/firebase.ts) - Admin:
firebase-admin13.4.0 (backend atbackend/src/config/firebase.ts) - Implementation:
- Frontend:
frontend/src/services/authService.ts- Login, logout, token refresh - Backend:
backend/src/middleware/firebaseAuth.ts- Token verification middleware
- Frontend:
- Project:
cim-summarizer(hardcoded in config) - Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header
Token-Based Auth:
- JWT (JSON Web Tokens)
- Purpose: API request authentication
- Implementation:
backend/src/middleware/firebaseAuth.ts - Verification: Firebase Admin SDK verifies token signature and expiration
- Header:
Authorization: Bearer <token>
Fallback Auth (for service-to-service):
- API Key based (not currently exposed but framework supports it in
backend/src/config/env.ts)
Monitoring & Observability
Error Tracking:
- No external error tracking service configured
- Errors logged via Winston logger with correlation IDs for tracing
Logs:
- Winston logger 3.11.0 - Structured JSON logging at
backend/src/utils/logger.ts - Transports: Console (development), File-based for production logs
- Correlation ID middleware at
backend/src/middleware/errorHandler.ts- Every request traced - Request logging: Morgan 1.10.0 with Winston transport
- Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments
Monitoring Endpoints:
GET /health- Basic health check with uptime and environment infoGET /health/config- Configuration validation statusGET /health/agentic-rag- Agentic RAG system health (placeholder)GET /monitoring/dashboard- Aggregated system metrics (queryable by time range)
CI/CD & Deployment
Hosting:
-
Backend:
- Firebase Cloud Functions (default, Node.js 20 runtime)
- Google Cloud Run (alternative containerized deployment)
- Configuration:
backend/firebase.jsondefines function source, runtime, and predeploy hooks
-
Frontend:
- Firebase Hosting (CDN-backed static hosting)
- Configuration: Defined in
frontend/directory withfirebase.json
Deployment Commands:
# Backend deployment
npm run deploy:firebase # Deploy functions to Firebase
npm run deploy:cloud-run # Deploy to Cloud Run
npm run docker:build # Build Docker image
npm run docker:push # Push to GCR
# Frontend deployment
npm run deploy:firebase # Deploy to Firebase Hosting
npm run deploy:preview # Deploy to preview channel
# Emulator
npm run emulator # Run Firebase emulator locally
npm run emulator:ui # Run emulator with UI
Build Pipeline:
- TypeScript compilation:
tsctargets ES2020 - Predeploy: Defined in
firebase.json- runsnpm run build - Docker image for Cloud Run:
Dockerfilein backend root
Environment Configuration
Required env vars (Production):
NODE_ENV=production
LLM_PROVIDER=anthropic
GCLOUD_PROJECT_ID=cim-summarizer
DOCUMENT_AI_PROCESSOR_ID=<processor-id>
GCS_BUCKET_NAME=<bucket-name>
DOCUMENT_AI_OUTPUT_BUCKET_NAME=<output-bucket>
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_ANON_KEY=<anon-key>
SUPABASE_SERVICE_KEY=<service-key>
DATABASE_URL=postgresql://postgres:<password>@aws-0-us-central-1.pooler.supabase.com:6543/postgres
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
FIREBASE_PROJECT_ID=cim-summarizer
Optional env vars:
DOCUMENT_AI_LOCATION=us
VECTOR_PROVIDER=supabase
LLM_MODEL=claude-sonnet-4-20250514
LLM_MAX_TOKENS=16000
LLM_TEMPERATURE=0.1
OPENROUTER_API_KEY=<key>
OPENROUTER_USE_BYOK=true
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
Secrets location:
- Development:
.envfile (gitignored, never committed) - Production: Firebase Functions secrets via
firebase functions:secrets:set - Google Credentials:
backend/serviceAccountKey.jsonfor local dev, service account in Cloud Functions environment
Webhooks & Callbacks
Incoming:
- No external webhooks currently configured
- All document processing triggered by HTTP POST to
POST /documents/upload
Outgoing:
- No outgoing webhooks implemented
- Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue
Real-time Monitoring:
- Server-Sent Events (SSE) not implemented
- Polling endpoints for progress:
GET /documents/{id}/progress- Document processing progressGET /documents/queue/status- Job queue status (frontend polls every 5 seconds)
Rate Limiting & Quotas
API Rate Limits:
- Express rate limiter: 1000 requests per 15 minutes per IP
- LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling)
- OpenAI rate limits: Handled by SDK with backoff
File Upload Limits:
- Max file size: 100MB (configurable via
MAX_FILE_SIZE) - Allowed MIME types:
application/pdf(configurable viaALLOWED_FILE_TYPES)
Network Configuration
CORS Origins (Allowed):
https://cim-summarizer.web.app(production)https://cim-summarizer.firebaseapp.com(production)http://localhost:3000(development)http://localhost:5173(development)https://localhost:3000(SSL local dev)https://localhost:5173(SSL local dev)
Port Mappings:
- Frontend dev: Port 5173 (Vite dev server)
- Backend dev: Port 5001 (Firebase Functions emulator)
- Backend API: Port 5000 (Express in standard deployment)
- Vite proxy to backend:
/apiroutes proxied from port 5173 tohttp://localhost:5000
Integration audit: 2026-02-24