# External Integrations **Analysis Date:** 2026-02-24 ## APIs & External Services **Document Processing:** - Google Document AI - Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing - Client: `@google-cloud/documentai` 9.3.0 - Implementation: `backend/src/services/documentAiProcessor.ts` - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS` or default credentials - Configuration: Processor ID from `DOCUMENT_AI_PROCESSOR_ID`, location from `DOCUMENT_AI_LOCATION` (default: 'us') - Max pages per chunk: 15 pages (configurable) **Large Language Models:** - OpenAI - Purpose: LLM analysis of document content, embeddings for vector search - SDK/Client: `openai` 5.10.2 - Auth: API key from `OPENAI_API_KEY` - Models: Default `gpt-4-turbo`, embeddings via `text-embedding-3-small` - Implementation: `backend/src/services/llmService.ts` with provider abstraction - Retry: 3 attempts with exponential backoff - Anthropic Claude - Purpose: LLM analysis and document summary generation - SDK/Client: `@anthropic-ai/sdk` 0.57.0 - Auth: API key from `ANTHROPIC_API_KEY` - Models: Default `claude-sonnet-4-20250514` (configurable via `LLM_MODEL`) - Implementation: `backend/src/services/llmService.ts` - Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors) - Retry: 3 attempts with exponential backoff - OpenRouter - Purpose: Alternative LLM provider supporting multiple models through single API - SDK/Client: HTTP requests via `axios` to OpenRouter API - Auth: `OPENROUTER_API_KEY` or optional Bring-Your-Own-Key mode (`OPENROUTER_USE_BYOK`) - Configuration: `LLM_PROVIDER: 'openrouter'` activates this provider - Implementation: `backend/src/services/llmService.ts` **File Storage:** - Google Cloud Storage (GCS) - Purpose: Store uploaded PDFs, processed documents, and generated PDFs - SDK/Client: `@google-cloud/storage` 7.16.0 - Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS` - Buckets: - Input: `GCS_BUCKET_NAME` for uploaded documents - Output: `DOCUMENT_AI_OUTPUT_BUCKET_NAME` for processing results - Implementation: `backend/src/services/fileStorageService.ts` and `backend/src/services/documentAiProcessor.ts` - Max file size: 100MB (configurable via `MAX_FILE_SIZE`) ## Data Storage **Databases:** - Supabase PostgreSQL - Connection: `SUPABASE_URL` for PostgREST API, `DATABASE_URL` for direct PostgreSQL - Client: `@supabase/supabase-js` 2.53.0 for REST API, `pg` 8.11.3 for direct pool connections - Auth: `SUPABASE_ANON_KEY` for client operations, `SUPABASE_SERVICE_KEY` for server operations - Implementation: - `backend/src/config/supabase.ts` - Client initialization with 30-second request timeout - `backend/src/models/` - All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel) - Vector Support: pgvector extension for semantic search - Tables: - `users` - User accounts and authentication data - `documents` - CIM documents with status tracking - `document_chunks` - Text chunks with embeddings for vector search - `document_feedback` - User feedback on summaries - `document_versions` - Document version history - `document_audit_logs` - Audit trail for compliance - `processing_jobs` - Background job queue with status tracking - `performance_metrics` - System performance data - Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout **Vector Database:** - Supabase pgvector (built into PostgreSQL) - Purpose: Semantic search and RAG context retrieval - Implementation: `backend/src/services/vectorDatabaseService.ts` - Embedding generation: Via OpenAI `text-embedding-3-small` (embedded in service) - Search: Cosine similarity via Supabase RPC calls - Semantic cache: 1-hour TTL for cached embeddings **File Storage:** - Google Cloud Storage (primary storage above) - Local filesystem (fallback for development, stored in `uploads/` directory) **Caching:** - In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL - No external cache service (Redis, Memcached) currently used ## Authentication & Identity **Auth Provider:** - Firebase Authentication - Purpose: User authentication, JWT token generation and verification - Client: `firebase` 12.0.0 (frontend at `frontend/src/config/firebase.ts`) - Admin: `firebase-admin` 13.4.0 (backend at `backend/src/config/firebase.ts`) - Implementation: - Frontend: `frontend/src/services/authService.ts` - Login, logout, token refresh - Backend: `backend/src/middleware/firebaseAuth.ts` - Token verification middleware - Project: `cim-summarizer` (hardcoded in config) - Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header **Token-Based Auth:** - JWT (JSON Web Tokens) - Purpose: API request authentication - Implementation: `backend/src/middleware/firebaseAuth.ts` - Verification: Firebase Admin SDK verifies token signature and expiration - Header: `Authorization: Bearer ` **Fallback Auth (for service-to-service):** - API Key based (not currently exposed but framework supports it in `backend/src/config/env.ts`) ## Monitoring & Observability **Error Tracking:** - No external error tracking service configured - Errors logged via Winston logger with correlation IDs for tracing **Logs:** - Winston logger 3.11.0 - Structured JSON logging at `backend/src/utils/logger.ts` - Transports: Console (development), File-based for production logs - Correlation ID middleware at `backend/src/middleware/errorHandler.ts` - Every request traced - Request logging: Morgan 1.10.0 with Winston transport - Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments **Monitoring Endpoints:** - `GET /health` - Basic health check with uptime and environment info - `GET /health/config` - Configuration validation status - `GET /health/agentic-rag` - Agentic RAG system health (placeholder) - `GET /monitoring/dashboard` - Aggregated system metrics (queryable by time range) ## CI/CD & Deployment **Hosting:** - **Backend**: - Firebase Cloud Functions (default, Node.js 20 runtime) - Google Cloud Run (alternative containerized deployment) - Configuration: `backend/firebase.json` defines function source, runtime, and predeploy hooks - **Frontend**: - Firebase Hosting (CDN-backed static hosting) - Configuration: Defined in `frontend/` directory with `firebase.json` **Deployment Commands:** ```bash # Backend deployment npm run deploy:firebase # Deploy functions to Firebase npm run deploy:cloud-run # Deploy to Cloud Run npm run docker:build # Build Docker image npm run docker:push # Push to GCR # Frontend deployment npm run deploy:firebase # Deploy to Firebase Hosting npm run deploy:preview # Deploy to preview channel # Emulator npm run emulator # Run Firebase emulator locally npm run emulator:ui # Run emulator with UI ``` **Build Pipeline:** - TypeScript compilation: `tsc` targets ES2020 - Predeploy: Defined in `firebase.json` - runs `npm run build` - Docker image for Cloud Run: `Dockerfile` in backend root ## Environment Configuration **Required env vars (Production):** ``` NODE_ENV=production LLM_PROVIDER=anthropic GCLOUD_PROJECT_ID=cim-summarizer DOCUMENT_AI_PROCESSOR_ID= GCS_BUCKET_NAME= DOCUMENT_AI_OUTPUT_BUCKET_NAME= SUPABASE_URL=https://.supabase.co SUPABASE_ANON_KEY= SUPABASE_SERVICE_KEY= DATABASE_URL=postgresql://postgres:@aws-0-us-central-1.pooler.supabase.com:6543/postgres ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... FIREBASE_PROJECT_ID=cim-summarizer ``` **Optional env vars:** ``` DOCUMENT_AI_LOCATION=us VECTOR_PROVIDER=supabase LLM_MODEL=claude-sonnet-4-20250514 LLM_MAX_TOKENS=16000 LLM_TEMPERATURE=0.1 OPENROUTER_API_KEY= OPENROUTER_USE_BYOK=true GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json ``` **Secrets location:** - Development: `.env` file (gitignored, never committed) - Production: Firebase Functions secrets via `firebase functions:secrets:set` - Google Credentials: `backend/serviceAccountKey.json` for local dev, service account in Cloud Functions environment ## Webhooks & Callbacks **Incoming:** - No external webhooks currently configured - All document processing triggered by HTTP POST to `POST /documents/upload` **Outgoing:** - No outgoing webhooks implemented - Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue **Real-time Monitoring:** - Server-Sent Events (SSE) not implemented - Polling endpoints for progress: - `GET /documents/{id}/progress` - Document processing progress - `GET /documents/queue/status` - Job queue status (frontend polls every 5 seconds) ## Rate Limiting & Quotas **API Rate Limits:** - Express rate limiter: 1000 requests per 15 minutes per IP - LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling) - OpenAI rate limits: Handled by SDK with backoff **File Upload Limits:** - Max file size: 100MB (configurable via `MAX_FILE_SIZE`) - Allowed MIME types: `application/pdf` (configurable via `ALLOWED_FILE_TYPES`) ## Network Configuration **CORS Origins (Allowed):** - `https://cim-summarizer.web.app` (production) - `https://cim-summarizer.firebaseapp.com` (production) - `http://localhost:3000` (development) - `http://localhost:5173` (development) - `https://localhost:3000` (SSL local dev) - `https://localhost:5173` (SSL local dev) **Port Mappings:** - Frontend dev: Port 5173 (Vite dev server) - Backend dev: Port 5001 (Firebase Functions emulator) - Backend API: Port 5000 (Express in standard deployment) - Vite proxy to backend: `/api` routes proxied from port 5173 to `http://localhost:5000` --- *Integration audit: 2026-02-24*