Files
cim_summary/.planning/codebase/INTEGRATIONS.md
2026-02-24 10:28:22 -05:00

248 lines
9.8 KiB
Markdown

# External Integrations
**Analysis Date:** 2026-02-24
## APIs & External Services
**Document Processing:**
- Google Document AI
- Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing
- Client: `@google-cloud/documentai` 9.3.0
- Implementation: `backend/src/services/documentAiProcessor.ts`
- Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS` or default credentials
- Configuration: Processor ID from `DOCUMENT_AI_PROCESSOR_ID`, location from `DOCUMENT_AI_LOCATION` (default: 'us')
- Max pages per chunk: 15 pages (configurable)
**Large Language Models:**
- OpenAI
- Purpose: LLM analysis of document content, embeddings for vector search
- SDK/Client: `openai` 5.10.2
- Auth: API key from `OPENAI_API_KEY`
- Models: Default `gpt-4-turbo`, embeddings via `text-embedding-3-small`
- Implementation: `backend/src/services/llmService.ts` with provider abstraction
- Retry: 3 attempts with exponential backoff
- Anthropic Claude
- Purpose: LLM analysis and document summary generation
- SDK/Client: `@anthropic-ai/sdk` 0.57.0
- Auth: API key from `ANTHROPIC_API_KEY`
- Models: Default `claude-sonnet-4-20250514` (configurable via `LLM_MODEL`)
- Implementation: `backend/src/services/llmService.ts`
- Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors)
- Retry: 3 attempts with exponential backoff
- OpenRouter
- Purpose: Alternative LLM provider supporting multiple models through single API
- SDK/Client: HTTP requests via `axios` to OpenRouter API
- Auth: `OPENROUTER_API_KEY` or optional Bring-Your-Own-Key mode (`OPENROUTER_USE_BYOK`)
- Configuration: `LLM_PROVIDER: 'openrouter'` activates this provider
- Implementation: `backend/src/services/llmService.ts`
**File Storage:**
- Google Cloud Storage (GCS)
- Purpose: Store uploaded PDFs, processed documents, and generated PDFs
- SDK/Client: `@google-cloud/storage` 7.16.0
- Auth: Google Application Credentials via `GOOGLE_APPLICATION_CREDENTIALS`
- Buckets:
- Input: `GCS_BUCKET_NAME` for uploaded documents
- Output: `DOCUMENT_AI_OUTPUT_BUCKET_NAME` for processing results
- Implementation: `backend/src/services/fileStorageService.ts` and `backend/src/services/documentAiProcessor.ts`
- Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
## Data Storage
**Databases:**
- Supabase PostgreSQL
- Connection: `SUPABASE_URL` for PostgREST API, `DATABASE_URL` for direct PostgreSQL
- Client: `@supabase/supabase-js` 2.53.0 for REST API, `pg` 8.11.3 for direct pool connections
- Auth: `SUPABASE_ANON_KEY` for client operations, `SUPABASE_SERVICE_KEY` for server operations
- Implementation:
- `backend/src/config/supabase.ts` - Client initialization with 30-second request timeout
- `backend/src/models/` - All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel)
- Vector Support: pgvector extension for semantic search
- Tables:
- `users` - User accounts and authentication data
- `documents` - CIM documents with status tracking
- `document_chunks` - Text chunks with embeddings for vector search
- `document_feedback` - User feedback on summaries
- `document_versions` - Document version history
- `document_audit_logs` - Audit trail for compliance
- `processing_jobs` - Background job queue with status tracking
- `performance_metrics` - System performance data
- Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout
**Vector Database:**
- Supabase pgvector (built into PostgreSQL)
- Purpose: Semantic search and RAG context retrieval
- Implementation: `backend/src/services/vectorDatabaseService.ts`
- Embedding generation: Via OpenAI `text-embedding-3-small` (embedded in service)
- Search: Cosine similarity via Supabase RPC calls
- Semantic cache: 1-hour TTL for cached embeddings
**File Storage:**
- Google Cloud Storage (primary storage above)
- Local filesystem (fallback for development, stored in `uploads/` directory)
**Caching:**
- In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL
- No external cache service (Redis, Memcached) currently used
## Authentication & Identity
**Auth Provider:**
- Firebase Authentication
- Purpose: User authentication, JWT token generation and verification
- Client: `firebase` 12.0.0 (frontend at `frontend/src/config/firebase.ts`)
- Admin: `firebase-admin` 13.4.0 (backend at `backend/src/config/firebase.ts`)
- Implementation:
- Frontend: `frontend/src/services/authService.ts` - Login, logout, token refresh
- Backend: `backend/src/middleware/firebaseAuth.ts` - Token verification middleware
- Project: `cim-summarizer` (hardcoded in config)
- Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header
**Token-Based Auth:**
- JWT (JSON Web Tokens)
- Purpose: API request authentication
- Implementation: `backend/src/middleware/firebaseAuth.ts`
- Verification: Firebase Admin SDK verifies token signature and expiration
- Header: `Authorization: Bearer <token>`
**Fallback Auth (for service-to-service):**
- API Key based (not currently exposed but framework supports it in `backend/src/config/env.ts`)
## Monitoring & Observability
**Error Tracking:**
- No external error tracking service configured
- Errors logged via Winston logger with correlation IDs for tracing
**Logs:**
- Winston logger 3.11.0 - Structured JSON logging at `backend/src/utils/logger.ts`
- Transports: Console (development), File-based for production logs
- Correlation ID middleware at `backend/src/middleware/errorHandler.ts` - Every request traced
- Request logging: Morgan 1.10.0 with Winston transport
- Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments
**Monitoring Endpoints:**
- `GET /health` - Basic health check with uptime and environment info
- `GET /health/config` - Configuration validation status
- `GET /health/agentic-rag` - Agentic RAG system health (placeholder)
- `GET /monitoring/dashboard` - Aggregated system metrics (queryable by time range)
## CI/CD & Deployment
**Hosting:**
- **Backend**:
- Firebase Cloud Functions (default, Node.js 20 runtime)
- Google Cloud Run (alternative containerized deployment)
- Configuration: `backend/firebase.json` defines function source, runtime, and predeploy hooks
- **Frontend**:
- Firebase Hosting (CDN-backed static hosting)
- Configuration: Defined in `frontend/` directory with `firebase.json`
**Deployment Commands:**
```bash
# Backend deployment
npm run deploy:firebase # Deploy functions to Firebase
npm run deploy:cloud-run # Deploy to Cloud Run
npm run docker:build # Build Docker image
npm run docker:push # Push to GCR
# Frontend deployment
npm run deploy:firebase # Deploy to Firebase Hosting
npm run deploy:preview # Deploy to preview channel
# Emulator
npm run emulator # Run Firebase emulator locally
npm run emulator:ui # Run emulator with UI
```
**Build Pipeline:**
- TypeScript compilation: `tsc` targets ES2020
- Predeploy: Defined in `firebase.json` - runs `npm run build`
- Docker image for Cloud Run: `Dockerfile` in backend root
## Environment Configuration
**Required env vars (Production):**
```
NODE_ENV=production
LLM_PROVIDER=anthropic
GCLOUD_PROJECT_ID=cim-summarizer
DOCUMENT_AI_PROCESSOR_ID=<processor-id>
GCS_BUCKET_NAME=<bucket-name>
DOCUMENT_AI_OUTPUT_BUCKET_NAME=<output-bucket>
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_ANON_KEY=<anon-key>
SUPABASE_SERVICE_KEY=<service-key>
DATABASE_URL=postgresql://postgres:<password>@aws-0-us-central-1.pooler.supabase.com:6543/postgres
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
FIREBASE_PROJECT_ID=cim-summarizer
```
**Optional env vars:**
```
DOCUMENT_AI_LOCATION=us
VECTOR_PROVIDER=supabase
LLM_MODEL=claude-sonnet-4-20250514
LLM_MAX_TOKENS=16000
LLM_TEMPERATURE=0.1
OPENROUTER_API_KEY=<key>
OPENROUTER_USE_BYOK=true
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
```
**Secrets location:**
- Development: `.env` file (gitignored, never committed)
- Production: Firebase Functions secrets via `firebase functions:secrets:set`
- Google Credentials: `backend/serviceAccountKey.json` for local dev, service account in Cloud Functions environment
## Webhooks & Callbacks
**Incoming:**
- No external webhooks currently configured
- All document processing triggered by HTTP POST to `POST /documents/upload`
**Outgoing:**
- No outgoing webhooks implemented
- Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue
**Real-time Monitoring:**
- Server-Sent Events (SSE) not implemented
- Polling endpoints for progress:
- `GET /documents/{id}/progress` - Document processing progress
- `GET /documents/queue/status` - Job queue status (frontend polls every 5 seconds)
## Rate Limiting & Quotas
**API Rate Limits:**
- Express rate limiter: 1000 requests per 15 minutes per IP
- LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling)
- OpenAI rate limits: Handled by SDK with backoff
**File Upload Limits:**
- Max file size: 100MB (configurable via `MAX_FILE_SIZE`)
- Allowed MIME types: `application/pdf` (configurable via `ALLOWED_FILE_TYPES`)
## Network Configuration
**CORS Origins (Allowed):**
- `https://cim-summarizer.web.app` (production)
- `https://cim-summarizer.firebaseapp.com` (production)
- `http://localhost:3000` (development)
- `http://localhost:5173` (development)
- `https://localhost:3000` (SSL local dev)
- `https://localhost:5173` (SSL local dev)
**Port Mappings:**
- Frontend dev: Port 5173 (Vite dev server)
- Backend dev: Port 5001 (Firebase Functions emulator)
- Backend API: Port 5000 (Express in standard deployment)
- Vite proxy to backend: `/api` routes proxied from port 5173 to `http://localhost:5000`
---
*Integration audit: 2026-02-24*