admin/cim_summary

Fork 0

Files

admin e6e1b1fa6f docs: map existing codebase

2026-02-24 10:28:22 -05:00

9.8 KiB

Raw Blame History

External Integrations

Analysis Date: 2026-02-24

APIs & External Services

Document Processing:

Google Document AI
- Purpose: OCR and text extraction from PDF documents with entity recognition and table parsing
- Client: @google-cloud/documentai 9.3.0
- Implementation: backend/src/services/documentAiProcessor.ts
- Auth: Google Application Credentials via GOOGLE_APPLICATION_CREDENTIALS or default credentials
- Configuration: Processor ID from DOCUMENT_AI_PROCESSOR_ID, location from DOCUMENT_AI_LOCATION (default: 'us')
- Max pages per chunk: 15 pages (configurable)

Large Language Models:

OpenAI
- Purpose: LLM analysis of document content, embeddings for vector search
- SDK/Client: openai 5.10.2
- Auth: API key from OPENAI_API_KEY
- Models: Default gpt-4-turbo, embeddings via text-embedding-3-small
- Implementation: backend/src/services/llmService.ts with provider abstraction
- Retry: 3 attempts with exponential backoff
Anthropic Claude
- Purpose: LLM analysis and document summary generation
- SDK/Client: @anthropic-ai/sdk 0.57.0
- Auth: API key from ANTHROPIC_API_KEY
- Models: Default claude-sonnet-4-20250514 (configurable via LLM_MODEL)
- Implementation: backend/src/services/llmService.ts
- Concurrency: Max 1 concurrent LLM call to prevent rate limiting (Anthropic 429 errors)
- Retry: 3 attempts with exponential backoff
OpenRouter
- Purpose: Alternative LLM provider supporting multiple models through single API
- SDK/Client: HTTP requests via axios to OpenRouter API
- Auth: OPENROUTER_API_KEY or optional Bring-Your-Own-Key mode (OPENROUTER_USE_BYOK)
- Configuration: LLM_PROVIDER: 'openrouter' activates this provider
- Implementation: backend/src/services/llmService.ts

File Storage:

Google Cloud Storage (GCS)
- Purpose: Store uploaded PDFs, processed documents, and generated PDFs
- SDK/Client: @google-cloud/storage 7.16.0
- Auth: Google Application Credentials via GOOGLE_APPLICATION_CREDENTIALS
- Buckets:
  - Input: GCS_BUCKET_NAME for uploaded documents
  - Output: DOCUMENT_AI_OUTPUT_BUCKET_NAME for processing results
- Implementation: backend/src/services/fileStorageService.ts and backend/src/services/documentAiProcessor.ts
- Max file size: 100MB (configurable via MAX_FILE_SIZE)

Data Storage

Databases:

Supabase PostgreSQL
- Connection: SUPABASE_URL for PostgREST API, DATABASE_URL for direct PostgreSQL
- Client: @supabase/supabase-js 2.53.0 for REST API, pg 8.11.3 for direct pool connections
- Auth: SUPABASE_ANON_KEY for client operations, SUPABASE_SERVICE_KEY for server operations
- Implementation:
  - backend/src/config/supabase.ts - Client initialization with 30-second request timeout
  - backend/src/models/ - All data models (DocumentModel, UserModel, ProcessingJobModel, VectorDatabaseModel)
- Vector Support: pgvector extension for semantic search
- Tables:
  - users - User accounts and authentication data
  - documents - CIM documents with status tracking
  - document_chunks - Text chunks with embeddings for vector search
  - document_feedback - User feedback on summaries
  - document_versions - Document version history
  - document_audit_logs - Audit trail for compliance
  - processing_jobs - Background job queue with status tracking
  - performance_metrics - System performance data
- Connection pooling: Max 5 connections, 30-second idle timeout, 2-second connection timeout

Vector Database:

Supabase pgvector (built into PostgreSQL)
- Purpose: Semantic search and RAG context retrieval
- Implementation: backend/src/services/vectorDatabaseService.ts
- Embedding generation: Via OpenAI text-embedding-3-small (embedded in service)
- Search: Cosine similarity via Supabase RPC calls
- Semantic cache: 1-hour TTL for cached embeddings

File Storage:

Google Cloud Storage (primary storage above)
Local filesystem (fallback for development, stored in uploads/ directory)

Caching:

In-memory semantic cache (Supabase vector embeddings) with 1-hour TTL
No external cache service (Redis, Memcached) currently used

Authentication & Identity

Auth Provider:

Firebase Authentication
- Purpose: User authentication, JWT token generation and verification
- Client: firebase 12.0.0 (frontend at frontend/src/config/firebase.ts)
- Admin: firebase-admin 13.4.0 (backend at backend/src/config/firebase.ts)
- Implementation:
  - Frontend: frontend/src/services/authService.ts - Login, logout, token refresh
  - Backend: backend/src/middleware/firebaseAuth.ts - Token verification middleware
- Project: cim-summarizer (hardcoded in config)
- Flow: User logs in with Firebase, receives ID token, frontend sends token in Authorization header

Token-Based Auth:

JWT (JSON Web Tokens)
- Purpose: API request authentication
- Implementation: backend/src/middleware/firebaseAuth.ts
- Verification: Firebase Admin SDK verifies token signature and expiration
- Header: Authorization: Bearer <token>

Fallback Auth (for service-to-service):

API Key based (not currently exposed but framework supports it in backend/src/config/env.ts)

Monitoring & Observability

Error Tracking:

No external error tracking service configured
Errors logged via Winston logger with correlation IDs for tracing

Logs:

Winston logger 3.11.0 - Structured JSON logging at backend/src/utils/logger.ts
Transports: Console (development), File-based for production logs
Correlation ID middleware at backend/src/middleware/errorHandler.ts - Every request traced
Request logging: Morgan 1.10.0 with Winston transport
Firebase Functions Cloud Logging: Automatic integration for Cloud Functions deployments

Monitoring Endpoints:

GET /health - Basic health check with uptime and environment info
GET /health/config - Configuration validation status
GET /health/agentic-rag - Agentic RAG system health (placeholder)
GET /monitoring/dashboard - Aggregated system metrics (queryable by time range)

CI/CD & Deployment

Hosting:

Backend:
- Firebase Cloud Functions (default, Node.js 20 runtime)
- Google Cloud Run (alternative containerized deployment)
- Configuration: backend/firebase.json defines function source, runtime, and predeploy hooks
Frontend:
- Firebase Hosting (CDN-backed static hosting)
- Configuration: Defined in frontend/ directory with firebase.json

Deployment Commands:

# Backend deployment
npm run deploy:firebase          # Deploy functions to Firebase
npm run deploy:cloud-run        # Deploy to Cloud Run
npm run docker:build            # Build Docker image
npm run docker:push             # Push to GCR

# Frontend deployment
npm run deploy:firebase         # Deploy to Firebase Hosting
npm run deploy:preview          # Deploy to preview channel

# Emulator
npm run emulator                # Run Firebase emulator locally
npm run emulator:ui             # Run emulator with UI

Build Pipeline:

TypeScript compilation: tsc targets ES2020
Predeploy: Defined in firebase.json - runs npm run build
Docker image for Cloud Run: Dockerfile in backend root

Environment Configuration

Required env vars (Production):

NODE_ENV=production
LLM_PROVIDER=anthropic
GCLOUD_PROJECT_ID=cim-summarizer
DOCUMENT_AI_PROCESSOR_ID=<processor-id>
GCS_BUCKET_NAME=<bucket-name>
DOCUMENT_AI_OUTPUT_BUCKET_NAME=<output-bucket>
SUPABASE_URL=https://<project>.supabase.co
SUPABASE_ANON_KEY=<anon-key>
SUPABASE_SERVICE_KEY=<service-key>
DATABASE_URL=postgresql://postgres:<password>@aws-0-us-central-1.pooler.supabase.com:6543/postgres
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
FIREBASE_PROJECT_ID=cim-summarizer

Optional env vars:

DOCUMENT_AI_LOCATION=us
VECTOR_PROVIDER=supabase
LLM_MODEL=claude-sonnet-4-20250514
LLM_MAX_TOKENS=16000
LLM_TEMPERATURE=0.1
OPENROUTER_API_KEY=<key>
OPENROUTER_USE_BYOK=true
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

Secrets location:

Development: .env file (gitignored, never committed)
Production: Firebase Functions secrets via firebase functions:secrets:set
Google Credentials: backend/serviceAccountKey.json for local dev, service account in Cloud Functions environment

Webhooks & Callbacks

Incoming:

No external webhooks currently configured
All document processing triggered by HTTP POST to POST /documents/upload

Outgoing:

No outgoing webhooks implemented
Document processing is synchronous (within 14-minute Cloud Function timeout) or async via job queue

Real-time Monitoring:

Server-Sent Events (SSE) not implemented
Polling endpoints for progress:
- GET /documents/{id}/progress - Document processing progress
- GET /documents/queue/status - Job queue status (frontend polls every 5 seconds)

Rate Limiting & Quotas

API Rate Limits:

Express rate limiter: 1000 requests per 15 minutes per IP
LLM provider limits: Anthropic limited to 1 concurrent call (application-level throttling)
OpenAI rate limits: Handled by SDK with backoff

File Upload Limits:

Max file size: 100MB (configurable via MAX_FILE_SIZE)
Allowed MIME types: application/pdf (configurable via ALLOWED_FILE_TYPES)

Network Configuration

CORS Origins (Allowed):

https://cim-summarizer.web.app (production)
https://cim-summarizer.firebaseapp.com (production)
http://localhost:3000 (development)
http://localhost:5173 (development)
https://localhost:3000 (SSL local dev)
https://localhost:5173 (SSL local dev)

Port Mappings:

Frontend dev: Port 5173 (Vite dev server)
Backend dev: Port 5001 (Firebase Functions emulator)
Backend API: Port 5000 (Express in standard deployment)
Vite proxy to backend: /api routes proxied from port 5173 to http://localhost:5000

Integration audit: 2026-02-24

9.8 KiB Raw Blame History