# Configuration Guide ## Complete Environment Setup and Configuration for CIM Document Processor ### ๐ŸŽฏ Overview This guide provides comprehensive configuration instructions for setting up the CIM Document Processor in development, staging, and production environments. --- ## ๐Ÿ”ง Environment Variables ### Required Environment Variables #### Google Cloud Configuration ```bash # Google Cloud Project GCLOUD_PROJECT_ID=your-project-id # Google Cloud Storage GCS_BUCKET_NAME=your-storage-bucket DOCUMENT_AI_OUTPUT_BUCKET_NAME=your-document-ai-bucket # Document AI Configuration DOCUMENT_AI_LOCATION=us DOCUMENT_AI_PROCESSOR_ID=your-processor-id # Service Account (leave blank if using Firebase Functions secrets / ADC) GOOGLE_APPLICATION_CREDENTIALS= ``` #### Supabase Configuration ```bash # Supabase Project SUPABASE_URL=https://your-project.supabase.co SUPABASE_ANON_KEY=your-anon-key SUPABASE_SERVICE_KEY=your-service-key ``` #### LLM Configuration ```bash # LLM Provider Selection LLM_PROVIDER=anthropic # or 'openai' # Anthropic (Claude AI) ANTHROPIC_API_KEY=your-anthropic-key # OpenAI (Alternative) OPENAI_API_KEY=your-openai-key # LLM Settings LLM_MODEL=gpt-4 # or 'claude-3-opus-20240229' LLM_MAX_TOKENS=3500 LLM_TEMPERATURE=0.1 LLM_PROMPT_BUFFER=500 ``` #### Firebase Configuration ```bash # Firebase Project FB_PROJECT_ID=your-firebase-project FB_STORAGE_BUCKET=your-firebase-bucket FB_API_KEY=your-firebase-api-key FB_AUTH_DOMAIN=your-project.firebaseapp.com ``` ### Optional Environment Variables #### Vector Database Configuration ```bash # Vector Provider VECTOR_PROVIDER=supabase # or 'pinecone' # Pinecone (if using Pinecone) PINECONE_API_KEY=your-pinecone-key PINECONE_INDEX=your-pinecone-index ``` #### Security Configuration ```bash # JWT Configuration JWT_SECRET=your-jwt-secret JWT_EXPIRES_IN=1h JWT_REFRESH_SECRET=your-refresh-secret JWT_REFRESH_EXPIRES_IN=7d # Rate Limiting RATE_LIMIT_WINDOW_MS=900000 # 15 minutes RATE_LIMIT_MAX_REQUESTS=100 ``` #### File Upload Configuration ```bash # File Limits MAX_FILE_SIZE=104857600 # 100MB ALLOWED_FILE_TYPES=application/pdf # Security BCRYPT_ROUNDS=12 ``` #### Logging Configuration ```bash # Logging LOG_LEVEL=info # error, warn, info, debug LOG_FILE=logs/app.log ``` #### Agentic RAG Configuration ```bash # Agentic RAG Settings AGENTIC_RAG_ENABLED=true AGENTIC_RAG_MAX_AGENTS=6 AGENTIC_RAG_PARALLEL_PROCESSING=true AGENTIC_RAG_VALIDATION_STRICT=true AGENTIC_RAG_RETRY_ATTEMPTS=3 AGENTIC_RAG_TIMEOUT_PER_AGENT=60000 ``` --- ## ๐Ÿš€ Environment Setup ### Development Environment #### 1. Clone Repository ```bash git clone cd cim_summary ``` #### 2. Install Dependencies ```bash # Backend dependencies cd backend npm install # Frontend dependencies cd ../frontend npm install ``` #### 3. Environment Configuration ```bash # Backend environment cd backend cp .env.example .env # Edit .env with your configuration # Frontend environment cd ../frontend cp .env.example .env # Edit .env with your configuration ``` #### 4. Google Cloud Setup ```bash # Install Google Cloud SDK curl https://sdk.cloud.google.com | bash exec -l $SHELL # Authenticate with Google Cloud gcloud auth login gcloud config set project YOUR_PROJECT_ID # Enable required APIs gcloud services enable documentai.googleapis.com gcloud services enable storage.googleapis.com gcloud services enable cloudfunctions.googleapis.com # Create service account gcloud iam service-accounts create cim-processor \ --display-name="CIM Document Processor" # Download service account key gcloud iam service-accounts keys create serviceAccountKey.json \ --iam-account=cim-processor@YOUR_PROJECT_ID.iam.gserviceaccount.com ``` #### 5. Supabase Setup ```bash # Install Supabase CLI npm install -g supabase # Login to Supabase supabase login # Initialize Supabase project supabase init # Link to your Supabase project supabase link --project-ref YOUR_PROJECT_REF ``` #### 6. Firebase Setup ```bash # Install Firebase CLI npm install -g firebase-tools # Login to Firebase firebase login # Initialize Firebase project firebase init # Select your project firebase use YOUR_PROJECT_ID ``` ##### Configure Google credentials via Firebase Functions secrets ```bash # Store the full service account JSON as a secret (never commit it to the repo) firebase functions:secrets:set FIREBASE_SERVICE_ACCOUNT --data-file=/path/to/serviceAccountKey.json ``` > When deploying Functions v2, add `FIREBASE_SERVICE_ACCOUNT` to your function's `secrets` array. The backend automatically reads this JSON from `process.env.FIREBASE_SERVICE_ACCOUNT`, so `GOOGLE_APPLICATION_CREDENTIALS` can remain blank and no local file is required. For local development, you can still set `GOOGLE_APPLICATION_CREDENTIALS=/abs/path/to/key.json` if needed. ### Production Environment #### 1. Environment Variables ```bash # Production environment variables NODE_ENV=production PORT=5001 # Ensure all required variables are set GCLOUD_PROJECT_ID=your-production-project SUPABASE_URL=https://your-production-project.supabase.co ANTHROPIC_API_KEY=your-production-anthropic-key ``` #### 2. Security Configuration ```bash # Use strong secrets in production JWT_SECRET=your-very-strong-jwt-secret JWT_REFRESH_SECRET=your-very-strong-refresh-secret # Enable strict validation AGENTIC_RAG_VALIDATION_STRICT=true ``` #### 3. Monitoring Configuration ```bash # Enable detailed logging LOG_LEVEL=info LOG_FILE=/var/log/cim-processor/app.log # Set appropriate rate limits RATE_LIMIT_MAX_REQUESTS=50 ``` --- ## ๐Ÿ” Configuration Validation ### Validation Script ```bash # Run configuration validation cd backend npm run validate-config ``` ### Configuration Health Check ```typescript // Configuration validation function export const validateConfiguration = () => { const errors: string[] = []; // Check required environment variables if (!process.env.GCLOUD_PROJECT_ID) { errors.push('GCLOUD_PROJECT_ID is required'); } if (!process.env.SUPABASE_URL) { errors.push('SUPABASE_URL is required'); } if (!process.env.ANTHROPIC_API_KEY && !process.env.OPENAI_API_KEY) { errors.push('Either ANTHROPIC_API_KEY or OPENAI_API_KEY is required'); } // Check file size limits const maxFileSize = parseInt(process.env.MAX_FILE_SIZE || '104857600'); if (maxFileSize > 104857600) { errors.push('MAX_FILE_SIZE cannot exceed 100MB'); } return { isValid: errors.length === 0, errors }; }; ``` ### Health Check Endpoint ```bash # Check configuration health curl -X GET http://localhost:5001/api/health/config \ -H "Authorization: Bearer " ``` --- ## ๐Ÿ” Security Configuration ### Authentication Setup #### Firebase Authentication ```typescript // Firebase configuration const firebaseConfig = { apiKey: process.env.FB_API_KEY, authDomain: process.env.FB_AUTH_DOMAIN, projectId: process.env.FB_PROJECT_ID, storageBucket: process.env.FB_STORAGE_BUCKET, messagingSenderId: process.env.FB_MESSAGING_SENDER_ID, appId: process.env.FB_APP_ID }; ``` #### JWT Configuration ```typescript // JWT settings const jwtConfig = { secret: process.env.JWT_SECRET || 'default-secret', expiresIn: process.env.JWT_EXPIRES_IN || '1h', refreshSecret: process.env.JWT_REFRESH_SECRET || 'default-refresh-secret', refreshExpiresIn: process.env.JWT_REFRESH_EXPIRES_IN || '7d' }; ``` ### Rate Limiting ```typescript // Rate limiting configuration const rateLimitConfig = { windowMs: parseInt(process.env.RATE_LIMIT_WINDOW_MS || '900000'), max: parseInt(process.env.RATE_LIMIT_MAX_REQUESTS || '100'), message: 'Too many requests from this IP' }; ``` ### CORS Configuration ```typescript // CORS settings const corsConfig = { origin: process.env.ALLOWED_ORIGINS?.split(',') || ['http://localhost:3000'], credentials: true, methods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS'], allowedHeaders: ['Content-Type', 'Authorization'] }; ``` --- ## ๐Ÿ“Š Performance Configuration ### Memory and CPU Limits ```bash # Node.js memory limits NODE_OPTIONS="--max-old-space-size=2048" # Process limits PM2_MAX_MEMORY_RESTART=2G PM2_INSTANCES=4 ``` ### Database Connection Pooling ```typescript // Database connection settings const dbConfig = { pool: { min: 2, max: 10, acquireTimeoutMillis: 30000, createTimeoutMillis: 30000, destroyTimeoutMillis: 5000, idleTimeoutMillis: 30000, reapIntervalMillis: 1000, createRetryIntervalMillis: 100 } }; ``` ### Caching Configuration ```typescript // Cache settings const cacheConfig = { ttl: 300000, // 5 minutes maxSize: 100, checkPeriod: 60000 // 1 minute }; ``` --- ## ๐Ÿงช Testing Configuration ### Test Environment Variables ```bash # Test environment NODE_ENV=test TEST_DATABASE_URL=postgresql://test:test@localhost:5432/cim_test TEST_GCLOUD_PROJECT_ID=test-project TEST_ANTHROPIC_API_KEY=test-key ``` ### Test Configuration ```typescript // Test settings const testConfig = { timeout: 30000, retries: 3, parallel: true, coverage: { threshold: { global: { branches: 80, functions: 80, lines: 80, statements: 80 } } } }; ``` --- ## ๐Ÿ”„ Environment-Specific Configurations ### Development ```bash # Development settings NODE_ENV=development LOG_LEVEL=debug AGENTIC_RAG_VALIDATION_STRICT=false RATE_LIMIT_MAX_REQUESTS=1000 ``` ### Staging ```bash # Staging settings NODE_ENV=staging LOG_LEVEL=info AGENTIC_RAG_VALIDATION_STRICT=true RATE_LIMIT_MAX_REQUESTS=100 ``` ### Production ```bash # Production settings NODE_ENV=production LOG_LEVEL=warn AGENTIC_RAG_VALIDATION_STRICT=true RATE_LIMIT_MAX_REQUESTS=50 ``` --- ## ๐Ÿ“‹ Configuration Checklist ### Pre-Deployment Checklist - [ ] All required environment variables are set - [ ] Google Cloud APIs are enabled - [ ] Service account has proper permissions - [ ] Supabase project is configured - [ ] Firebase project is set up - [ ] LLM API keys are valid - [ ] Database migrations are run - [ ] File storage buckets are created - [ ] CORS is properly configured - [ ] Rate limiting is configured - [ ] Logging is set up - [ ] Monitoring is configured ### Security Checklist - [ ] JWT secrets are strong and unique - [ ] API keys are properly secured - [ ] CORS origins are restricted - [ ] Rate limiting is enabled - [ ] Input validation is configured - [ ] Error messages don't leak sensitive information - [ ] HTTPS is enabled in production - [ ] Service account permissions are minimal ### Performance Checklist - [ ] Database connection pooling is configured - [ ] Caching is enabled - [ ] Memory limits are set - [ ] Process limits are configured - [ ] Monitoring is set up - [ ] Log rotation is configured - [ ] Backup procedures are in place --- ## ๐Ÿšจ Troubleshooting ### Common Configuration Issues #### Missing Environment Variables ```bash # Check for missing variables npm run check-env ``` #### Google Cloud Authentication ```bash # Verify authentication gcloud auth list gcloud config list ``` #### Database Connection ```bash # Test database connection npm run test-db ``` #### API Key Validation ```bash # Test API keys npm run test-apis ``` ### Configuration Debugging ```typescript // Debug configuration export const debugConfiguration = () => { console.log('Environment:', process.env.NODE_ENV); console.log('Google Cloud Project:', process.env.GCLOUD_PROJECT_ID); console.log('Supabase URL:', process.env.SUPABASE_URL); console.log('LLM Provider:', process.env.LLM_PROVIDER); console.log('Agentic RAG Enabled:', process.env.AGENTIC_RAG_ENABLED); }; ``` --- This comprehensive configuration guide ensures proper setup and configuration of the CIM Document Processor across all environments.