Files
cim_summary/.kiro/specs/codebase-cleanup-and-upload-fix/design.md
Jon 6057d1d7fd 🔧 Fix authentication and document upload issues
## What was done:
 Fixed Firebase Admin initialization to use default credentials for Firebase Functions
 Updated frontend to use correct Firebase Functions URL (was using Cloud Run URL)
 Added comprehensive debugging to authentication middleware
 Added debugging to file upload middleware and CORS handling
 Added debug buttons to frontend for troubleshooting authentication
 Enhanced error handling and logging throughout the stack

## Current issues:
 Document upload still returns 400 Bad Request despite authentication working
 GET requests work fine (200 OK) but POST upload requests fail
 Frontend authentication is working correctly (valid JWT tokens)
 Backend authentication middleware is working (rejects invalid tokens)
 CORS is configured correctly and allowing requests

## Root cause analysis:
- Authentication is NOT the issue (tokens are valid, GET requests work)
- The problem appears to be in the file upload handling or multer configuration
- Request reaches the server but fails during upload processing
- Need to identify exactly where in the upload pipeline the failure occurs

## TODO next steps:
1. 🔍 Check Firebase Functions logs after next upload attempt to see debugging output
2. 🔍 Verify if request reaches upload middleware (look for '�� Upload middleware called' logs)
3. 🔍 Check if file validation is triggered (look for '🔍 File filter called' logs)
4. 🔍 Identify specific error in upload pipeline (multer, file processing, etc.)
5. 🔍 Test with smaller file or different file type to isolate issue
6. 🔍 Check if issue is with Firebase Functions file size limits or timeout
7. 🔍 Verify multer configuration and file handling in Firebase Functions environment

## Technical details:
- Frontend: https://cim-summarizer.web.app
- Backend: https://us-central1-cim-summarizer.cloudfunctions.net/api
- Authentication: Firebase Auth with JWT tokens (working correctly)
- File upload: Multer with memory storage for immediate GCS upload
- Debug buttons available in production frontend for troubleshooting
2025-07-31 16:18:53 -04:00

9.5 KiB

Design Document

Overview

This design addresses the systematic cleanup of a document processing application that has accumulated technical debt during migration from local deployment to Firebase/GCloud infrastructure. The application currently suffers from configuration inconsistencies, redundant files, and document upload errors that need to be resolved through a structured cleanup and debugging approach.

Current Architecture Analysis

The application consists of:

  • Backend: Node.js/TypeScript API deployed on Google Cloud Run
  • Frontend: React/TypeScript SPA deployed on Firebase Hosting
  • Database: Supabase (PostgreSQL) for document metadata
  • Storage: Currently using local file storage (MUST migrate to GCS)
  • Processing: Document AI + Agentic RAG pipeline
  • Authentication: Firebase Auth

Key Issues Identified

  1. Configuration Drift: Multiple environment files with conflicting settings
  2. Local Dependencies: Still using local file storage and local PostgreSQL references (MUST use only Supabase)
  3. Upload Errors: Invalid UUID errors in document retrieval
  4. Deployment Complexity: Mixed local/cloud deployment artifacts
  5. Error Handling: Insufficient error logging and debugging capabilities
  6. Architecture Inconsistency: Local storage and database incompatible with cloud deployment

Architecture

Target Architecture

graph TB
    subgraph "Frontend (Firebase Hosting)"
        A[React App] --> B[Document Upload Component]
        B --> C[Auth Context]
    end
    
    subgraph "Backend (Cloud Run)"
        D[Express API] --> E[Document Controller]
        E --> F[Upload Middleware]
        F --> G[File Storage Service]
        G --> H[GCS Bucket]
        E --> I[Document Model]
        I --> J[Supabase DB]
    end
    
    subgraph "Processing Pipeline"
        K[Job Queue] --> L[Document AI]
        L --> M[Agentic RAG]
        M --> N[PDF Generation]
    end
    
    A --> D
    E --> K
    
    subgraph "Authentication"
        O[Firebase Auth] --> A
        O --> D
    end

Configuration Management Strategy

  1. Environment Separation: Clear distinction between development, staging, and production
  2. Service-Specific Configs: Separate Firebase, GCloud, and Supabase configurations
  3. Secret Management: Proper handling of API keys and service account credentials
  4. Deployment Consistency: Single deployment strategy per environment

Components and Interfaces

1. Configuration Cleanup Service

Purpose: Consolidate and standardize environment configurations

Interface:

interface ConfigurationService {
  validateEnvironment(): Promise<ValidationResult>;
  consolidateConfigs(): Promise<void>;
  removeRedundantFiles(): Promise<string[]>;
  updateDeploymentConfigs(): Promise<void>;
}

Responsibilities:

  • Remove duplicate/conflicting environment files
  • Standardize Firebase and GCloud configurations
  • Validate required environment variables
  • Update deployment scripts and configurations

2. Storage Migration Service

Purpose: Complete migration from local storage to Google Cloud Storage (no local storage going forward)

Interface:

interface StorageMigrationService {
  migrateExistingFiles(): Promise<MigrationResult>;
  replaceFileStorageService(): Promise<void>;
  validateGCSConfiguration(): Promise<boolean>;
  removeAllLocalStorageDependencies(): Promise<void>;
  updateDatabaseReferences(): Promise<void>;
}

Responsibilities:

  • Migrate ALL existing uploaded files to GCS
  • Completely replace file storage service to use ONLY GCS
  • Update all file path references in database to GCS URLs
  • Remove ALL local storage code and dependencies
  • Ensure no fallback to local storage exists

3. Upload Error Diagnostic Service

Purpose: Identify and resolve document upload errors

Interface:

interface UploadDiagnosticService {
  analyzeUploadErrors(): Promise<ErrorAnalysis>;
  validateUploadPipeline(): Promise<ValidationResult>;
  fixRouteHandling(): Promise<void>;
  improveErrorLogging(): Promise<void>;
}

Responsibilities:

  • Analyze current upload error patterns
  • Fix UUID validation issues in routes
  • Improve error handling and logging
  • Validate complete upload pipeline

4. Deployment Standardization Service

Purpose: Standardize deployment processes and remove legacy artifacts

Interface:

interface DeploymentService {
  standardizeDeploymentScripts(): Promise<void>;
  removeLocalDeploymentArtifacts(): Promise<string[]>;
  validateCloudDeployment(): Promise<ValidationResult>;
  updateDocumentation(): Promise<void>;
}

Responsibilities:

  • Remove local deployment scripts and configurations
  • Standardize Cloud Run and Firebase deployment
  • Update package.json scripts
  • Create deployment documentation

Data Models

Configuration Validation Model

interface ConfigValidation {
  environment: 'development' | 'staging' | 'production';
  requiredVars: string[];
  optionalVars: string[];
  conflicts: ConfigConflict[];
  missing: string[];
  status: 'valid' | 'invalid' | 'warning';
}

interface ConfigConflict {
  variable: string;
  values: string[];
  files: string[];
  resolution: string;
}

Migration Status Model

interface MigrationStatus {
  totalFiles: number;
  migratedFiles: number;
  failedFiles: FileError[];
  storageUsage: {
    local: number;
    cloud: number;
  };
  status: 'pending' | 'in-progress' | 'completed' | 'failed';
}

interface FileError {
  filePath: string;
  error: string;
  retryCount: number;
  lastAttempt: Date;
}

Upload Error Analysis Model

interface UploadErrorAnalysis {
  errorTypes: {
    [key: string]: {
      count: number;
      examples: string[];
      severity: 'low' | 'medium' | 'high';
    };
  };
  affectedRoutes: string[];
  timeRange: {
    start: Date;
    end: Date;
  };
  recommendations: string[];
}

Error Handling

Upload Error Resolution Strategy

  1. Route Parameter Validation: Fix UUID validation in document routes
  2. Error Logging Enhancement: Add structured logging with correlation IDs
  3. Graceful Degradation: Implement fallback mechanisms for upload failures
  4. User Feedback: Provide clear error messages to users

Configuration Error Handling

  1. Validation on Startup: Validate all configurations before service startup
  2. Fallback Configurations: Provide sensible defaults for non-critical settings
  3. Environment Detection: Automatically detect and configure for deployment environment
  4. Configuration Monitoring: Monitor configuration drift in production

Storage Error Handling

  1. Retry Logic: Implement exponential backoff for GCS operations
  2. Migration Safety: Backup existing files before migration, then remove local storage completely
  3. Integrity Checks: Validate file integrity after migration to GCS
  4. GCS-Only Operations: All storage operations must use GCS exclusively (no local fallbacks)

Testing Strategy

Configuration Testing

  1. Environment Validation Tests: Verify all required configurations are present
  2. Configuration Conflict Tests: Detect and report configuration conflicts
  3. Deployment Tests: Validate deployment configurations work correctly
  4. Integration Tests: Test configuration changes don't break existing functionality

Upload Pipeline Testing

  1. Unit Tests: Test individual upload components
  2. Integration Tests: Test complete upload pipeline
  3. Error Scenario Tests: Test various error conditions and recovery
  4. Performance Tests: Validate upload performance after changes

Storage Migration Testing

  1. Migration Tests: Test file migration process
  2. Data Integrity Tests: Verify files are correctly migrated
  3. Rollback Tests: Test ability to rollback migration
  4. Performance Tests: Compare storage performance before/after migration

End-to-End Testing

  1. User Journey Tests: Test complete user upload journey
  2. Cross-Environment Tests: Verify functionality across all environments
  3. Regression Tests: Ensure cleanup doesn't break existing features
  4. Load Tests: Validate system performance under load

Implementation Phases

Phase 1: Analysis and Planning

  • Audit current configuration files and identify conflicts
  • Analyze upload error patterns and root causes
  • Document current deployment process and identify issues
  • Create detailed cleanup and migration plan

Phase 2: Configuration Cleanup

  • Remove redundant and conflicting configuration files
  • Standardize environment variable naming and structure
  • Update deployment configurations for consistency
  • Validate configurations across all environments

Phase 3: Storage Migration

  • Implement Google Cloud Storage integration
  • Migrate existing files from local storage to GCS
  • Update file storage service and database references
  • Test and validate storage functionality

Phase 4: Upload Error Resolution

  • Fix UUID validation issues in document routes
  • Improve error handling and logging throughout upload pipeline
  • Implement better user feedback for upload errors
  • Add monitoring and alerting for upload failures

Phase 5: Deployment Standardization

  • Remove local deployment artifacts and scripts
  • Standardize Cloud Run and Firebase deployment processes
  • Update documentation and deployment guides
  • Implement automated deployment validation

Phase 6: Testing and Validation

  • Comprehensive testing of all changes
  • Performance validation and optimization
  • User acceptance testing
  • Production deployment and monitoring