Files
cim_summary/DEPENDENCY_ANALYSIS_REPORT.md
2025-08-01 15:46:43 -04:00

12 KiB

Dependency Analysis Report - CIM Document Processor

Executive Summary

This report analyzes the dependencies in both backend and frontend packages to identify:

  • Unused dependencies that can be removed
  • Outdated packages that should be updated
  • Consolidation opportunities
  • Dependencies that are actually being used vs. placeholder implementations

Backend Dependencies Analysis

Core Dependencies (Actively Used)

Essential Dependencies

  • express - Main web framework
  • cors - CORS middleware
  • helmet - Security middleware
  • morgan - HTTP request logging
  • express-rate-limit - Rate limiting
  • dotenv - Environment variable management
  • winston - Logging framework
  • @supabase/supabase-js - Database client
  • @google-cloud/storage - Google Cloud Storage
  • @google-cloud/documentai - Document AI processing
  • @anthropic-ai/sdk - Claude AI integration
  • openai - OpenAI integration
  • puppeteer - PDF generation
  • uuid - UUID generation
  • axios - HTTP client

Conditionally Used Dependencies

  • bcryptjs - Used in auth.ts and seed.ts (legacy auth system)
  • jsonwebtoken - Used in auth.ts (legacy JWT system)
  • joi - Used for environment validation and middleware validation
  • zod - Used in llmSchemas.ts and llmService.ts for schema validation
  • multer - Used in upload middleware (legacy multipart upload)
  • pdf-parse - Used in documentAiProcessor.ts (Document AI fallback)

⚠️ Potentially Unused Dependencies

  • redis - Only imported in sessionService.ts but may not be actively used
  • pg - PostgreSQL client (may be redundant with Supabase)

Development Dependencies (Actively Used)

Essential Dev Dependencies

  • typescript - TypeScript compiler
  • ts-node-dev - Development server
  • jest - Testing framework
  • supertest - API testing
  • @types/* - TypeScript type definitions
  • eslint - Code linting
  • @typescript-eslint/* - TypeScript ESLint rules

Unused Dependencies Analysis

Confirmed Unused

None identified - all dependencies appear to be used somewhere in the codebase.

⚠️ Potentially Redundant

  1. Validation Libraries: Both joi and zod are used for validation

    • joi: Environment validation, middleware validation
    • zod: LLM schemas, service validation
    • Recommendation: Consider consolidating to just zod for consistency
  2. Database Clients: Both pg and @supabase/supabase-js

    • pg: Direct PostgreSQL client
    • @supabase/supabase-js: Supabase client (includes PostgreSQL)
    • Recommendation: Remove pg if only using Supabase
  3. Authentication: Both bcryptjs/jsonwebtoken and Firebase Auth

    • Legacy JWT system vs. Firebase Authentication
    • Recommendation: Remove legacy auth dependencies if fully migrated to Firebase

Frontend Dependencies Analysis

Core Dependencies (Actively Used)

Essential Dependencies

  • react - React framework
  • react-dom - React DOM rendering
  • react-router-dom - Client-side routing
  • axios - HTTP client for API calls
  • firebase - Firebase Authentication
  • lucide-react - Icon library (used in 6 components)
  • react-dropzone - File upload component

Unused Dependencies

  • clsx - Not imported anywhere
  • tailwind-merge - Not imported anywhere

Development Dependencies (Actively Used)

Essential Dev Dependencies

  • typescript - TypeScript compiler
  • vite - Build tool and dev server
  • @vitejs/plugin-react - React plugin for Vite
  • tailwindcss - CSS framework
  • postcss - CSS processing
  • autoprefixer - CSS vendor prefixing
  • eslint - Code linting
  • @typescript-eslint/* - TypeScript ESLint rules
  • vitest - Testing framework
  • @testing-library/* - React testing utilities

Processing Strategy Analysis

Current Active Strategy

Based on the code analysis, the current processing strategy is:

  • Primary: optimized_agentic_rag (most actively used)
  • Fallback: document_ai_agentic_rag (Document AI + Agentic RAG)

Unused Processing Strategies

The following strategies are implemented but not actively used:

  1. chunking - Legacy chunking strategy
  2. rag - Basic RAG strategy
  3. agentic_rag - Basic agentic RAG (superseded by optimized version)

Services Analysis

Actively Used Services

  • unifiedDocumentProcessor - Main orchestrator
  • optimizedAgenticRAGProcessor - Core AI processing
  • llmService - LLM interactions
  • pdfGenerationService - PDF generation
  • fileStorageService - GCS operations
  • uploadMonitoringService - Real-time tracking
  • sessionService - Session management
  • jobQueueService - Background processing

⚠️ Legacy Services (Can be removed)

  • documentProcessingService - Legacy chunking service
  • documentAiProcessor - Document AI + Agentic RAG processor
  • ragDocumentProcessor - Basic RAG processor

Outdated Packages Analysis

Backend Outdated Packages

  • @types/express: 4.17.23 → 5.0.3 (major version update)
  • @types/jest: 29.5.14 → 30.0.0 (major version update)
  • @types/multer: 1.4.13 → 2.0.0 (major version update)
  • @types/node: 20.19.9 → 24.1.0 (major version update)
  • @types/pg: 8.15.4 → 8.15.5 (patch update)
  • @types/supertest: 2.0.16 → 6.0.3 (major version update)
  • @typescript-eslint/*: 6.21.0 → 8.38.0 (major version update)
  • bcryptjs: 2.4.3 → 3.0.2 (major version update)
  • dotenv: 16.6.1 → 17.2.1 (major version update)
  • eslint: 8.57.1 → 9.32.0 (major version update)
  • express: 4.21.2 → 5.1.0 (major version update)
  • express-rate-limit: 7.5.1 → 8.0.1 (major version update)
  • helmet: 7.2.0 → 8.1.0 (major version update)
  • jest: 29.7.0 → 30.0.5 (major version update)
  • multer: 1.4.5-lts.2 → 2.0.2 (major version update)
  • openai: 5.10.2 → 5.11.0 (minor update)
  • puppeteer: 21.11.0 → 24.15.0 (major version update)
  • redis: 4.7.1 → 5.7.0 (major version update)
  • supertest: 6.3.4 → 7.1.4 (major version update)
  • typescript: 5.8.3 → 5.9.2 (minor update)
  • zod: 3.25.76 → 4.0.14 (major version update)

Frontend Outdated Packages

  • @testing-library/jest-dom: 6.6.3 → 6.6.4 (patch update)
  • @testing-library/react: 13.4.0 → 16.3.0 (major version update)
  • @types/react: 18.3.23 → 19.1.9 (major version update)
  • @types/react-dom: 18.3.7 → 19.1.7 (major version update)
  • @typescript-eslint/*: 6.21.0 → 8.38.0 (major version update)
  • eslint: 8.57.1 → 9.32.0 (major version update)
  • eslint-plugin-react-hooks: 4.6.2 → 5.2.0 (major version update)
  • lucide-react: 0.294.0 → 0.536.0 (major version update)
  • react: 18.3.1 → 19.1.1 (major version update)
  • react-dom: 18.3.1 → 19.1.1 (major version update)
  • react-router-dom: 6.30.1 → 7.7.1 (major version update)
  • tailwind-merge: 2.6.0 → 3.3.1 (major version update)
  • tailwindcss: 3.4.17 → 4.1.11 (major version update)
  • typescript: 5.8.3 → 5.9.2 (minor update)
  • vite: 4.5.14 → 7.0.6 (major version update)
  • vitest: 0.34.6 → 3.2.4 (major version update)

Update Strategy

⚠️ Warning: Many packages have major version updates that may include breaking changes. Update strategy:

  1. Immediate Updates (Low Risk):

    • @types/pg: 8.15.4 → 8.15.5 (patch update)
    • openai: 5.10.2 → 5.11.0 (minor update)
    • typescript: 5.8.3 → 5.9.2 (minor update)
    • @testing-library/jest-dom: 6.6.3 → 6.6.4 (patch update)
  2. Major Version Updates (Require Testing):

    • React ecosystem updates (React 18 → 19)
    • Express updates (Express 4 → 5)
    • Testing framework updates (Jest 29 → 30, Vitest 0.34 → 3.2)
    • Build tool updates (Vite 4 → 7)
  3. Recommendation: Update major versions after dependency cleanup to minimize risk

Recommendations

Phase 1: Immediate Cleanup (Low Risk)

Backend

  1. Remove unused frontend dependencies:

    npm uninstall clsx tailwind-merge
    
  2. Consolidate validation libraries:

    • Migrate from joi to zod for consistency
    • Remove joi dependency
  3. Remove legacy auth dependencies (if Firebase auth is fully implemented):

    npm uninstall bcryptjs jsonwebtoken
    npm uninstall @types/bcryptjs @types/jsonwebtoken
    

Frontend

  1. Remove unused dependencies:
    npm uninstall clsx tailwind-merge
    

Phase 2: Service Consolidation (Medium Risk)

  1. Remove legacy processing services:

    • documentProcessingService.ts
    • documentAiProcessor.ts
    • ragDocumentProcessor.ts
  2. Simplify unifiedDocumentProcessor:

    • Remove unused strategy methods
    • Keep only optimized_agentic_rag strategy
  3. Remove unused database client:

    • Remove pg if only using Supabase

Phase 3: Configuration Cleanup (Low Risk)

  1. Remove unused environment variables:

    • Legacy auth configuration
    • Unused processing strategy configs
    • Unused LLM configurations
  2. Update configuration validation:

    • Remove validation for unused configs
    • Simplify environment schema

Phase 4: Route Cleanup (Medium Risk)

  1. Remove legacy upload endpoints:

    • Keep only /upload-url and /confirm-upload
    • Remove multipart upload endpoints
  2. Remove unused analytics endpoints:

    • Keep only actively used monitoring endpoints

Impact Assessment

Risk Levels

  • Low Risk: Removing unused dependencies, updating packages
  • Medium Risk: Removing legacy services, consolidating routes
  • High Risk: Changing core processing logic

Testing Requirements

  • Unit tests for all active services
  • Integration tests for upload flow
  • End-to-end tests for document processing
  • Performance testing for optimized agentic RAG

Rollback Plan

  • Keep backup of removed files for 1-2 weeks
  • Maintain feature flags for major changes
  • Document all changes for easy rollback

Next Steps

  1. Start with Phase 1 (unused dependencies)
  2. Test thoroughly after each phase
  3. Document changes for team reference
  4. Update deployment scripts if needed
  5. Monitor performance after cleanup

Estimated Savings

Bundle Size Reduction

  • Frontend: ~50KB (removing unused dependencies)
  • Backend: ~200KB (removing legacy services and dependencies)

Maintenance Reduction

  • Fewer dependencies to maintain and update
  • Simplified codebase with fewer moving parts
  • Reduced security vulnerabilities from unused packages

Performance Improvement

  • Faster builds with fewer dependencies
  • Reduced memory usage from removed services
  • Simplified deployment with fewer configuration options

Summary

Key Findings

  1. Unused Dependencies: 2 frontend dependencies (clsx, tailwind-merge) are completely unused
  2. Legacy Services: 2 processing services can be removed (documentProcessingService, ragDocumentProcessor)
  3. Redundant Dependencies: Both joi and zod for validation, both pg and Supabase for database
  4. Outdated Packages: 21 backend and 15 frontend packages have updates available
  5. Major Version Updates: Many packages require major version updates with potential breaking changes

Immediate Actions (Step 2 Complete)

  1. Dependency Analysis Complete - All dependencies mapped and usage identified
  2. Outdated Packages Identified - Version updates documented with risk assessment
  3. Cleanup Strategy Defined - Phased approach with risk levels assigned
  4. Impact Assessment Complete - Bundle size and maintenance savings estimated

Next Steps (Step 3 - Service Layer Consolidation)

  1. Remove unused frontend dependencies (clsx, tailwind-merge)
  2. Remove legacy processing services
  3. Consolidate validation libraries (migrate from joi to zod)
  4. Remove redundant database client (pg if only using Supabase)
  5. Update low-risk package versions

Risk Assessment

  • Low Risk: Removing unused dependencies, updating minor/patch versions
  • Medium Risk: Removing legacy services, consolidating libraries
  • High Risk: Major version updates, core processing logic changes

This dependency analysis provides a clear roadmap for cleaning up the codebase while maintaining functionality and minimizing risk.