Adds a Firebase scheduled function (sendWeeklyReport) that runs every Thursday at 12:00 America/New_York and emails a CSV attachment to the four BluePoint Capital recipients. The CSV covers all completed documents from the past 7 days with: Date Processed, Company Name, Core Operations Summary, Geography, Deal Source, Industry/Sector, Stated Reason for Sale, LTM Revenue, and LTM EBITDA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CIM Document Processor - AI-Powered CIM Analysis System
🎯 Project Overview
Purpose: Automated processing and analysis of Confidential Information Memorandums (CIMs) using AI-powered document understanding and structured data extraction.
Core Technology Stack:
- Frontend: React + TypeScript + Vite
- Backend: Node.js + Express + TypeScript
- Database: Supabase (PostgreSQL) + Vector Database
- AI Services: Google Document AI + Claude AI + OpenAI
- Storage: Google Cloud Storage
- Authentication: Firebase Auth
🏗️ Architecture Summary
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ External │
│ (React) │◄──►│ (Node.js) │◄──►│ Services │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Database │ │ Google Cloud │
│ (Supabase) │ │ Services │
└─────────────────┘ └─────────────────┘
📁 Key Directories & Files
Core Application
frontend/src/- React frontend applicationbackend/src/- Node.js backend servicesbackend/src/services/- Core business logic servicesbackend/src/models/- Database models and typesbackend/src/routes/- API route definitions
Documentation
APP_DESIGN_DOCUMENTATION.md- Complete system architectureAGENTIC_RAG_IMPLEMENTATION_PLAN.md- AI processing strategyPDF_GENERATION_ANALYSIS.md- PDF generation optimizationDEPLOYMENT_GUIDE.md- Deployment instructionsARCHITECTURE_DIAGRAMS.md- Visual architecture documentation
Configuration
backend/src/config/- Environment and service configurationfrontend/src/config/- Frontend configurationbackend/scripts/- Setup and utility scripts
🚀 Quick Start
Prerequisites
- Node.js 18+
- Google Cloud Platform account
- Supabase account
- Firebase project
Environment Setup
# Backend
cd backend
npm install
cp .env.example .env
# Configure environment variables
# Frontend
cd frontend
npm install
cp .env.example .env
# Configure environment variables
Development
# Backend (port 5001)
cd backend && npm run dev
# Frontend (port 5173)
cd frontend && npm run dev
🔧 Core Services
1. Document Processing Pipeline
- unifiedDocumentProcessor.ts - Main orchestrator
- optimizedAgenticRAGProcessor.ts - AI-powered analysis
- documentAiProcessor.ts - Google Document AI integration
- llmService.ts - LLM interactions (Claude AI/OpenAI)
2. File Management
- fileStorageService.ts - Google Cloud Storage operations
- pdfGenerationService.ts - PDF report generation
- uploadMonitoringService.ts - Real-time upload tracking
3. Data Management
- agenticRAGDatabaseService.ts - Analytics and session management
- vectorDatabaseService.ts - Vector embeddings and search
- sessionService.ts - User session management
📊 Processing Strategies
Current Active Strategy: Optimized Agentic RAG
- Text Extraction - Google Document AI extracts text from PDF
- Semantic Chunking - Split text into 4000-char chunks with overlap
- Vector Embedding - Generate embeddings for each chunk
- LLM Analysis - Claude AI analyzes chunks and generates structured data
- PDF Generation - Create summary PDF with analysis results
Output Format
Structured CIM Review data including:
- Deal Overview
- Business Description
- Market Analysis
- Financial Summary
- Management Team
- Investment Thesis
- Key Questions & Next Steps
🔌 API Endpoints
Document Management
POST /documents/upload-url- Get signed upload URLPOST /documents/:id/confirm-upload- Confirm upload and start processingPOST /documents/:id/process-optimized-agentic-rag- Trigger AI processingGET /documents/:id/download- Download processed PDFDELETE /documents/:id- Delete document
Analytics & Monitoring
GET /documents/analytics- Get processing analyticsGET /documents/processing-stats- Get processing statisticsGET /documents/:id/agentic-rag-sessions- Get processing sessionsGET /monitoring/upload-metrics- Get upload metricsGET /monitoring/upload-health- Get upload health statusGET /monitoring/real-time-stats- Get real-time statisticsGET /vector/stats- Get vector database statistics
🗄️ Database Schema
Core Tables
- documents - Document metadata and processing status
- agentic_rag_sessions - AI processing session tracking
- document_chunks - Vector embeddings and chunk data
- processing_jobs - Background job management
- users - User authentication and profiles
🔐 Security
- Firebase Authentication with JWT validation
- Protected API endpoints with user-specific data isolation
- Signed URLs for secure file uploads
- Rate limiting and input validation
- CORS configuration for cross-origin requests
📈 Performance & Monitoring
Real-time Monitoring
- Upload progress tracking
- Processing status updates
- Error rate monitoring
- Performance metrics
- API usage tracking
- Cost monitoring
Analytics Dashboard
- Processing success rates
- Average processing times
- API usage statistics
- Cost tracking
- User activity metrics
- Error analysis reports
🚨 Error Handling
Frontend Error Handling
- Network errors with automatic retry
- Authentication errors with token refresh
- Upload errors with user-friendly messages
- Processing errors with real-time display
Backend Error Handling
- Validation errors with detailed messages
- Processing errors with graceful degradation
- Storage errors with retry logic
- Database errors with connection pooling
- LLM API errors with exponential backoff
🧪 Testing
Test Structure
- Unit Tests: Jest for backend, Vitest for frontend
- Integration Tests: End-to-end testing
- API Tests: Supertest for backend endpoints
Test Coverage
- Service layer testing
- API endpoint testing
- Error handling scenarios
- Performance testing
- Security testing
📚 Documentation Index
Technical Documentation
- Application Design Documentation - Complete system architecture
- Agentic RAG Implementation Plan - AI processing strategy
- PDF Generation Analysis - PDF optimization details
- Architecture Diagrams - Visual system design
- Deployment Guide - Deployment instructions
Analysis Reports
- Codebase Audit Report - Code quality analysis
- Dependency Analysis Report - Dependency management
- Document AI Integration Summary - Google Document AI setup
🤝 Contributing
Development Workflow
- Create feature branch from main
- Implement changes with tests
- Update documentation
- Submit pull request
- Code review and approval
- Merge to main
Code Standards
- TypeScript for type safety
- ESLint for code quality
- Prettier for formatting
- Jest for testing
- Conventional commits for version control
📞 Support
Common Issues
- Upload Failures - Check GCS permissions and bucket configuration
- Processing Timeouts - Increase timeout limits for large documents
- Memory Issues - Monitor memory usage and adjust batch sizes
- API Quotas - Check API usage and implement rate limiting
- PDF Generation Failures - Check Puppeteer installation and memory
- LLM API Errors - Verify API keys and check rate limits
Debug Tools
- Real-time logging with correlation IDs
- Upload monitoring dashboard
- Processing session details
- Error analysis reports
- Performance metrics dashboard
📄 License
This project is proprietary software developed for BPCP. All rights reserved.
Last Updated: December 2024 Version: 1.0.0 Status: Production Ready
Description
Languages
TypeScript
92.2%
JavaScript
3.7%
PLpgSQL
3.1%
Shell
1%