admin a44b90f307 feat: add weekly CIM report email every Thursday at noon ET
Adds a Firebase scheduled function (sendWeeklyReport) that runs every
Thursday at 12:00 America/New_York and emails a CSV attachment to the
four BluePoint Capital recipients. The CSV covers all completed documents
from the past 7 days with: Date Processed, Company Name, Core Operations
Summary, Geography, Deal Source, Industry/Sector, Stated Reason for Sale,
LTM Revenue, and LTM EBITDA.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 13:23:31 -05:00
2025-08-01 15:46:43 -04:00
2025-08-01 15:46:43 -04:00

CIM Document Processor - AI-Powered CIM Analysis System

🎯 Project Overview

Purpose: Automated processing and analysis of Confidential Information Memorandums (CIMs) using AI-powered document understanding and structured data extraction.

Core Technology Stack:

  • Frontend: React + TypeScript + Vite
  • Backend: Node.js + Express + TypeScript
  • Database: Supabase (PostgreSQL) + Vector Database
  • AI Services: Google Document AI + Claude AI + OpenAI
  • Storage: Google Cloud Storage
  • Authentication: Firebase Auth

🏗️ Architecture Summary

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend       │    │   External      │
│   (React)       │◄──►│   (Node.js)     │◄──►│   Services      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │                        │
                              ▼                        ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │   Database      │    │   Google Cloud  │
                       │   (Supabase)    │    │   Services      │
                       └─────────────────┘    └─────────────────┘

📁 Key Directories & Files

Core Application

  • frontend/src/ - React frontend application
  • backend/src/ - Node.js backend services
  • backend/src/services/ - Core business logic services
  • backend/src/models/ - Database models and types
  • backend/src/routes/ - API route definitions

Documentation

  • APP_DESIGN_DOCUMENTATION.md - Complete system architecture
  • AGENTIC_RAG_IMPLEMENTATION_PLAN.md - AI processing strategy
  • PDF_GENERATION_ANALYSIS.md - PDF generation optimization
  • DEPLOYMENT_GUIDE.md - Deployment instructions
  • ARCHITECTURE_DIAGRAMS.md - Visual architecture documentation

Configuration

  • backend/src/config/ - Environment and service configuration
  • frontend/src/config/ - Frontend configuration
  • backend/scripts/ - Setup and utility scripts

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • Google Cloud Platform account
  • Supabase account
  • Firebase project

Environment Setup

# Backend
cd backend
npm install
cp .env.example .env
# Configure environment variables

# Frontend
cd frontend
npm install
cp .env.example .env
# Configure environment variables

Development

# Backend (port 5001)
cd backend && npm run dev

# Frontend (port 5173)
cd frontend && npm run dev

🔧 Core Services

1. Document Processing Pipeline

  • unifiedDocumentProcessor.ts - Main orchestrator
  • optimizedAgenticRAGProcessor.ts - AI-powered analysis
  • documentAiProcessor.ts - Google Document AI integration
  • llmService.ts - LLM interactions (Claude AI/OpenAI)

2. File Management

  • fileStorageService.ts - Google Cloud Storage operations
  • pdfGenerationService.ts - PDF report generation
  • uploadMonitoringService.ts - Real-time upload tracking

3. Data Management

  • agenticRAGDatabaseService.ts - Analytics and session management
  • vectorDatabaseService.ts - Vector embeddings and search
  • sessionService.ts - User session management

📊 Processing Strategies

Current Active Strategy: Optimized Agentic RAG

  1. Text Extraction - Google Document AI extracts text from PDF
  2. Semantic Chunking - Split text into 4000-char chunks with overlap
  3. Vector Embedding - Generate embeddings for each chunk
  4. LLM Analysis - Claude AI analyzes chunks and generates structured data
  5. PDF Generation - Create summary PDF with analysis results

Output Format

Structured CIM Review data including:

  • Deal Overview
  • Business Description
  • Market Analysis
  • Financial Summary
  • Management Team
  • Investment Thesis
  • Key Questions & Next Steps

🔌 API Endpoints

Document Management

  • POST /documents/upload-url - Get signed upload URL
  • POST /documents/:id/confirm-upload - Confirm upload and start processing
  • POST /documents/:id/process-optimized-agentic-rag - Trigger AI processing
  • GET /documents/:id/download - Download processed PDF
  • DELETE /documents/:id - Delete document

Analytics & Monitoring

  • GET /documents/analytics - Get processing analytics
  • GET /documents/processing-stats - Get processing statistics
  • GET /documents/:id/agentic-rag-sessions - Get processing sessions
  • GET /monitoring/upload-metrics - Get upload metrics
  • GET /monitoring/upload-health - Get upload health status
  • GET /monitoring/real-time-stats - Get real-time statistics
  • GET /vector/stats - Get vector database statistics

🗄️ Database Schema

Core Tables

  • documents - Document metadata and processing status
  • agentic_rag_sessions - AI processing session tracking
  • document_chunks - Vector embeddings and chunk data
  • processing_jobs - Background job management
  • users - User authentication and profiles

🔐 Security

  • Firebase Authentication with JWT validation
  • Protected API endpoints with user-specific data isolation
  • Signed URLs for secure file uploads
  • Rate limiting and input validation
  • CORS configuration for cross-origin requests

📈 Performance & Monitoring

Real-time Monitoring

  • Upload progress tracking
  • Processing status updates
  • Error rate monitoring
  • Performance metrics
  • API usage tracking
  • Cost monitoring

Analytics Dashboard

  • Processing success rates
  • Average processing times
  • API usage statistics
  • Cost tracking
  • User activity metrics
  • Error analysis reports

🚨 Error Handling

Frontend Error Handling

  • Network errors with automatic retry
  • Authentication errors with token refresh
  • Upload errors with user-friendly messages
  • Processing errors with real-time display

Backend Error Handling

  • Validation errors with detailed messages
  • Processing errors with graceful degradation
  • Storage errors with retry logic
  • Database errors with connection pooling
  • LLM API errors with exponential backoff

🧪 Testing

Test Structure

  • Unit Tests: Jest for backend, Vitest for frontend
  • Integration Tests: End-to-end testing
  • API Tests: Supertest for backend endpoints

Test Coverage

  • Service layer testing
  • API endpoint testing
  • Error handling scenarios
  • Performance testing
  • Security testing

📚 Documentation Index

Technical Documentation

Analysis Reports

🤝 Contributing

Development Workflow

  1. Create feature branch from main
  2. Implement changes with tests
  3. Update documentation
  4. Submit pull request
  5. Code review and approval
  6. Merge to main

Code Standards

  • TypeScript for type safety
  • ESLint for code quality
  • Prettier for formatting
  • Jest for testing
  • Conventional commits for version control

📞 Support

Common Issues

  1. Upload Failures - Check GCS permissions and bucket configuration
  2. Processing Timeouts - Increase timeout limits for large documents
  3. Memory Issues - Monitor memory usage and adjust batch sizes
  4. API Quotas - Check API usage and implement rate limiting
  5. PDF Generation Failures - Check Puppeteer installation and memory
  6. LLM API Errors - Verify API keys and check rate limits

Debug Tools

  • Real-time logging with correlation IDs
  • Upload monitoring dashboard
  • Processing session details
  • Error analysis reports
  • Performance metrics dashboard

📄 License

This project is proprietary software developed for BPCP. All rights reserved.


Last Updated: December 2024 Version: 1.0.0 Status: Production Ready

Description
CIM Document Processor with Hybrid LLM Analysis
Readme 8.3 MiB
Languages
TypeScript 92.2%
JavaScript 3.7%
PLpgSQL 3.1%
Shell 1%