Files
cim_summary/backend/RAG_PROCESSING_README.md

6.9 KiB

RAG Processing System for CIM Analysis

Overview

This document describes the new RAG (Retrieval-Augmented Generation) processing system that provides an alternative to the current chunking approach for CIM document analysis.

Why RAG?

Current Chunking Issues

  • 9 sequential chunks per document (inefficient)
  • Context fragmentation (each chunk analyzed in isolation)
  • Redundant processing (same company analyzed 9 times)
  • Inconsistent results (contradictions between chunks)
  • High costs (more API calls = higher total cost)

RAG Benefits

  • 6-8 focused queries instead of 9+ chunks
  • Full document context maintained throughout
  • Intelligent retrieval of relevant sections
  • Lower costs with better quality
  • Faster processing with parallel capability

Architecture

Components

  1. RAG Document Processor (ragDocumentProcessor.ts)

    • Intelligent document segmentation
    • Section-specific analysis
    • Context-aware retrieval
    • Performance tracking
  2. Unified Document Processor (unifiedDocumentProcessor.ts)

    • Strategy switching
    • Performance comparison
    • Quality assessment
    • Statistics tracking
  3. API Endpoints (enhanced documents.ts)

    • /api/documents/:id/process-rag - Process with RAG
    • /api/documents/:id/compare-strategies - Compare both approaches
    • /api/documents/:id/switch-strategy - Switch processing strategy
    • /api/documents/processing-stats - Get performance statistics

Configuration

Environment Variables

# Processing Strategy (default: 'chunking')
PROCESSING_STRATEGY=rag

# Enable RAG Processing
ENABLE_RAG_PROCESSING=true

# Enable Processing Comparison
ENABLE_PROCESSING_COMPARISON=true

# LLM Configuration for RAG
LLM_CHUNK_SIZE=15000          # Increased from 4000
LLM_MAX_TOKENS=4000          # Increased from 3500
LLM_MAX_INPUT_TOKENS=200000  # Increased from 180000
LLM_PROMPT_BUFFER=1000       # Increased from 500
LLM_TIMEOUT_MS=180000        # Increased from 120000
LLM_MAX_COST_PER_DOCUMENT=3.00  # Increased from 2.00

Usage

1. Process Document with RAG

// Using the unified processor
const result = await unifiedDocumentProcessor.processDocument(
  documentId,
  userId,
  documentText,
  { strategy: 'rag' }
);

console.log('RAG Processing Results:', {
  success: result.success,
  processingTime: result.processingTime,
  apiCalls: result.apiCalls,
  summary: result.summary
});

2. Compare Both Strategies

const comparison = await unifiedDocumentProcessor.compareProcessingStrategies(
  documentId,
  userId,
  documentText
);

console.log('Comparison Results:', {
  winner: comparison.winner,
  timeDifference: comparison.performanceMetrics.timeDifference,
  apiCallDifference: comparison.performanceMetrics.apiCallDifference,
  qualityScore: comparison.performanceMetrics.qualityScore
});

3. API Endpoints

Process with RAG

POST /api/documents/{id}/process-rag

Compare Strategies

POST /api/documents/{id}/compare-strategies

Switch Strategy

POST /api/documents/{id}/switch-strategy
Content-Type: application/json

{
  "strategy": "rag"  // or "chunking"
}

Get Processing Stats

GET /api/documents/processing-stats

Processing Flow

RAG Approach

  1. Document Segmentation - Identify logical sections (executive summary, business description, financials, etc.)
  2. Key Metrics Extraction - Extract financial and business metrics from each section
  3. Query-Based Analysis - Process 6 focused queries for BPCP template sections
  4. Context Synthesis - Combine results with full document context
  5. Final Summary - Generate comprehensive markdown summary

Comparison with Chunking

Aspect Chunking RAG
Processing 9 sequential chunks 6 focused queries
Context Fragmented per chunk Full document context
Quality Inconsistent across chunks Consistent, focused analysis
Cost High (9+ API calls) Lower (6-8 API calls)
Speed Slow (sequential) Faster (parallel possible)
Accuracy Context loss issues Precise, relevant retrieval

Testing

Run RAG Test

cd backend
npm run build
node test-rag-processing.js

Expected Output

🚀 Testing RAG Processing Approach
==================================

📋 Testing RAG Processing...
✅ RAG Processing Results:
- Success: true
- Processing Time: 45000ms
- API Calls: 8
- Error: None

📊 Analysis Summary:
- Company: ABC Manufacturing
- Industry: Aerospace & Defense
- Revenue: $62M
- EBITDA: $12.1M

🔄 Testing Unified Processor Comparison...
✅ Comparison Results:
- Winner: rag
- Time Difference: -15000ms
- API Call Difference: -1
- Quality Score: 0.75

Performance Metrics

Quality Assessment

  • Summary Length - Longer summaries tend to be more comprehensive
  • Markdown Structure - Headers, lists, and formatting indicate better structure
  • Content Completeness - Coverage of all BPCP template sections
  • Consistency - No contradictions between sections

Cost Analysis

  • API Calls - RAG typically uses 6-8 calls vs 9+ for chunking
  • Token Usage - More efficient token usage with focused queries
  • Processing Time - Faster due to parallel processing capability

Migration Strategy

Phase 1: Parallel Testing

  • Keep current chunking system
  • Add RAG system alongside
  • Use comparison endpoints to evaluate performance
  • Collect statistics on both approaches

Phase 2: Gradual Migration

  • Switch to RAG for new documents
  • Use comparison to validate results
  • Monitor performance and quality metrics

Phase 3: Full Migration

  • Make RAG the default strategy
  • Keep chunking as fallback option
  • Optimize based on collected data

Troubleshooting

Common Issues

  1. RAG Processing Fails

    • Check LLM API configuration
    • Verify document text extraction
    • Review error logs for specific issues
  2. Poor Quality Results

    • Adjust section relevance thresholds
    • Review query prompts
    • Check document structure
  3. High Processing Time

    • Monitor API response times
    • Check network connectivity
    • Consider parallel processing optimization

Debug Mode

# Enable debug logging
LOG_LEVEL=debug
ENABLE_PROCESSING_COMPARISON=true

Future Enhancements

  1. Vector Embeddings - Add semantic search capabilities
  2. Caching - Cache section analysis for repeated queries
  3. Parallel Processing - Process queries in parallel for speed
  4. Custom Queries - Allow user-defined analysis queries
  5. Quality Feedback - Learn from user feedback to improve prompts

Support

For issues or questions about the RAG processing system:

  1. Check the logs for detailed error information
  2. Run the test script to validate functionality
  3. Compare with chunking approach to identify issues
  4. Review configuration settings