Files

Jon 9c1b6d1327 Add agentic RAG implementation with enhanced document processing and LLM services

2025-07-27 22:06:13 -04:00

6.9 KiB

Raw Blame History

RAG Processing System for CIM Analysis

Overview

This document describes the new RAG (Retrieval-Augmented Generation) processing system that provides an alternative to the current chunking approach for CIM document analysis.

Why RAG?

Current Chunking Issues

9 sequential chunks per document (inefficient)
Context fragmentation (each chunk analyzed in isolation)
Redundant processing (same company analyzed 9 times)
Inconsistent results (contradictions between chunks)
High costs (more API calls = higher total cost)

RAG Benefits

6-8 focused queries instead of 9+ chunks
Full document context maintained throughout
Intelligent retrieval of relevant sections
Lower costs with better quality
Faster processing with parallel capability

Architecture

Components

RAG Document Processor (ragDocumentProcessor.ts)
- Intelligent document segmentation
- Section-specific analysis
- Context-aware retrieval
- Performance tracking
Unified Document Processor (unifiedDocumentProcessor.ts)
- Strategy switching
- Performance comparison
- Quality assessment
- Statistics tracking
API Endpoints (enhanced documents.ts)
- /api/documents/:id/process-rag - Process with RAG
- /api/documents/:id/compare-strategies - Compare both approaches
- /api/documents/:id/switch-strategy - Switch processing strategy
- /api/documents/processing-stats - Get performance statistics

Configuration

Environment Variables

# Processing Strategy (default: 'chunking')
PROCESSING_STRATEGY=rag

# Enable RAG Processing
ENABLE_RAG_PROCESSING=true

# Enable Processing Comparison
ENABLE_PROCESSING_COMPARISON=true

# LLM Configuration for RAG
LLM_CHUNK_SIZE=15000          # Increased from 4000
LLM_MAX_TOKENS=4000          # Increased from 3500
LLM_MAX_INPUT_TOKENS=200000  # Increased from 180000
LLM_PROMPT_BUFFER=1000       # Increased from 500
LLM_TIMEOUT_MS=180000        # Increased from 120000
LLM_MAX_COST_PER_DOCUMENT=3.00  # Increased from 2.00

Usage

1. Process Document with RAG

// Using the unified processor
const result = await unifiedDocumentProcessor.processDocument(
  documentId,
  userId,
  documentText,
  { strategy: 'rag' }
);

console.log('RAG Processing Results:', {
  success: result.success,
  processingTime: result.processingTime,
  apiCalls: result.apiCalls,
  summary: result.summary
});

2. Compare Both Strategies

const comparison = await unifiedDocumentProcessor.compareProcessingStrategies(
  documentId,
  userId,
  documentText
);

console.log('Comparison Results:', {
  winner: comparison.winner,
  timeDifference: comparison.performanceMetrics.timeDifference,
  apiCallDifference: comparison.performanceMetrics.apiCallDifference,
  qualityScore: comparison.performanceMetrics.qualityScore
});

3. API Endpoints

Process with RAG

POST /api/documents/{id}/process-rag

Compare Strategies

POST /api/documents/{id}/compare-strategies

Switch Strategy

POST /api/documents/{id}/switch-strategy
Content-Type: application/json

{
  "strategy": "rag"  // or "chunking"
}

Get Processing Stats

GET /api/documents/processing-stats

Processing Flow

RAG Approach

Document Segmentation - Identify logical sections (executive summary, business description, financials, etc.)
Key Metrics Extraction - Extract financial and business metrics from each section
Query-Based Analysis - Process 6 focused queries for BPCP template sections
Context Synthesis - Combine results with full document context
Final Summary - Generate comprehensive markdown summary

Comparison with Chunking

Aspect	Chunking	RAG
Processing	9 sequential chunks	6 focused queries
Context	Fragmented per chunk	Full document context
Quality	Inconsistent across chunks	Consistent, focused analysis
Cost	High (9+ API calls)	Lower (6-8 API calls)
Speed	Slow (sequential)	Faster (parallel possible)
Accuracy	Context loss issues	Precise, relevant retrieval

Testing

Run RAG Test

cd backend
npm run build
node test-rag-processing.js

Expected Output

🚀 Testing RAG Processing Approach
==================================

📋 Testing RAG Processing...
✅ RAG Processing Results:
- Success: true
- Processing Time: 45000ms
- API Calls: 8
- Error: None

📊 Analysis Summary:
- Company: ABC Manufacturing
- Industry: Aerospace & Defense
- Revenue: $62M
- EBITDA: $12.1M

🔄 Testing Unified Processor Comparison...
✅ Comparison Results:
- Winner: rag
- Time Difference: -15000ms
- API Call Difference: -1
- Quality Score: 0.75

Performance Metrics

Quality Assessment

Summary Length - Longer summaries tend to be more comprehensive
Markdown Structure - Headers, lists, and formatting indicate better structure
Content Completeness - Coverage of all BPCP template sections
Consistency - No contradictions between sections

Cost Analysis

API Calls - RAG typically uses 6-8 calls vs 9+ for chunking
Token Usage - More efficient token usage with focused queries
Processing Time - Faster due to parallel processing capability

Migration Strategy

Phase 1: Parallel Testing

Keep current chunking system
Add RAG system alongside
Use comparison endpoints to evaluate performance
Collect statistics on both approaches

Phase 2: Gradual Migration

Switch to RAG for new documents
Use comparison to validate results
Monitor performance and quality metrics

Phase 3: Full Migration

Make RAG the default strategy
Keep chunking as fallback option
Optimize based on collected data

Troubleshooting

Common Issues

RAG Processing Fails
- Check LLM API configuration
- Verify document text extraction
- Review error logs for specific issues
Poor Quality Results
- Adjust section relevance thresholds
- Review query prompts
- Check document structure
High Processing Time
- Monitor API response times
- Check network connectivity
- Consider parallel processing optimization

Debug Mode

# Enable debug logging
LOG_LEVEL=debug
ENABLE_PROCESSING_COMPARISON=true

Future Enhancements

Vector Embeddings - Add semantic search capabilities
Caching - Cache section analysis for repeated queries
Parallel Processing - Process queries in parallel for speed
Custom Queries - Allow user-defined analysis queries
Quality Feedback - Learn from user feedback to improve prompts

Support

For issues or questions about the RAG processing system:

Check the logs for detailed error information
Run the test script to validate functionality
Compare with chunking approach to identify issues
Review configuration settings

6.9 KiB Raw Blame History