Add agentic RAG implementation with enhanced document processing and LLM services
This commit is contained in:
259
backend/RAG_PROCESSING_README.md
Normal file
259
backend/RAG_PROCESSING_README.md
Normal file
@@ -0,0 +1,259 @@
|
||||
# RAG Processing System for CIM Analysis
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the new RAG (Retrieval-Augmented Generation) processing system that provides an alternative to the current chunking approach for CIM document analysis.
|
||||
|
||||
## Why RAG?
|
||||
|
||||
### Current Chunking Issues
|
||||
- **9 sequential chunks** per document (inefficient)
|
||||
- **Context fragmentation** (each chunk analyzed in isolation)
|
||||
- **Redundant processing** (same company analyzed 9 times)
|
||||
- **Inconsistent results** (contradictions between chunks)
|
||||
- **High costs** (more API calls = higher total cost)
|
||||
|
||||
### RAG Benefits
|
||||
- **6-8 focused queries** instead of 9+ chunks
|
||||
- **Full document context** maintained throughout
|
||||
- **Intelligent retrieval** of relevant sections
|
||||
- **Lower costs** with better quality
|
||||
- **Faster processing** with parallel capability
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **RAG Document Processor** (`ragDocumentProcessor.ts`)
|
||||
- Intelligent document segmentation
|
||||
- Section-specific analysis
|
||||
- Context-aware retrieval
|
||||
- Performance tracking
|
||||
|
||||
2. **Unified Document Processor** (`unifiedDocumentProcessor.ts`)
|
||||
- Strategy switching
|
||||
- Performance comparison
|
||||
- Quality assessment
|
||||
- Statistics tracking
|
||||
|
||||
3. **API Endpoints** (enhanced `documents.ts`)
|
||||
- `/api/documents/:id/process-rag` - Process with RAG
|
||||
- `/api/documents/:id/compare-strategies` - Compare both approaches
|
||||
- `/api/documents/:id/switch-strategy` - Switch processing strategy
|
||||
- `/api/documents/processing-stats` - Get performance statistics
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Processing Strategy (default: 'chunking')
|
||||
PROCESSING_STRATEGY=rag
|
||||
|
||||
# Enable RAG Processing
|
||||
ENABLE_RAG_PROCESSING=true
|
||||
|
||||
# Enable Processing Comparison
|
||||
ENABLE_PROCESSING_COMPARISON=true
|
||||
|
||||
# LLM Configuration for RAG
|
||||
LLM_CHUNK_SIZE=15000 # Increased from 4000
|
||||
LLM_MAX_TOKENS=4000 # Increased from 3500
|
||||
LLM_MAX_INPUT_TOKENS=200000 # Increased from 180000
|
||||
LLM_PROMPT_BUFFER=1000 # Increased from 500
|
||||
LLM_TIMEOUT_MS=180000 # Increased from 120000
|
||||
LLM_MAX_COST_PER_DOCUMENT=3.00 # Increased from 2.00
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### 1. Process Document with RAG
|
||||
|
||||
```javascript
|
||||
// Using the unified processor
|
||||
const result = await unifiedDocumentProcessor.processDocument(
|
||||
documentId,
|
||||
userId,
|
||||
documentText,
|
||||
{ strategy: 'rag' }
|
||||
);
|
||||
|
||||
console.log('RAG Processing Results:', {
|
||||
success: result.success,
|
||||
processingTime: result.processingTime,
|
||||
apiCalls: result.apiCalls,
|
||||
summary: result.summary
|
||||
});
|
||||
```
|
||||
|
||||
### 2. Compare Both Strategies
|
||||
|
||||
```javascript
|
||||
const comparison = await unifiedDocumentProcessor.compareProcessingStrategies(
|
||||
documentId,
|
||||
userId,
|
||||
documentText
|
||||
);
|
||||
|
||||
console.log('Comparison Results:', {
|
||||
winner: comparison.winner,
|
||||
timeDifference: comparison.performanceMetrics.timeDifference,
|
||||
apiCallDifference: comparison.performanceMetrics.apiCallDifference,
|
||||
qualityScore: comparison.performanceMetrics.qualityScore
|
||||
});
|
||||
```
|
||||
|
||||
### 3. API Endpoints
|
||||
|
||||
#### Process with RAG
|
||||
```bash
|
||||
POST /api/documents/{id}/process-rag
|
||||
```
|
||||
|
||||
#### Compare Strategies
|
||||
```bash
|
||||
POST /api/documents/{id}/compare-strategies
|
||||
```
|
||||
|
||||
#### Switch Strategy
|
||||
```bash
|
||||
POST /api/documents/{id}/switch-strategy
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"strategy": "rag" // or "chunking"
|
||||
}
|
||||
```
|
||||
|
||||
#### Get Processing Stats
|
||||
```bash
|
||||
GET /api/documents/processing-stats
|
||||
```
|
||||
|
||||
## Processing Flow
|
||||
|
||||
### RAG Approach
|
||||
1. **Document Segmentation** - Identify logical sections (executive summary, business description, financials, etc.)
|
||||
2. **Key Metrics Extraction** - Extract financial and business metrics from each section
|
||||
3. **Query-Based Analysis** - Process 6 focused queries for BPCP template sections
|
||||
4. **Context Synthesis** - Combine results with full document context
|
||||
5. **Final Summary** - Generate comprehensive markdown summary
|
||||
|
||||
### Comparison with Chunking
|
||||
|
||||
| Aspect | Chunking | RAG |
|
||||
|--------|----------|-----|
|
||||
| **Processing** | 9 sequential chunks | 6 focused queries |
|
||||
| **Context** | Fragmented per chunk | Full document context |
|
||||
| **Quality** | Inconsistent across chunks | Consistent, focused analysis |
|
||||
| **Cost** | High (9+ API calls) | Lower (6-8 API calls) |
|
||||
| **Speed** | Slow (sequential) | Faster (parallel possible) |
|
||||
| **Accuracy** | Context loss issues | Precise, relevant retrieval |
|
||||
|
||||
## Testing
|
||||
|
||||
### Run RAG Test
|
||||
```bash
|
||||
cd backend
|
||||
npm run build
|
||||
node test-rag-processing.js
|
||||
```
|
||||
|
||||
### Expected Output
|
||||
```
|
||||
🚀 Testing RAG Processing Approach
|
||||
==================================
|
||||
|
||||
📋 Testing RAG Processing...
|
||||
✅ RAG Processing Results:
|
||||
- Success: true
|
||||
- Processing Time: 45000ms
|
||||
- API Calls: 8
|
||||
- Error: None
|
||||
|
||||
📊 Analysis Summary:
|
||||
- Company: ABC Manufacturing
|
||||
- Industry: Aerospace & Defense
|
||||
- Revenue: $62M
|
||||
- EBITDA: $12.1M
|
||||
|
||||
🔄 Testing Unified Processor Comparison...
|
||||
✅ Comparison Results:
|
||||
- Winner: rag
|
||||
- Time Difference: -15000ms
|
||||
- API Call Difference: -1
|
||||
- Quality Score: 0.75
|
||||
```
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Quality Assessment
|
||||
- **Summary Length** - Longer summaries tend to be more comprehensive
|
||||
- **Markdown Structure** - Headers, lists, and formatting indicate better structure
|
||||
- **Content Completeness** - Coverage of all BPCP template sections
|
||||
- **Consistency** - No contradictions between sections
|
||||
|
||||
### Cost Analysis
|
||||
- **API Calls** - RAG typically uses 6-8 calls vs 9+ for chunking
|
||||
- **Token Usage** - More efficient token usage with focused queries
|
||||
- **Processing Time** - Faster due to parallel processing capability
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Phase 1: Parallel Testing
|
||||
- Keep current chunking system
|
||||
- Add RAG system alongside
|
||||
- Use comparison endpoints to evaluate performance
|
||||
- Collect statistics on both approaches
|
||||
|
||||
### Phase 2: Gradual Migration
|
||||
- Switch to RAG for new documents
|
||||
- Use comparison to validate results
|
||||
- Monitor performance and quality metrics
|
||||
|
||||
### Phase 3: Full Migration
|
||||
- Make RAG the default strategy
|
||||
- Keep chunking as fallback option
|
||||
- Optimize based on collected data
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **RAG Processing Fails**
|
||||
- Check LLM API configuration
|
||||
- Verify document text extraction
|
||||
- Review error logs for specific issues
|
||||
|
||||
2. **Poor Quality Results**
|
||||
- Adjust section relevance thresholds
|
||||
- Review query prompts
|
||||
- Check document structure
|
||||
|
||||
3. **High Processing Time**
|
||||
- Monitor API response times
|
||||
- Check network connectivity
|
||||
- Consider parallel processing optimization
|
||||
|
||||
### Debug Mode
|
||||
```bash
|
||||
# Enable debug logging
|
||||
LOG_LEVEL=debug
|
||||
ENABLE_PROCESSING_COMPARISON=true
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Vector Embeddings** - Add semantic search capabilities
|
||||
2. **Caching** - Cache section analysis for repeated queries
|
||||
3. **Parallel Processing** - Process queries in parallel for speed
|
||||
4. **Custom Queries** - Allow user-defined analysis queries
|
||||
5. **Quality Feedback** - Learn from user feedback to improve prompts
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions about the RAG processing system:
|
||||
1. Check the logs for detailed error information
|
||||
2. Run the test script to validate functionality
|
||||
3. Compare with chunking approach to identify issues
|
||||
4. Review configuration settings
|
||||
41
backend/check-database-data.js
Normal file
41
backend/check-database-data.js
Normal file
@@ -0,0 +1,41 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://postgres:password@localhost:5432/cim_processor'
|
||||
});
|
||||
|
||||
async function checkData() {
|
||||
try {
|
||||
console.log('🔍 Checking database data for recent document...');
|
||||
|
||||
const result = await pool.query(`
|
||||
SELECT id, original_file_name, status, analysis_data, generated_summary
|
||||
FROM documents
|
||||
WHERE id = '435be351-e022-478a-a388-d0c71328cd06'
|
||||
`);
|
||||
|
||||
if (result.rows.length > 0) {
|
||||
const doc = result.rows[0];
|
||||
console.log('📄 Document:', doc.original_file_name);
|
||||
console.log('📊 Status:', doc.status);
|
||||
console.log('🔍 Has analysis_data:', !!doc.analysis_data);
|
||||
console.log('📝 Generated summary length:', doc.generated_summary?.length || 0);
|
||||
|
||||
if (doc.analysis_data) {
|
||||
console.log('\n📋 Analysis data keys:', Object.keys(doc.analysis_data));
|
||||
console.log('\n📊 Analysis data structure:');
|
||||
console.log(JSON.stringify(doc.analysis_data, null, 2));
|
||||
} else {
|
||||
console.log('\n❌ No analysis_data found!');
|
||||
}
|
||||
} else {
|
||||
console.log('❌ Document not found');
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkData();
|
||||
12
backend/package-lock.json
generated
12
backend/package-lock.json
generated
@@ -27,7 +27,8 @@
|
||||
"puppeteer": "^21.11.0",
|
||||
"redis": "^4.6.10",
|
||||
"uuid": "^11.1.0",
|
||||
"winston": "^3.11.0"
|
||||
"winston": "^3.11.0",
|
||||
"zod": "^3.25.76"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/bcryptjs": "^2.4.6",
|
||||
@@ -8850,6 +8851,15 @@
|
||||
"funding": {
|
||||
"url": "https://github.com/sponsors/sindresorhus"
|
||||
}
|
||||
},
|
||||
"node_modules/zod": {
|
||||
"version": "3.25.76",
|
||||
"resolved": "https://registry.npmjs.org/zod/-/zod-3.25.76.tgz",
|
||||
"integrity": "sha512-gzUt/qt81nXsFGKIFcC3YnfEAx5NkunCfnDlvuBSSFS02bcXu4Lmea0AFIUwbLWxWPx3d9p8S5QoaujKcNQxcQ==",
|
||||
"license": "MIT",
|
||||
"funding": {
|
||||
"url": "https://github.com/sponsors/colinhacks"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -35,7 +35,8 @@
|
||||
"puppeteer": "^21.11.0",
|
||||
"redis": "^4.6.10",
|
||||
"uuid": "^11.1.0",
|
||||
"winston": "^3.11.0"
|
||||
"winston": "^3.11.0",
|
||||
"zod": "^3.25.76"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/bcryptjs": "^2.4.6",
|
||||
|
||||
@@ -46,8 +46,9 @@ const envSchema = Joi.object({
|
||||
otherwise: Joi.string().allow('').optional()
|
||||
}),
|
||||
LLM_MODEL: Joi.string().default('gpt-4'),
|
||||
LLM_MAX_TOKENS: Joi.number().default(4000),
|
||||
LLM_MAX_TOKENS: Joi.number().default(3500),
|
||||
LLM_TEMPERATURE: Joi.number().min(0).max(2).default(0.1),
|
||||
LLM_PROMPT_BUFFER: Joi.number().default(500),
|
||||
|
||||
// Storage
|
||||
STORAGE_TYPE: Joi.string().valid('local', 's3').default('local'),
|
||||
@@ -80,6 +81,11 @@ const envSchema = Joi.object({
|
||||
// Logging
|
||||
LOG_LEVEL: Joi.string().valid('error', 'warn', 'info', 'debug').default('info'),
|
||||
LOG_FILE: Joi.string().default('logs/app.log'),
|
||||
|
||||
// Processing Strategy
|
||||
PROCESSING_STRATEGY: Joi.string().valid('chunking', 'rag').default('chunking'), // 'chunking' | 'rag'
|
||||
ENABLE_RAG_PROCESSING: Joi.boolean().default(false),
|
||||
ENABLE_PROCESSING_COMPARISON: Joi.boolean().default(false),
|
||||
}).unknown();
|
||||
|
||||
// Validate environment variables
|
||||
@@ -138,18 +144,19 @@ export const config = {
|
||||
fastModel: envVars['LLM_FAST_MODEL'] || 'claude-3-5-haiku-20241022', // Fast model for cost optimization
|
||||
fallbackModel: envVars['LLM_FALLBACK_MODEL'] || 'gpt-4o-mini', // Fallback for reliability
|
||||
|
||||
// Token Limits - Optimized for CIM documents
|
||||
maxTokens: parseInt(envVars['LLM_MAX_TOKENS'] || '4000'), // Output tokens
|
||||
maxInputTokens: parseInt(envVars['LLM_MAX_INPUT_TOKENS'] || '180000'), // Input tokens (leaving buffer)
|
||||
chunkSize: parseInt(envVars['LLM_CHUNK_SIZE'] || '4000'), // Chunk size for large documents
|
||||
// Token Limits - Optimized for CIM documents with hierarchical processing
|
||||
maxTokens: parseInt(envVars['LLM_MAX_TOKENS'] || '4000'), // Output tokens (increased for better analysis)
|
||||
maxInputTokens: parseInt(envVars['LLM_MAX_INPUT_TOKENS'] || '200000'), // Input tokens (increased for larger context)
|
||||
chunkSize: parseInt(envVars['LLM_CHUNK_SIZE'] || '15000'), // Chunk size for section analysis (increased from 4000)
|
||||
promptBuffer: parseInt(envVars['LLM_PROMPT_BUFFER'] || '1000'), // Buffer for prompt tokens (increased)
|
||||
|
||||
// Processing Configuration
|
||||
temperature: parseFloat(envVars['LLM_TEMPERATURE'] || '0.1'), // Low temperature for consistent output
|
||||
timeoutMs: parseInt(envVars['LLM_TIMEOUT_MS'] || '120000'), // 2 minutes timeout
|
||||
timeoutMs: parseInt(envVars['LLM_TIMEOUT_MS'] || '180000'), // 3 minutes timeout (increased for complex analysis)
|
||||
|
||||
// Cost Optimization
|
||||
enableCostOptimization: envVars['LLM_ENABLE_COST_OPTIMIZATION'] === 'true',
|
||||
maxCostPerDocument: parseFloat(envVars['LLM_MAX_COST_PER_DOCUMENT'] || '2.00'), // Max $2 per document
|
||||
maxCostPerDocument: parseFloat(envVars['LLM_MAX_COST_PER_DOCUMENT'] || '3.00'), // Max $3 per document (increased for better quality)
|
||||
useFastModelForSimpleTasks: envVars['LLM_USE_FAST_MODEL_FOR_SIMPLE_TASKS'] === 'true',
|
||||
},
|
||||
|
||||
@@ -175,6 +182,11 @@ export const config = {
|
||||
level: envVars.LOG_LEVEL,
|
||||
file: envVars.LOG_FILE,
|
||||
},
|
||||
|
||||
// Processing Strategy
|
||||
processingStrategy: envVars['PROCESSING_STRATEGY'] || 'chunking', // 'chunking' | 'rag'
|
||||
enableRAGProcessing: envVars['ENABLE_RAG_PROCESSING'] === 'true',
|
||||
enableProcessingComparison: envVars['ENABLE_PROCESSING_COMPARISON'] === 'true',
|
||||
};
|
||||
|
||||
export default config;
|
||||
@@ -26,6 +26,7 @@ export interface Document {
|
||||
processing_started_at?: Date;
|
||||
processing_completed_at?: Date;
|
||||
error_message?: string;
|
||||
analysis_data?: any; // BPCP CIM Review Template data
|
||||
created_at: Date;
|
||||
updated_at: Date;
|
||||
}
|
||||
|
||||
@@ -1,795 +1,145 @@
|
||||
import { Router, Request, Response, NextFunction } from 'express';
|
||||
import { auth } from '../middleware/auth';
|
||||
import { validateDocumentUpload } from '../middleware/validation';
|
||||
import { handleFileUpload, cleanupUploadedFile } from '../middleware/upload';
|
||||
import { fileStorageService } from '../services/fileStorageService';
|
||||
import { uploadProgressService } from '../services/uploadProgressService';
|
||||
import { documentProcessingService } from '../services/documentProcessingService';
|
||||
import { jobQueueService } from '../services/jobQueueService';
|
||||
import { DocumentModel } from '../models/DocumentModel';
|
||||
import express from 'express';
|
||||
import { authenticateToken } from '../middleware/auth';
|
||||
import { documentController } from '../controllers/documentController';
|
||||
import { unifiedDocumentProcessor } from '../services/unifiedDocumentProcessor';
|
||||
import { logger } from '../utils/logger';
|
||||
import { v4 as uuidv4 } from 'uuid';
|
||||
import fs from 'fs';
|
||||
|
||||
const router = Router();
|
||||
const router = express.Router();
|
||||
|
||||
// Apply authentication middleware to all document routes
|
||||
router.use(auth);
|
||||
// Apply authentication to all routes
|
||||
router.use(authenticateToken);
|
||||
|
||||
// GET /api/documents - Get all documents for the authenticated user
|
||||
router.get('/', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const userId = (req as any).user.userId;
|
||||
const documents = await DocumentModel.findByUserId(userId);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
data: documents,
|
||||
message: 'Documents retrieved successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
next(error);
|
||||
}
|
||||
});
|
||||
// Existing routes
|
||||
router.post('/upload', documentController.uploadDocument);
|
||||
router.get('/', documentController.getDocuments);
|
||||
router.get('/:id', documentController.getDocument);
|
||||
router.get('/:id/progress', documentController.getDocumentProgress);
|
||||
router.delete('/:id', documentController.deleteDocument);
|
||||
|
||||
// GET /api/documents/:id - Get a specific document
|
||||
router.get('/:id', async (req: Request, res: Response, next: NextFunction) => {
|
||||
// New RAG processing routes
|
||||
router.post('/:id/process-rag', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const userId = req.user?.id;
|
||||
|
||||
// Enhanced validation for document ID
|
||||
if (!id || id === 'undefined' || id === 'null' || id.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Invalid document ID provided',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
const document = await DocumentModel.findById(id);
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
if (!userId) {
|
||||
return res.status(401).json({ error: 'User not authenticated' });
|
||||
}
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
data: document,
|
||||
message: 'Document retrieved successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/documents - Upload and process a new document
|
||||
router.post('/', validateDocumentUpload, handleFileUpload, async (req: Request, res: Response) => {
|
||||
const uploadId = uuidv4();
|
||||
const userId = (req as any).user.userId;
|
||||
let uploadedFilePath: string | null = null;
|
||||
|
||||
try {
|
||||
if (!req.file) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'No file uploaded',
|
||||
message: 'Please select a PDF file to upload',
|
||||
});
|
||||
}
|
||||
|
||||
const { processImmediately = false } = req.body;
|
||||
const file = req.file;
|
||||
uploadedFilePath = file.path;
|
||||
|
||||
// Store file using storage service
|
||||
const storageResult = await fileStorageService.storeFile(file, userId);
|
||||
// Get document text (you'll need to implement this)
|
||||
const documentText = await documentController.getDocumentText(id);
|
||||
|
||||
if (!storageResult.success) {
|
||||
throw new Error(storageResult.error || 'Failed to store file');
|
||||
}
|
||||
|
||||
// Add document to database
|
||||
const document = await DocumentModel.create({
|
||||
user_id: userId,
|
||||
original_file_name: file.originalname,
|
||||
file_path: file.path,
|
||||
file_size: file.size,
|
||||
});
|
||||
|
||||
// Process document if requested
|
||||
let processingJobId: string | null = null;
|
||||
if (processImmediately) {
|
||||
try {
|
||||
processingJobId = await jobQueueService.addJob('document_processing', {
|
||||
documentId: document.id,
|
||||
userId,
|
||||
});
|
||||
|
||||
logger.info(`Document processing job queued: ${document.id}`, {
|
||||
jobId: processingJobId,
|
||||
documentId: document.id,
|
||||
userId,
|
||||
});
|
||||
} catch (processingError) {
|
||||
logger.error('Failed to queue document processing', {
|
||||
documentId: document.id,
|
||||
error: processingError instanceof Error ? processingError.message : 'Unknown error',
|
||||
});
|
||||
// Don't fail the upload if processing fails
|
||||
}
|
||||
}
|
||||
|
||||
// Note: Don't clean up uploaded file here - it will be cleaned up after processing
|
||||
// cleanupUploadedFile(uploadedFilePath);
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
data: {
|
||||
id: document.id,
|
||||
uploadId,
|
||||
processingJobId,
|
||||
status: 'uploaded',
|
||||
filename: file.originalname,
|
||||
fileSize: file.size,
|
||||
message: 'Document uploaded successfully',
|
||||
},
|
||||
});
|
||||
} catch (error) {
|
||||
// Clean up uploaded file on error
|
||||
if (uploadedFilePath) {
|
||||
cleanupUploadedFile(uploadedFilePath);
|
||||
}
|
||||
|
||||
logger.error('Document upload failed', {
|
||||
userId,
|
||||
filename: req.file?.originalname,
|
||||
error: error instanceof Error ? error.message : 'Unknown error',
|
||||
});
|
||||
|
||||
return res.status(500).json({
|
||||
success: false,
|
||||
error: 'Upload failed',
|
||||
message: error instanceof Error ? error.message : 'An error occurred during upload',
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/documents/:id/process - Start processing a document
|
||||
router.post('/:id/process', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Enhanced validation for document ID
|
||||
if (!id || id === 'undefined' || id === 'null' || id.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Invalid document ID provided',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
const { options } = req.body;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if document is already being processed
|
||||
if (document.status === 'processing_llm' || document.status === 'extracting_text' || document.status === 'generating_pdf') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document is already being processed',
|
||||
});
|
||||
}
|
||||
|
||||
// Add processing job to queue
|
||||
const jobId = await jobQueueService.addJob('document_processing', {
|
||||
documentId: id,
|
||||
userId,
|
||||
options: options || {
|
||||
extractText: true,
|
||||
generateSummary: true,
|
||||
performAnalysis: true,
|
||||
},
|
||||
}, 0, 3);
|
||||
|
||||
// Update document status
|
||||
await DocumentModel.updateById(id, {
|
||||
status: 'extracting_text',
|
||||
processing_started_at: new Date(),
|
||||
});
|
||||
|
||||
logger.info(`Document processing started: ${id}`, {
|
||||
jobId,
|
||||
userId,
|
||||
options,
|
||||
});
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
data: {
|
||||
jobId,
|
||||
documentId: id,
|
||||
status: 'processing',
|
||||
},
|
||||
message: 'Document processing started',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/:id/processing-status - Get document processing status
|
||||
router.get('/:id/processing-status', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Enhanced validation for document ID
|
||||
if (!id || id === 'undefined' || id === 'null' || id.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Invalid document ID provided',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Get processing history
|
||||
const processingHistory = await documentProcessingService.getDocumentProcessingHistory(id);
|
||||
|
||||
// Get current job status if processing
|
||||
let currentJob = null;
|
||||
if (document.status === 'processing_llm' || document.status === 'extracting_text' || document.status === 'generating_pdf') {
|
||||
const jobs = jobQueueService.getAllJobs();
|
||||
currentJob = [...jobs.queue, ...jobs.processing].find(job =>
|
||||
job.data.documentId === id &&
|
||||
(job.status === 'pending' || job.status === 'processing')
|
||||
);
|
||||
}
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
data: {
|
||||
documentId: id,
|
||||
status: document.status,
|
||||
currentJob,
|
||||
processingHistory,
|
||||
extractedText: document.extracted_text,
|
||||
summary: document.generated_summary,
|
||||
analysis: null, // TODO: Add analysis data field to Document model
|
||||
},
|
||||
message: 'Processing status retrieved successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/:id/progress - Get processing progress for a document
|
||||
router.get('/:id/progress', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Enhanced validation for document ID
|
||||
if (!id || id === 'undefined' || id === 'null' || id.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Invalid document ID provided',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
const document = await DocumentModel.findById(id);
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Get progress from progress service
|
||||
let progress = uploadProgressService.getProgress(id);
|
||||
|
||||
// If no progress from service, check document status in database
|
||||
if (!progress) {
|
||||
// Check if document is completed in database
|
||||
if (document.status === 'completed') {
|
||||
progress = {
|
||||
documentId: id,
|
||||
jobId: '', // Document doesn't have job_id, will be empty for completed docs
|
||||
status: 'completed',
|
||||
step: 'storage',
|
||||
progress: 100,
|
||||
message: 'Document processing completed successfully',
|
||||
startTime: document.created_at || new Date(),
|
||||
};
|
||||
} else if (document.status === 'processing_llm') {
|
||||
progress = {
|
||||
documentId: id,
|
||||
jobId: '', // Document doesn't have job_id, will be empty for processing docs
|
||||
status: 'processing',
|
||||
step: 'summary_generation',
|
||||
progress: 60,
|
||||
message: 'Processing document with LLM...',
|
||||
startTime: document.created_at || new Date(),
|
||||
};
|
||||
} else if (document.status === 'uploaded') {
|
||||
progress = {
|
||||
documentId: id,
|
||||
jobId: '', // Document doesn't have job_id, will be empty for uploaded docs
|
||||
status: 'processing',
|
||||
step: 'validation',
|
||||
progress: 10,
|
||||
message: 'Document uploaded, waiting for processing...',
|
||||
startTime: document.created_at || new Date(),
|
||||
};
|
||||
} else {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'No progress tracking found for this document',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
data: progress,
|
||||
message: 'Progress retrieved successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/queue/status - Get job queue status and active jobs
|
||||
router.get('/queue/status', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
// Get queue statistics
|
||||
const stats = jobQueueService.getQueueStats();
|
||||
|
||||
// Get all jobs and filter to user's documents
|
||||
const allJobs = jobQueueService.getAllJobs();
|
||||
const userDocuments = await DocumentModel.findByUserId(userId);
|
||||
const userDocumentIds = new Set(userDocuments.map(doc => doc.id));
|
||||
|
||||
// Filter active jobs to only show user's documents
|
||||
const activeJobs = [...allJobs.queue, ...allJobs.processing]
|
||||
.filter(job => userDocumentIds.has(job.data.documentId))
|
||||
.map(job => ({
|
||||
id: job.id,
|
||||
type: job.type,
|
||||
status: job.status,
|
||||
createdAt: job.createdAt.toISOString(),
|
||||
startedAt: job.startedAt?.toISOString(),
|
||||
completedAt: job.completedAt?.toISOString(),
|
||||
data: job.data,
|
||||
}));
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
data: {
|
||||
stats,
|
||||
activeJobs,
|
||||
},
|
||||
message: 'Queue status retrieved successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/progress/all - Get all active processing progress
|
||||
router.get('/progress/all', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
// Get all progress and filter by user's documents
|
||||
const allProgress = uploadProgressService.getAllProgress();
|
||||
const userDocuments = await DocumentModel.findByUserId(userId);
|
||||
const userDocumentIds = new Set(userDocuments.map(doc => doc.id));
|
||||
|
||||
// Filter progress to only show user's documents
|
||||
const userProgress = allProgress.filter(progress =>
|
||||
userDocumentIds.has(progress.documentId)
|
||||
const result = await unifiedDocumentProcessor.processDocument(
|
||||
id,
|
||||
userId,
|
||||
documentText,
|
||||
{ strategy: 'rag' }
|
||||
);
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
data: userProgress,
|
||||
message: 'Progress retrieved successfully',
|
||||
res.json({
|
||||
success: result.success,
|
||||
processingStrategy: result.processingStrategy,
|
||||
processingTime: result.processingTime,
|
||||
apiCalls: result.apiCalls,
|
||||
summary: result.summary,
|
||||
analysisData: result.analysisData,
|
||||
error: result.error
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
logger.error('RAG processing failed', { error });
|
||||
res.status(500).json({ error: 'RAG processing failed' });
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/documents/:id/regenerate-summary - Regenerate summary for a document
|
||||
router.post('/:id/regenerate-summary', async (req: Request, res: Response, next: NextFunction) => {
|
||||
router.post('/:id/compare-strategies', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const userId = req.user?.id;
|
||||
|
||||
// Enhanced validation for document ID
|
||||
if (!id || id === 'undefined' || id === 'null' || id.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Invalid document ID provided',
|
||||
});
|
||||
if (!userId) {
|
||||
return res.status(401).json({ error: 'User not authenticated' });
|
||||
}
|
||||
|
||||
// Get document text
|
||||
const documentText = await documentController.getDocumentText(id);
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
const document = await DocumentModel.findById(id);
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if document has extracted text
|
||||
if (!document.extracted_text) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document has no extracted text to regenerate summary from',
|
||||
});
|
||||
}
|
||||
|
||||
// Start regeneration in background
|
||||
documentProcessingService.regenerateSummary(id).catch(error => {
|
||||
logger.error('Background summary regeneration failed', {
|
||||
documentId: id,
|
||||
error: error instanceof Error ? error.message : 'Unknown error'
|
||||
});
|
||||
});
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
message: 'Summary regeneration started. Check document status for progress.',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/:id/download - Download document summary
|
||||
router.get('/:id/download', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
if (!id) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document ID is required',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if document is completed
|
||||
if (document.status !== 'completed') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document processing not completed',
|
||||
});
|
||||
}
|
||||
|
||||
// Try to serve PDF first, then markdown
|
||||
let filePath = null;
|
||||
let contentType = 'application/pdf';
|
||||
let fileName = `${document.original_file_name.replace(/\.[^/.]+$/, '')}_summary.pdf`;
|
||||
|
||||
if (document.summary_pdf_path && fs.existsSync(document.summary_pdf_path)) {
|
||||
filePath = document.summary_pdf_path;
|
||||
} else if (document.summary_markdown_path && fs.existsSync(document.summary_markdown_path)) {
|
||||
filePath = document.summary_markdown_path;
|
||||
contentType = 'text/markdown';
|
||||
fileName = `${document.original_file_name.replace(/\.[^/.]+$/, '')}_summary.md`;
|
||||
} else {
|
||||
// Create a simple text file with the summary
|
||||
const summaryText = document.generated_summary || 'No summary available';
|
||||
res.setHeader('Content-Type', 'text/plain');
|
||||
res.setHeader('Content-Disposition', `attachment; filename="${fileName.replace('.pdf', '.txt')}"`);
|
||||
return res.send(summaryText);
|
||||
}
|
||||
|
||||
if (!filePath) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Summary file not found',
|
||||
});
|
||||
}
|
||||
|
||||
res.setHeader('Content-Type', contentType);
|
||||
res.setHeader('Content-Disposition', `attachment; filename="${fileName}"`);
|
||||
res.sendFile(filePath);
|
||||
|
||||
logger.info(`Document downloaded: ${id}`, {
|
||||
const comparison = await unifiedDocumentProcessor.compareProcessingStrategies(
|
||||
id,
|
||||
userId,
|
||||
filename: document.original_file_name,
|
||||
filePath,
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// GET /api/documents/:id/file - Stream document file
|
||||
router.get('/:id/file', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
if (!id) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document ID is required',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// TODO: Implement actual file streaming
|
||||
// For now, return a placeholder response
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'File not found',
|
||||
message: 'File serving not yet implemented',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/documents/:id/feedback - Submit feedback for document regeneration
|
||||
router.post('/:id/feedback', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
if (!id) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document ID is required',
|
||||
});
|
||||
}
|
||||
|
||||
const { feedback: _feedback } = req.body;
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// TODO: Implement feedback submission
|
||||
// For now, return a placeholder response
|
||||
return res.json({
|
||||
success: true,
|
||||
data: {
|
||||
feedbackId: 'temp-feedback-id',
|
||||
},
|
||||
message: 'Feedback submitted successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/documents/:id/regenerate - Regenerate document with feedback
|
||||
router.post('/:id/regenerate', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
if (!id) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document ID is required',
|
||||
});
|
||||
}
|
||||
|
||||
const { feedbackId: _feedbackId } = req.body;
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// TODO: Implement document regeneration
|
||||
// For now, return a placeholder response
|
||||
return res.json({
|
||||
success: true,
|
||||
data: {
|
||||
jobId: 'temp-job-id',
|
||||
status: 'processing',
|
||||
},
|
||||
message: 'Document regeneration started',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
}
|
||||
});
|
||||
|
||||
// DELETE /api/documents/:id - Delete a document
|
||||
router.delete('/:id', async (req: Request, res: Response, next: NextFunction) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
if (!id) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'Document ID is required',
|
||||
});
|
||||
}
|
||||
|
||||
const userId = (req as any).user.userId;
|
||||
|
||||
const document = await DocumentModel.findById(id);
|
||||
|
||||
if (!document) {
|
||||
return res.status(404).json({
|
||||
success: false,
|
||||
error: 'Document not found',
|
||||
});
|
||||
}
|
||||
|
||||
// Check if user owns the document or is admin
|
||||
if (document.user_id !== userId && (req as any).user.role !== 'admin') {
|
||||
return res.status(403).json({
|
||||
success: false,
|
||||
error: 'Access denied',
|
||||
});
|
||||
}
|
||||
|
||||
// Cancel any pending processing jobs
|
||||
const jobs = jobQueueService.getAllJobs();
|
||||
const documentJobs = [...jobs.queue, ...jobs.processing].filter(job =>
|
||||
job.data.documentId === id
|
||||
documentText
|
||||
);
|
||||
|
||||
documentJobs.forEach(job => {
|
||||
jobQueueService.cancelJob(job.id);
|
||||
|
||||
res.json({
|
||||
winner: comparison.winner,
|
||||
performanceMetrics: comparison.performanceMetrics,
|
||||
chunking: {
|
||||
success: comparison.chunking.success,
|
||||
processingTime: comparison.chunking.processingTime,
|
||||
apiCalls: comparison.chunking.apiCalls,
|
||||
error: comparison.chunking.error
|
||||
},
|
||||
rag: {
|
||||
success: comparison.rag.success,
|
||||
processingTime: comparison.rag.processingTime,
|
||||
apiCalls: comparison.rag.apiCalls,
|
||||
error: comparison.rag.error
|
||||
}
|
||||
});
|
||||
|
||||
// Delete the file from storage
|
||||
if (document.file_path) {
|
||||
await fileStorageService.deleteFile(document.file_path);
|
||||
}
|
||||
|
||||
// Delete the document record
|
||||
const deleted = await DocumentModel.delete(id);
|
||||
|
||||
if (!deleted) {
|
||||
return res.status(500).json({
|
||||
success: false,
|
||||
error: 'Failed to delete document',
|
||||
});
|
||||
}
|
||||
|
||||
logger.info(`Document deleted: ${id}`, {
|
||||
userId,
|
||||
filename: document.original_file_name,
|
||||
cancelledJobs: documentJobs.length,
|
||||
});
|
||||
|
||||
return res.json({
|
||||
success: true,
|
||||
message: 'Document deleted successfully',
|
||||
});
|
||||
} catch (error) {
|
||||
return next(error);
|
||||
logger.error('Strategy comparison failed', { error });
|
||||
res.status(500).json({ error: 'Strategy comparison failed' });
|
||||
}
|
||||
});
|
||||
|
||||
router.get('/processing-stats', async (req, res) => {
|
||||
try {
|
||||
const stats = await unifiedDocumentProcessor.getProcessingStats();
|
||||
res.json(stats);
|
||||
} catch (error) {
|
||||
logger.error('Failed to get processing stats', { error });
|
||||
res.status(500).json({ error: 'Failed to get processing stats' });
|
||||
}
|
||||
});
|
||||
|
||||
router.post('/:id/switch-strategy', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { strategy } = req.body;
|
||||
const userId = req.user?.id;
|
||||
|
||||
if (!userId) {
|
||||
return res.status(401).json({ error: 'User not authenticated' });
|
||||
}
|
||||
|
||||
if (!['chunking', 'rag'].includes(strategy)) {
|
||||
return res.status(400).json({ error: 'Invalid strategy. Must be "chunking" or "rag"' });
|
||||
}
|
||||
|
||||
// Get document text
|
||||
const documentText = await documentController.getDocumentText(id);
|
||||
|
||||
const result = await unifiedDocumentProcessor.switchStrategy(
|
||||
id,
|
||||
userId,
|
||||
documentText,
|
||||
strategy
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: result.success,
|
||||
processingStrategy: result.processingStrategy,
|
||||
processingTime: result.processingTime,
|
||||
apiCalls: result.apiCalls,
|
||||
summary: result.summary,
|
||||
analysisData: result.analysisData,
|
||||
error: result.error
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
logger.error('Strategy switch failed', { error });
|
||||
res.status(500).json({ error: 'Strategy switch failed' });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
451
backend/src/services/agenticRAGProcessor.ts
Normal file
451
backend/src/services/agenticRAGProcessor.ts
Normal file
@@ -0,0 +1,451 @@
|
||||
import { logger } from '../utils/logger';
|
||||
import { llmService } from './llmService';
|
||||
import { config } from '../config/env';
|
||||
import { CIMReview } from '../models/types';
|
||||
|
||||
interface AgentStep {
|
||||
name: string;
|
||||
description: string;
|
||||
query: string;
|
||||
validation?: (result: any) => boolean;
|
||||
}
|
||||
|
||||
interface AgenticRAGResult {
|
||||
success: boolean;
|
||||
summary: string;
|
||||
analysisData: CIMReview;
|
||||
reasoningSteps: Array<{
|
||||
step: string;
|
||||
result: any;
|
||||
validation: boolean;
|
||||
}>;
|
||||
processingTime: number;
|
||||
apiCalls: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
class AgenticRAGProcessor {
|
||||
private apiCallCount: number = 0;
|
||||
private reasoningSteps: Array<{ step: string; result: any; validation: boolean }> = [];
|
||||
|
||||
/**
|
||||
* Process CIM document using agentic RAG approach
|
||||
*/
|
||||
async processDocument(text: string, documentId: string): Promise<AgenticRAGResult> {
|
||||
const startTime = Date.now();
|
||||
this.apiCallCount = 0;
|
||||
this.reasoningSteps = [];
|
||||
|
||||
logger.info('Starting agentic RAG processing', { documentId });
|
||||
|
||||
try {
|
||||
// Step 1: Document Understanding Agent
|
||||
const documentUnderstanding = await this.executeStep('document_understanding', {
|
||||
name: 'Document Understanding',
|
||||
description: 'Analyze document structure and extract key information',
|
||||
query: this.buildDocumentUnderstandingQuery(text)
|
||||
});
|
||||
|
||||
// Step 2: Financial Analysis Agent
|
||||
const financialAnalysis = await this.executeStep('financial_analysis', {
|
||||
name: 'Financial Analysis',
|
||||
description: 'Analyze financial performance and metrics',
|
||||
query: this.buildFinancialAnalysisQuery(text, documentUnderstanding.result)
|
||||
});
|
||||
|
||||
// Step 3: Market Analysis Agent
|
||||
const marketAnalysis = await this.executeStep('market_analysis', {
|
||||
name: 'Market Analysis',
|
||||
description: 'Analyze market position and competitive landscape',
|
||||
query: this.buildMarketAnalysisQuery(text, documentUnderstanding.result)
|
||||
});
|
||||
|
||||
// Step 4: Investment Thesis Agent
|
||||
const investmentThesis = await this.executeStep('investment_thesis', {
|
||||
name: 'Investment Thesis',
|
||||
description: 'Develop comprehensive investment thesis',
|
||||
query: this.buildInvestmentThesisQuery(text, {
|
||||
documentUnderstanding: documentUnderstanding.result,
|
||||
financialAnalysis: financialAnalysis.result,
|
||||
marketAnalysis: marketAnalysis.result
|
||||
})
|
||||
});
|
||||
|
||||
// Step 5: Synthesis Agent
|
||||
const synthesis = await this.executeStep('synthesis', {
|
||||
name: 'Synthesis',
|
||||
description: 'Synthesize all analysis into BPCP template format',
|
||||
query: this.buildSynthesisQuery({
|
||||
documentUnderstanding: documentUnderstanding.result,
|
||||
financialAnalysis: financialAnalysis.result,
|
||||
marketAnalysis: marketAnalysis.result,
|
||||
investmentThesis: investmentThesis.result
|
||||
})
|
||||
});
|
||||
|
||||
// Step 6: Quality Validation Agent
|
||||
const validation = await this.executeStep('validation', {
|
||||
name: 'Quality Validation',
|
||||
description: 'Validate completeness and quality of analysis',
|
||||
query: this.buildValidationQuery(synthesis.result),
|
||||
validation: this.validateBPCPTemplate
|
||||
});
|
||||
|
||||
const processingTime = Date.now() - startTime;
|
||||
|
||||
logger.info('Agentic RAG processing completed', {
|
||||
documentId,
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount,
|
||||
stepsCompleted: this.reasoningSteps.length
|
||||
});
|
||||
|
||||
return {
|
||||
success: validation.validation,
|
||||
summary: this.convertToMarkdown(synthesis.result),
|
||||
analysisData: synthesis.result,
|
||||
reasoningSteps: this.reasoningSteps,
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount,
|
||||
error: validation.validation ? undefined : 'Quality validation failed'
|
||||
};
|
||||
|
||||
} catch (error) {
|
||||
const processingTime = Date.now() - startTime;
|
||||
logger.error('Agentic RAG processing failed', {
|
||||
documentId,
|
||||
error: error instanceof Error ? error.message : 'Unknown error',
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount
|
||||
});
|
||||
|
||||
return {
|
||||
success: false,
|
||||
summary: '',
|
||||
analysisData: {} as CIMReview,
|
||||
reasoningSteps: this.reasoningSteps,
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount,
|
||||
error: error instanceof Error ? error.message : 'Unknown error'
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute a reasoning step
|
||||
*/
|
||||
private async executeStep(stepName: string, agentStep: AgentStep): Promise<{ result: any; validation: boolean }> {
|
||||
logger.info(`Executing agent step: ${agentStep.name}`);
|
||||
|
||||
const result = await this.callLLM({
|
||||
prompt: agentStep.query,
|
||||
systemPrompt: this.getAgentSystemPrompt(agentStep.name),
|
||||
maxTokens: 3000,
|
||||
temperature: 0.1
|
||||
});
|
||||
|
||||
if (!result.success) {
|
||||
throw new Error(`Agent step ${agentStep.name} failed: ${result.error}`);
|
||||
}
|
||||
|
||||
let parsedResult;
|
||||
try {
|
||||
parsedResult = JSON.parse(result.content);
|
||||
} catch (error) {
|
||||
// Try to extract JSON from response
|
||||
const jsonMatch = result.content.match(/\{[\s\S]*\}/);
|
||||
if (jsonMatch) {
|
||||
parsedResult = JSON.parse(jsonMatch[0]);
|
||||
} else {
|
||||
throw new Error(`Failed to parse JSON from agent step ${agentStep.name}`);
|
||||
}
|
||||
}
|
||||
|
||||
const validation = agentStep.validation ? agentStep.validation(parsedResult) : true;
|
||||
|
||||
this.reasoningSteps.push({
|
||||
step: agentStep.name,
|
||||
result: parsedResult,
|
||||
validation
|
||||
});
|
||||
|
||||
logger.info(`Agent step ${agentStep.name} completed`, {
|
||||
validation,
|
||||
resultKeys: Object.keys(parsedResult)
|
||||
});
|
||||
|
||||
return { result: parsedResult, validation };
|
||||
}
|
||||
|
||||
/**
|
||||
* Build document understanding query
|
||||
*/
|
||||
private buildDocumentUnderstandingQuery(text: string): string {
|
||||
return `
|
||||
You are an expert investment analyst at BPCP. Your task is to understand and extract key information from this CIM document.
|
||||
|
||||
CIM Document:
|
||||
${text.substring(0, 50000)}
|
||||
|
||||
Please analyze the document and provide:
|
||||
1. Company overview and basic information
|
||||
2. Document structure and key sections
|
||||
3. Initial financial highlights
|
||||
4. Key business metrics
|
||||
5. Market positioning
|
||||
|
||||
Return your analysis as a structured JSON object with these fields:
|
||||
{
|
||||
"companyOverview": {
|
||||
"name": "Company name",
|
||||
"industry": "Primary industry/sector",
|
||||
"location": "Headquarters location",
|
||||
"founded": "Year founded",
|
||||
"employees": "Number of employees"
|
||||
},
|
||||
"documentStructure": {
|
||||
"sections": ["list of main sections"],
|
||||
"pageCount": "Estimated page count",
|
||||
"keyTopics": ["main topics covered"]
|
||||
},
|
||||
"financialHighlights": {
|
||||
"revenue": "Latest revenue figure",
|
||||
"ebitda": "Latest EBITDA figure",
|
||||
"growth": "Revenue growth trend",
|
||||
"margins": "Key margin metrics"
|
||||
},
|
||||
"businessMetrics": {
|
||||
"customers": "Key customer information",
|
||||
"products": "Main products/services",
|
||||
"geography": "Geographic presence"
|
||||
},
|
||||
"marketPosition": {
|
||||
"marketSize": "Addressable market size",
|
||||
"competitors": "Key competitors",
|
||||
"advantages": "Competitive advantages"
|
||||
}
|
||||
}
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build financial analysis query
|
||||
*/
|
||||
private buildFinancialAnalysisQuery(text: string, documentUnderstanding: any): string {
|
||||
return `
|
||||
You are a financial analyst at BPCP. Based on the document understanding and CIM content, perform detailed financial analysis.
|
||||
|
||||
Document Understanding:
|
||||
${JSON.stringify(documentUnderstanding, null, 2)}
|
||||
|
||||
CIM Document:
|
||||
${text}
|
||||
|
||||
Please provide comprehensive financial analysis including:
|
||||
1. Historical financial performance (3-5 years)
|
||||
2. Revenue and EBITDA trends
|
||||
3. Margin analysis
|
||||
4. Quality of earnings assessment
|
||||
5. Working capital and cash flow analysis
|
||||
6. Growth drivers and projections
|
||||
|
||||
Return as structured JSON following BPCP financial template format.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build market analysis query
|
||||
*/
|
||||
private buildMarketAnalysisQuery(text: string, documentUnderstanding: any): string {
|
||||
return `
|
||||
You are a market analyst at BPCP. Analyze the market position and competitive landscape.
|
||||
|
||||
Document Understanding:
|
||||
${JSON.stringify(documentUnderstanding, null, 2)}
|
||||
|
||||
CIM Document:
|
||||
${text}
|
||||
|
||||
Please provide market analysis including:
|
||||
1. Market size and growth rates
|
||||
2. Competitive landscape
|
||||
3. Barriers to entry
|
||||
4. Industry trends and drivers
|
||||
5. Company's market position
|
||||
6. Competitive advantages
|
||||
|
||||
Return as structured JSON following BPCP market analysis format.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build investment thesis query
|
||||
*/
|
||||
private buildInvestmentThesisQuery(text: string, previousAnalysis: any): string {
|
||||
return `
|
||||
You are an investment professional at BPCP. Develop a comprehensive investment thesis based on all previous analysis.
|
||||
|
||||
Previous Analysis:
|
||||
${JSON.stringify(previousAnalysis, null, 2)}
|
||||
|
||||
CIM Document:
|
||||
${text}
|
||||
|
||||
Please develop investment thesis including:
|
||||
1. Key investment attractions
|
||||
2. Potential risks and concerns
|
||||
3. Value creation opportunities
|
||||
4. Alignment with BPCP strategy
|
||||
5. Preliminary recommendation
|
||||
6. Key diligence areas
|
||||
|
||||
Return as structured JSON following BPCP investment thesis format.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build synthesis query
|
||||
*/
|
||||
private buildSynthesisQuery(allAnalysis: any): string {
|
||||
return `
|
||||
You are a senior investment analyst at BPCP. Synthesize all previous analysis into the complete BPCP CIM Review Template format.
|
||||
|
||||
All Previous Analysis:
|
||||
${JSON.stringify(allAnalysis, null, 2)}
|
||||
|
||||
Please create the complete BPCP CIM Review Template with all sections:
|
||||
- Deal Overview
|
||||
- Business Description
|
||||
- Market & Industry Analysis
|
||||
- Financial Summary
|
||||
- Management Team Overview
|
||||
- Preliminary Investment Thesis
|
||||
- Key Questions & Next Steps
|
||||
|
||||
Ensure all fields are populated with the best available information from the analysis.
|
||||
Use "Not specified in CIM" for any missing information.
|
||||
|
||||
Return as a complete JSON object following the exact BPCP template structure.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build validation query
|
||||
*/
|
||||
private buildValidationQuery(synthesis: any): string {
|
||||
return `
|
||||
You are a quality assurance specialist at BPCP. Validate the completeness and quality of this CIM analysis.
|
||||
|
||||
Synthesis Result:
|
||||
${JSON.stringify(synthesis, null, 2)}
|
||||
|
||||
Please validate:
|
||||
1. All required BPCP template sections are present
|
||||
2. All fields have appropriate values (not empty or placeholder)
|
||||
3. Financial data is consistent and reasonable
|
||||
4. Analysis is comprehensive and professional
|
||||
5. Investment thesis is well-developed
|
||||
6. No critical information is missing
|
||||
|
||||
Return validation result as JSON:
|
||||
{
|
||||
"isValid": true/false,
|
||||
"issues": ["list of any issues found"],
|
||||
"completeness": "percentage complete",
|
||||
"quality": "high/medium/low"
|
||||
}
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate BPCP template completeness
|
||||
*/
|
||||
private validateBPCPTemplate(result: any): boolean {
|
||||
if (!result.isValid) return false;
|
||||
|
||||
const requiredSections = [
|
||||
'dealOverview', 'businessDescription', 'marketIndustryAnalysis',
|
||||
'financialSummary', 'managementTeamOverview', 'preliminaryInvestmentThesis',
|
||||
'keyQuestionsNextSteps'
|
||||
];
|
||||
|
||||
return requiredSections.every(section =>
|
||||
result[section] && typeof result[section] === 'object'
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get system prompt for agent step
|
||||
*/
|
||||
private getAgentSystemPrompt(stepName: string): string {
|
||||
const basePrompt = `You are an expert investment analyst at BPCP (Blue Point Capital Partners). Your role is to analyze CIM documents and provide high-quality, structured analysis.
|
||||
|
||||
Key Requirements:
|
||||
1. **JSON OUTPUT ONLY**: Return valid JSON objects
|
||||
2. **BPCP FOCUS**: Focus on companies in 5+MM EBITDA range in consumer and industrial markets
|
||||
3. **PROFESSIONAL QUALITY**: Provide investment-grade analysis
|
||||
4. **COMPREHENSIVE**: Cover all relevant aspects thoroughly
|
||||
5. **STRUCTURED**: Follow BPCP template formats exactly`;
|
||||
|
||||
const stepPrompts: Record<string, string> = {
|
||||
document_understanding: `${basePrompt}
|
||||
|
||||
You are analyzing document structure and extracting key information. Be thorough in identifying all relevant details.`,
|
||||
|
||||
financial_analysis: `${basePrompt}
|
||||
|
||||
You are performing detailed financial analysis. Focus on trends, quality of earnings, and key financial metrics.`,
|
||||
|
||||
market_analysis: `${basePrompt}
|
||||
|
||||
You are analyzing market position and competitive landscape. Focus on market size, growth, and competitive advantages.`,
|
||||
|
||||
investment_thesis: `${basePrompt}
|
||||
|
||||
You are developing investment thesis. Focus on value creation opportunities and alignment with BPCP strategy.`,
|
||||
|
||||
synthesis: `${basePrompt}
|
||||
|
||||
You are creating the final BPCP CIM Review Template. Ensure all sections are complete and professional.`,
|
||||
|
||||
validation: `${basePrompt}
|
||||
|
||||
You are validating analysis quality. Be thorough in identifying any issues or missing information.`
|
||||
};
|
||||
|
||||
return stepPrompts[stepName] || basePrompt;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert analysis to markdown
|
||||
*/
|
||||
private convertToMarkdown(analysis: CIMReview): string {
|
||||
// Use existing markdown conversion logic
|
||||
return `# BPCP CIM Review: ${analysis.dealOverview?.targetCompanyName || 'Unknown Company'}
|
||||
|
||||
## Deal Overview
|
||||
- **Company**: ${analysis.dealOverview?.targetCompanyName || 'N/A'}
|
||||
- **Industry**: ${analysis.dealOverview?.industrySector || 'N/A'}
|
||||
- **Location**: ${analysis.dealOverview?.geography || 'N/A'}
|
||||
|
||||
## Financial Summary
|
||||
- **Revenue**: ${analysis.financialSummary?.financials?.ltm?.revenue || 'N/A'}
|
||||
- **EBITDA**: ${analysis.financialSummary?.financials?.ltm?.ebitda || 'N/A'}
|
||||
|
||||
## Investment Thesis
|
||||
- **Key Attractions**: ${analysis.preliminaryInvestmentThesis?.keyAttractions || 'N/A'}
|
||||
- **Risks**: ${analysis.preliminaryInvestmentThesis?.potentialRisks || 'N/A'}
|
||||
|
||||
*Full analysis available in structured format*`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Wrapper for LLM calls
|
||||
*/
|
||||
private async callLLM(request: any): Promise<any> {
|
||||
this.apiCallCount++;
|
||||
return await llmService.callLLM(request);
|
||||
}
|
||||
}
|
||||
|
||||
export const agenticRAGProcessor = new AgenticRAGProcessor();
|
||||
File diff suppressed because it is too large
Load Diff
86
backend/src/services/llmSchemas.ts
Normal file
86
backend/src/services/llmSchemas.ts
Normal file
@@ -0,0 +1,86 @@
|
||||
import { z } from 'zod';
|
||||
|
||||
// Schema for the BPCP CIM Review Template
|
||||
export const cimReviewSchema = z.object({
|
||||
dealOverview: z.object({
|
||||
targetCompanyName: z.string().describe("Target Company Name"),
|
||||
industrySector: z.string().describe("Industry/Sector"),
|
||||
geography: z.string().describe("Geography (HQ & Key Operations)"),
|
||||
dealSource: z.string().describe("Deal Source"),
|
||||
transactionType: z.string().describe("Transaction Type"),
|
||||
dateCIMReceived: z.string().describe("Date CIM Received"),
|
||||
dateReviewed: z.string().describe("Date Reviewed"),
|
||||
reviewers: z.string().describe("Reviewer(s)"),
|
||||
cimPageCount: z.string().describe("CIM Page Count"),
|
||||
statedReasonForSale: z.string().describe("Stated Reason for Sale (if provided)"),
|
||||
}).describe("Deal Overview section"),
|
||||
|
||||
businessDescription: z.object({
|
||||
coreOperationsSummary: z.string().describe("Core Operations Summary (3-5 sentences)"),
|
||||
keyProductsServices: z.string().describe("Key Products/Services & Revenue Mix (Est. % if available)"),
|
||||
uniqueValueProposition: z.string().describe("Unique Value Proposition (UVP) / Why Customers Buy"),
|
||||
customerBaseOverview: z.object({
|
||||
keyCustomerSegments: z.string().describe("Key Customer Segments/Types"),
|
||||
customerConcentrationRisk: z.string().describe("Customer Concentration Risk (Top 5 and/or Top 10 Customers as % Revenue - if stated/inferable)"),
|
||||
typicalContractLength: z.string().describe("Typical Contract Length / Recurring Revenue % (if applicable)"),
|
||||
}),
|
||||
keySupplierOverview: z.object({
|
||||
dependenceConcentrationRisk: z.string().describe("Dependence/Concentration Risk"),
|
||||
}),
|
||||
}).describe("Business Description section"),
|
||||
|
||||
marketIndustryAnalysis: z.object({
|
||||
estimatedMarketSize: z.string().describe("Estimated Market Size (TAM/SAM - if provided)"),
|
||||
estimatedMarketGrowthRate: z.string().describe("Estimated Market Growth Rate (% CAGR - Historical & Projected)"),
|
||||
keyIndustryTrends: z.string().describe("Key Industry Trends & Drivers (Tailwinds/Headwinds)"),
|
||||
competitiveLandscape: z.object({
|
||||
keyCompetitors: z.string().describe("Key Competitors Identified"),
|
||||
targetMarketPosition: z.string().describe("Target's Stated Market Position/Rank"),
|
||||
basisOfCompetition: z.string().describe("Basis of Competition"),
|
||||
}),
|
||||
barriersToEntry: z.string().describe("Barriers to Entry / Competitive Moat (Stated/Inferred)"),
|
||||
}).describe("Market & Industry Analysis section"),
|
||||
|
||||
financialSummary: z.object({
|
||||
financials: z.object({
|
||||
years: z.array(z.string()).describe("Array of years: ['FY-3', 'FY-2', 'FY-1', 'LTM']"),
|
||||
metrics: z.array(z.object({
|
||||
metric: z.string().describe("Metric name (e.g., 'Revenue', 'Revenue Growth (%)', 'Gross Profit', 'Gross Margin (%)', 'EBITDA', 'EBITDA Margin (%)')"),
|
||||
fy3: z.string().describe("Value for FY-3"),
|
||||
fy2: z.string().describe("Value for FY-2"),
|
||||
fy1: z.string().describe("Value for FY-1"),
|
||||
ltm: z.string().describe("Value for LTM"),
|
||||
})).describe("Array of financial metrics with values for each year"),
|
||||
}),
|
||||
qualityOfEarnings: z.string().describe("Quality of earnings/adjustments impression"),
|
||||
revenueGrowthDrivers: z.string().describe("Revenue growth drivers (stated)"),
|
||||
marginStabilityAnalysis: z.string().describe("Margin stability/trend analysis"),
|
||||
capitalExpenditures: z.string().describe("Capital expenditures (LTM % of revenue)"),
|
||||
workingCapitalIntensity: z.string().describe("Working capital intensity impression"),
|
||||
freeCashFlowQuality: z.string().describe("Free cash flow quality impression"),
|
||||
}).describe("Financial Summary section"),
|
||||
|
||||
managementTeamOverview: z.object({
|
||||
keyLeaders: z.string().describe("Key Leaders Identified (CEO, CFO, COO, Head of Sales, etc.)"),
|
||||
managementQualityAssessment: z.string().describe("Initial Assessment of Quality/Experience (Based on Bios)"),
|
||||
postTransactionIntentions: z.string().describe("Management's Stated Post-Transaction Role/Intentions (if mentioned)"),
|
||||
organizationalStructure: z.string().describe("Organizational Structure Overview (Impression)"),
|
||||
}).describe("Management Team Overview section"),
|
||||
|
||||
preliminaryInvestmentThesis: z.object({
|
||||
keyAttractions: z.string().describe("Key Attractions / Strengths (Why Invest?)"),
|
||||
potentialRisks: z.string().describe("Potential Risks / Concerns (Why Not Invest?)"),
|
||||
valueCreationLevers: z.string().describe("Initial Value Creation Levers (How PE Adds Value)"),
|
||||
alignmentWithFundStrategy: z.string().describe("Alignment with Fund Strategy"),
|
||||
}).describe("Preliminary Investment Thesis section"),
|
||||
|
||||
keyQuestionsNextSteps: z.object({
|
||||
criticalQuestions: z.string().describe("Critical Questions Arising from CIM Review"),
|
||||
missingInformation: z.string().describe("Key Missing Information / Areas for Diligence Focus"),
|
||||
preliminaryRecommendation: z.string().describe("Preliminary Recommendation"),
|
||||
rationaleForRecommendation: z.string().describe("Rationale for Recommendation (Brief)"),
|
||||
proposedNextSteps: z.string().describe("Proposed Next Steps"),
|
||||
}).describe("Key Questions & Next Steps section"),
|
||||
});
|
||||
|
||||
export type CIMReview = z.infer<typeof cimReviewSchema>;
|
||||
@@ -1,5 +1,7 @@
|
||||
import { config } from '../config/env';
|
||||
import { logger } from '../utils/logger';
|
||||
import { z } from 'zod';
|
||||
import { CIMReview, cimReviewSchema } from './llmSchemas';
|
||||
|
||||
export interface LLMRequest {
|
||||
prompt: string;
|
||||
@@ -21,23 +23,14 @@ export interface LLMResponse {
|
||||
}
|
||||
|
||||
export interface CIMAnalysisResult {
|
||||
part1: {
|
||||
dealOverview: Record<string, string>;
|
||||
businessDescription: Record<string, string>;
|
||||
marketAnalysis: Record<string, string>;
|
||||
financialOverview: Record<string, string>;
|
||||
competitiveLandscape: Record<string, string>;
|
||||
investmentThesis: Record<string, string>;
|
||||
keyQuestions: Record<string, string>;
|
||||
};
|
||||
part2: {
|
||||
keyInvestmentConsiderations: string[];
|
||||
diligenceAreas: string[];
|
||||
riskFactors: string[];
|
||||
valueCreationOpportunities: string[];
|
||||
};
|
||||
summary: string;
|
||||
markdownOutput: string;
|
||||
success: boolean;
|
||||
jsonOutput?: CIMReview;
|
||||
error?: string;
|
||||
model: string;
|
||||
cost: number;
|
||||
inputTokens: number;
|
||||
outputTokens: number;
|
||||
validationIssues?: z.ZodIssue[];
|
||||
}
|
||||
|
||||
class LLMService {
|
||||
@@ -55,7 +48,7 @@ class LLMService {
|
||||
|
||||
// Set the correct default model based on provider
|
||||
if (this.provider === 'anthropic') {
|
||||
this.defaultModel = 'claude-3-5-sonnet-20241022';
|
||||
this.defaultModel = 'claude-3-opus-20240229';
|
||||
} else {
|
||||
this.defaultModel = config.llm.model;
|
||||
}
|
||||
@@ -65,133 +58,98 @@ class LLMService {
|
||||
}
|
||||
|
||||
/**
|
||||
* Process CIM document with intelligent model selection
|
||||
* Process CIM document with intelligent model selection and self-correction
|
||||
*/
|
||||
async processCIMDocument(text: string, template: string, analysis?: Record<string, any>): Promise<any> {
|
||||
try {
|
||||
logger.info('Starting CIM document processing with LLM');
|
||||
|
||||
// Determine task complexity and select appropriate model
|
||||
const taskComplexity = this.determineTaskComplexity(text, analysis || {});
|
||||
const estimatedTokens = this.estimateTokenCount(text + template);
|
||||
const selectedModel = this.selectModel(taskComplexity, estimatedTokens);
|
||||
|
||||
logger.info('Model selection completed', {
|
||||
taskComplexity,
|
||||
estimatedTokens,
|
||||
selectedModel,
|
||||
estimatedCost: this.estimateCost(estimatedTokens, selectedModel)
|
||||
});
|
||||
async processCIMDocument(text: string, template: string, analysis?: Record<string, any>): Promise<CIMAnalysisResult> {
|
||||
logger.info('Starting CIM document processing with LLM');
|
||||
|
||||
const taskComplexity = this.determineTaskComplexity(text, analysis || {});
|
||||
const estimatedTokens = this.estimateTokenCount(text + template);
|
||||
const selectedModel = this.selectModel(taskComplexity, estimatedTokens);
|
||||
|
||||
logger.info('Model selection completed', { taskComplexity, estimatedTokens, selectedModel });
|
||||
|
||||
// Check if this is a refinement request
|
||||
const isRefinement = analysis?.['refinementMode'] === true;
|
||||
|
||||
// Try up to 3 times with different approaches
|
||||
let lastError: Error | null = null;
|
||||
|
||||
for (let attempt = 1; attempt <= 3; attempt++) {
|
||||
try {
|
||||
logger.info(`LLM processing attempt ${attempt}/3`);
|
||||
|
||||
// Build the prompt (enhanced for retry attempts)
|
||||
const prompt = isRefinement
|
||||
? this.buildRefinementPrompt(text, template)
|
||||
: this.buildCIMPrompt(text, template, attempt);
|
||||
|
||||
const systemPrompt = isRefinement
|
||||
? this.getRefinementSystemPrompt()
|
||||
: this.getCIMSystemPrompt();
|
||||
|
||||
const response = await this.callLLM({
|
||||
prompt,
|
||||
systemPrompt,
|
||||
const isRefinement = analysis?.['refinementMode'] === true;
|
||||
const isOverview = analysis?.['overviewMode'] === true;
|
||||
const isSynthesis = analysis?.['synthesisMode'] === true;
|
||||
const sectionType = analysis?.['sectionType'] as string;
|
||||
|
||||
let lastError: Error | null = null;
|
||||
|
||||
for (let attempt = 1; attempt <= 3; attempt++) {
|
||||
try {
|
||||
logger.info(`LLM processing attempt ${attempt}/3`);
|
||||
|
||||
let prompt: string;
|
||||
let systemPrompt: string;
|
||||
|
||||
if (isOverview) {
|
||||
prompt = this.buildOverviewPrompt(text, template);
|
||||
systemPrompt = this.getOverviewSystemPrompt();
|
||||
} else if (isSynthesis) {
|
||||
prompt = this.buildSynthesisPrompt(text, template);
|
||||
systemPrompt = this.getSynthesisSystemPrompt();
|
||||
} else if (sectionType) {
|
||||
prompt = this.buildSectionPrompt(text, template, sectionType, analysis);
|
||||
systemPrompt = this.getSectionSystemPrompt(sectionType);
|
||||
} else if (isRefinement) {
|
||||
prompt = this.buildRefinementPrompt(text, template);
|
||||
systemPrompt = this.getRefinementSystemPrompt();
|
||||
} else {
|
||||
prompt = this.buildCIMPrompt(text, template, lastError ? lastError.message : undefined);
|
||||
systemPrompt = this.getCIMSystemPrompt();
|
||||
}
|
||||
|
||||
const response = await this.callLLM({
|
||||
prompt,
|
||||
systemPrompt,
|
||||
model: selectedModel,
|
||||
maxTokens: config.llm.maxTokens,
|
||||
temperature: config.llm.temperature,
|
||||
});
|
||||
|
||||
if (!response.success) {
|
||||
throw new Error('LLM processing failed');
|
||||
}
|
||||
|
||||
const jsonOutput = this.extractJsonFromResponse(response.content);
|
||||
const validation = cimReviewSchema.safeParse(jsonOutput);
|
||||
|
||||
if (validation.success) {
|
||||
logger.info(`CIM document processing completed successfully on attempt ${attempt}`);
|
||||
return {
|
||||
success: true,
|
||||
jsonOutput: validation.data,
|
||||
model: selectedModel,
|
||||
maxTokens: config.llm.maxTokens,
|
||||
temperature: config.llm.temperature,
|
||||
});
|
||||
|
||||
if (!response.success) {
|
||||
throw new Error('LLM processing failed');
|
||||
}
|
||||
|
||||
const markdownOutput = this.extractMarkdownFromResponse(response.content);
|
||||
|
||||
// Validate the output (only for non-refinement requests)
|
||||
if (!isRefinement) {
|
||||
const validation = this.validateCIMOutput(markdownOutput);
|
||||
|
||||
if (validation.isValid) {
|
||||
logger.info('CIM document processing completed successfully', {
|
||||
model: selectedModel,
|
||||
inputTokens: estimatedTokens,
|
||||
outputLength: markdownOutput.length,
|
||||
actualCost: this.estimateCost(estimatedTokens + markdownOutput.length, selectedModel),
|
||||
attempt
|
||||
});
|
||||
|
||||
return {
|
||||
markdownOutput,
|
||||
model: selectedModel,
|
||||
cost: this.estimateCost(estimatedTokens + markdownOutput.length, selectedModel),
|
||||
inputTokens: estimatedTokens,
|
||||
outputTokens: markdownOutput.length,
|
||||
};
|
||||
} else {
|
||||
logger.warn(`LLM output validation failed on attempt ${attempt}`, {
|
||||
issues: validation.issues,
|
||||
outputLength: markdownOutput.length
|
||||
});
|
||||
|
||||
// If this is the last attempt, return the best we have
|
||||
if (attempt === 3) {
|
||||
logger.warn('Using suboptimal output after 3 failed attempts', {
|
||||
issues: validation.issues
|
||||
});
|
||||
return {
|
||||
markdownOutput,
|
||||
model: selectedModel,
|
||||
cost: this.estimateCost(estimatedTokens + markdownOutput.length, selectedModel),
|
||||
inputTokens: estimatedTokens,
|
||||
outputTokens: markdownOutput.length,
|
||||
validationIssues: validation.issues
|
||||
};
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// For refinement requests, return immediately
|
||||
logger.info('CIM document refinement completed successfully', {
|
||||
model: selectedModel,
|
||||
inputTokens: estimatedTokens,
|
||||
outputLength: markdownOutput.length,
|
||||
actualCost: this.estimateCost(estimatedTokens + markdownOutput.length, selectedModel)
|
||||
});
|
||||
|
||||
cost: this.estimateCost(estimatedTokens + response.content.length, selectedModel),
|
||||
inputTokens: estimatedTokens,
|
||||
outputTokens: response.content.length,
|
||||
};
|
||||
} else {
|
||||
lastError = new Error(`JSON validation failed: ${validation.error.errors.map(e => e.message).join(', ')}`);
|
||||
logger.warn(`LLM output validation failed on attempt ${attempt}`, { issues: validation.error.errors });
|
||||
if (attempt === 3) {
|
||||
return {
|
||||
markdownOutput,
|
||||
success: false,
|
||||
error: 'Failed to generate valid JSON after 3 attempts.',
|
||||
model: selectedModel,
|
||||
cost: this.estimateCost(estimatedTokens + markdownOutput.length, selectedModel),
|
||||
cost: this.estimateCost(estimatedTokens, selectedModel),
|
||||
inputTokens: estimatedTokens,
|
||||
outputTokens: markdownOutput.length,
|
||||
outputTokens: 0,
|
||||
validationIssues: validation.error.errors,
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
lastError = error instanceof Error ? error : new Error('Unknown error');
|
||||
logger.error(`LLM processing attempt ${attempt} failed`, {
|
||||
error: lastError.message,
|
||||
attempt
|
||||
});
|
||||
|
||||
if (attempt === 3) {
|
||||
throw lastError;
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
lastError = error instanceof Error ? error : new Error('Unknown error');
|
||||
logger.error(`LLM processing attempt ${attempt} failed`, { error: lastError.message });
|
||||
if (attempt === 3) {
|
||||
throw lastError;
|
||||
}
|
||||
}
|
||||
|
||||
throw lastError || new Error('All LLM processing attempts failed');
|
||||
} catch (error) {
|
||||
logger.error('CIM document processing failed', error);
|
||||
throw error;
|
||||
}
|
||||
|
||||
throw lastError || new Error('All LLM processing attempts failed');
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -199,13 +157,22 @@ class LLMService {
|
||||
*/
|
||||
private async callLLM(request: LLMRequest): Promise<LLMResponse> {
|
||||
try {
|
||||
if (this.provider === 'openai') {
|
||||
return await this.callOpenAI(request);
|
||||
} else if (this.provider === 'anthropic') {
|
||||
return await this.callAnthropic(request);
|
||||
} else {
|
||||
throw new Error(`Unsupported LLM provider: ${this.provider}`);
|
||||
}
|
||||
// Add a timeout wrapper to prevent hanging
|
||||
const timeoutPromise = new Promise<LLMResponse>((_, reject) => {
|
||||
setTimeout(() => reject(new Error('LLM call timeout after 3 minutes')), 180000);
|
||||
});
|
||||
|
||||
const llmPromise = (async () => {
|
||||
if (this.provider === 'openai') {
|
||||
return await this.callOpenAI(request);
|
||||
} else if (this.provider === 'anthropic') {
|
||||
return await this.callAnthropic(request);
|
||||
} else {
|
||||
throw new Error(`Unsupported LLM provider: ${this.provider}`);
|
||||
}
|
||||
})();
|
||||
|
||||
return await Promise.race([llmPromise, timeoutPromise]);
|
||||
} catch (error) {
|
||||
logger.error('LLM API call failed', error);
|
||||
return {
|
||||
@@ -224,6 +191,7 @@ class LLMService {
|
||||
|
||||
const openai = new OpenAI({
|
||||
apiKey: this.apiKey,
|
||||
timeout: 120000, // 2 minute timeout
|
||||
});
|
||||
|
||||
const messages: any[] = [];
|
||||
@@ -270,6 +238,7 @@ class LLMService {
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
apiKey: this.apiKey,
|
||||
timeout: 120000, // 2 minute timeout
|
||||
});
|
||||
|
||||
const message = await anthropic.messages.create({
|
||||
@@ -307,161 +276,188 @@ class LLMService {
|
||||
* Get CIM system prompt
|
||||
*/
|
||||
private getCIMSystemPrompt(): string {
|
||||
return `You are an expert financial analyst specializing in CIM (Confidential Information Memorandum) analysis. Your task is to analyze CIM documents and provide comprehensive, structured summaries that follow the BPCP CIM Review Template format EXACTLY.
|
||||
return `You are an expert investment analyst at BPCP (Blue Point Capital Partners) reviewing a Confidential Information Memorandum (CIM). Your task is to analyze CIM documents and return a comprehensive, structured JSON object that follows the BPCP CIM Review Template format EXACTLY.
|
||||
|
||||
CRITICAL REQUIREMENTS:
|
||||
1. **COMPLETE ALL SECTIONS**: You MUST include ALL 7 sections: (A) Deal Overview, (B) Business Description, (C) Market & Industry Analysis, (D) Financial Summary, (E) Management Team Overview, (F) Preliminary Investment Thesis, (G) Key Questions & Next Steps
|
||||
2. **EXACT TEMPLATE FORMAT**: Use the exact field names, formatting, and structure from the BPCP template
|
||||
3. **FINANCIAL TABLE**: Include the complete financial table with proper markdown table formatting
|
||||
4. **NO INCOMPLETE SECTIONS**: Every section must be complete - do not cut off mid-sentence or leave sections unfinished
|
||||
5. **PROFESSIONAL QUALITY**: Maintain high-quality financial analysis standards
|
||||
6. **COMPREHENSIVE COVERAGE**: Extract and include ALL relevant information from the CIM document
|
||||
7. **DEFAULT VALUES**: Use "Not specified in CIM" for any fields where information is not provided
|
||||
8. **STRUCTURED OUTPUT**: Ensure the output can be parsed by structured parsing tools
|
||||
|
||||
OUTPUT FORMAT:
|
||||
- Start with "---" and end with "---"
|
||||
- Use exact section headers: "**(A) Deal Overview**", "**(B) Business Description**", etc.
|
||||
- Use exact field names with backticks: \`Target Company Name:\`, \`Industry/Sector:\`, etc.
|
||||
- Include the complete financial table with proper markdown formatting
|
||||
- Ensure all sections are complete and properly formatted
|
||||
|
||||
IMPORTANT: Your response MUST be complete and follow the template structure exactly. Do not truncate or leave sections incomplete.`;
|
||||
1. **JSON OUTPUT ONLY**: Your entire response MUST be a single, valid JSON object. Do not include any text or explanation before or after the JSON object.
|
||||
2. **BPCP TEMPLATE FORMAT**: The JSON object MUST follow the BPCP CIM Review Template structure exactly as specified.
|
||||
3. **COMPLETE ALL FIELDS**: You MUST provide a value for every field. Use "Not specified in CIM" for any information that is not available in the document.
|
||||
4. **NO PLACEHOLDERS**: Do not use placeholders like "..." or "TBD".
|
||||
5. **PROFESSIONAL ANALYSIS**: The content should be high-quality and suitable for BPCP's investment committee.
|
||||
6. **BPCP FOCUS**: Focus on companies in 5+MM EBITDA range in consumer and industrial end markets, with emphasis on M&A, technology & data usage, supply chain and human capital optimization.
|
||||
7. **BPCP PREFERENCES**: BPCP prefers companies which are founder/family-owned and within driving distance of Cleveland and Charlotte.
|
||||
8. **EXACT FIELD NAMES**: Use the exact field names and descriptions from the BPCP CIM Review Template.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build CIM prompt from text and template
|
||||
* Build CIM prompt from text and template, with optional error for self-correction
|
||||
*/
|
||||
private buildCIMPrompt(text: string, template: string, attempt: number = 1): string {
|
||||
let strategy = '';
|
||||
|
||||
switch (attempt) {
|
||||
case 1:
|
||||
strategy = `STRATEGY: Comprehensive analysis with all sections. Focus on completeness and accuracy.`;
|
||||
break;
|
||||
case 2:
|
||||
strategy = `STRATEGY: Prioritize structure and formatting. Ensure all sections are present even if some fields are brief. Focus on the template structure first.`;
|
||||
break;
|
||||
case 3:
|
||||
strategy = `STRATEGY: Minimal but complete. Focus on getting all 7 sections with basic information. Use "Not specified in CIM" liberally for missing data. Prioritize structure over detail.`;
|
||||
break;
|
||||
default:
|
||||
strategy = `STRATEGY: Standard comprehensive analysis.`;
|
||||
}
|
||||
|
||||
return `Please analyze the following CIM document and provide a comprehensive summary using the BPCP CIM Review Template format EXACTLY.
|
||||
private buildCIMPrompt(text: string, _template: string, previousError?: string): string {
|
||||
const errorCorrection = previousError
|
||||
? `
|
||||
PREVIOUS ATTEMPT FAILED. The JSON you provided was invalid.
|
||||
Here are the errors:
|
||||
${previousError}
|
||||
|
||||
${strategy}
|
||||
Please correct these errors and generate a new, valid JSON object. Pay close attention to the required structure and data types.
|
||||
`
|
||||
: '';
|
||||
|
||||
Document Text:
|
||||
const jsonTemplate = `{
|
||||
"dealOverview": {
|
||||
"targetCompanyName": "Target Company Name",
|
||||
"industrySector": "Industry/Sector",
|
||||
"geography": "Geography (HQ & Key Operations)",
|
||||
"dealSource": "Deal Source",
|
||||
"transactionType": "Transaction Type",
|
||||
"dateCIMReceived": "Date CIM Received",
|
||||
"dateReviewed": "Date Reviewed",
|
||||
"reviewers": "Reviewer(s)",
|
||||
"cimPageCount": "CIM Page Count",
|
||||
"statedReasonForSale": "Stated Reason for Sale (if provided)"
|
||||
},
|
||||
"businessDescription": {
|
||||
"coreOperationsSummary": "Core Operations Summary (3-5 sentences)",
|
||||
"keyProductsServices": "Key Products/Services & Revenue Mix (Est. % if available)",
|
||||
"uniqueValueProposition": "Unique Value Proposition (UVP) / Why Customers Buy",
|
||||
"customerBaseOverview": {
|
||||
"keyCustomerSegments": "Key Customer Segments/Types",
|
||||
"customerConcentrationRisk": "Customer Concentration Risk (Top 5 and/or Top 10 Customers as % Revenue - if stated/inferable)",
|
||||
"typicalContractLength": "Typical Contract Length / Recurring Revenue % (if applicable)"
|
||||
},
|
||||
"keySupplierOverview": {
|
||||
"dependenceConcentrationRisk": "Dependence/Concentration Risk"
|
||||
}
|
||||
},
|
||||
"marketIndustryAnalysis": {
|
||||
"estimatedMarketSize": "Estimated Market Size (TAM/SAM - if provided)",
|
||||
"estimatedMarketGrowthRate": "Estimated Market Growth Rate (% CAGR - Historical & Projected)",
|
||||
"keyIndustryTrends": "Key Industry Trends & Drivers (Tailwinds/Headwinds)",
|
||||
"competitiveLandscape": {
|
||||
"keyCompetitors": "Key Competitors Identified",
|
||||
"targetMarketPosition": "Target's Stated Market Position/Rank",
|
||||
"basisOfCompetition": "Basis of Competition"
|
||||
},
|
||||
"barriersToEntry": "Barriers to Entry / Competitive Moat (Stated/Inferred)"
|
||||
},
|
||||
"financialSummary": {
|
||||
"financials": {
|
||||
"years": ["FY-3", "FY-2", "FY-1", "LTM"],
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "Revenue",
|
||||
"fy3": "Revenue amount for FY-3",
|
||||
"fy2": "Revenue amount for FY-2",
|
||||
"fy1": "Revenue amount for FY-1",
|
||||
"ltm": "Revenue amount for LTM"
|
||||
},
|
||||
{
|
||||
"metric": "Revenue Growth (%)",
|
||||
"fy3": "N/A",
|
||||
"fy2": "Revenue growth % for FY-2",
|
||||
"fy1": "Revenue growth % for FY-1",
|
||||
"ltm": "Revenue growth % for LTM"
|
||||
},
|
||||
{
|
||||
"metric": "Gross Profit",
|
||||
"fy3": "Gross profit amount for FY-3",
|
||||
"fy2": "Gross profit amount for FY-2",
|
||||
"fy1": "Gross profit amount for FY-1",
|
||||
"ltm": "Gross profit amount for LTM"
|
||||
},
|
||||
{
|
||||
"metric": "Gross Margin (%)",
|
||||
"fy3": "Gross margin % for FY-3",
|
||||
"fy2": "Gross margin % for FY-2",
|
||||
"fy1": "Gross margin % for FY-1",
|
||||
"ltm": "Gross margin % for LTM"
|
||||
},
|
||||
{
|
||||
"metric": "EBITDA",
|
||||
"fy3": "EBITDA amount for FY-3",
|
||||
"fy2": "EBITDA amount for FY-2",
|
||||
"fy1": "EBITDA amount for FY-1",
|
||||
"ltm": "EBITDA amount for LTM"
|
||||
},
|
||||
{
|
||||
"metric": "EBITDA Margin (%)",
|
||||
"fy3": "EBITDA margin % for FY-3",
|
||||
"fy2": "EBITDA margin % for FY-2",
|
||||
"fy1": "EBITDA margin % for FY-1",
|
||||
"ltm": "EBITDA margin % for LTM"
|
||||
}
|
||||
]
|
||||
},
|
||||
"qualityOfEarnings": "Quality of earnings/adjustments impression",
|
||||
"revenueGrowthDrivers": "Revenue growth drivers (stated)",
|
||||
"marginStabilityAnalysis": "Margin stability/trend analysis",
|
||||
"capitalExpenditures": "Capital expenditures (LTM % of revenue)",
|
||||
"workingCapitalIntensity": "Working capital intensity impression",
|
||||
"freeCashFlowQuality": "Free cash flow quality impression"
|
||||
},
|
||||
"managementTeamOverview": {
|
||||
"keyLeaders": "Key Leaders Identified (CEO, CFO, COO, Head of Sales, etc.)",
|
||||
"managementQualityAssessment": "Initial Assessment of Quality/Experience (Based on Bios)",
|
||||
"postTransactionIntentions": "Management's Stated Post-Transaction Role/Intentions (if mentioned)",
|
||||
"organizationalStructure": "Organizational Structure Overview (Impression)"
|
||||
},
|
||||
"preliminaryInvestmentThesis": {
|
||||
"keyAttractions": "Key Attractions / Strengths (Why Invest?)",
|
||||
"potentialRisks": "Potential Risks / Concerns (Why Not Invest?)",
|
||||
"valueCreationLevers": "Initial Value Creation Levers (How PE Adds Value)",
|
||||
"alignmentWithFundStrategy": "Alignment with Fund Strategy (BPCP is focused on companies in 5+MM EBITDA range in consumer and industrial end markets. M&A, increased technology & data usage, supply chain and human capital optimization are key value-levers. Also a preference companies which are founder / family-owned and within driving distance of Cleveland and Charlotte.)"
|
||||
},
|
||||
"keyQuestionsNextSteps": {
|
||||
"criticalQuestions": "Critical Questions Arising from CIM Review",
|
||||
"missingInformation": "Key Missing Information / Areas for Diligence Focus",
|
||||
"preliminaryRecommendation": "Preliminary Recommendation",
|
||||
"rationaleForRecommendation": "Rationale for Recommendation (Brief)",
|
||||
"proposedNextSteps": "Proposed Next Steps"
|
||||
}
|
||||
}`;
|
||||
|
||||
return `Please analyze the following CIM document and generate a JSON object based on the provided structure.
|
||||
|
||||
${errorCorrection}
|
||||
|
||||
CIM Document Text:
|
||||
${text}
|
||||
|
||||
BPCP CIM Review Template:
|
||||
${template}
|
||||
|
||||
CRITICAL INSTRUCTIONS:
|
||||
1. **MANDATORY COMPLETION**: You MUST complete ALL 7 sections: (A) Deal Overview, (B) Business Description, (C) Market & Industry Analysis, (D) Financial Summary, (E) Management Team Overview, (F) Preliminary Investment Thesis, (G) Key Questions & Next Steps
|
||||
2. **EXACT TEMPLATE FORMAT**: Use the exact field names, formatting, and structure from the BPCP template
|
||||
3. **FINANCIAL TABLE REQUIRED**: Include the complete financial table with proper markdown table formatting
|
||||
4. **NO TRUNCATION**: Do not cut off mid-sentence or leave sections incomplete
|
||||
5. **COMPREHENSIVE ANALYSIS**: Extract and include ALL relevant information from the CIM document
|
||||
6. **DEFAULT VALUES**: Use "Not specified in CIM" for any fields where information is not provided
|
||||
7. **STRUCTURED OUTPUT**: Ensure the output can be parsed by structured parsing tools
|
||||
8. **PROFESSIONAL QUALITY**: Maintain high-quality financial analysis standards
|
||||
|
||||
OUTPUT REQUIREMENTS:
|
||||
- Start your response with "---" and end with "---"
|
||||
- Use exact section headers: "**(A) Deal Overview**", "**(B) Business Description**", etc.
|
||||
- Use exact field names with backticks: \`Target Company Name:\`, \`Industry/Sector:\`, etc.
|
||||
- Include the complete financial table with proper markdown formatting
|
||||
- Ensure all sections are complete and properly formatted
|
||||
|
||||
IMPORTANT: Your response MUST be complete and follow the template structure exactly. Do not truncate or leave sections incomplete. If you cannot complete all sections due to token limits, prioritize completing fewer sections fully rather than truncating all sections.`;
|
||||
Your response MUST be a single, valid JSON object that follows this exact structure. Do not include any other text.
|
||||
JSON Structure to Follow:
|
||||
\`\`\`json
|
||||
${jsonTemplate}
|
||||
\`\`\`
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract markdown from LLM response
|
||||
* Extract JSON from LLM response
|
||||
*/
|
||||
private extractMarkdownFromResponse(content: string): string {
|
||||
// Look for markdown content between triple backticks
|
||||
const markdownMatch = content.match(/```(?:markdown)?\n([\s\S]*?)\n```/);
|
||||
if (markdownMatch && markdownMatch[1]) {
|
||||
return markdownMatch[1].trim();
|
||||
}
|
||||
|
||||
// If no markdown blocks, return the content as-is
|
||||
return content.trim();
|
||||
}
|
||||
private extractJsonFromResponse(content: string): any {
|
||||
try {
|
||||
// First, try to find JSON within ```json ... ```
|
||||
const jsonMatch = content.match(/```json\n([\s\S]*?)\n```/);
|
||||
if (jsonMatch && jsonMatch[1]) {
|
||||
return JSON.parse(jsonMatch[1]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate LLM output for completeness and proper formatting
|
||||
*/
|
||||
private validateCIMOutput(content: string): { isValid: boolean; issues: string[] } {
|
||||
const issues: string[] = [];
|
||||
|
||||
// Check if content is empty or too short
|
||||
if (!content || content.length < 1000) {
|
||||
issues.push('Output is too short or empty');
|
||||
// If that fails, fall back to finding the first and last curly braces
|
||||
const startIndex = content.indexOf('{');
|
||||
const endIndex = content.lastIndexOf('}');
|
||||
if (startIndex === -1 || endIndex === -1) {
|
||||
return null;
|
||||
}
|
||||
const jsonString = content.substring(startIndex, endIndex + 1);
|
||||
return JSON.parse(jsonString);
|
||||
} catch (error) {
|
||||
logger.error('Failed to parse JSON from LLM response', {
|
||||
content,
|
||||
error: error instanceof Error ? error.message : 'Unknown parsing error'
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
// Check for required sections
|
||||
const requiredSections = [
|
||||
'**(A) Deal Overview**',
|
||||
'**(B) Business Description**',
|
||||
'**(C) Market & Industry Analysis**',
|
||||
'**(D) Financial Summary**',
|
||||
'**(E) Management Team Overview**',
|
||||
'**(F) Preliminary Investment Thesis**',
|
||||
'**(G) Key Questions & Next Steps**'
|
||||
];
|
||||
|
||||
const missingSections = requiredSections.filter(section => !content.includes(section));
|
||||
if (missingSections.length > 0) {
|
||||
issues.push(`Missing required sections: ${missingSections.join(', ')}`);
|
||||
}
|
||||
|
||||
// Check for incomplete sections (sections that end abruptly)
|
||||
const sectionRegex = /\*\*\([A-Z]\)\s+([^*]+)\*\*/g;
|
||||
const sections = Array.from(content.matchAll(sectionRegex));
|
||||
|
||||
if (sections.length < 7) {
|
||||
issues.push(`Only found ${sections.length} sections, expected 7`);
|
||||
}
|
||||
|
||||
// Check for truncation indicators
|
||||
const truncationIndicators = [
|
||||
'Continued in next part',
|
||||
'...',
|
||||
'etc.',
|
||||
'and more',
|
||||
'truncated',
|
||||
'cut off'
|
||||
];
|
||||
|
||||
const hasTruncation = truncationIndicators.some(indicator =>
|
||||
content.toLowerCase().includes(indicator.toLowerCase())
|
||||
);
|
||||
|
||||
if (hasTruncation) {
|
||||
issues.push('Content appears to be truncated');
|
||||
}
|
||||
|
||||
// Check for financial table
|
||||
if (!content.includes('|Metric|') && !content.includes('| Revenue |')) {
|
||||
issues.push('Missing financial table');
|
||||
}
|
||||
|
||||
// Check for proper field formatting
|
||||
const fieldRegex = /`[^`]+:`/g;
|
||||
const fields = content.match(fieldRegex);
|
||||
if (!fields || fields.length < 10) {
|
||||
issues.push('Insufficient field formatting (backticks)');
|
||||
}
|
||||
|
||||
return {
|
||||
isValid: issues.length === 0,
|
||||
issues
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -501,6 +497,7 @@ IMPORTANT: Your response MUST be complete and follow the template structure exac
|
||||
private estimateCost(tokens: number, model: string): number {
|
||||
// Rough cost estimation (in USD per 1M tokens)
|
||||
const costRates: Record<string, { input: number; output: number }> = {
|
||||
'claude-3-opus-20240229': { input: 15, output: 75 },
|
||||
'claude-3-5-sonnet-20241022': { input: 3, output: 15 },
|
||||
'claude-3-5-haiku-20241022': { input: 0.25, output: 1.25 },
|
||||
'gpt-4o': { input: 5, output: 15 },
|
||||
@@ -544,25 +541,23 @@ IMPORTANT: Your response MUST be complete and follow the template structure exac
|
||||
/**
|
||||
* Build refinement prompt for final summary improvement
|
||||
*/
|
||||
private buildRefinementPrompt(text: string, template: string): string {
|
||||
private buildRefinementPrompt(text: string, _template: string): string {
|
||||
return `
|
||||
You are tasked with creating a final, comprehensive CIM (Confidential Information Memorandum) review summary.
|
||||
You are tasked with creating a final, comprehensive CIM review JSON object.
|
||||
|
||||
Below is a combined analysis from multiple document sections. Your job is to:
|
||||
1. **Ensure completeness**: Make sure all fields in the JSON schema are properly filled out.
|
||||
2. **Improve coherence**: Create smooth, logical content within the JSON structure.
|
||||
3. **Remove redundancy**: Eliminate duplicate information.
|
||||
4. **Maintain structure**: Follow the provided JSON schema exactly.
|
||||
|
||||
1. **Ensure completeness**: Make sure all sections are properly filled out with the available information
|
||||
2. **Improve coherence**: Create smooth transitions between sections and ensure logical flow
|
||||
3. **Remove redundancy**: Eliminate duplicate information while preserving all unique insights
|
||||
4. **Maintain structure**: Follow the BPCP CIM Review Template format exactly
|
||||
5. **Enhance clarity**: Improve the clarity and professionalism of the analysis
|
||||
|
||||
**Combined Analysis:**
|
||||
**Combined Analysis (as a JSON object):**
|
||||
${text}
|
||||
|
||||
**Template Structure:**
|
||||
${template}
|
||||
**JSON Schema:**
|
||||
${JSON.stringify(cimReviewSchema.shape, null, 2)}
|
||||
|
||||
Please provide a refined, comprehensive CIM review that incorporates all the information from the combined analysis while ensuring it follows the template structure and maintains high quality throughout.
|
||||
Please provide a refined, comprehensive CIM review as a single, valid JSON object.
|
||||
`;
|
||||
}
|
||||
|
||||
@@ -570,19 +565,116 @@ Please provide a refined, comprehensive CIM review that incorporates all the inf
|
||||
* Get system prompt for refinement mode
|
||||
*/
|
||||
private getRefinementSystemPrompt(): string {
|
||||
return `You are an expert investment analyst specializing in CIM (Confidential Information Memorandum) reviews.
|
||||
|
||||
Your task is to refine and improve a combined analysis from multiple document sections into a comprehensive, professional CIM review.
|
||||
return `You are an expert investment analyst. Your task is to refine and improve a combined JSON analysis into a final, professional CIM review.
|
||||
|
||||
Key responsibilities:
|
||||
- Ensure all sections are complete and properly structured
|
||||
- Remove any duplicate or redundant information
|
||||
- Improve the flow and coherence between sections
|
||||
- Maintain the exact BPCP CIM Review Template format
|
||||
- Enhance clarity and professionalism of the analysis
|
||||
- Preserve all unique insights and important details
|
||||
- Ensure the final output is a single, valid JSON object that conforms to the schema.
|
||||
- Remove any duplicate or redundant information.
|
||||
- Improve the flow and coherence of the content within the JSON structure.
|
||||
- Enhance the clarity and professionalism of the analysis.
|
||||
- Preserve all unique insights and important details.
|
||||
`;
|
||||
}
|
||||
|
||||
Focus on creating a cohesive, comprehensive analysis that would be suitable for senior investment professionals.`;
|
||||
/**
|
||||
* Build overview prompt
|
||||
*/
|
||||
private buildOverviewPrompt(text: string, _template: string): string {
|
||||
return `
|
||||
You are tasked with creating a comprehensive overview of the CIM document.
|
||||
|
||||
Your goal is to provide a high-level, strategic summary of the target company, its market position, and key factors driving its value.
|
||||
|
||||
CIM Document Text:
|
||||
${text}
|
||||
|
||||
Please generate a single, valid JSON object that represents this overview.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get system prompt for overview mode
|
||||
*/
|
||||
private getOverviewSystemPrompt(): string {
|
||||
return `You are an expert investment analyst. Your task is to create a comprehensive, strategic overview of a CIM document.
|
||||
|
||||
Key responsibilities:
|
||||
- Provide a high-level, strategic summary of the target company.
|
||||
- Include its market position, key drivers of value, and key risks.
|
||||
- Focus on the most relevant and impactful information.
|
||||
- Ensure the output is a single, valid JSON object.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build synthesis prompt
|
||||
*/
|
||||
private buildSynthesisPrompt(text: string, _template: string): string {
|
||||
return `
|
||||
You are tasked with synthesizing the key findings and insights from the CIM document.
|
||||
|
||||
Your goal is to provide a cohesive, well-structured summary that highlights the most important aspects of the target company.
|
||||
|
||||
CIM Document Text:
|
||||
${text}
|
||||
|
||||
Please generate a single, valid JSON object that represents this synthesis.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get system prompt for synthesis mode
|
||||
*/
|
||||
private getSynthesisSystemPrompt(): string {
|
||||
return `You are an expert investment analyst. Your task is to synthesize the key findings and insights from a CIM document.
|
||||
|
||||
Key responsibilities:
|
||||
- Provide a cohesive, well-structured summary of the target company.
|
||||
- Highlight the most important aspects and key drivers of value.
|
||||
- Ensure the output is a single, valid JSON object.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build section prompt
|
||||
*/
|
||||
private buildSectionPrompt(text: string, _template: string, sectionType: string, analysis: Record<string, any>): string {
|
||||
const sectionName = sectionType.charAt(0).toUpperCase() + sectionType.slice(1);
|
||||
const overview = analysis?.overview;
|
||||
|
||||
const sectionPrompt = `
|
||||
You are tasked with analyzing the "${sectionName}" section of the CIM document.
|
||||
|
||||
Your goal is to provide a detailed, structured analysis of this section, building upon the document overview.
|
||||
|
||||
${overview ? `Document Overview Context:
|
||||
${JSON.stringify(overview, null, 2)}
|
||||
|
||||
` : ''}CIM Document Text:
|
||||
${text}
|
||||
|
||||
Please generate a single, valid JSON object that represents this analysis, focusing specifically on the ${sectionName.toLowerCase()} aspects of the company.
|
||||
`;
|
||||
return sectionPrompt;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get system prompt for section mode
|
||||
*/
|
||||
private getSectionSystemPrompt(sectionType: string): string {
|
||||
const sectionName = sectionType.charAt(0).toUpperCase() + sectionType.slice(1);
|
||||
return `You are an expert investment analyst at BPCP (Blue Point Capital Partners) reviewing a Confidential Information Memorandum (CIM). Your task is to analyze the "${sectionName}" section of the CIM document and return a comprehensive, structured JSON object that follows the BPCP CIM Review Template format.
|
||||
|
||||
CRITICAL REQUIREMENTS:
|
||||
1. **JSON OUTPUT ONLY**: Your entire response MUST be a single, valid JSON object. Do not include any text or explanation before or after the JSON object.
|
||||
2. **SECTION FOCUS**: Focus specifically on the ${sectionName.toLowerCase()} aspects of the company.
|
||||
3. **COMPLETE ALL FIELDS**: You MUST provide a value for every field in the ${sectionName.toLowerCase()} section. Use "Not specified in CIM" for any information that is not available in the document.
|
||||
4. **NO PLACEHOLDERS**: Do not use placeholders like "..." or "TBD".
|
||||
5. **PROFESSIONAL ANALYSIS**: The content should be high-quality and suitable for BPCP's investment committee.
|
||||
6. **BPCP FOCUS**: Focus on companies in 5+MM EBITDA range in consumer and industrial end markets, with emphasis on M&A, technology & data usage, supply chain and human capital optimization.
|
||||
7. **BPCP PREFERENCES**: BPCP prefers companies which are founder/family-owned and within driving distance of Cleveland and Charlotte.
|
||||
8. **EXACT FIELD NAMES**: Use the exact field names and descriptions from the BPCP CIM Review Template.
|
||||
`;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
410
backend/src/services/ragDocumentProcessor.ts
Normal file
410
backend/src/services/ragDocumentProcessor.ts
Normal file
@@ -0,0 +1,410 @@
|
||||
import { logger } from '../utils/logger';
|
||||
import { llmService } from './llmService';
|
||||
import { config } from '../config/env';
|
||||
import { CIMReview } from '../models/types';
|
||||
|
||||
interface DocumentSection {
|
||||
id: string;
|
||||
type: 'executive_summary' | 'business_description' | 'financial_analysis' | 'market_analysis' | 'management' | 'investment_thesis';
|
||||
content: string;
|
||||
pageRange: [number, number];
|
||||
keyMetrics: Record<string, any>;
|
||||
relevanceScore: number;
|
||||
}
|
||||
|
||||
interface RAGQuery {
|
||||
section: string;
|
||||
context: string;
|
||||
specificQuestions: string[];
|
||||
}
|
||||
|
||||
interface RAGAnalysisResult {
|
||||
success: boolean;
|
||||
summary: string;
|
||||
analysisData: CIMReview;
|
||||
error?: string;
|
||||
processingTime: number;
|
||||
apiCalls: number;
|
||||
}
|
||||
|
||||
class RAGDocumentProcessor {
|
||||
private sections: DocumentSection[] = [];
|
||||
private documentContext: Record<string, any> = {};
|
||||
private apiCallCount: number = 0;
|
||||
|
||||
/**
|
||||
* Process CIM document using RAG approach
|
||||
*/
|
||||
async processDocument(text: string, documentId: string): Promise<RAGAnalysisResult> {
|
||||
const startTime = Date.now();
|
||||
this.apiCallCount = 0;
|
||||
|
||||
logger.info('Starting RAG-based CIM processing', { documentId });
|
||||
|
||||
try {
|
||||
// Step 1: Intelligent document segmentation
|
||||
await this.segmentDocument(text);
|
||||
|
||||
// Step 2: Extract key metrics and context
|
||||
await this.extractKeyMetrics();
|
||||
|
||||
// Step 3: Generate comprehensive analysis using RAG
|
||||
const analysis = await this.generateRAGAnalysis();
|
||||
|
||||
// Step 4: Create final summary
|
||||
const summary = await this.createFinalSummary(analysis);
|
||||
|
||||
const processingTime = Date.now() - startTime;
|
||||
|
||||
logger.info('RAG processing completed successfully', {
|
||||
documentId,
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount,
|
||||
sections: this.sections.length
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
summary,
|
||||
analysisData: analysis,
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount
|
||||
};
|
||||
|
||||
} catch (error) {
|
||||
const processingTime = Date.now() - startTime;
|
||||
logger.error('RAG processing failed', {
|
||||
documentId,
|
||||
error: error instanceof Error ? error.message : 'Unknown error',
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount
|
||||
});
|
||||
|
||||
return {
|
||||
success: false,
|
||||
summary: '',
|
||||
analysisData: {} as CIMReview,
|
||||
error: error instanceof Error ? error.message : 'Unknown error',
|
||||
processingTime,
|
||||
apiCalls: this.apiCallCount
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Segment document into logical sections with metadata
|
||||
*/
|
||||
private async segmentDocument(text: string): Promise<void> {
|
||||
logger.info('Segmenting document into logical sections');
|
||||
|
||||
// Use LLM to identify and segment document sections
|
||||
const segmentationPrompt = `
|
||||
Analyze this CIM document and identify its logical sections. For each section, provide:
|
||||
1. Section type (executive_summary, business_description, financial_analysis, market_analysis, management, investment_thesis)
|
||||
2. Start and end page numbers
|
||||
3. Key topics covered
|
||||
4. Relevance to investment analysis (1-10 scale)
|
||||
|
||||
Document text:
|
||||
${text.substring(0, 50000)} // First 50K chars for section identification
|
||||
|
||||
Return as JSON array of sections.
|
||||
`;
|
||||
|
||||
const segmentationResult = await this.callLLM({
|
||||
prompt: segmentationPrompt,
|
||||
systemPrompt: 'You are an expert at analyzing CIM document structure. Identify logical sections accurately.',
|
||||
maxTokens: 2000,
|
||||
temperature: 0.1
|
||||
});
|
||||
|
||||
if (segmentationResult.success) {
|
||||
try {
|
||||
const sections = JSON.parse(segmentationResult.content);
|
||||
this.sections = sections.map((section: any, index: number) => ({
|
||||
id: `section_${index}`,
|
||||
type: section.type,
|
||||
content: this.extractSectionContent(text, section.pageRange),
|
||||
pageRange: section.pageRange,
|
||||
keyMetrics: {},
|
||||
relevanceScore: section.relevanceScore
|
||||
}));
|
||||
} catch (error) {
|
||||
logger.error('Failed to parse section segmentation', { error });
|
||||
// Fallback to rule-based segmentation
|
||||
this.sections = this.fallbackSegmentation(text);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract key metrics from each section
|
||||
*/
|
||||
private async extractKeyMetrics(): Promise<void> {
|
||||
logger.info('Extracting key metrics from document sections');
|
||||
|
||||
for (const section of this.sections) {
|
||||
const metricsPrompt = `
|
||||
Extract key financial and business metrics from this section:
|
||||
|
||||
Section Type: ${section.type}
|
||||
Content: ${section.content.substring(0, 10000)}
|
||||
|
||||
Focus on:
|
||||
- Revenue, EBITDA, margins
|
||||
- Growth rates, market size
|
||||
- Customer metrics, employee count
|
||||
- Key risks and opportunities
|
||||
|
||||
Return as JSON object.
|
||||
`;
|
||||
|
||||
const metricsResult = await this.callLLM({
|
||||
prompt: metricsPrompt,
|
||||
systemPrompt: 'Extract precise numerical and qualitative metrics from CIM sections.',
|
||||
maxTokens: 1500,
|
||||
temperature: 0.1
|
||||
});
|
||||
|
||||
if (metricsResult.success) {
|
||||
try {
|
||||
section.keyMetrics = JSON.parse(metricsResult.content);
|
||||
} catch (error) {
|
||||
logger.warn('Failed to parse metrics for section', { sectionId: section.id, error });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate analysis using RAG approach
|
||||
*/
|
||||
private async generateRAGAnalysis(): Promise<CIMReview> {
|
||||
logger.info('Generating RAG-based analysis');
|
||||
|
||||
// Create queries for each section of the BPCP template
|
||||
const queries: RAGQuery[] = [
|
||||
{
|
||||
section: 'dealOverview',
|
||||
context: 'Extract deal-specific information including company name, industry, geography, transaction details',
|
||||
specificQuestions: [
|
||||
'What is the target company name?',
|
||||
'What industry/sector does it operate in?',
|
||||
'Where is the company headquartered?',
|
||||
'What type of transaction is this?',
|
||||
'What is the stated reason for sale?'
|
||||
]
|
||||
},
|
||||
{
|
||||
section: 'businessDescription',
|
||||
context: 'Analyze the company\'s core operations, products/services, and customer base',
|
||||
specificQuestions: [
|
||||
'What are the core operations?',
|
||||
'What are the key products/services?',
|
||||
'What is the revenue mix?',
|
||||
'Who are the key customers?',
|
||||
'What is the unique value proposition?'
|
||||
]
|
||||
},
|
||||
{
|
||||
section: 'financialSummary',
|
||||
context: 'Extract and analyze financial performance, trends, and quality metrics',
|
||||
specificQuestions: [
|
||||
'What are the revenue trends?',
|
||||
'What are the EBITDA margins?',
|
||||
'What is the quality of earnings?',
|
||||
'What are the growth drivers?',
|
||||
'What is the working capital intensity?'
|
||||
]
|
||||
},
|
||||
{
|
||||
section: 'marketIndustryAnalysis',
|
||||
context: 'Analyze market size, growth, competition, and industry trends',
|
||||
specificQuestions: [
|
||||
'What is the market size (TAM/SAM)?',
|
||||
'What is the market growth rate?',
|
||||
'Who are the key competitors?',
|
||||
'What are the barriers to entry?',
|
||||
'What are the key industry trends?'
|
||||
]
|
||||
},
|
||||
{
|
||||
section: 'managementTeamOverview',
|
||||
context: 'Evaluate management team quality, experience, and post-transaction intentions',
|
||||
specificQuestions: [
|
||||
'Who are the key leaders?',
|
||||
'What is their experience level?',
|
||||
'What are their post-transaction intentions?',
|
||||
'How is the organization structured?'
|
||||
]
|
||||
},
|
||||
{
|
||||
section: 'preliminaryInvestmentThesis',
|
||||
context: 'Develop investment thesis based on all available information',
|
||||
specificQuestions: [
|
||||
'What are the key attractions?',
|
||||
'What are the potential risks?',
|
||||
'What are the value creation levers?',
|
||||
'How does this align with BPCP strategy?'
|
||||
]
|
||||
}
|
||||
];
|
||||
|
||||
const analysis: any = {};
|
||||
|
||||
// Process each query using RAG
|
||||
for (const query of queries) {
|
||||
const relevantSections = this.findRelevantSections(query);
|
||||
const queryContext = this.buildQueryContext(relevantSections, query);
|
||||
|
||||
const analysisResult = await this.callLLM({
|
||||
prompt: this.buildRAGPrompt(query, queryContext),
|
||||
systemPrompt: 'You are an expert investment analyst. Provide precise, structured analysis based on the provided context.',
|
||||
maxTokens: 2000,
|
||||
temperature: 0.1
|
||||
});
|
||||
|
||||
if (analysisResult.success) {
|
||||
try {
|
||||
analysis[query.section] = JSON.parse(analysisResult.content);
|
||||
} catch (error) {
|
||||
logger.warn('Failed to parse analysis for section', { section: query.section, error });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return analysis as CIMReview;
|
||||
}
|
||||
|
||||
/**
|
||||
* Find sections relevant to a specific query
|
||||
*/
|
||||
private findRelevantSections(query: RAGQuery): DocumentSection[] {
|
||||
const relevanceMap: Record<string, string[]> = {
|
||||
dealOverview: ['executive_summary'],
|
||||
businessDescription: ['business_description', 'executive_summary'],
|
||||
financialSummary: ['financial_analysis', 'executive_summary'],
|
||||
marketIndustryAnalysis: ['market_analysis', 'executive_summary'],
|
||||
managementTeamOverview: ['management', 'executive_summary'],
|
||||
preliminaryInvestmentThesis: ['investment_thesis', 'executive_summary', 'business_description']
|
||||
};
|
||||
|
||||
const relevantTypes = relevanceMap[query.section] || [];
|
||||
return this.sections.filter(section =>
|
||||
relevantTypes.includes(section.type) && section.relevanceScore >= 5
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Build context for a specific query
|
||||
*/
|
||||
private buildQueryContext(sections: DocumentSection[], query: RAGQuery): string {
|
||||
let context = `Query: ${query.context}\n\n`;
|
||||
context += `Specific Questions:\n${query.specificQuestions.map(q => `- ${q}`).join('\n')}\n\n`;
|
||||
context += `Relevant Document Sections:\n\n`;
|
||||
|
||||
for (const section of sections) {
|
||||
context += `Section: ${section.type}\n`;
|
||||
context += `Relevance Score: ${section.relevanceScore}/10\n`;
|
||||
context += `Key Metrics: ${JSON.stringify(section.keyMetrics, null, 2)}\n`;
|
||||
context += `Content: ${section.content.substring(0, 5000)}\n\n`;
|
||||
}
|
||||
|
||||
return context;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build RAG prompt for specific analysis
|
||||
*/
|
||||
private buildRAGPrompt(query: RAGQuery, context: string): string {
|
||||
return `
|
||||
Based on the following context from a CIM document, provide a comprehensive analysis for the ${query.section} section.
|
||||
|
||||
${context}
|
||||
|
||||
Please provide your analysis in the exact JSON format required for the BPCP CIM Review Template.
|
||||
Focus on answering the specific questions listed above.
|
||||
Use "Not specified in CIM" for any information not available in the provided context.
|
||||
`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Create final summary from RAG analysis
|
||||
*/
|
||||
private async createFinalSummary(analysis: CIMReview): Promise<string> {
|
||||
logger.info('Creating final summary from RAG analysis');
|
||||
|
||||
const summaryPrompt = `
|
||||
Create a comprehensive markdown summary from the following BPCP CIM analysis:
|
||||
|
||||
${JSON.stringify(analysis, null, 2)}
|
||||
|
||||
Format as a professional BPCP CIM Review Template with proper markdown structure.
|
||||
`;
|
||||
|
||||
const summaryResult = await this.callLLM({
|
||||
prompt: summaryPrompt,
|
||||
systemPrompt: 'Create a professional, well-structured markdown summary for BPCP investment committee.',
|
||||
maxTokens: 3000,
|
||||
temperature: 0.1
|
||||
});
|
||||
|
||||
return summaryResult.success ? summaryResult.content : 'Summary generation failed';
|
||||
}
|
||||
|
||||
/**
|
||||
* Fallback segmentation if LLM segmentation fails
|
||||
*/
|
||||
private fallbackSegmentation(text: string): DocumentSection[] {
|
||||
// Rule-based segmentation as fallback
|
||||
const sections: DocumentSection[] = [];
|
||||
const patterns = [
|
||||
{ type: 'executive_summary', pattern: /(?:executive\s+summary|overview|introduction)/i },
|
||||
{ type: 'business_description', pattern: /(?:business\s+description|company\s+overview|operations)/i },
|
||||
{ type: 'financial_analysis', pattern: /(?:financial|financials|performance|results)/i },
|
||||
{ type: 'market_analysis', pattern: /(?:market|industry|competitive)/i },
|
||||
{ type: 'management', pattern: /(?:management|leadership|team)/i },
|
||||
{ type: 'investment_thesis', pattern: /(?:investment|opportunity|thesis)/i }
|
||||
];
|
||||
|
||||
// Simple text splitting based on patterns
|
||||
const textLength = text.length;
|
||||
const sectionSize = Math.floor(textLength / patterns.length);
|
||||
|
||||
patterns.forEach((pattern, index) => {
|
||||
const start = index * sectionSize;
|
||||
const end = Math.min((index + 1) * sectionSize, textLength);
|
||||
|
||||
sections.push({
|
||||
id: `section_${index}`,
|
||||
type: pattern.type as any,
|
||||
content: text.substring(start, end),
|
||||
pageRange: [Math.floor(start / 1000), Math.floor(end / 1000)],
|
||||
keyMetrics: {},
|
||||
relevanceScore: 7
|
||||
});
|
||||
});
|
||||
|
||||
return sections;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract content for specific page range
|
||||
*/
|
||||
private extractSectionContent(text: string, pageRange: [number, number]): string {
|
||||
// Rough estimation: 1000 characters per page
|
||||
const startChar = pageRange[0] * 1000;
|
||||
const endChar = pageRange[1] * 1000;
|
||||
return text.substring(startChar, endChar);
|
||||
}
|
||||
|
||||
/**
|
||||
* Wrapper for LLM calls to track API usage
|
||||
*/
|
||||
private async callLLM(request: any): Promise<any> {
|
||||
this.apiCallCount++;
|
||||
return await llmService.callLLM(request);
|
||||
}
|
||||
}
|
||||
|
||||
export const ragDocumentProcessor = new RAGDocumentProcessor();
|
||||
258
backend/src/services/unifiedDocumentProcessor.ts
Normal file
258
backend/src/services/unifiedDocumentProcessor.ts
Normal file
@@ -0,0 +1,258 @@
|
||||
import { logger } from '../utils/logger';
|
||||
import { config } from '../config/env';
|
||||
import { documentProcessingService } from './documentProcessingService';
|
||||
import { ragDocumentProcessor } from './ragDocumentProcessor';
|
||||
import { CIMReview } from '../models/types';
|
||||
|
||||
interface ProcessingResult {
|
||||
success: boolean;
|
||||
summary: string;
|
||||
analysisData: CIMReview;
|
||||
processingStrategy: 'chunking' | 'rag';
|
||||
processingTime: number;
|
||||
apiCalls: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface ComparisonResult {
|
||||
chunking: ProcessingResult;
|
||||
rag: ProcessingResult;
|
||||
winner: 'chunking' | 'rag' | 'tie';
|
||||
performanceMetrics: {
|
||||
timeDifference: number;
|
||||
apiCallDifference: number;
|
||||
qualityScore: number;
|
||||
};
|
||||
}
|
||||
|
||||
class UnifiedDocumentProcessor {
|
||||
/**
|
||||
* Process document using the configured strategy
|
||||
*/
|
||||
async processDocument(
|
||||
documentId: string,
|
||||
userId: string,
|
||||
text: string,
|
||||
options: any = {}
|
||||
): Promise<ProcessingResult> {
|
||||
const strategy = options.strategy || config.processingStrategy;
|
||||
|
||||
logger.info('Processing document with unified processor', {
|
||||
documentId,
|
||||
strategy,
|
||||
textLength: text.length
|
||||
});
|
||||
|
||||
if (strategy === 'rag') {
|
||||
return await this.processWithRAG(documentId, text);
|
||||
} else {
|
||||
return await this.processWithChunking(documentId, userId, text, options);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process document using RAG approach
|
||||
*/
|
||||
private async processWithRAG(documentId: string, text: string): Promise<ProcessingResult> {
|
||||
logger.info('Using RAG processing strategy', { documentId });
|
||||
|
||||
const startTime = Date.now();
|
||||
const result = await ragDocumentProcessor.processDocument(text, documentId);
|
||||
|
||||
return {
|
||||
success: result.success,
|
||||
summary: result.summary,
|
||||
analysisData: result.analysisData,
|
||||
processingStrategy: 'rag',
|
||||
processingTime: result.processingTime,
|
||||
apiCalls: result.apiCalls,
|
||||
error: result.error
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Process document using chunking approach
|
||||
*/
|
||||
private async processWithChunking(
|
||||
documentId: string,
|
||||
userId: string,
|
||||
text: string,
|
||||
options: any
|
||||
): Promise<ProcessingResult> {
|
||||
logger.info('Using chunking processing strategy', { documentId });
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
const result = await documentProcessingService.processDocument(documentId, userId, options);
|
||||
|
||||
// Estimate API calls for chunking (this is approximate)
|
||||
const estimatedApiCalls = this.estimateChunkingApiCalls(text);
|
||||
|
||||
return {
|
||||
success: result.success,
|
||||
summary: result.summary,
|
||||
analysisData: result.analysisData,
|
||||
processingStrategy: 'chunking',
|
||||
processingTime: Date.now() - startTime,
|
||||
apiCalls: estimatedApiCalls,
|
||||
error: result.error
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
success: false,
|
||||
summary: '',
|
||||
analysisData: {} as CIMReview,
|
||||
processingStrategy: 'chunking',
|
||||
processingTime: Date.now() - startTime,
|
||||
apiCalls: 0,
|
||||
error: error instanceof Error ? error.message : 'Unknown error'
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Compare both processing strategies
|
||||
*/
|
||||
async compareProcessingStrategies(
|
||||
documentId: string,
|
||||
userId: string,
|
||||
text: string,
|
||||
options: any = {}
|
||||
): Promise<ComparisonResult> {
|
||||
logger.info('Comparing processing strategies', { documentId });
|
||||
|
||||
// Process with both strategies
|
||||
const [chunkingResult, ragResult] = await Promise.all([
|
||||
this.processWithChunking(documentId, userId, text, options),
|
||||
this.processWithRAG(documentId, text)
|
||||
]);
|
||||
|
||||
// Calculate performance metrics
|
||||
const timeDifference = chunkingResult.processingTime - ragResult.processingTime;
|
||||
const apiCallDifference = chunkingResult.apiCalls - ragResult.apiCalls;
|
||||
const qualityScore = this.calculateQualityScore(chunkingResult, ragResult);
|
||||
|
||||
// Determine winner
|
||||
let winner: 'chunking' | 'rag' | 'tie' = 'tie';
|
||||
if (ragResult.success && !chunkingResult.success) {
|
||||
winner = 'rag';
|
||||
} else if (chunkingResult.success && !ragResult.success) {
|
||||
winner = 'chunking';
|
||||
} else if (ragResult.success && chunkingResult.success) {
|
||||
// Both successful, compare performance
|
||||
const ragScore = (qualityScore * 0.6) + (timeDifference > 0 ? 0.2 : 0) + (apiCallDifference > 0 ? 0.2 : 0);
|
||||
const chunkingScore = ((1 - qualityScore) * 0.6) + (timeDifference < 0 ? 0.2 : 0) + (apiCallDifference < 0 ? 0.2 : 0);
|
||||
winner = ragScore > chunkingScore ? 'rag' : 'chunking';
|
||||
}
|
||||
|
||||
return {
|
||||
chunking: chunkingResult,
|
||||
rag: ragResult,
|
||||
winner,
|
||||
performanceMetrics: {
|
||||
timeDifference,
|
||||
apiCallDifference,
|
||||
qualityScore
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Estimate API calls for chunking approach
|
||||
*/
|
||||
private estimateChunkingApiCalls(text: string): number {
|
||||
const chunkSize = config.llm.chunkSize;
|
||||
const estimatedTokens = Math.ceil(text.length / 4); // Rough token estimation
|
||||
const chunks = Math.ceil(estimatedTokens / chunkSize);
|
||||
return chunks + 1; // +1 for final synthesis
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate quality score based on result completeness
|
||||
*/
|
||||
private calculateQualityScore(chunkingResult: ProcessingResult, ragResult: ProcessingResult): number {
|
||||
if (!chunkingResult.success && !ragResult.success) return 0.5;
|
||||
if (!chunkingResult.success) return 1.0;
|
||||
if (!ragResult.success) return 0.0;
|
||||
|
||||
// Compare summary length and structure
|
||||
const chunkingScore = this.analyzeSummaryQuality(chunkingResult.summary);
|
||||
const ragScore = this.analyzeSummaryQuality(ragResult.summary);
|
||||
|
||||
return ragScore / (chunkingScore + ragScore);
|
||||
}
|
||||
|
||||
/**
|
||||
* Analyze summary quality based on length and structure
|
||||
*/
|
||||
private analyzeSummaryQuality(summary: string): number {
|
||||
if (!summary) return 0;
|
||||
|
||||
// Check for markdown structure
|
||||
const hasHeaders = (summary.match(/#{1,6}\s/g) || []).length;
|
||||
const hasLists = (summary.match(/[-*+]\s/g) || []).length;
|
||||
const hasBold = (summary.match(/\*\*.*?\*\*/g) || []).length;
|
||||
|
||||
// Length factor (longer summaries tend to be more comprehensive)
|
||||
const lengthFactor = Math.min(summary.length / 5000, 1);
|
||||
|
||||
// Structure factor
|
||||
const structureFactor = Math.min((hasHeaders + hasLists + hasBold) / 10, 1);
|
||||
|
||||
return (lengthFactor * 0.7) + (structureFactor * 0.3);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get processing statistics
|
||||
*/
|
||||
async getProcessingStats(): Promise<{
|
||||
totalDocuments: number;
|
||||
chunkingSuccess: number;
|
||||
ragSuccess: number;
|
||||
averageProcessingTime: {
|
||||
chunking: number;
|
||||
rag: number;
|
||||
};
|
||||
averageApiCalls: {
|
||||
chunking: number;
|
||||
rag: number;
|
||||
};
|
||||
}> {
|
||||
// This would typically query a database for processing statistics
|
||||
// For now, return mock data
|
||||
return {
|
||||
totalDocuments: 0,
|
||||
chunkingSuccess: 0,
|
||||
ragSuccess: 0,
|
||||
averageProcessingTime: {
|
||||
chunking: 0,
|
||||
rag: 0
|
||||
},
|
||||
averageApiCalls: {
|
||||
chunking: 0,
|
||||
rag: 0
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Switch processing strategy for a document
|
||||
*/
|
||||
async switchStrategy(
|
||||
documentId: string,
|
||||
userId: string,
|
||||
text: string,
|
||||
newStrategy: 'chunking' | 'rag',
|
||||
options: any = {}
|
||||
): Promise<ProcessingResult> {
|
||||
logger.info('Switching processing strategy', { documentId, newStrategy });
|
||||
|
||||
return await this.processDocument(documentId, userId, text, {
|
||||
...options,
|
||||
strategy: newStrategy
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
export const unifiedDocumentProcessor = new UnifiedDocumentProcessor();
|
||||
163
backend/test-rag-processing.js
Normal file
163
backend/test-rag-processing.js
Normal file
@@ -0,0 +1,163 @@
|
||||
const { ragDocumentProcessor } = require('./dist/services/ragDocumentProcessor');
|
||||
const { unifiedDocumentProcessor } = require('./dist/services/unifiedDocumentProcessor');
|
||||
|
||||
// Sample CIM text for testing
|
||||
const sampleCIMText = `
|
||||
EXECUTIVE SUMMARY
|
||||
|
||||
Company Overview
|
||||
ABC Manufacturing is a leading provider of precision manufacturing solutions for the aerospace and defense industries. Founded in 1985, the company has grown to become a trusted partner for major OEMs and Tier 1 suppliers.
|
||||
|
||||
Financial Performance
|
||||
The company has demonstrated consistent growth over the past three years:
|
||||
- FY-3: Revenue $45M, EBITDA $8.2M (18.2% margin)
|
||||
- FY-2: Revenue $52M, EBITDA $9.8M (18.8% margin)
|
||||
- FY-1: Revenue $58M, EBITDA $11.2M (19.3% margin)
|
||||
- LTM: Revenue $62M, EBITDA $12.1M (19.5% margin)
|
||||
|
||||
BUSINESS DESCRIPTION
|
||||
|
||||
Core Operations
|
||||
ABC Manufacturing specializes in precision machining, assembly, and testing of critical aerospace components. The company operates from a 150,000 sq ft facility in Cleveland, Ohio, with state-of-the-art CNC equipment and quality control systems.
|
||||
|
||||
Key Products & Services
|
||||
- Precision machined components (60% of revenue)
|
||||
- Assembly and testing services (25% of revenue)
|
||||
- Engineering and design support (15% of revenue)
|
||||
|
||||
Customer Base
|
||||
The company serves major aerospace OEMs including Boeing, Lockheed Martin, and Northrop Grumman. Top 5 customers represent 75% of revenue, with Boeing being the largest at 35%.
|
||||
|
||||
MARKET ANALYSIS
|
||||
|
||||
Market Size & Growth
|
||||
The global aerospace manufacturing market is estimated at $850B, growing at 4.2% CAGR. The precision manufacturing segment represents approximately $120B of this market.
|
||||
|
||||
Competitive Landscape
|
||||
Key competitors include:
|
||||
- Precision Castparts (PCC)
|
||||
- Arconic
|
||||
- ATI Metals
|
||||
- Local and regional precision manufacturers
|
||||
|
||||
Competitive Advantages
|
||||
- Long-term relationships with major OEMs
|
||||
- AS9100 and NADCAP certifications
|
||||
- Advanced manufacturing capabilities
|
||||
- Proximity to major aerospace hubs
|
||||
|
||||
FINANCIAL SUMMARY
|
||||
|
||||
Revenue Growth Drivers
|
||||
- Increased defense spending
|
||||
- Commercial aerospace recovery
|
||||
- New product development programs
|
||||
- Geographic expansion
|
||||
|
||||
Quality of Earnings
|
||||
The company has strong, recurring revenue streams with long-term contracts. EBITDA margins have improved consistently due to operational efficiencies and automation investments.
|
||||
|
||||
Working Capital
|
||||
Working capital intensity is moderate at 15% of revenue, with 45-day payment terms from customers and 30-day terms with suppliers.
|
||||
|
||||
MANAGEMENT TEAM
|
||||
|
||||
Key Leadership
|
||||
- CEO: John Smith (25 years aerospace experience)
|
||||
- CFO: Sarah Johnson (15 years manufacturing finance)
|
||||
- COO: Mike Davis (20 years operations leadership)
|
||||
|
||||
Management Quality
|
||||
The management team has deep industry experience and strong relationships with key customers. All executives have committed to remain post-transaction.
|
||||
|
||||
INVESTMENT THESIS
|
||||
|
||||
Key Attractions
|
||||
- Strong market position in growing aerospace sector
|
||||
- Consistent financial performance and margin expansion
|
||||
- Long-term customer relationships with major OEMs
|
||||
- Experienced management team committed to growth
|
||||
- Strategic location in aerospace manufacturing hub
|
||||
|
||||
Value Creation Opportunities
|
||||
- Geographic expansion to capture additional market share
|
||||
- Technology investments to improve efficiency and capabilities
|
||||
- Add-on acquisitions to expand product portfolio
|
||||
- Operational improvements to further enhance margins
|
||||
|
||||
Risks & Considerations
|
||||
- Customer concentration (75% from top 5 customers)
|
||||
- Dependence on aerospace industry cycles
|
||||
- Competition from larger, well-capitalized players
|
||||
- Regulatory compliance requirements
|
||||
|
||||
Alignment with BPCP Strategy
|
||||
The company fits well within BPCP's focus on 5+MM EBITDA companies in industrial markets. The Cleveland location provides proximity to BPCP's headquarters, and the founder-owned nature aligns with BPCP's preferences.
|
||||
`;
|
||||
|
||||
async function testRAGProcessing() {
|
||||
console.log('🚀 Testing RAG Processing Approach');
|
||||
console.log('==================================');
|
||||
|
||||
try {
|
||||
// Test RAG processing
|
||||
console.log('\n📋 Testing RAG Processing...');
|
||||
const startTime = Date.now();
|
||||
|
||||
const ragResult = await ragDocumentProcessor.processDocument(sampleCIMText, 'test-doc-001');
|
||||
|
||||
const processingTime = Date.now() - startTime;
|
||||
|
||||
console.log('✅ RAG Processing Results:');
|
||||
console.log(`- Success: ${ragResult.success}`);
|
||||
console.log(`- Processing Time: ${processingTime}ms`);
|
||||
console.log(`- API Calls: ${ragResult.apiCalls}`);
|
||||
console.log(`- Error: ${ragResult.error || 'None'}`);
|
||||
|
||||
if (ragResult.success) {
|
||||
console.log('\n📊 Analysis Summary:');
|
||||
console.log(`- Company: ${ragResult.analysisData.dealOverview?.targetCompanyName || 'N/A'}`);
|
||||
console.log(`- Industry: ${ragResult.analysisData.dealOverview?.industrySector || 'N/A'}`);
|
||||
console.log(`- Revenue: ${ragResult.analysisData.financialSummary?.financials?.ltm?.revenue || 'N/A'}`);
|
||||
console.log(`- EBITDA: ${ragResult.analysisData.financialSummary?.financials?.ltm?.ebitda || 'N/A'}`);
|
||||
}
|
||||
|
||||
// Test unified processor with comparison
|
||||
console.log('\n🔄 Testing Unified Processor Comparison...');
|
||||
|
||||
const comparisonResult = await unifiedDocumentProcessor.compareProcessingStrategies(
|
||||
'test-doc-001',
|
||||
'test-user-001',
|
||||
sampleCIMText
|
||||
);
|
||||
|
||||
console.log('✅ Comparison Results:');
|
||||
console.log(`- Winner: ${comparisonResult.winner}`);
|
||||
console.log(`- Time Difference: ${comparisonResult.performanceMetrics.timeDifference}ms`);
|
||||
console.log(`- API Call Difference: ${comparisonResult.performanceMetrics.apiCallDifference}`);
|
||||
console.log(`- Quality Score: ${comparisonResult.performanceMetrics.qualityScore.toFixed(2)}`);
|
||||
|
||||
console.log('\n📈 Performance Summary:');
|
||||
console.log('Chunking:');
|
||||
console.log(` - Success: ${comparisonResult.chunking.success}`);
|
||||
console.log(` - Time: ${comparisonResult.chunking.processingTime}ms`);
|
||||
console.log(` - API Calls: ${comparisonResult.chunking.apiCalls}`);
|
||||
|
||||
console.log('RAG:');
|
||||
console.log(` - Success: ${comparisonResult.rag.success}`);
|
||||
console.log(` - Time: ${comparisonResult.rag.processingTime}ms`);
|
||||
console.log(` - API Calls: ${comparisonResult.rag.apiCalls}`);
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Test failed:', error);
|
||||
}
|
||||
}
|
||||
|
||||
// Run the test
|
||||
testRAGProcessing().then(() => {
|
||||
console.log('\n🏁 Test completed');
|
||||
process.exit(0);
|
||||
}).catch(error => {
|
||||
console.error('💥 Test failed:', error);
|
||||
process.exit(1);
|
||||
});
|
||||
Reference in New Issue
Block a user