16 KiB
LLM Service Documentation
📄 File Information
File Path: backend/src/services/llmService.ts
File Type: TypeScript
Last Updated: 2024-12-20
Version: 1.0.0
Status: Active
🎯 Purpose & Overview
Primary Purpose: Centralized service for all LLM (Large Language Model) interactions, providing intelligent model selection, prompt engineering, and structured output generation for CIM document analysis.
Business Context: Handles the AI-powered analysis of Confidential Information Memorandums by orchestrating interactions with Claude AI and OpenAI, ensuring optimal model selection, cost management, and quality output generation.
Key Responsibilities:
- Intelligent model selection based on task complexity
- Prompt engineering and system prompt management
- Multi-provider LLM integration (Claude AI, OpenAI)
- Structured output generation and validation
- Cost tracking and optimization
- Error handling and retry logic
- CIM-specific analysis and synthesis
🏗️ Architecture & Dependencies
Dependencies
Internal Dependencies:
config/env.ts- Environment configuration and API keyslogger.ts- Structured logging utilityllmSchemas.ts- CIM review data structure definitions and validationzod- Schema validation library
External Dependencies:
@anthropic-ai/sdk- Claude AI API client@openai/openai- OpenAI API clientzod- TypeScript-first schema validation
Integration Points
- Input Sources: Document text from processing services
- Output Destinations: Structured CIM analysis data, summaries, section analysis
- Event Triggers: Document analysis requests from processing pipeline
- Event Listeners: Analysis completion events, error events
🔧 Implementation Details
Core Functions/Methods
processCIMDocument
/**
* @purpose Main entry point for CIM document processing with intelligent model selection
* @context Called when document analysis is needed with structured output requirements
* @inputs text: string, template: string, analysis?: Record<string, any>
* @outputs CIMAnalysisResult with structured data, cost tracking, and validation
* @dependencies Claude AI/OpenAI APIs, schema validation, cost estimation
* @errors API failures, validation errors, parsing errors
* @complexity O(1) - Single LLM call with comprehensive prompt engineering
*/
Example Usage:
const llmService = new LLMService();
const result = await llmService.processCIMDocument(
documentText,
cimTemplate,
{ refinementMode: false, overviewMode: true }
);
callLLM
/**
* @purpose Generic LLM call method with provider abstraction
* @context Called for all LLM interactions regardless of provider
* @inputs request: LLMRequest with prompt and configuration
* @outputs LLMResponse with content and usage metrics
* @dependencies Provider-specific API clients
* @errors API failures, rate limiting, authentication errors
* @complexity O(1) - Direct API call with error handling
*/
callAnthropic
/**
* @purpose Claude AI specific API interactions
* @context Called when using Claude AI as the LLM provider
* @inputs request: LLMRequest with Claude-specific parameters
* @outputs LLMResponse with Claude AI response and token usage
* @dependencies @anthropic-ai/sdk
* @errors Claude API failures, rate limiting, model errors
* @complexity O(1) - Direct Claude API call
*/
callOpenAI
/**
* @purpose OpenAI specific API interactions
* @context Called when using OpenAI as the LLM provider
* @inputs request: LLMRequest with OpenAI-specific parameters
* @outputs LLMResponse with OpenAI response and token usage
* @dependencies @openai/openai
* @errors OpenAI API failures, rate limiting, model errors
* @complexity O(1) - Direct OpenAI API call
*/
Data Structures
LLMRequest
interface LLMRequest {
prompt: string; // Main prompt text
systemPrompt?: string; // System prompt for context
maxTokens?: number; // Maximum tokens for response
temperature?: number; // Response creativity (0-2)
model?: string; // Specific model to use
}
LLMResponse
interface LLMResponse {
success: boolean; // Request success status
content: string; // LLM response content
usage?: { // Token usage metrics
promptTokens: number; // Input tokens used
completionTokens: number; // Output tokens used
totalTokens: number; // Total tokens used
};
error?: string; // Error message if failed
}
CIMAnalysisResult
interface CIMAnalysisResult {
success: boolean; // Analysis success status
jsonOutput?: CIMReview; // Structured analysis data
error?: string; // Error message if failed
model: string; // Model used for analysis
cost: number; // Estimated cost in USD
inputTokens: number; // Input tokens consumed
outputTokens: number; // Output tokens consumed
validationIssues?: z.ZodIssue[]; // Schema validation issues
}
Configuration
// Key configuration options
const LLM_CONFIG = {
provider: 'anthropic' | 'openai', // LLM provider selection
defaultModel: 'claude-3-opus-20240229', // Default model for provider
maxTokens: 3500, // Default max tokens
temperature: 0.1, // Default temperature
promptBuffer: 500, // Buffer for prompt engineering
retryAttempts: 3, // Number of retry attempts
costThreshold: 5.0, // Cost threshold per request (USD)
};
📊 Data Flow
Input Processing
- Task Analysis: Determine task complexity and requirements
- Model Selection: Select optimal model based on complexity and tokens
- Prompt Engineering: Build appropriate prompt based on analysis type
- System Prompt: Generate context-appropriate system prompt
- Parameter Optimization: Optimize temperature, tokens, and other parameters
Processing Pipeline
- Provider Selection: Route to appropriate provider (Claude/OpenAI)
- API Call: Execute LLM API call with retry logic
- Response Processing: Extract and validate response content
- JSON Parsing: Parse structured output from response
- Schema Validation: Validate output against CIM review schema
Output Generation
- Content Extraction: Extract structured data from LLM response
- Cost Calculation: Calculate and track API usage costs
- Validation: Validate output against expected schema
- Error Handling: Handle parsing and validation errors
- Result Formatting: Format final analysis result
Data Transformations
Document Text→Task Analysis→Model Selection→Prompt Engineering→LLM ResponseAnalysis Requirements→Prompt Strategy→System Context→Structured OutputRaw Response→JSON Parsing→Schema Validation→Validated Data
🚨 Error Handling
Error Types
/**
* @errorType API_ERROR
* @description LLM API call failed due to network or service issues
* @recoverable true
* @retryStrategy exponential_backoff
* @userMessage "LLM service temporarily unavailable"
*/
/**
* @errorType RATE_LIMIT_ERROR
* @description API rate limit exceeded
* @recoverable true
* @retryStrategy exponential_backoff
* @userMessage "Rate limit exceeded, retrying shortly"
*/
/**
* @errorType VALIDATION_ERROR
* @description LLM response failed schema validation
* @recoverable true
* @retryStrategy retry_with_different_prompt
* @userMessage "Response validation failed, retrying with improved prompt"
*/
/**
* @errorType PARSING_ERROR
* @description Failed to parse JSON from LLM response
* @recoverable true
* @retryStrategy retry_with_json_formatting
* @userMessage "Response parsing failed, retrying with JSON formatting"
*/
Error Recovery
- API Errors: Implement exponential backoff and retry logic
- Rate Limit Errors: Respect rate limits and implement backoff
- Validation Errors: Retry with improved prompts and formatting
- Parsing Errors: Retry with explicit JSON formatting instructions
Fallback Strategies
- Primary Strategy: Claude AI with comprehensive prompts
- Fallback Strategy: OpenAI with similar prompts
- Degradation Strategy: Simplified analysis with basic prompts
🧪 Testing
Test Coverage
- Unit Tests: 95% - Core LLM interaction logic and prompt engineering
- Integration Tests: 90% - End-to-end LLM processing workflows
- Performance Tests: API response time and cost optimization
Test Data
/**
* @testData sample_cim_text.txt
* @description Standard CIM document text for testing
* @size 10KB
* @sections Financial, Market, Management
* @expectedOutput Valid CIMReview with all sections populated
*/
/**
* @testData complex_cim_text.txt
* @description Complex CIM document for model selection testing
* @size 50KB
* @sections Comprehensive business analysis
* @expectedOutput Complex analysis with appropriate model selection
*/
/**
* @testData malformed_response.json
* @description Malformed LLM response for error handling testing
* @size 2KB
* @format Invalid JSON structure
* @expectedOutput Proper error handling and retry logic
*/
Mock Strategy
- External APIs: Mock Claude AI and OpenAI responses
- Schema Validation: Mock validation scenarios and error cases
- Cost Tracking: Mock token usage and cost calculations
📈 Performance Characteristics
Performance Metrics
- Average Response Time: 10-30 seconds per LLM call
- Token Usage: 1000-4000 tokens per analysis
- Cost per Analysis: $0.01-$0.10 per document
- Success Rate: 95%+ with retry logic
- Validation Success: 90%+ with prompt engineering
Optimization Strategies
- Model Selection: Intelligent model selection based on task complexity
- Prompt Engineering: Optimized prompts for better response quality
- Cost Management: Token usage optimization and cost tracking
- Caching: Cache similar requests to reduce API calls
- Batch Processing: Process multiple sections in single requests
Scalability Limits
- API Rate Limits: Respect provider-specific rate limits
- Cost Limits: Maximum cost per request and daily limits
- Token Limits: Maximum input/output token limits per model
- Concurrent Requests: Limit concurrent API calls
🔍 Debugging & Monitoring
Logging
/**
* @logging Structured logging with detailed LLM interaction metrics
* @levels debug, info, warn, error
* @correlation Request ID and model tracking
* @context Prompt engineering, model selection, cost tracking, validation
*/
Debug Tools
- Prompt Analysis: Detailed prompt engineering and system prompt analysis
- Model Selection: Model selection logic and reasoning
- Cost Tracking: Detailed cost analysis and optimization
- Response Validation: Schema validation and error analysis
Common Issues
- API Failures: Monitor API health and implement proper retry logic
- Rate Limiting: Implement proper rate limiting and backoff strategies
- Validation Errors: Improve prompt engineering for better response quality
- Cost Optimization: Monitor and optimize token usage and model selection
🔐 Security Considerations
Input Validation
- Text Content: Sanitization of input text for prompt injection prevention
- API Keys: Secure storage and rotation of API keys
- Request Validation: Validation of all input parameters
Authentication & Authorization
- API Access: Secure access to LLM provider APIs
- Key Management: Secure API key management and rotation
- Request Logging: Secure logging of requests and responses
Data Protection
- Text Processing: Secure handling of sensitive document content
- Response Storage: Secure storage of LLM responses and analysis
- Cost Tracking: Secure tracking and reporting of API usage costs
📚 Related Documentation
Internal References
optimizedAgenticRAGProcessor.ts- Uses this service for LLM analysisllmSchemas.ts- CIM review data structure definitionsconfig/env.ts- Environment configuration and API keyslogger.ts- Structured logging utility
External References
🔄 Change History
Recent Changes
2024-12-20- Implemented intelligent model selection and cost tracking -[Author]2024-12-15- Added comprehensive prompt engineering and validation -[Author]2024-12-10- Implemented multi-provider support (Claude AI, OpenAI) -[Author]
Planned Changes
- Advanced prompt engineering improvements -
2025-01-15 - Multi-language support for international documents -
2025-01-30 - Enhanced cost optimization and caching -
2025-02-15
📋 Usage Examples
Basic Usage
import { LLMService } from './llmService';
const llmService = new LLMService();
const result = await llmService.processCIMDocument(
documentText,
cimTemplate
);
if (result.success) {
console.log('Analysis completed:', result.jsonOutput);
console.log('Cost:', result.cost, 'USD');
console.log('Tokens used:', result.inputTokens + result.outputTokens);
} else {
console.error('Analysis failed:', result.error);
}
Advanced Usage
import { LLMService } from './llmService';
const llmService = new LLMService();
const result = await llmService.processCIMDocument(
documentText,
cimTemplate,
{
refinementMode: true,
overviewMode: false,
sectionType: 'financial'
}
);
// Monitor detailed metrics
console.log('Model used:', result.model);
console.log('Input tokens:', result.inputTokens);
console.log('Output tokens:', result.outputTokens);
console.log('Total cost:', result.cost, 'USD');
Error Handling
try {
const result = await llmService.processCIMDocument(
documentText,
cimTemplate
);
if (!result.success) {
logger.error('LLM analysis failed', {
error: result.error,
model: result.model,
cost: result.cost
});
}
if (result.validationIssues) {
logger.warn('Validation issues found', {
issues: result.validationIssues
});
}
} catch (error) {
logger.error('Unexpected error during LLM analysis', {
error: error.message
});
}
🎯 LLM Agent Notes
Key Understanding Points
- This service is the central hub for all LLM interactions in the system
- Implements intelligent model selection based on task complexity
- Provides comprehensive prompt engineering for different analysis types
- Handles multi-provider support (Claude AI, OpenAI) with fallback logic
- Includes cost tracking and optimization for API usage
Common Modifications
- Adding new providers - Implement new provider methods and update selection logic
- Modifying prompt engineering - Update prompt building methods for different analysis types
- Adjusting model selection - Modify selectModel method for different complexity criteria
- Enhancing validation - Extend schema validation and error handling
- Optimizing costs - Adjust cost thresholds and token optimization strategies
Integration Patterns
- Strategy Pattern - Different providers and models for different tasks
- Factory Pattern - Creating different types of prompts and system contexts
- Observer Pattern - Cost tracking and performance monitoring
- Chain of Responsibility - Retry logic and fallback strategies
This documentation provides comprehensive information about the LLM service, enabling LLM agents to understand its purpose, implementation, and usage patterns for effective code evaluation and modification.