Files
cim_summary/docs/PROMPT_ENGINEERING_ANALYSIS.md
admin e406d33074 Refactor: Codebase cleanup and modularization
- Remove outdated documentation files (7 files)
- Remove deprecated code (database.ts, authController.ts, auth.ts)
- Extract constants to backend/src/config/constants.ts
- Consolidate shared types (processing, llm, document, job)
- Create LLM modularization structure:
  - llmPrompts/ directory for prompt builders
  - llmProviders/ directory for provider implementations
  - llmUtils/ directory for utility functions
- Extract common error handling patterns to errorHandlers.ts
- Organize scripts into subdirectories (monitoring/, testing/, debugging/, setup/)
- Update README.md with current documentation references

All functionality preserved, structure improved for maintainability.
2025-11-11 06:52:10 -05:00

22 KiB

Prompt Engineering Deep Dive Analysis

CIM Document Processing System

Analysis Date: 2025-01-XX
Analyst: AI Prompt Engineering Specialist
Scope: 12 prompt constructs across 2 core service files


Executive Summary

This analysis identifies 15 specific, actionable recommendations to optimize AI prompts across the CIM processing system. Recommendations are prioritized by success likelihood and implementation difficulty, targeting improvements in:

  1. Financial Accuracy: 100% correctness of extracted financial data
  2. Data Completeness: All key data extracted (management, customers, KPIs)
  3. Format & Readability: Correct, easy-to-read, programmatically parsable JSON
  4. Processing Efficiency: Fastest possible processing and response
  5. Insight Quality: World-class PE investor-level analysis

Key Findings:

  • Strong foundation with comprehensive validation frameworks
  • Opportunities for enhanced few-shot examples and cross-validation
  • Potential for improved RAG query specificity and dynamic instruction generation
  • Need for more structured PE investment framework integration

Analysis Methodology

Each prompt construct was evaluated against:

  • Current effectiveness for the 5 objectives
  • Specific weaknesses and gaps
  • Improvement opportunities with success likelihood vs implementation difficulty
  • Recommended enhancements prioritized by impact

Detailed Recommendations

QUICK WINS (High Success, Low Difficulty)

Recommendation 1: Add Financial Table Detection Examples with Edge Cases

Location: buildFinancialPrompt (llmService.ts:2460-2845)

Current State:

  • Has 10 few-shot examples covering various formats
  • Missing examples for: multi-table scenarios, conflicting data, partial tables, merged cells

Proposed Improvement: Add 3-5 additional few-shot examples covering:

  • Multiple tables with conflicting values (how to identify PRIMARY)
  • Tables with merged cells or irregular formatting
  • Partial tables (only 2-3 periods available)
  • Tables with footnotes containing critical adjustments
  • Pro forma vs historical side-by-side comparison

Success Likelihood: High (90%)
Implementation Difficulty: Low (2-3 hours)
Expected Impact:

  • Financial Accuracy: +5-8% (better handling of edge cases)
  • Data Completeness: +2-3% (fewer "Not specified" for valid data)
  • Format & Readability: Neutral
  • Processing Efficiency: Neutral
  • Insight Quality: Neutral

Code Reference: Lines 2721-2792 in llmService.ts


Recommendation 2: Enhance JSON Template with Inline Validation Hints

Location: buildCIMPrompt (llmService.ts:1069-1361)

Current State:

  • JSON template has format comments but lacks validation hints
  • No examples of correct vs incorrect values

Proposed Improvement: Add inline validation examples to JSON template:

"revenue": "Revenue amount for FY-3", // Format: "$XX.XM" (e.g., "$64.2M"). Must be $10M+ for target companies. If <$10M, likely wrong table.
"revenueGrowth": "N/A (baseline year)", // Format: "XX.X%" or "N/A". Calculate if not provided: ((FY-2 - FY-3) / FY-3) * 100

Success Likelihood: High (85%)
Implementation Difficulty: Low (1-2 hours)
Expected Impact:

  • Financial Accuracy: +3-5% (clearer format expectations)
  • Format & Readability: +5-7% (better format consistency)
  • Data Completeness: +1-2%
  • Processing Efficiency: Neutral
  • Insight Quality: Neutral

Code Reference: Lines 1069-1361 in llmService.ts


Recommendation 3: Add Explicit Format Standardization Examples

Location: buildCIMPrompt (llmService.ts:1395-1410)

Current State:

  • Format requirements are stated but lack concrete examples
  • No examples of incorrect formats to avoid

Proposed Improvement: Add "DO/DON'T" format examples:

**Currency Values**:
✓ CORRECT: "$64.2M", "$1.2B", "$20.5M" (from thousands)
✗ INCORRECT: "$64,200,000", "$64M revenue", "64.2 million"

**Percentages**:
✓ CORRECT: "29.3%", "(4.4)%" (negative)
✗ INCORRECT: "29.3 percent", "29.3", "-4.4%"

Success Likelihood: High (88%)
Implementation Difficulty: Low (1 hour)
Expected Impact:

  • Format & Readability: +8-10% (dramatically better format consistency)
  • Financial Accuracy: +2-3% (fewer parsing errors)
  • Processing Efficiency: +2-3% (less post-processing needed)
  • Data Completeness: Neutral
  • Insight Quality: Neutral

Code Reference: Lines 1395-1410 in llmService.ts


Recommendation 4: Enhance Cross-Table Validation Instructions

Location: buildFinancialPrompt (llmService.ts:2562-2584)

Current State:

  • Cross-table validation is mentioned but lacks step-by-step process
  • No specific validation rules for discrepancies

Proposed Improvement: Add structured cross-validation workflow:

**Step 5: Cross-Table Validation (CRITICAL)**
1. Extract from PRIMARY table first
2. Check executive summary for key metrics (revenue, EBITDA)
3. If discrepancy >10%, investigate:
   - Is executive summary using adjusted/pro forma numbers?
   - Is PRIMARY table using different period definitions?
   - Which source is more authoritative? (Usually detailed table)
4. Document any discrepancies in qualityOfEarnings field
5. Use PRIMARY table as authoritative source unless executive summary explicitly states adjustments

Success Likelihood: High (87%)
Implementation Difficulty: Low (2 hours)
Expected Impact:

  • Financial Accuracy: +6-9% (better handling of discrepancies)
  • Data Completeness: +2-3% (captures adjustments)
  • Format & Readability: Neutral
  • Processing Efficiency: Neutral
  • Insight Quality: +1-2% (better quality of earnings notes)

Code Reference: Lines 2562-2584 in llmService.ts


HIGH-IMPACT IMPROVEMENTS (High Success, Medium Difficulty)

Recommendation 5: Implement Multi-Pass Financial Validation

Location: extractPass1CombinedMetadataFinancial + new validation method

Current State:

  • Financial extraction happens in single pass
  • Validation occurs within the prompt but not systematically

Proposed Improvement: Add post-extraction validation pass:

  1. After Pass 1 extraction, run validation check
  2. If validation fails (magnitude, trends, calculations), trigger targeted re-extraction
  3. Use focused prompt asking LLM to re-check specific periods/metrics
  4. Compare results and flag discrepancies

Success Likelihood: High (82%)
Implementation Difficulty: Medium (6-8 hours)
Expected Impact:

  • Financial Accuracy: +10-15% (catches errors before final output)
  • Data Completeness: +3-5% (fills gaps found during validation)
  • Format & Readability: +2-3%
  • Processing Efficiency: -5-8% (additional pass adds time)
  • Insight Quality: +2-3%

Code Reference: New method needed in optimizedAgenticRAGProcessor.ts after line 1562


Recommendation 6: Enhance RAG Query with Field-Specific Semantic Boosts

Location: createCIMAnalysisQuery (optimizedAgenticRAGProcessor.ts:634-678)

Current State:

  • Priority weighting exists but is generic
  • Semantic specificity is good but could be more targeted

Proposed Improvement: Add field-specific semantic boost patterns:

**FINANCIAL DATA SEMANTIC BOOSTS** (Weight: 15/10 for financial chunks):
- Boost: "historical financial performance table", "income statement", "P&L statement"
- Boost: "revenue for FY-3 FY-2 FY-1 LTM", "EBITDA margin percentage"
- Boost: "trailing twelve months", "fiscal year end", "last twelve months"
- Penalize: "projected", "forecast", "budget", "plan" (unless explicitly historical)

**MARKET DATA SEMANTIC BOOSTS** (Weight: 12/10 for market chunks):
- Boost: "total addressable market TAM", "serviceable addressable market SAM"
- Boost: "market share percentage", "competitive positioning", "market leader"
- Boost: "compound annual growth rate CAGR", "market growth rate"

Success Likelihood: High (80%)
Implementation Difficulty: Medium (4-5 hours)
Expected Impact:

  • Data Completeness: +5-8% (better chunk retrieval)
  • Financial Accuracy: +3-5% (more relevant context)
  • Processing Efficiency: +3-5% (fewer irrelevant chunks)
  • Format & Readability: Neutral
  • Insight Quality: +2-4% (better context for analysis)

Code Reference: Lines 634-678 in optimizedAgenticRAGProcessor.ts


Recommendation 7: Add Dynamic Few-Shot Example Selection

Location: buildCIMPrompt + new helper method

Current State:

  • Fixed set of 10 financial examples
  • Examples don't adapt to document characteristics

Proposed Improvement: Create dynamic example selection based on detected document characteristics:

  • If document has fiscal year end different from calendar: include fiscal year examples
  • If document has thousands format: include conversion examples
  • If document has only 2-3 periods: include partial period examples
  • If document has pro forma tables: include pro forma vs historical examples

Success Likelihood: High (78%)
Implementation Difficulty: Medium (5-6 hours)
Expected Impact:

  • Financial Accuracy: +8-12% (examples match document format)
  • Data Completeness: +3-5%
  • Format & Readability: +2-3%
  • Processing Efficiency: Neutral (selection is fast)
  • Insight Quality: Neutral

Code Reference: New helper method in llmService.ts, modify buildCIMPrompt around line 1226


Recommendation 8: Enhance Gap-Filling Query with Field-Specific Inference Rules

Location: createGapFillingQuery (optimizedAgenticRAGProcessor.ts:2626-2750)

Current State:

  • Has inference rules but they're generic
  • Missing field-specific calculation rules

Proposed Improvement: Add comprehensive field-specific inference rules:

**FINANCIAL FIELD INFERENCE RULES**:
- revenueGrowth: If revenue for 2 periods available, calculate: ((Current - Prior) / Prior) * 100
- ebitdaMargin: If revenue and EBITDA available, calculate: (EBITDA / Revenue) * 100
- grossMargin: If revenue and grossProfit available, calculate: (Gross Profit / Revenue) * 100
- CAGR: If multiple periods available, calculate: ((End/Start)^(1/Periods) - 1) * 100

**MARKET FIELD INFERENCE RULES**:
- Market share: If TAM and company revenue available, calculate: (Revenue / TAM) * 100
- Market growth: If TAM for 2 periods available, calculate growth rate

**BUSINESS FIELD INFERENCE RULES**:
- Customer concentration: If top customers mentioned, sum percentages
- Recurring revenue %: If MRR/ARR and total revenue available, calculate percentage

Success Likelihood: High (85%)
Implementation Difficulty: Medium (4-5 hours)
Expected Impact:

  • Data Completeness: +8-12% (calculates missing derived fields)
  • Financial Accuracy: +3-5% (validates through calculation)
  • Format & Readability: +1-2%
  • Processing Efficiency: Neutral
  • Insight Quality: +2-3%

Code Reference: Lines 2725-2734 in optimizedAgenticRAGProcessor.ts


Recommendation 9: Add PE Investment Framework Scoring Template

Location: extractPass5InvestmentThesis (optimizedAgenticRAGProcessor.ts:1897-2067)

Current State:

  • BPCP alignment scoring exists but lacks detailed scoring rubric
  • No examples of high vs low scores

Proposed Improvement: Add detailed scoring rubric with examples:

**BPCP ALIGNMENT SCORING RUBRIC** (1-10 scale):

1. **EBITDA Fit** (Target: 5-20MM):
   - 10: $5-20MM EBITDA, perfect fit
   - 8: $3-5MM or $20-30MM, good fit with growth potential
   - 5: $1-3MM or $30-50MM, acceptable but outside sweet spot
   - 3: <$1MM or >$50MM, poor fit

2. **Industry Fit** (Consumer/Industrial):
   - 10: Pure consumer or industrial, core focus
   - 8: Mixed consumer/industrial, good fit
   - 5: Adjacent sector (e.g., healthcare services), acceptable
   - 3: Outside focus (e.g., tech, healthcare), poor fit

[Continue for all 7 criteria with specific examples]

Success Likelihood: High (83%)
Implementation Difficulty: Medium (3-4 hours)
Expected Impact:

  • Insight Quality: +10-15% (more consistent, quantitative scoring)
  • Data Completeness: +2-3% (ensures all criteria scored)
  • Format & Readability: +3-5% (standardized scores)
  • Financial Accuracy: Neutral
  • Processing Efficiency: Neutral

Code Reference: Lines 1994-2005 in optimizedAgenticRAGProcessor.ts


STRATEGIC ENHANCEMENTS (High Success, High Difficulty)

Recommendation 10: Implement Multi-Pass Cross-Validation System

Location: New validation service + integration points

Current State:

  • Each pass extracts independently
  • No systematic cross-validation between passes

Proposed Improvement: Create validation service that:

  1. After all passes complete, runs cross-validation checks
  2. Identifies inconsistencies (e.g., company name differs, financials don't match)
  3. Triggers targeted re-extraction for inconsistent fields
  4. Maintains validation log for debugging

Success Likelihood: High (75%)
Implementation Difficulty: High (12-15 hours)
Expected Impact:

  • Financial Accuracy: +12-18% (catches cross-pass inconsistencies)
  • Data Completeness: +5-8% (fills gaps found during validation)
  • Format & Readability: +3-5%
  • Processing Efficiency: -8-12% (additional validation pass)
  • Insight Quality: +5-7%

Code Reference: New file: backend/src/services/crossValidationService.ts


Recommendation 11: Add Context-Aware Prompt Adaptation

Location: buildEnhancedExtractionInstructions + document analysis

Current State:

  • Dynamic instructions exist but are rule-based
  • Doesn't adapt to document-specific patterns

Proposed Improvement: Add document pattern detection and adaptive prompts:

  1. Analyze document structure (sections, table locations, format patterns)
  2. Detect document "type" (e.g., "bank-prepared CIM", "company-prepared", "auction process")
  3. Adapt prompts based on detected patterns:
    • Bank-prepared: Emphasize executive summary cross-reference
    • Company-prepared: Emphasize narrative text extraction
    • Auction: Emphasize competitive positioning

Success Likelihood: Medium-High (70%)
Implementation Difficulty: High (10-12 hours)
Expected Impact:

  • Financial Accuracy: +8-12% (better extraction for document type)
  • Data Completeness: +6-10% (targets right sections)
  • Format & Readability: +2-3%
  • Processing Efficiency: +5-8% (more targeted extraction)
  • Insight Quality: +4-6%

Code Reference: Enhance buildEnhancedExtractionInstructions in optimizedAgenticRAGProcessor.ts:2194-2322


Recommendation 12: Implement Confidence Scoring and Uncertainty Handling

Location: New confidence scoring system + prompt enhancements

Current State:

  • Confidence scoring mentioned in getFinancialSystemPrompt but not used
  • No systematic uncertainty handling

Proposed Improvement:

  1. Add confidence scores to extraction output (High/Medium/Low)
  2. For Low confidence fields, trigger targeted re-extraction
  3. Add uncertainty indicators to JSON output
  4. Use confidence scores to prioritize gap-filling

Success Likelihood: Medium-High (72%)
Implementation Difficulty: High (10-12 hours)
Expected Impact:

  • Financial Accuracy: +10-15% (flags uncertain extractions)
  • Data Completeness: +5-8% (targeted re-extraction)
  • Format & Readability: +2-3% (uncertainty indicators)
  • Processing Efficiency: -5-10% (additional passes for low confidence)
  • Insight Quality: +3-5%

Code Reference: New method in llmService.ts, modify schema in llmSchemas.ts


Recommendation 13: Add PE Investment Thesis Template with Examples

Location: extractPass5InvestmentThesis (optimizedAgenticRAGProcessor.ts:1897-2067)

Current State:

  • Framework exists but lacks concrete examples
  • No "good vs bad" investment thesis examples

Proposed Improvement: Add comprehensive investment thesis template with examples:

**EXAMPLE: HIGH-QUALITY INVESTMENT THESIS**

Key Attractions:
1. Market-leading position with 25% market share in $2.5B TAM, providing pricing power and competitive moat. Revenue grew 15% CAGR over 3 years to $64M, demonstrating strong execution. This market position supports 2-3x revenue growth potential through geographic expansion and product line extensions.

[Continue with 4-7 more examples showing specificity, quantification, and investment impact]

**EXAMPLE: LOW-QUALITY INVESTMENT THESIS (AVOID)**

Key Attractions:
1. Strong market position. [TOO VAGUE - lacks specificity, quantification, investment impact]
2. Good management team. [TOO GENERIC - no details, no track record, no investment significance]

Success Likelihood: High (88%)
Implementation Difficulty: Medium-High (6-8 hours)
Expected Impact:

  • Insight Quality: +15-20% (dramatically better investment thesis quality)
  • Data Completeness: +3-5% (ensures all required elements)
  • Format & Readability: +5-7% (consistent structure)
  • Financial Accuracy: Neutral
  • Processing Efficiency: Neutral

Code Reference: Lines 1901-2067 in optimizedAgenticRAGProcessor.ts


Recommendation 14: Enhance List Field Repair with Document-Specific Context

Location: repairListField (optimizedAgenticRAGProcessor.ts:2832-3000)

Current State:

  • Uses first 5 chunks for context (4000 chars)
  • Doesn't prioritize most relevant chunks

Proposed Improvement:

  1. Use RAG to find most relevant chunks for the specific field being repaired
  2. Increase context to 6000-8000 chars for better understanding
  3. Add field-specific chunk prioritization (e.g., for "risks", prioritize risk sections)
  4. Include examples of high-quality items from similar fields

Success Likelihood: High (80%)
Implementation Difficulty: Medium (5-6 hours)
Expected Impact:

  • Insight Quality: +8-12% (better list item quality)
  • Data Completeness: +3-5% (more comprehensive lists)
  • Format & Readability: +2-3%
  • Financial Accuracy: Neutral
  • Processing Efficiency: -2-3% (more context processing)

Code Reference: Lines 2841-2926 in optimizedAgenticRAGProcessor.ts


Recommendation 15: Add Structured Extraction Workflow with Checkpoints

Location: buildCIMPrompt (llmService.ts:1365-1394)

Current State:

  • Workflow exists but is linear
  • No validation checkpoints

Proposed Improvement: Add checkpoint-based workflow:

**Phase 1: Document Structure Analysis** [CHECKPOINT: Verify sections identified]
1. Identify document sections...
2. Locate key sections...
[VALIDATION: If <5 sections found, expand search]

**Phase 2: Financial Data Extraction** [CHECKPOINT: Validate financial table found]
1. Locate PRIMARY historical financial table
[VALIDATION: If revenue <$10M, search for alternative table]
2. Extract financial metrics...
[VALIDATION: Verify magnitude, trends, calculations]

Success Likelihood: Medium-High (75%)
Implementation Difficulty: High (8-10 hours)
Expected Impact:

  • Financial Accuracy: +10-15% (catches errors at checkpoints)
  • Data Completeness: +5-8% (expands search when needed)
  • Format & Readability: +2-3%
  • Processing Efficiency: -5-8% (additional validation steps)
  • Insight Quality: +3-5%

Code Reference: Lines 1365-1394 in llmService.ts


Prioritized Implementation Roadmap

Phase 1: Quick Wins (Week 1-2)

Total Effort: 6-8 hours
Expected Impact: +15-25% improvement across objectives

  1. Recommendation 3: Add Explicit Format Standardization Examples
  2. Recommendation 2: Enhance JSON Template with Inline Validation Hints
  3. Recommendation 1: Add Financial Table Detection Examples with Edge Cases
  4. Recommendation 4: Enhance Cross-Table Validation Instructions

Phase 2: High-Impact Improvements (Week 3-4)

Total Effort: 22-28 hours
Expected Impact: +25-40% improvement across objectives

  1. Recommendation 6: Enhance RAG Query with Field-Specific Semantic Boosts
  2. Recommendation 8: Enhance Gap-Filling Query with Field-Specific Inference Rules
  3. Recommendation 9: Add PE Investment Framework Scoring Template
  4. Recommendation 7: Add Dynamic Few-Shot Example Selection
  5. Recommendation 13: Add PE Investment Thesis Template with Examples

Phase 3: Strategic Enhancements (Week 5-8)

Total Effort: 40-50 hours
Expected Impact: +30-50% improvement across objectives

  1. Recommendation 14: Enhance List Field Repair with Document-Specific Context
  2. Recommendation 5: Implement Multi-Pass Financial Validation
  3. Recommendation 10: Implement Multi-Pass Cross-Validation System
  4. Recommendation 11: Add Context-Aware Prompt Adaptation
  5. Recommendation 12: Implement Confidence Scoring and Uncertainty Handling
  6. Recommendation 15: Add Structured Extraction Workflow with Checkpoints

Success Metrics

Baseline (Current State)

  • Financial Accuracy: ~85-90% (estimated)
  • Data Completeness: ~80-85% (estimated)
  • Format Consistency: ~75-80% (estimated)
  • Processing Speed: Baseline
  • Investment Quality: ~7/10 (estimated)

Target (After All Recommendations)

  • Financial Accuracy: >99% (validated against manual review)
  • Data Completeness: >95% (excluding truly unavailable data)
  • Format Consistency: >98% (adherence to format specifications)
  • Processing Speed: <30% increase (despite improvements)
  • Investment Quality: >8.5/10 (investment committee feedback)

Risk Assessment

Low Risk (Recommendations 1-4, 6-9, 13-14)

  • Well-defined scope
  • Clear implementation path
  • Low chance of breaking existing functionality
  • Easy to roll back if needed

Medium Risk (Recommendations 5, 10, 11, 15)

  • More complex implementation
  • May require architectural changes
  • Testing required to ensure no regressions
  • May impact processing time

High Risk (Recommendation 12)

  • Requires schema changes
  • May impact downstream systems
  • Requires comprehensive testing
  • Most complex to implement

Conclusion

This analysis identifies 15 specific, actionable recommendations to optimize AI prompts across the CIM processing system. The recommendations are prioritized by success likelihood and implementation difficulty, with a clear roadmap for implementation over 8 weeks.

Key Takeaways:

  1. Quick wins can deliver 15-25% improvement with minimal effort
  2. High-impact improvements can deliver 25-40% improvement with moderate effort
  3. Strategic enhancements can deliver 30-50% improvement but require significant effort

Recommended Approach:

  • Start with Phase 1 (Quick Wins) to build momentum
  • Validate improvements with real CIM documents
  • Iterate based on results before moving to Phase 2
  • Consider Phase 3 enhancements based on business priorities and resource availability

All recommendations include specific code references, success likelihood assessments, and expected impact on the 5 core objectives.