cim_summary

Author	SHA1	Message	Date
admin	e4a7699938	docs(02-04): complete runHealthProbes + runRetentionCleanup plan - Phase 2 plan 4 complete — two scheduled Cloud Function exports added - SUMMARY.md created with decisions, deviations, and phase readiness notes - STATE.md updated: phase 2 complete, plan counter at 4/4 - ROADMAP.md updated: phase 2 all 4 plans complete - Requirements HLTH-03 and INFR-03 marked complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 14:37:00 -05:00
admin	1f9df623b4	feat(02-04): add runHealthProbes scheduled Cloud Function export - Runs every 5 minutes, separate from processDocumentJobs (PITFALL-2, HLTH-03) - Calls healthProbeService.runAllProbes() then alertService.evaluateAndAlert() - Uses dynamic import() pattern matching processDocumentJobs - retryCount: 0 — probes re-run in 5 minutes, no retry needed - Lists all required secrets: anthropicApiKey, openaiApiKey, databaseUrl, supabaseServiceKey, supabaseAnonKey	2026-02-24 14:35:01 -05:00
admin	0acacd1269	docs(02-03): complete alertService plan - SUMMARY.md with deduplication, lazy transporter, and email decisions - STATE.md: plan 3/4, 50% progress, decisions recorded - ROADMAP.md: phase 02 updated (3/4 summaries) - REQUIREMENTS.md: ALRT-01, ALRT-02, ALRT-04 marked complete	2026-02-24 14:33:05 -05:00
admin	4b5afe2132	test(02-03): add alertService unit tests (8 passing) - healthy probes trigger no alert logic - down probe creates alert_events row and sends email - degraded probe uses alert_type service_degraded - deduplication suppresses row creation and email within cooldown - recipient read from process.env.EMAIL_WEEKLY_RECIPIENT - missing recipient skips email but still creates alert row - email failure does not throw (non-throwing pipeline) - multiple probes processed independently - vi.mock factories use inline vi.fn() only (no TDZ hoisting errors)	2026-02-24 14:30:16 -05:00
admin	91f609cf92	feat(02-03): create alertService with deduplication and email - evaluateAndAlert() iterates ProbeResults and skips healthy probes - Maps 'down' -> 'service_down', 'degraded' -> 'service_degraded' - Deduplication via AlertEventModel.findRecentByService with configurable cooldown - Creates alert_events row before sending email (suppression skips both) - Recipient read from process.env.EMAIL_WEEKLY_RECIPIENT (never hardcoded) - createTransporter() called inside function scope (Firebase Secret timing fix) - Email failures caught and logged, never re-thrown	2026-02-24 14:28:20 -05:00
admin	520b6b1fe2	docs(02-01): complete analytics service plan - Created 02-01-SUMMARY.md with full execution documentation - Updated ROADMAP.md with phase 2 plan progress (2 of 4 plans with summaries) - Marked requirements ANLY-01 and ANLY-03 complete in REQUIREMENTS.md - Added 02-01 key decisions to STATE.md	2026-02-24 14:26:41 -05:00
admin	018fb7a24c	docs(02-02): complete health probe service plan - SUMMARY.md: 4 probers, 9 unit tests, nodemailer installed - STATE.md: advanced to phase 2 plan 2, added 5 key decisions - ROADMAP.md: updated phase 2 progress (2/4 summaries) - REQUIREMENTS.md: marked HLTH-02 and HLTH-04 complete	2026-02-24 14:25:45 -05:00
admin	cf30811b97	test(02-01): add analyticsService unit tests - 6 tests: recordProcessingEvent (4 tests) + deleteProcessingEventsOlderThan (2 tests) - Verifies fire-and-forget: void return (undefined), no throw on Supabase failure - Verifies error logging on Supabase failure without rethrowing - Verifies null coalescing for optional fields (duration_ms, error_message, stage) - Verifies cutoff date math (~30 days ago) and row count return - Uses makeSupabaseChain() pattern from Phase 1 model tests	2026-02-24 14:23:42 -05:00
admin	a8ba884043	test(02-02): add healthProbeService unit tests - 9 tests covering all 4 probers and orchestrator - Verifies all probes return 4 ProbeResults with correct service names - Verifies results persisted via HealthCheckModel.create 4 times - Verifies one probe failure does not abort other probes - Verifies LLM probe 429 returns degraded not down - Verifies Supabase probe uses getPostgresPool (not PostgREST) - Verifies Firebase Auth distinguishes expected vs unexpected errors - Verifies latency_ms is a non-negative number - Verifies HealthCheckModel.create failure is isolated	2026-02-24 14:23:35 -05:00
admin	41298262d6	feat(02-02): install nodemailer and create healthProbeService - Install nodemailer + @types/nodemailer (needed by Plan 03) - Create healthProbeService.ts with 4 probers: document_ai, llm_api, supabase, firebase_auth - Each probe makes a real authenticated API call - Each probe returns structured ProbeResult with status, latency_ms, error_message - LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5 - Supabase probe uses getPostgresPool().query('SELECT 1') not PostgREST - Firebase Auth probe distinguishes expected vs unexpected errors - runAllProbes orchestrator uses Promise.allSettled for fault isolation - Results persisted via HealthCheckModel.create() after each probe	2026-02-24 14:22:38 -05:00
admin	ef88541511	feat(02-01): create analytics migration and analyticsService - Migration 013: document_processing_events table with CHECK constraint on event_type - Indexes on created_at (retention queries) and document_id (per-doc history) - RLS enabled following migration 012 pattern - analyticsService.recordProcessingEvent(): void return (fire-and-forget, never throws) - analyticsService.deleteProcessingEventsOlderThan(): Promise<number> (retention delete) - getSupabaseServiceClient() called per-method, never cached at module level	2026-02-24 14:22:05 -05:00
admin	73f8d8271e	docs(02-backend-services): create phase plan	2026-02-24 14:14:54 -05:00
admin	fcb3987c56	docs(02): research phase backend services domain	2026-02-24 14:08:02 -05:00
admin	13454fe860	docs(phase-01): complete phase execution and verification	2026-02-24 14:00:10 -05:00
admin	20e3bec887	docs(01-02): complete model unit tests plan - SUMMARY: 33 tests for HealthCheckModel (14) and AlertEventModel (19) - STATE: Advanced to plan 02 complete, added 3 mock pattern decisions - ROADMAP: Phase 1 Data Foundation marked complete (2/2 plans)	2026-02-24 12:22:12 -05:00
admin	e630ff744a	test(01-02): add AlertEventModel unit tests - Tests cover create (valid, default status active, explicit status, with details JSONB) - Input validation (empty name, invalid alert_type, invalid status) - Supabase error handling (throws descriptive message) - findActive (all active, filtered by service, empty array) - acknowledge (sets status+timestamp, throws on not found via PGRST116) - resolve (sets status+timestamp, throws on not found) - findRecentByService (found within window, null when absent — deduplication use case) - deleteOlderThan (cutoff date, returns count) - All 41 tests pass (14 HealthCheck + 19 AlertEvent + 8 existing)	2026-02-24 12:22:12 -05:00
admin	61c2b9fc73	test(01-02): add HealthCheckModel unit tests - Tests cover create (valid, minimal, probe_details), input validation (empty name, invalid status) - Supabase error handling (throws, logs error) - findLatestByService (found, not found — PGRST116 null return) - findAll (default limit 100, filtered by service, custom limit) - deleteOlderThan (cutoff date calculation, returns count) - Establishes Supabase chainable mock pattern for future model tests - Mocks getSupabaseServiceClient confirming INFR-04 compliance	2026-02-24 12:22:12 -05:00
admin	1e4bc99fd1	feat(01-01): add HealthCheckModel and AlertEventModel with barrel exports - HealthCheckModel: typed interfaces (ServiceHealthCheck, CreateHealthCheckData), static methods create/findLatestByService/findAll/deleteOlderThan, input validation, getSupabaseServiceClient() per-method, Winston logging - AlertEventModel: typed interfaces (AlertEvent, CreateAlertEventData), static methods create/findActive/acknowledge/resolve/findRecentByService/deleteOlderThan, input validation, PGRST116 handled as null, Winston logging - Update models/index.ts to re-export both models and their types - Strict TypeScript: Record<string, unknown> for JSONB fields, no any types	2026-02-24 12:22:12 -05:00
admin	94d1c0adae	feat(01-01): create monitoring tables migration - Add service_health_checks table with status CHECK constraint, JSONB probe_details, checked_at column - Add alert_events table with alert_type and status CHECK constraints, lifecycle timestamps - Add created_at indexes on both tables (INFR-01 requirement) - Add composite indexes for common query patterns - Enable RLS on both tables (service role bypasses RLS per Supabase pattern)	2026-02-24 12:22:12 -05:00
admin	fec5d0319e	Merge branch 'upgrade/firebase-functions-v7-nodejs22' into production Upgrade Firebase Functions runtime to Node.js 22 and firebase-functions v7. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 11:44:02 -05:00
admin	e606027ddc	docs(01-01): complete monitoring data foundation plan - Add 01-01-SUMMARY.md documenting migration + model layer delivery - Update STATE.md: plan 1 complete, decisions recorded, session updated - Update ROADMAP.md: phase 1 progress 1/2 plans complete - Mark INFR-01 and INFR-04 requirements complete in REQUIREMENTS.md	2026-02-24 11:42:53 -05:00
admin	9a5ff52d12	chore: upgrade Firebase Functions to Node.js 22 and firebase-functions v7 Node.js 20 is being decommissioned 2026-10-30. This upgrades the runtime to Node.js 22 (LTS), bumps firebase-functions from v6 to v7, removes the deprecated functions.config() fallback, and aligns the TS target to ES2022. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 11:41:00 -05:00
admin	6429e98f58	docs(01): create phase plan	2026-02-24 11:24:17 -05:00
admin	c480d4b990	docs(01): research phase data foundation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 11:20:24 -05:00
admin	54157fe74d	docs(01): capture phase context	2026-02-24 11:15:10 -05:00
admin	fcaf4579e1	docs: create roadmap (4 phases)	2026-02-24 11:08:13 -05:00
admin	503f39bd9c	docs: define v1 requirements	2026-02-24 11:05:34 -05:00
admin	f9cc71b959	chore: add project config	2026-02-24 10:52:04 -05:00
admin	972760b957	docs: initialize project	2026-02-24 10:49:52 -05:00
admin	e6e1b1fa6f	docs: map existing codebase	2026-02-24 10:28:22 -05:00
admin	9a906763c7	Remove 15 stale planning and analysis docs These are completed implementation plans, one-time analysis artifacts, and generic guides that no longer reflect the current codebase. All useful content is either implemented in code or captured in TODO_AND_OPTIMIZATIONS.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 10:12:23 -05:00
admin	3d01085b10	Fix hardcoded processing strategy in document controller The confirmUpload and inline processing paths were hardcoded to 'document_ai_agentic_rag', ignoring the config setting. Now reads from config.processingStrategy so the single-pass processor is actually used when configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 22:37:38 -05:00
admin	5cfb136484	Add single-pass CIM processor: 2 LLM calls, ~2.5 min processing New processing strategy `single_pass_quality_check` replaces the multi-pass agentic RAG pipeline (15-25 min) with a streamlined 2-call approach: 1. Full-document LLM extraction (Sonnet) — single call with complete CIM text 2. Delta quality-check (Haiku) — reviews extraction, returns only corrections Key changes: - New singlePassProcessor.ts with extraction + quality check flow - llmService: qualityCheckCIMDocument() with delta-only corrections array - llmService: improved prompt requiring professional inferences for qualitative fields instead of defaulting to "Not specified in CIM" - Removed deterministic financial parser from single-pass flow (LLM outperforms it — parser matched footnotes and narrative text as financials) - Default strategy changed to single_pass_quality_check - Completeness scoring with diagnostic logging of empty fields Tested on 2 real CIMs: 100% completeness, correct financials, ~150s each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 22:28:45 -05:00
admin	f4bd60ca38	Fix CIM processing pipeline: embeddings, model refs, and timeouts - Fix invalid model name claude-3-7-sonnet-latest → use config.llm.model - Increase LLM timeout from 3 min to 6 min for complex CIM analysis - Improve RAG fallback to use evenly-spaced chunks when keyword matching finds too few results (prevents sending tiny fragments to LLM) - Add model name normalization for Claude 4.x family - Add googleServiceAccount utility for unified credential resolution - Add Cloud Run log fetching script - Update default models to Claude 4.6/4.5 family Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:33:31 -05:00
admin	b00700edd7	Note runtime/dependency upgrades in to-do list	2026-02-23 14:50:56 -05:00
admin	9480a3c994	Add acceptance tests and align defaults to Sonnet 4	2026-02-23 14:45:57 -05:00
admin	14d5c360e5	Set up clean Firebase deploy workflow from git source - Add @google-cloud/functions-framework and ts-node deps to match deployed - Add .env.bak ignore patterns to firebase.json - Fix adminService.ts: inline axios client (was importing non-existent module) - Clean .env to exclude GCP Secret Manager secrets (prevents deploy overlap error) - Verified: both frontend and backend build and deploy successfully Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:41:00 -05:00
admin	ecd4b13115	Fix EBITDA margin auto-correction and TypeScript compilation error - Added auto-correction logic for EBITDA margins when difference >15pp - Fixed missing closing brace in revenue validation block - Enhanced margin validation to catch cases like 95% -> 22.3%	2025-11-10 15:53:17 -05:00
admin	59e0938b72	Implement Claude Haiku 3.5 for financial extraction - Use Haiku 3.5 (claude-3-5-haiku-latest) for financial extraction by default - Automatically adjust maxTokens to 8192 for Haiku (vs 16000 for Sonnet) - Add intelligent fallback to Sonnet 4.5 if Haiku validation fails - Add comprehensive test script for Haiku financial extraction - Fix TypeScript errors in financial validation logic Benefits: - ~50% faster processing (13s vs 26s estimated) - ~92% cost reduction (--.014 vs --.15 per extraction) - Maintains accuracy with validation fallback Tested successfully with Stax Holding Company CIM: - Correctly extracted FY3=4M, FY2=1M, FY1=6M, LTM=1M - Processing time: 13.15s - Cost: --.0138	2025-11-10 14:44:37 -05:00
admin	e1411ec39c	Fix financial summary generation issues - Fix period ordering: Display periods in chronological order (FY3 → FY2 → FY1 → LTM) - Add missing metrics: Include Gross Profit and Gross Margin rows in summary table - Enhance financial parser: Improve column alignment validation and logging - Strengthen LLM prompts: Add better examples, validation checks, and column alignment guidance - Improve validation: Add cross-period validation, trend checking, and margin consistency checks - Add test suite: Create comprehensive tests for financial summary workflow All tests passing. Summary table now correctly displays periods chronologically and includes all required metrics.	2025-11-10 14:00:42 -05:00
admin	ac561f9021	fix: Remove duplicate sync:secrets script (reappeared in working directory)	2025-11-10 06:35:07 -05:00
admin	f62ef72a8a	docs: Add comprehensive financial extraction improvement plan This plan addresses all 10 pending todos with detailed implementation steps: Priority 1 (Weeks 1-2): Research & Analysis - Review older commits for historical patterns - Research best practices for financial data extraction Priority 2 (Weeks 3-4): Performance Optimization - Reduce processing time from 178s to <120s - Implement tiered model approach, parallel processing, prompt optimization Priority 3 (Weeks 5-6): Testing & Validation - Add comprehensive unit tests (>80% coverage) - Test invalid value rejection, cross-period validation, period identification Priority 4 (Weeks 7-8): Monitoring & Observability - Track extraction success rates, error patterns - Implement user feedback collection Priority 5 (Weeks 9-11): Code Quality & Documentation - Optimize prompt size (20-30% reduction) - Add financial data visualization UI - Document extraction strategies Priority 6 (Weeks 12-14): Advanced Features - Compare RAG vs Simple extraction approaches - Add confidence scores for extractions Includes detailed tasks, deliverables, success criteria, timeline, and risk mitigation strategies.	2025-11-10 06:33:41 -05:00
admin	b2c9db59c2	fix: Remove duplicate sync:secrets script, keep sync-secrets as canonical - Remove duplicate 'sync:secrets' script (line 41) - Keep 'sync-secrets' (line 29) as the canonical version - Matches existing references in bash scripts (clean-env-secrets.sh, pre-deploy-check.sh) - Resolves DRY violation and script naming confusion	2025-11-10 02:46:56 -05:00
admin	8b15732a98	feat: Add pre-deployment validation and deployment automation - Add pre-deploy-check.sh script to validate .env doesn't contain secrets - Add clean-env-secrets.sh script to remove secrets from .env before deployment - Update deploy:firebase script to run validation automatically - Add sync-secrets npm script for local development - Add deploy:firebase:force for deployments that skip validation This prevents 'Secret environment variable overlaps non secret environment variable' errors by ensuring secrets defined via defineSecret() are not also in .env file. ## Completed Todos - ✅ Test financial extraction with Stax Holding Company CIM - All values correct (FY-3: $64M, FY-2: $71M, FY-1: $71M, LTM: $76M) - ✅ Implement deterministic parser fallback - Integrated into simpleDocumentProcessor - ✅ Implement few-shot examples - Added comprehensive examples for PRIMARY table identification - ✅ Fix primary table identification - Financial extraction now correctly identifies PRIMARY table (millions) vs subsidiary tables (thousands) ## Pending Todos 1. Review older commits (1-2 months ago) to see how financial extraction was working then - Check commits: `185c780` (Claude 3.7), `5b3b1bf` (Document AI fixes), `0ec3d14` (multi-pass extraction) - Compare prompt simplicity - older versions may have had simpler, more effective prompts - Check if deterministic parser was being used more effectively 2. Review best practices for structured financial data extraction from PDFs/CIMs - Research: LLM prompt engineering for tabular data (few-shot examples, chain-of-thought) - Period identification strategies - Validation techniques - Hybrid approaches (deterministic + LLM) - Error handling patterns - Check academic papers and industry case studies 3. Determine how to reduce processing time without sacrificing accuracy - Options: 1) Use Claude Haiku 4.5 for initial extraction, Sonnet 4.5 for validation - 2) Parallel extraction of different sections - 3) Caching common patterns - 4) Streaming responses - 5) Incremental processing with early validation - 6) Reduce prompt verbosity while maintaining clarity 4. Add unit tests for financial extraction validation logic - Test: invalid value rejection, cross-period validation, numeric extraction - Period identification from various formats (years, FY-X, mixed) - Include edge cases: missing periods, projections mixed with historical, inconsistent formatting 5. Monitor production financial extraction accuracy - Track: extraction success rate, validation rejection rate, common error patterns - User feedback on extracted financial data - Set up alerts for validation failures and extraction inconsistencies 6. Optimize prompt size for financial extraction - Current prompts may be too verbose - Test shorter, more focused prompts that maintain accuracy - Consider: removing redundant instructions, using more concise examples, focusing on critical rules only 7. Add financial data visualization - Consider adding a financial data preview/validation step in the UI - Allow users to verify/correct extracted values if needed - Provides human-in-the-loop validation for critical financial data 8. Document extraction strategies - Document the different financial table formats found in CIMs - Create a reference guide for common patterns (years format, FY-X format, mixed format, etc.) - This will help with prompt engineering and parser improvements 9. Compare RAG-based extraction vs simple full-document extraction for financial accuracy - Determine which approach produces more accurate financial data and why - May need to hybrid approach 10. Add confidence scores to financial extraction results - Flag low-confidence extractions for manual review - Helps identify when extraction may be incorrect and needs human validation	2025-11-10 02:43:47 -05:00
admin	77df7c2101	Merge feature/fix-financial-extraction-primary-table: Financial extraction now correctly identifies PRIMARY table	2025-11-10 02:22:38 -05:00
admin	7acd1297bb	feat: Implement separate financial extraction with few-shot examples - Add processFinancialsOnly() method for focused financial extraction - Integrate deterministic parser into simpleDocumentProcessor - Add comprehensive few-shot examples showing PRIMARY vs subsidiary tables - Enhance prompt with explicit PRIMARY table identification rules - Fix maxTokens default from 3500 to 16000 to prevent truncation - Add test script for Stax Holding Company CIM validation Test Results: ✅ FY-3: 4M revenue, cd /home/jonathan/Coding/cim_summary && git commit -m "feat: Implement separate financial extraction with few-shot examples - Add processFinancialsOnly() method for focused financial extraction - Integrate deterministic parser into simpleDocumentProcessor - Add comprehensive few-shot examples showing PRIMARY vs subsidiary tables - Enhance prompt with explicit PRIMARY table identification rules - Fix maxTokens default from 3500 to 16000 to prevent truncation - Add test script for Stax Holding Company CIM validation Test Results: ✅ FY-3: $64M revenue, $19M EBITDA (correct) ✅ FY-2: $71M revenue, $24M EBITDA (correct) ✅ FY-1: $71M revenue, $24M EBITDA (correct) ✅ LTM: $76M revenue, $27M EBITDA (correct) All financial values now correctly extracted from PRIMARY table (millions format) instead of subsidiary tables (thousands format)."9M EBITDA (correct) ✅ FY-2: 1M revenue, 4M EBITDA (correct) ✅ FY-1: 1M revenue, 4M EBITDA (correct) ✅ LTM: 6M revenue, 7M EBITDA (correct) All financial values now correctly extracted from PRIMARY table (millions format) instead of subsidiary tables (thousands format).	2025-11-10 02:17:40 -05:00
admin	531686bb91	fix: Improve financial extraction accuracy and validation - Upgrade to Claude Sonnet 4.5 for better accuracy - Simplify and clarify financial extraction prompts - Add flexible period identification (years, FY-X, LTM formats) - Add cross-validation to catch wrong column extraction - Reject values that are too small (<M revenue, <00K EBITDA) - Add monitoring scripts for document processing - Improve validation to catch inconsistent values across periods	2025-11-09 21:57:55 -05:00
admin	63fe7e97a8	Merge pull request 'production-current' (#1 ) from production-current into master Reviewed-on: #1	2025-11-09 21:09:23 -05:00
admin	9c916d12f4	feat: Production release v2.0.0 - Simple Document Processor Major release with significant performance improvements and new processing strategy. ## Core Changes - Implemented simple_full_document processing strategy (default) - Full document → LLM approach: 1-2 passes, ~5-6 minutes processing time - Achieved 100% completeness with 2 API calls (down from 5+) - Removed redundant Document AI passes for faster processing ## Financial Data Extraction - Enhanced deterministic financial table parser - Improved FY3/FY2/FY1/LTM identification from varying CIM formats - Automatic merging of parser results with LLM extraction ## Code Quality & Infrastructure - Cleaned up debug logging (removed emoji markers from production code) - Fixed Firebase Secrets configuration (using modern defineSecret approach) - Updated OpenAI API key - Resolved deployment conflicts (secrets vs environment variables) - Added .env files to Firebase ignore list ## Deployment - Firebase Functions v2 deployment successful - All 7 required secrets verified and configured - Function URL: https://api-y56ccs6wva-uc.a.run.app ## Performance Improvements - Processing time: ~5-6 minutes (down from 23+ minutes) - API calls: 1-2 (down from 5+) - Completeness: 100% achievable - LLM Model: claude-3-7-sonnet-latest ## Breaking Changes - Default processing strategy changed to 'simple_full_document' - RAG processor available as alternative strategy 'document_ai_agentic_rag' ## Files Changed - 36 files changed, 5642 insertions(+), 4451 deletions(-) - Removed deprecated documentation files - Cleaned up unused services and models This release represents a major refactoring focused on speed, accuracy, and maintainability. v2.0.0	2025-11-09 21:07:22 -05:00
admin	0ec3d1412b	feat: Implement multi-pass hierarchical extraction for 95-98% data coverage Replaces single-pass RAG extraction with 6-pass targeted extraction strategy: Pass 1: Metadata & Structure - Deal overview fields (company name, industry, geography, employees) - Targeted RAG query for basic company information - 20 chunks focused on executive summary and overview sections Pass 2: Financial Data - All financial metrics (FY-3, FY-2, FY-1, LTM) - Revenue, EBITDA, margins, cash flow - 30 chunks with emphasis on financial tables and appendices - Extracts quality of earnings, capex, working capital Pass 3: Market Analysis - TAM/SAM market sizing, growth rates - Competitive landscape and positioning - Industry trends and barriers to entry - 25 chunks focused on market and industry sections Pass 4: Business & Operations - Products/services and value proposition - Customer and supplier information - Management team and org structure - 25 chunks covering business model and operations Pass 5: Investment Thesis - Strategic analysis and recommendations - Value creation levers and risks - Alignment with fund strategy - 30 chunks for synthesis and high-level analysis Pass 6: Validation & Gap-Filling - Identifies fields still marked "Not specified in CIM" - Groups missing fields into logical batches - Makes targeted RAG queries for each batch - Dynamic API usage based on gaps found Key Improvements: - Each pass uses targeted RAG queries optimized for that data type - Smart merge strategy preserves first non-empty value for each field - Gap-filling pass catches data missed in initial passes - Total ~5-10 LLM API calls vs. 1 (controlled cost increase) - Expected to achieve 95-98% data coverage vs. ~40-50% currently Technical Details: - Updated processLargeDocument to use generateLLMAnalysisMultiPass - Added processingStrategy: 'document_ai_multi_pass_rag' - Each pass includes keyword fallback if RAG search fails - Deep merge utility prevents "Not specified" from overwriting good data - Comprehensive logging for debugging each pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-07 13:15:19 -05:00

1 2

86 Commits