Files

admin f4bd60ca38 Fix CIM processing pipeline: embeddings, model refs, and timeouts

- Fix invalid model name claude-3-7-sonnet-latest → use config.llm.model
- Increase LLM timeout from 3 min to 6 min for complex CIM analysis
- Improve RAG fallback to use evenly-spaced chunks when keyword matching
  finds too few results (prevents sending tiny fragments to LLM)
- Add model name normalization for Claude 4.x family
- Add googleServiceAccount utility for unified credential resolution
- Add Cloud Run log fetching script
- Update default models to Claude 4.6/4.5 family

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-23 18:33:31 -05:00

4.1 KiB

Raw Blame History

Operational To-Dos & Optimization Backlog

To-Do List (as of 2026-02-23)

Wire Firebase Functions secrets: Attach ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY, SUPABASE_SERVICE_KEY, SUPABASE_ANON_KEY, DATABASE_URL, EMAIL_PASS, and FIREBASE_SERVICE_ACCOUNT to every deployed function so the runtime no longer depends on local .env values.
Set GCLOUD_PROJECT_ID explicitly: Export GCLOUD_PROJECT_ID=cim-summarizer (or the active project) for local scripts and production functions so Document AI processor paths stop defaulting to projects/undefined.
Acceptance-test expansion: Add additional CIM/output fixture pairs (beyond Handi Foods) so the automated acceptance suite enforces coverage across diverse deal structures.
Backend log hygiene: Keep tailing logs/error.log after each deploy to confirm the service account + Anthropic credential fixes remain in place; document notable findings in deployment notes.
Infrastructure deployment checklist: Update DEPLOYMENT_GUIDE.md with the exact Firebase/GCP commands used to fetch secrets and run Sonnet validation so future deploys stay reproducible.
Runtime upgrade: Migrate Firebase Functions from Node.js 20 to a supported runtime well before the 2026‑10‑30 decommission date (warning surfaced during deploy).
firebase-functions dependency bump: Upgrade the project to the latest firebase-functions package and address any breaking changes on the next development pass.
Document viewer KPIs missing after Project Panther run: Project Panther - Confidential Information Memorandum_vBluePoint.pdf → Revenue/EBITDA/Employees/Founded surfaced as "Not specified in CIM" even though the CIM has numeric tables. Trace optimizedAgenticRAGProcessor → dealOverview mapper to ensure summary metrics populate the dashboard cards and add a regression test for this doc.
10+ minute processing latency regression: The same Project Panther run (doc ID document-55c4a6e2-8c08-4734-87f6-24407cea50ac.pdf) took ~10 minutes end-to-end. Instrument each pipeline phase (PDF chunking, Document AI, RAG passes, financial parser) so we can see where time is lost, then cap slow stages (e.g., GCS upload retries, three Anthropic fallbacks) before the next deploy.

Optimization Backlog (ordered by Accuracy → Speed → Cost benefit vs. implementation risk)

Deterministic financial parser enhancements (status: partially addressed). Continue improving token alignment (multi-row tables, negative numbers) to reduce dependence on LLM retries. Risk: low, limited to parser module.
Retrieval gating per Agentic pass. Swap the “top-N chunk blast” with similarity search keyed to each prompt (deal overview, market, thesis). Benefit: higher accuracy + lower token count. Risk: medium; needs robust Supabase RPC fallbacks.
Embedding cache keyed by document checksum. Skip re-embedding when a document/version is unchanged to cut processing time/cost on retries. Risk: medium; requires schema changes to store content hashes.
Field-level validation & dependency checks prior to gap filling. Enforce numeric relationships (e.g., EBITDA margin = EBITDA / Revenue) and re-query only the failing sections. Benefit: accuracy; risk: medium (adds validator & targeted prompts).
Stream Document AI chunks directly into chunker. Avoid writing intermediate PDFs to disk/GCS when splitting >30 page CIMs. Benefit: speed/cost; risk: medium-high because it touches PDF splitting + Document AI integration.
Parallelize independent multi-pass queries (e.g., run Pass 2 and Pass 3 concurrently when quota allows). Benefit: lower latency; risk: medium-high due to Anthropic rate limits & merge ordering.
Expose per-pass metrics via /health/agentic-rag. Surface timing/token/cost data so regressions are visible. Benefit: operational accuracy; risk: low.
Structured comparison harness for CIM outputs. Reuse the acceptance-test fixtures to generate diff reports for human reviewers (baseline vs. new model). Benefit: accuracy guardrail; risk: low once additional fixtures exist.

4.1 KiB Raw Blame History Unescape Escape

Operational To-Dos & Optimization Backlog

To-Do List (as of 2026-02-23)

Optimization Backlog (ordered by Accuracy → Speed → Cost benefit vs. implementation risk)

4.1 KiB

Raw Blame History