- Fix invalid model name claude-3-7-sonnet-latest → use config.llm.model - Increase LLM timeout from 3 min to 6 min for complex CIM analysis - Improve RAG fallback to use evenly-spaced chunks when keyword matching finds too few results (prevents sending tiny fragments to LLM) - Add model name normalization for Claude 4.x family - Add googleServiceAccount utility for unified credential resolution - Add Cloud Run log fetching script - Update default models to Claude 4.6/4.5 family Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
23 lines
4.1 KiB
Markdown
23 lines
4.1 KiB
Markdown
# Operational To-Dos & Optimization Backlog
|
||
|
||
## To-Do List (as of 2026-02-23)
|
||
- **Wire Firebase Functions secrets**: Attach `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `SUPABASE_SERVICE_KEY`, `SUPABASE_ANON_KEY`, `DATABASE_URL`, `EMAIL_PASS`, and `FIREBASE_SERVICE_ACCOUNT` to every deployed function so the runtime no longer depends on local `.env` values.
|
||
- **Set `GCLOUD_PROJECT_ID` explicitly**: Export `GCLOUD_PROJECT_ID=cim-summarizer` (or the active project) for local scripts and production functions so Document AI processor paths stop defaulting to `projects/undefined`.
|
||
- **Acceptance-test expansion**: Add additional CIM/output fixture pairs (beyond Handi Foods) so the automated acceptance suite enforces coverage across diverse deal structures.
|
||
- **Backend log hygiene**: Keep tailing `logs/error.log` after each deploy to confirm the service account + Anthropic credential fixes remain in place; document notable findings in deployment notes.
|
||
- **Infrastructure deployment checklist**: Update `DEPLOYMENT_GUIDE.md` with the exact Firebase/GCP commands used to fetch secrets and run Sonnet validation so future deploys stay reproducible.
|
||
- **Runtime upgrade**: Migrate Firebase Functions from Node.js 20 to a supported runtime well before the 2026‑10‑30 decommission date (warning surfaced during deploy).
|
||
- **`firebase-functions` dependency bump**: Upgrade the project to the latest `firebase-functions` package and address any breaking changes on the next development pass.
|
||
- **Document viewer KPIs missing after Project Panther run**: `Project Panther - Confidential Information Memorandum_vBluePoint.pdf` → `Revenue/EBITDA/Employees/Founded` surfaced as "Not specified in CIM" even though the CIM has numeric tables. Trace `optimizedAgenticRAGProcessor` → `dealOverview` mapper to ensure summary metrics populate the dashboard cards and add a regression test for this doc.
|
||
- **10+ minute processing latency regression**: The same Project Panther run (doc ID `document-55c4a6e2-8c08-4734-87f6-24407cea50ac.pdf`) took ~10 minutes end-to-end. Instrument each pipeline phase (PDF chunking, Document AI, RAG passes, financial parser) so we can see where time is lost, then cap slow stages (e.g., GCS upload retries, three Anthropic fallbacks) before the next deploy.
|
||
|
||
## Optimization Backlog (ordered by Accuracy → Speed → Cost benefit vs. implementation risk)
|
||
1. **Deterministic financial parser enhancements** (status: partially addressed). Continue improving token alignment (multi-row tables, negative numbers) to reduce dependence on LLM retries. Risk: low, limited to parser module.
|
||
2. **Retrieval gating per Agentic pass**. Swap the “top-N chunk blast” with similarity search keyed to each prompt (deal overview, market, thesis). Benefit: higher accuracy + lower token count. Risk: medium; needs robust Supabase RPC fallbacks.
|
||
3. **Embedding cache keyed by document checksum**. Skip re-embedding when a document/version is unchanged to cut processing time/cost on retries. Risk: medium; requires schema changes to store content hashes.
|
||
4. **Field-level validation & dependency checks prior to gap filling**. Enforce numeric relationships (e.g., EBITDA margin = EBITDA / Revenue) and re-query only the failing sections. Benefit: accuracy; risk: medium (adds validator & targeted prompts).
|
||
5. **Stream Document AI chunks directly into chunker**. Avoid writing intermediate PDFs to disk/GCS when splitting >30 page CIMs. Benefit: speed/cost; risk: medium-high because it touches PDF splitting + Document AI integration.
|
||
6. **Parallelize independent multi-pass queries** (e.g., run Pass 2 and Pass 3 concurrently when quota allows). Benefit: lower latency; risk: medium-high due to Anthropic rate limits & merge ordering.
|
||
7. **Expose per-pass metrics via `/health/agentic-rag`**. Surface timing/token/cost data so regressions are visible. Benefit: operational accuracy; risk: low.
|
||
8. **Structured comparison harness for CIM outputs**. Reuse the acceptance-test fixtures to generate diff reports for human reviewers (baseline vs. new model). Benefit: accuracy guardrail; risk: low once additional fixtures exist.
|