- Add pre-deploy-check.sh script to validate .env doesn't contain secrets
- Add clean-env-secrets.sh script to remove secrets from .env before deployment
- Update deploy:firebase script to run validation automatically
- Add sync-secrets npm script for local development
- Add deploy:firebase:force for deployments that skip validation
This prevents 'Secret environment variable overlaps non secret environment variable' errors
by ensuring secrets defined via defineSecret() are not also in .env file.
## Completed Todos
- ✅ Test financial extraction with Stax Holding Company CIM - All values correct (FY-3: $64M, FY-2: $71M, FY-1: $71M, LTM: $76M)
- ✅ Implement deterministic parser fallback - Integrated into simpleDocumentProcessor
- ✅ Implement few-shot examples - Added comprehensive examples for PRIMARY table identification
- ✅ Fix primary table identification - Financial extraction now correctly identifies PRIMARY table (millions) vs subsidiary tables (thousands)
## Pending Todos
1. Review older commits (1-2 months ago) to see how financial extraction was working then
- Check commits: 185c780 (Claude 3.7), 5b3b1bf (Document AI fixes), 0ec3d14 (multi-pass extraction)
- Compare prompt simplicity - older versions may have had simpler, more effective prompts
- Check if deterministic parser was being used more effectively
2. Review best practices for structured financial data extraction from PDFs/CIMs
- Research: LLM prompt engineering for tabular data (few-shot examples, chain-of-thought)
- Period identification strategies
- Validation techniques
- Hybrid approaches (deterministic + LLM)
- Error handling patterns
- Check academic papers and industry case studies
3. Determine how to reduce processing time without sacrificing accuracy
- Options: 1) Use Claude Haiku 4.5 for initial extraction, Sonnet 4.5 for validation
- 2) Parallel extraction of different sections
- 3) Caching common patterns
- 4) Streaming responses
- 5) Incremental processing with early validation
- 6) Reduce prompt verbosity while maintaining clarity
4. Add unit tests for financial extraction validation logic
- Test: invalid value rejection, cross-period validation, numeric extraction
- Period identification from various formats (years, FY-X, mixed)
- Include edge cases: missing periods, projections mixed with historical, inconsistent formatting
5. Monitor production financial extraction accuracy
- Track: extraction success rate, validation rejection rate, common error patterns
- User feedback on extracted financial data
- Set up alerts for validation failures and extraction inconsistencies
6. Optimize prompt size for financial extraction
- Current prompts may be too verbose
- Test shorter, more focused prompts that maintain accuracy
- Consider: removing redundant instructions, using more concise examples, focusing on critical rules only
7. Add financial data visualization
- Consider adding a financial data preview/validation step in the UI
- Allow users to verify/correct extracted values if needed
- Provides human-in-the-loop validation for critical financial data
8. Document extraction strategies
- Document the different financial table formats found in CIMs
- Create a reference guide for common patterns (years format, FY-X format, mixed format, etc.)
- This will help with prompt engineering and parser improvements
9. Compare RAG-based extraction vs simple full-document extraction for financial accuracy
- Determine which approach produces more accurate financial data and why
- May need to hybrid approach
10. Add confidence scores to financial extraction results
- Flag low-confidence extractions for manual review
- Helps identify when extraction may be incorrect and needs human validation