Major release with significant performance improvements and new processing strategy. ## Core Changes - Implemented simple_full_document processing strategy (default) - Full document → LLM approach: 1-2 passes, ~5-6 minutes processing time - Achieved 100% completeness with 2 API calls (down from 5+) - Removed redundant Document AI passes for faster processing ## Financial Data Extraction - Enhanced deterministic financial table parser - Improved FY3/FY2/FY1/LTM identification from varying CIM formats - Automatic merging of parser results with LLM extraction ## Code Quality & Infrastructure - Cleaned up debug logging (removed emoji markers from production code) - Fixed Firebase Secrets configuration (using modern defineSecret approach) - Updated OpenAI API key - Resolved deployment conflicts (secrets vs environment variables) - Added .env files to Firebase ignore list ## Deployment - Firebase Functions v2 deployment successful - All 7 required secrets verified and configured - Function URL: https://api-y56ccs6wva-uc.a.run.app ## Performance Improvements - Processing time: ~5-6 minutes (down from 23+ minutes) - API calls: 1-2 (down from 5+) - Completeness: 100% achievable - LLM Model: claude-3-7-sonnet-latest ## Breaking Changes - Default processing strategy changed to 'simple_full_document' - RAG processor available as alternative strategy 'document_ai_agentic_rag' ## Files Changed - 36 files changed, 5642 insertions(+), 4451 deletions(-) - Removed deprecated documentation files - Cleaned up unused services and models This release represents a major refactoring focused on speed, accuracy, and maintainability.
37 lines
776 B
SQL
37 lines
776 B
SQL
-- Find all documents that need processing
|
|
-- Run this to see what documents exist and their status
|
|
|
|
-- All documents in processing status
|
|
SELECT
|
|
id,
|
|
user_id,
|
|
status,
|
|
original_file_name,
|
|
created_at,
|
|
updated_at
|
|
FROM documents
|
|
WHERE status IN ('processing', 'processing_llm', 'uploading', 'extracting_text')
|
|
ORDER BY updated_at DESC;
|
|
|
|
-- Count by status
|
|
SELECT
|
|
status,
|
|
COUNT(*) as count
|
|
FROM documents
|
|
GROUP BY status
|
|
ORDER BY count DESC;
|
|
|
|
-- Documents stuck in processing (updated more than 10 minutes ago)
|
|
SELECT
|
|
id,
|
|
user_id,
|
|
status,
|
|
original_file_name,
|
|
updated_at,
|
|
NOW() - updated_at as time_since_update
|
|
FROM documents
|
|
WHERE status IN ('processing', 'processing_llm')
|
|
AND updated_at < NOW() - INTERVAL '10 minutes'
|
|
ORDER BY updated_at ASC;
|
|
|