# API Documentation Guide ## Complete API Reference for CIM Document Processor ### 🎯 Overview This document provides comprehensive API documentation for the CIM Document Processor, including all endpoints, authentication, error handling, and usage examples. --- ## 🔐 Authentication ### Firebase JWT Authentication All API endpoints require Firebase JWT authentication. Include the JWT token in the Authorization header: ```http Authorization: Bearer ``` ### Token Validation - Tokens are validated on every request - Invalid or expired tokens return 401 Unauthorized - User context is extracted from the token for data isolation --- ## 📊 Base URL ### Development ``` http://localhost:5001/api ``` ### Production ``` https://your-domain.com/api ``` --- ## 🔌 API Endpoints ### Document Management #### `POST /documents/upload-url` Get a signed upload URL for direct file upload to Google Cloud Storage. **Request Body**: ```json { "fileName": "sample_cim.pdf", "fileType": "application/pdf", "fileSize": 2500000 } ``` **Response**: ```json { "success": true, "uploadUrl": "https://storage.googleapis.com/...", "filePath": "uploads/user-123/doc-456/sample_cim.pdf", "correlationId": "req-789" } ``` **Error Responses**: - `400 Bad Request` - Invalid file type or size - `401 Unauthorized` - Missing or invalid authentication - `500 Internal Server Error` - Upload URL generation failed #### `POST /documents/:id/confirm-upload` Confirm file upload and start document processing. **Path Parameters**: - `id` (string, required) - Document ID (UUID) **Request Body**: ```json { "filePath": "uploads/user-123/doc-456/sample_cim.pdf", "fileSize": 2500000, "fileName": "sample_cim.pdf" } ``` **Response**: ```json { "success": true, "documentId": "doc-456", "status": "processing", "message": "Document processing started", "correlationId": "req-789" } ``` **Error Responses**: - `400 Bad Request` - Invalid document ID or file path - `401 Unauthorized` - Missing or invalid authentication - `404 Not Found` - Document not found - `500 Internal Server Error` - Processing failed to start #### `POST /documents/:id/process-optimized-agentic-rag` Trigger AI processing using the optimized agentic RAG strategy. **Path Parameters**: - `id` (string, required) - Document ID (UUID) **Request Body**: ```json { "strategy": "optimized_agentic_rag", "options": { "enableSemanticChunking": true, "enableMetadataEnrichment": true } } ``` **Response**: ```json { "success": true, "processingStrategy": "optimized_agentic_rag", "processingTime": 180000, "apiCalls": 25, "summary": "Comprehensive CIM analysis completed...", "analysisData": { "dealOverview": { ... }, "businessDescription": { ... }, "financialSummary": { ... } }, "correlationId": "req-789" } ``` **Error Responses**: - `400 Bad Request` - Invalid strategy or options - `401 Unauthorized` - Missing or invalid authentication - `404 Not Found` - Document not found - `500 Internal Server Error` - Processing failed #### `GET /documents/:id/download` Download the processed PDF report. **Path Parameters**: - `id` (string, required) - Document ID (UUID) **Response**: - `200 OK` - PDF file stream - `Content-Type: application/pdf` - `Content-Disposition: attachment; filename="cim_report.pdf"` **Error Responses**: - `401 Unauthorized` - Missing or invalid authentication - `404 Not Found` - Document or PDF not found - `500 Internal Server Error` - Download failed #### `DELETE /documents/:id` Delete a document and all associated data. **Path Parameters**: - `id` (string, required) - Document ID (UUID) **Response**: ```json { "success": true, "message": "Document deleted successfully", "correlationId": "req-789" } ``` **Error Responses**: - `401 Unauthorized` - Missing or invalid authentication - `404 Not Found` - Document not found - `500 Internal Server Error` - Deletion failed ### Analytics & Monitoring #### `GET /documents/analytics` Get processing analytics for the current user. **Query Parameters**: - `days` (number, optional) - Number of days to analyze (default: 30) **Response**: ```json { "success": true, "analytics": { "totalDocuments": 150, "processingSuccessRate": 0.95, "averageProcessingTime": 180000, "totalApiCalls": 3750, "estimatedCost": 45.50, "documentsByStatus": { "completed": 142, "processing": 5, "failed": 3 }, "processingTrends": [ { "date": "2024-12-20", "documentsProcessed": 8, "averageTime": 175000 } ] }, "correlationId": "req-789" } ``` #### `GET /documents/processing-stats` Get real-time processing statistics. **Response**: ```json { "success": true, "stats": { "totalDocuments": 150, "documentAiAgenticRagSuccess": 142, "averageProcessingTime": { "documentAiAgenticRag": 180000 }, "averageApiCalls": { "documentAiAgenticRag": 25 }, "activeProcessing": 3, "queueLength": 2 }, "correlationId": "req-789" } ``` #### `GET /documents/:id/agentic-rag-sessions` Get agentic RAG processing sessions for a document. **Path Parameters**: - `id` (string, required) - Document ID (UUID) **Response**: ```json { "success": true, "sessions": [ { "id": "session-123", "strategy": "optimized_agentic_rag", "status": "completed", "totalAgents": 6, "completedAgents": 6, "failedAgents": 0, "overallValidationScore": 0.92, "processingTimeMs": 180000, "apiCallsCount": 25, "totalCost": 0.35, "createdAt": "2024-12-20T10:30:00Z", "completedAt": "2024-12-20T10:33:00Z" } ], "correlationId": "req-789" } ``` ### Monitoring Endpoints #### `GET /monitoring/upload-metrics` Get upload metrics for a specified time period. **Query Parameters**: - `hours` (number, required) - Number of hours to analyze (1-168) **Response**: ```json { "success": true, "data": { "totalUploads": 45, "successfulUploads": 43, "failedUploads": 2, "successRate": 0.956, "averageFileSize": 2500000, "totalDataTransferred": 112500000, "uploadTrends": [ { "hour": "2024-12-20T10:00:00Z", "uploads": 8, "successRate": 1.0 } ] }, "correlationId": "req-789" } ``` #### `GET /monitoring/upload-health` Get upload pipeline health status. **Response**: ```json { "success": true, "data": { "status": "healthy", "successRate": 0.956, "averageResponseTime": 1500, "errorRate": 0.044, "activeConnections": 12, "lastError": null, "lastErrorTime": null, "uptime": 86400000 }, "correlationId": "req-789" } ``` #### `GET /monitoring/real-time-stats` Get real-time upload statistics. **Response**: ```json { "success": true, "data": { "currentUploads": 3, "queueLength": 2, "processingRate": 8.5, "averageProcessingTime": 180000, "memoryUsage": 45.2, "cpuUsage": 23.1, "activeUsers": 15, "systemLoad": 0.67 }, "correlationId": "req-789" } ``` ### Vector Database Endpoints #### `GET /vector/document-chunks/:documentId` Get document chunks for a specific document. **Path Parameters**: - `documentId` (string, required) - Document ID (UUID) **Response**: ```json { "success": true, "chunks": [ { "id": "chunk-123", "content": "Document chunk content...", "embedding": [0.1, 0.2, 0.3, ...], "metadata": { "sectionType": "financial", "confidence": 0.95 }, "createdAt": "2024-12-20T10:30:00Z" } ], "correlationId": "req-789" } ``` #### `GET /vector/analytics` Get search analytics for the current user. **Query Parameters**: - `days` (number, optional) - Number of days to analyze (default: 30) **Response**: ```json { "success": true, "analytics": { "totalSearches": 125, "averageSearchTime": 250, "searchSuccessRate": 0.98, "popularQueries": [ "financial performance", "market analysis", "management team" ], "searchTrends": [ { "date": "2024-12-20", "searches": 8, "averageTime": 245 } ] }, "correlationId": "req-789" } ``` #### `GET /vector/stats` Get vector database statistics. **Response**: ```json { "success": true, "stats": { "totalChunks": 1500, "totalDocuments": 150, "averageChunkSize": 4000, "embeddingDimensions": 1536, "indexSize": 2500000, "queryPerformance": { "averageQueryTime": 250, "cacheHitRate": 0.85 } }, "correlationId": "req-789" } ``` --- ## 🚨 Error Handling ### Standard Error Response Format All error responses follow this format: ```json { "success": false, "error": "Error message description", "errorCode": "ERROR_CODE", "correlationId": "req-789", "details": { "field": "Additional error details" } } ``` ### Common Error Codes #### `400 Bad Request` - `INVALID_INPUT` - Invalid request parameters - `MISSING_REQUIRED_FIELD` - Required field is missing - `INVALID_FILE_TYPE` - Unsupported file type - `FILE_TOO_LARGE` - File size exceeds limit #### `401 Unauthorized` - `MISSING_TOKEN` - Authentication token is missing - `INVALID_TOKEN` - Authentication token is invalid - `EXPIRED_TOKEN` - Authentication token has expired #### `404 Not Found` - `DOCUMENT_NOT_FOUND` - Document does not exist - `SESSION_NOT_FOUND` - Processing session not found - `FILE_NOT_FOUND` - File does not exist #### `500 Internal Server Error` - `PROCESSING_FAILED` - Document processing failed - `STORAGE_ERROR` - File storage operation failed - `DATABASE_ERROR` - Database operation failed - `EXTERNAL_SERVICE_ERROR` - External service unavailable ### Error Recovery Strategies #### Retry Logic - **Transient Errors**: Automatically retry with exponential backoff - **Rate Limiting**: Respect rate limits and implement backoff - **Service Unavailable**: Retry with increasing delays #### Fallback Strategies - **Primary Strategy**: Optimized agentic RAG processing - **Fallback Strategy**: Basic processing without advanced features - **Degradation Strategy**: Simple text extraction only --- ## 📊 Rate Limiting ### Limits - **Upload Endpoints**: 10 requests per minute per user - **Processing Endpoints**: 5 requests per minute per user - **Analytics Endpoints**: 30 requests per minute per user - **Download Endpoints**: 20 requests per minute per user ### Rate Limit Headers ```http X-RateLimit-Limit: 10 X-RateLimit-Remaining: 7 X-RateLimit-Reset: 1640000000 ``` ### Rate Limit Exceeded Response ```json { "success": false, "error": "Rate limit exceeded", "errorCode": "RATE_LIMIT_EXCEEDED", "retryAfter": 60, "correlationId": "req-789" } ``` --- ## 📋 Usage Examples ### Complete Document Processing Workflow #### 1. Get Upload URL ```bash curl -X POST http://localhost:5001/api/documents/upload-url \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "fileName": "sample_cim.pdf", "fileType": "application/pdf", "fileSize": 2500000 }' ``` #### 2. Upload File to GCS ```bash curl -X PUT "" \ -H "Content-Type: application/pdf" \ --upload-file sample_cim.pdf ``` #### 3. Confirm Upload ```bash curl -X POST http://localhost:5001/api/documents/doc-123/confirm-upload \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "filePath": "uploads/user-123/doc-123/sample_cim.pdf", "fileSize": 2500000, "fileName": "sample_cim.pdf" }' ``` #### 4. Trigger AI Processing ```bash curl -X POST http://localhost:5001/api/documents/doc-123/process-optimized-agentic-rag \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "strategy": "optimized_agentic_rag", "options": { "enableSemanticChunking": true, "enableMetadataEnrichment": true } }' ``` #### 5. Download PDF Report ```bash curl -X GET http://localhost:5001/api/documents/doc-123/download \ -H "Authorization: Bearer " \ --output cim_report.pdf ``` ### JavaScript/TypeScript Examples #### Document Upload and Processing ```typescript import axios from 'axios'; const API_BASE = 'http://localhost:5001/api'; const AUTH_TOKEN = 'firebase_jwt_token'; // Get upload URL const uploadUrlResponse = await axios.post(`${API_BASE}/documents/upload-url`, { fileName: 'sample_cim.pdf', fileType: 'application/pdf', fileSize: 2500000 }, { headers: { Authorization: `Bearer ${AUTH_TOKEN}` } }); const { uploadUrl, filePath } = uploadUrlResponse.data; // Upload file to GCS await axios.put(uploadUrl, fileBuffer, { headers: { 'Content-Type': 'application/pdf' } }); // Confirm upload await axios.post(`${API_BASE}/documents/${documentId}/confirm-upload`, { filePath, fileSize: 2500000, fileName: 'sample_cim.pdf' }, { headers: { Authorization: `Bearer ${AUTH_TOKEN}` } }); // Trigger AI processing const processingResponse = await axios.post( `${API_BASE}/documents/${documentId}/process-optimized-agentic-rag`, { strategy: 'optimized_agentic_rag', options: { enableSemanticChunking: true, enableMetadataEnrichment: true } }, { headers: { Authorization: `Bearer ${AUTH_TOKEN}` } } ); console.log('Processing result:', processingResponse.data); ``` #### Error Handling ```typescript try { const response = await axios.post(`${API_BASE}/documents/upload-url`, { fileName: 'sample_cim.pdf', fileType: 'application/pdf', fileSize: 2500000 }, { headers: { Authorization: `Bearer ${AUTH_TOKEN}` } }); console.log('Upload URL:', response.data.uploadUrl); } catch (error) { if (error.response) { const { status, data } = error.response; switch (status) { case 400: console.error('Bad request:', data.error); break; case 401: console.error('Authentication failed:', data.error); break; case 429: console.error('Rate limit exceeded, retry after:', data.retryAfter, 'seconds'); break; case 500: console.error('Server error:', data.error); break; default: console.error('Unexpected error:', data.error); } } else { console.error('Network error:', error.message); } } ``` --- ## 🔍 Monitoring and Debugging ### Correlation IDs All API responses include a `correlationId` for request tracking: ```json { "success": true, "data": { ... }, "correlationId": "req-789" } ``` ### Request Logging Include correlation ID in logs for debugging: ```typescript logger.info('API request', { correlationId: response.data.correlationId, endpoint: '/documents/upload-url', userId: 'user-123' }); ``` ### Health Checks Monitor API health with correlation IDs: ```bash curl -X GET http://localhost:5001/api/monitoring/upload-health \ -H "Authorization: Bearer " ``` --- This comprehensive API documentation provides all the information needed to integrate with the CIM Document Processor API, including authentication, endpoints, error handling, and usage examples.