Files
cim_summary/API_DOCUMENTATION_GUIDE.md

688 lines
15 KiB
Markdown

# API Documentation Guide
## Complete API Reference for CIM Document Processor
### 🎯 Overview
This document provides comprehensive API documentation for the CIM Document Processor, including all endpoints, authentication, error handling, and usage examples.
---
## 🔐 Authentication
### Firebase JWT Authentication
All API endpoints require Firebase JWT authentication. Include the JWT token in the Authorization header:
```http
Authorization: Bearer <firebase_jwt_token>
```
### Token Validation
- Tokens are validated on every request
- Invalid or expired tokens return 401 Unauthorized
- User context is extracted from the token for data isolation
---
## 📊 Base URL
### Development
```
http://localhost:5001/api
```
### Production
```
https://your-domain.com/api
```
---
## 🔌 API Endpoints
### Document Management
#### `POST /documents/upload-url`
Get a signed upload URL for direct file upload to Google Cloud Storage.
**Request Body**:
```json
{
"fileName": "sample_cim.pdf",
"fileType": "application/pdf",
"fileSize": 2500000
}
```
**Response**:
```json
{
"success": true,
"uploadUrl": "https://storage.googleapis.com/...",
"filePath": "uploads/user-123/doc-456/sample_cim.pdf",
"correlationId": "req-789"
}
```
**Error Responses**:
- `400 Bad Request` - Invalid file type or size
- `401 Unauthorized` - Missing or invalid authentication
- `500 Internal Server Error` - Upload URL generation failed
#### `POST /documents/:id/confirm-upload`
Confirm file upload and start document processing.
**Path Parameters**:
- `id` (string, required) - Document ID (UUID)
**Request Body**:
```json
{
"filePath": "uploads/user-123/doc-456/sample_cim.pdf",
"fileSize": 2500000,
"fileName": "sample_cim.pdf"
}
```
**Response**:
```json
{
"success": true,
"documentId": "doc-456",
"status": "processing",
"message": "Document processing started",
"correlationId": "req-789"
}
```
**Error Responses**:
- `400 Bad Request` - Invalid document ID or file path
- `401 Unauthorized` - Missing or invalid authentication
- `404 Not Found` - Document not found
- `500 Internal Server Error` - Processing failed to start
#### `POST /documents/:id/process-optimized-agentic-rag`
Trigger AI processing using the optimized agentic RAG strategy.
**Path Parameters**:
- `id` (string, required) - Document ID (UUID)
**Request Body**:
```json
{
"strategy": "optimized_agentic_rag",
"options": {
"enableSemanticChunking": true,
"enableMetadataEnrichment": true
}
}
```
**Response**:
```json
{
"success": true,
"processingStrategy": "optimized_agentic_rag",
"processingTime": 180000,
"apiCalls": 25,
"summary": "Comprehensive CIM analysis completed...",
"analysisData": {
"dealOverview": { ... },
"businessDescription": { ... },
"financialSummary": { ... }
},
"correlationId": "req-789"
}
```
**Error Responses**:
- `400 Bad Request` - Invalid strategy or options
- `401 Unauthorized` - Missing or invalid authentication
- `404 Not Found` - Document not found
- `500 Internal Server Error` - Processing failed
#### `GET /documents/:id/download`
Download the processed PDF report.
**Path Parameters**:
- `id` (string, required) - Document ID (UUID)
**Response**:
- `200 OK` - PDF file stream
- `Content-Type: application/pdf`
- `Content-Disposition: attachment; filename="cim_report.pdf"`
**Error Responses**:
- `401 Unauthorized` - Missing or invalid authentication
- `404 Not Found` - Document or PDF not found
- `500 Internal Server Error` - Download failed
#### `DELETE /documents/:id`
Delete a document and all associated data.
**Path Parameters**:
- `id` (string, required) - Document ID (UUID)
**Response**:
```json
{
"success": true,
"message": "Document deleted successfully",
"correlationId": "req-789"
}
```
**Error Responses**:
- `401 Unauthorized` - Missing or invalid authentication
- `404 Not Found` - Document not found
- `500 Internal Server Error` - Deletion failed
### Analytics & Monitoring
#### `GET /documents/analytics`
Get processing analytics for the current user.
**Query Parameters**:
- `days` (number, optional) - Number of days to analyze (default: 30)
**Response**:
```json
{
"success": true,
"analytics": {
"totalDocuments": 150,
"processingSuccessRate": 0.95,
"averageProcessingTime": 180000,
"totalApiCalls": 3750,
"estimatedCost": 45.50,
"documentsByStatus": {
"completed": 142,
"processing": 5,
"failed": 3
},
"processingTrends": [
{
"date": "2024-12-20",
"documentsProcessed": 8,
"averageTime": 175000
}
]
},
"correlationId": "req-789"
}
```
#### `GET /documents/processing-stats`
Get real-time processing statistics.
**Response**:
```json
{
"success": true,
"stats": {
"totalDocuments": 150,
"documentAiAgenticRagSuccess": 142,
"averageProcessingTime": {
"documentAiAgenticRag": 180000
},
"averageApiCalls": {
"documentAiAgenticRag": 25
},
"activeProcessing": 3,
"queueLength": 2
},
"correlationId": "req-789"
}
```
#### `GET /documents/:id/agentic-rag-sessions`
Get agentic RAG processing sessions for a document.
**Path Parameters**:
- `id` (string, required) - Document ID (UUID)
**Response**:
```json
{
"success": true,
"sessions": [
{
"id": "session-123",
"strategy": "optimized_agentic_rag",
"status": "completed",
"totalAgents": 6,
"completedAgents": 6,
"failedAgents": 0,
"overallValidationScore": 0.92,
"processingTimeMs": 180000,
"apiCallsCount": 25,
"totalCost": 0.35,
"createdAt": "2024-12-20T10:30:00Z",
"completedAt": "2024-12-20T10:33:00Z"
}
],
"correlationId": "req-789"
}
```
### Monitoring Endpoints
#### `GET /monitoring/upload-metrics`
Get upload metrics for a specified time period.
**Query Parameters**:
- `hours` (number, required) - Number of hours to analyze (1-168)
**Response**:
```json
{
"success": true,
"data": {
"totalUploads": 45,
"successfulUploads": 43,
"failedUploads": 2,
"successRate": 0.956,
"averageFileSize": 2500000,
"totalDataTransferred": 112500000,
"uploadTrends": [
{
"hour": "2024-12-20T10:00:00Z",
"uploads": 8,
"successRate": 1.0
}
]
},
"correlationId": "req-789"
}
```
#### `GET /monitoring/upload-health`
Get upload pipeline health status.
**Response**:
```json
{
"success": true,
"data": {
"status": "healthy",
"successRate": 0.956,
"averageResponseTime": 1500,
"errorRate": 0.044,
"activeConnections": 12,
"lastError": null,
"lastErrorTime": null,
"uptime": 86400000
},
"correlationId": "req-789"
}
```
#### `GET /monitoring/real-time-stats`
Get real-time upload statistics.
**Response**:
```json
{
"success": true,
"data": {
"currentUploads": 3,
"queueLength": 2,
"processingRate": 8.5,
"averageProcessingTime": 180000,
"memoryUsage": 45.2,
"cpuUsage": 23.1,
"activeUsers": 15,
"systemLoad": 0.67
},
"correlationId": "req-789"
}
```
### Vector Database Endpoints
#### `GET /vector/document-chunks/:documentId`
Get document chunks for a specific document.
**Path Parameters**:
- `documentId` (string, required) - Document ID (UUID)
**Response**:
```json
{
"success": true,
"chunks": [
{
"id": "chunk-123",
"content": "Document chunk content...",
"embedding": [0.1, 0.2, 0.3, ...],
"metadata": {
"sectionType": "financial",
"confidence": 0.95
},
"createdAt": "2024-12-20T10:30:00Z"
}
],
"correlationId": "req-789"
}
```
#### `GET /vector/analytics`
Get search analytics for the current user.
**Query Parameters**:
- `days` (number, optional) - Number of days to analyze (default: 30)
**Response**:
```json
{
"success": true,
"analytics": {
"totalSearches": 125,
"averageSearchTime": 250,
"searchSuccessRate": 0.98,
"popularQueries": [
"financial performance",
"market analysis",
"management team"
],
"searchTrends": [
{
"date": "2024-12-20",
"searches": 8,
"averageTime": 245
}
]
},
"correlationId": "req-789"
}
```
#### `GET /vector/stats`
Get vector database statistics.
**Response**:
```json
{
"success": true,
"stats": {
"totalChunks": 1500,
"totalDocuments": 150,
"averageChunkSize": 4000,
"embeddingDimensions": 1536,
"indexSize": 2500000,
"queryPerformance": {
"averageQueryTime": 250,
"cacheHitRate": 0.85
}
},
"correlationId": "req-789"
}
```
---
## 🚨 Error Handling
### Standard Error Response Format
All error responses follow this format:
```json
{
"success": false,
"error": "Error message description",
"errorCode": "ERROR_CODE",
"correlationId": "req-789",
"details": {
"field": "Additional error details"
}
}
```
### Common Error Codes
#### `400 Bad Request`
- `INVALID_INPUT` - Invalid request parameters
- `MISSING_REQUIRED_FIELD` - Required field is missing
- `INVALID_FILE_TYPE` - Unsupported file type
- `FILE_TOO_LARGE` - File size exceeds limit
#### `401 Unauthorized`
- `MISSING_TOKEN` - Authentication token is missing
- `INVALID_TOKEN` - Authentication token is invalid
- `EXPIRED_TOKEN` - Authentication token has expired
#### `404 Not Found`
- `DOCUMENT_NOT_FOUND` - Document does not exist
- `SESSION_NOT_FOUND` - Processing session not found
- `FILE_NOT_FOUND` - File does not exist
#### `500 Internal Server Error`
- `PROCESSING_FAILED` - Document processing failed
- `STORAGE_ERROR` - File storage operation failed
- `DATABASE_ERROR` - Database operation failed
- `EXTERNAL_SERVICE_ERROR` - External service unavailable
### Error Recovery Strategies
#### Retry Logic
- **Transient Errors**: Automatically retry with exponential backoff
- **Rate Limiting**: Respect rate limits and implement backoff
- **Service Unavailable**: Retry with increasing delays
#### Fallback Strategies
- **Primary Strategy**: Optimized agentic RAG processing
- **Fallback Strategy**: Basic processing without advanced features
- **Degradation Strategy**: Simple text extraction only
---
## 📊 Rate Limiting
### Limits
- **Upload Endpoints**: 10 requests per minute per user
- **Processing Endpoints**: 5 requests per minute per user
- **Analytics Endpoints**: 30 requests per minute per user
- **Download Endpoints**: 20 requests per minute per user
### Rate Limit Headers
```http
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1640000000
```
### Rate Limit Exceeded Response
```json
{
"success": false,
"error": "Rate limit exceeded",
"errorCode": "RATE_LIMIT_EXCEEDED",
"retryAfter": 60,
"correlationId": "req-789"
}
```
---
## 📋 Usage Examples
### Complete Document Processing Workflow
#### 1. Get Upload URL
```bash
curl -X POST http://localhost:5001/api/documents/upload-url \
-H "Authorization: Bearer <firebase_jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"fileName": "sample_cim.pdf",
"fileType": "application/pdf",
"fileSize": 2500000
}'
```
#### 2. Upload File to GCS
```bash
curl -X PUT "<upload_url>" \
-H "Content-Type: application/pdf" \
--upload-file sample_cim.pdf
```
#### 3. Confirm Upload
```bash
curl -X POST http://localhost:5001/api/documents/doc-123/confirm-upload \
-H "Authorization: Bearer <firebase_jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"filePath": "uploads/user-123/doc-123/sample_cim.pdf",
"fileSize": 2500000,
"fileName": "sample_cim.pdf"
}'
```
#### 4. Trigger AI Processing
```bash
curl -X POST http://localhost:5001/api/documents/doc-123/process-optimized-agentic-rag \
-H "Authorization: Bearer <firebase_jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"strategy": "optimized_agentic_rag",
"options": {
"enableSemanticChunking": true,
"enableMetadataEnrichment": true
}
}'
```
#### 5. Download PDF Report
```bash
curl -X GET http://localhost:5001/api/documents/doc-123/download \
-H "Authorization: Bearer <firebase_jwt_token>" \
--output cim_report.pdf
```
### JavaScript/TypeScript Examples
#### Document Upload and Processing
```typescript
import axios from 'axios';
const API_BASE = 'http://localhost:5001/api';
const AUTH_TOKEN = 'firebase_jwt_token';
// Get upload URL
const uploadUrlResponse = await axios.post(`${API_BASE}/documents/upload-url`, {
fileName: 'sample_cim.pdf',
fileType: 'application/pdf',
fileSize: 2500000
}, {
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
});
const { uploadUrl, filePath } = uploadUrlResponse.data;
// Upload file to GCS
await axios.put(uploadUrl, fileBuffer, {
headers: { 'Content-Type': 'application/pdf' }
});
// Confirm upload
await axios.post(`${API_BASE}/documents/${documentId}/confirm-upload`, {
filePath,
fileSize: 2500000,
fileName: 'sample_cim.pdf'
}, {
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
});
// Trigger AI processing
const processingResponse = await axios.post(
`${API_BASE}/documents/${documentId}/process-optimized-agentic-rag`,
{
strategy: 'optimized_agentic_rag',
options: {
enableSemanticChunking: true,
enableMetadataEnrichment: true
}
},
{
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
}
);
console.log('Processing result:', processingResponse.data);
```
#### Error Handling
```typescript
try {
const response = await axios.post(`${API_BASE}/documents/upload-url`, {
fileName: 'sample_cim.pdf',
fileType: 'application/pdf',
fileSize: 2500000
}, {
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
});
console.log('Upload URL:', response.data.uploadUrl);
} catch (error) {
if (error.response) {
const { status, data } = error.response;
switch (status) {
case 400:
console.error('Bad request:', data.error);
break;
case 401:
console.error('Authentication failed:', data.error);
break;
case 429:
console.error('Rate limit exceeded, retry after:', data.retryAfter, 'seconds');
break;
case 500:
console.error('Server error:', data.error);
break;
default:
console.error('Unexpected error:', data.error);
}
} else {
console.error('Network error:', error.message);
}
}
```
---
## 🔍 Monitoring and Debugging
### Correlation IDs
All API responses include a `correlationId` for request tracking:
```json
{
"success": true,
"data": { ... },
"correlationId": "req-789"
}
```
### Request Logging
Include correlation ID in logs for debugging:
```typescript
logger.info('API request', {
correlationId: response.data.correlationId,
endpoint: '/documents/upload-url',
userId: 'user-123'
});
```
### Health Checks
Monitor API health with correlation IDs:
```bash
curl -X GET http://localhost:5001/api/monitoring/upload-health \
-H "Authorization: Bearer <firebase_jwt_token>"
```
---
This comprehensive API documentation provides all the information needed to integrate with the CIM Document Processor API, including authentication, endpoints, error handling, and usage examples.