688 lines
15 KiB
Markdown
688 lines
15 KiB
Markdown
# API Documentation Guide
|
|
## Complete API Reference for CIM Document Processor
|
|
|
|
### 🎯 Overview
|
|
|
|
This document provides comprehensive API documentation for the CIM Document Processor, including all endpoints, authentication, error handling, and usage examples.
|
|
|
|
---
|
|
|
|
## 🔐 Authentication
|
|
|
|
### Firebase JWT Authentication
|
|
All API endpoints require Firebase JWT authentication. Include the JWT token in the Authorization header:
|
|
|
|
```http
|
|
Authorization: Bearer <firebase_jwt_token>
|
|
```
|
|
|
|
### Token Validation
|
|
- Tokens are validated on every request
|
|
- Invalid or expired tokens return 401 Unauthorized
|
|
- User context is extracted from the token for data isolation
|
|
|
|
---
|
|
|
|
## 📊 Base URL
|
|
|
|
### Development
|
|
```
|
|
http://localhost:5001/api
|
|
```
|
|
|
|
### Production
|
|
```
|
|
https://your-domain.com/api
|
|
```
|
|
|
|
---
|
|
|
|
## 🔌 API Endpoints
|
|
|
|
### Document Management
|
|
|
|
#### `POST /documents/upload-url`
|
|
Get a signed upload URL for direct file upload to Google Cloud Storage.
|
|
|
|
**Request Body**:
|
|
```json
|
|
{
|
|
"fileName": "sample_cim.pdf",
|
|
"fileType": "application/pdf",
|
|
"fileSize": 2500000
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"uploadUrl": "https://storage.googleapis.com/...",
|
|
"filePath": "uploads/user-123/doc-456/sample_cim.pdf",
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `400 Bad Request` - Invalid file type or size
|
|
- `401 Unauthorized` - Missing or invalid authentication
|
|
- `500 Internal Server Error` - Upload URL generation failed
|
|
|
|
#### `POST /documents/:id/confirm-upload`
|
|
Confirm file upload and start document processing.
|
|
|
|
**Path Parameters**:
|
|
- `id` (string, required) - Document ID (UUID)
|
|
|
|
**Request Body**:
|
|
```json
|
|
{
|
|
"filePath": "uploads/user-123/doc-456/sample_cim.pdf",
|
|
"fileSize": 2500000,
|
|
"fileName": "sample_cim.pdf"
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"documentId": "doc-456",
|
|
"status": "processing",
|
|
"message": "Document processing started",
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `400 Bad Request` - Invalid document ID or file path
|
|
- `401 Unauthorized` - Missing or invalid authentication
|
|
- `404 Not Found` - Document not found
|
|
- `500 Internal Server Error` - Processing failed to start
|
|
|
|
#### `POST /documents/:id/process-optimized-agentic-rag`
|
|
Trigger AI processing using the optimized agentic RAG strategy.
|
|
|
|
**Path Parameters**:
|
|
- `id` (string, required) - Document ID (UUID)
|
|
|
|
**Request Body**:
|
|
```json
|
|
{
|
|
"strategy": "optimized_agentic_rag",
|
|
"options": {
|
|
"enableSemanticChunking": true,
|
|
"enableMetadataEnrichment": true
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"processingStrategy": "optimized_agentic_rag",
|
|
"processingTime": 180000,
|
|
"apiCalls": 25,
|
|
"summary": "Comprehensive CIM analysis completed...",
|
|
"analysisData": {
|
|
"dealOverview": { ... },
|
|
"businessDescription": { ... },
|
|
"financialSummary": { ... }
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `400 Bad Request` - Invalid strategy or options
|
|
- `401 Unauthorized` - Missing or invalid authentication
|
|
- `404 Not Found` - Document not found
|
|
- `500 Internal Server Error` - Processing failed
|
|
|
|
#### `GET /documents/:id/download`
|
|
Download the processed PDF report.
|
|
|
|
**Path Parameters**:
|
|
- `id` (string, required) - Document ID (UUID)
|
|
|
|
**Response**:
|
|
- `200 OK` - PDF file stream
|
|
- `Content-Type: application/pdf`
|
|
- `Content-Disposition: attachment; filename="cim_report.pdf"`
|
|
|
|
**Error Responses**:
|
|
- `401 Unauthorized` - Missing or invalid authentication
|
|
- `404 Not Found` - Document or PDF not found
|
|
- `500 Internal Server Error` - Download failed
|
|
|
|
#### `DELETE /documents/:id`
|
|
Delete a document and all associated data.
|
|
|
|
**Path Parameters**:
|
|
- `id` (string, required) - Document ID (UUID)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Document deleted successfully",
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `401 Unauthorized` - Missing or invalid authentication
|
|
- `404 Not Found` - Document not found
|
|
- `500 Internal Server Error` - Deletion failed
|
|
|
|
### Analytics & Monitoring
|
|
|
|
#### `GET /documents/analytics`
|
|
Get processing analytics for the current user.
|
|
|
|
**Query Parameters**:
|
|
- `days` (number, optional) - Number of days to analyze (default: 30)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"analytics": {
|
|
"totalDocuments": 150,
|
|
"processingSuccessRate": 0.95,
|
|
"averageProcessingTime": 180000,
|
|
"totalApiCalls": 3750,
|
|
"estimatedCost": 45.50,
|
|
"documentsByStatus": {
|
|
"completed": 142,
|
|
"processing": 5,
|
|
"failed": 3
|
|
},
|
|
"processingTrends": [
|
|
{
|
|
"date": "2024-12-20",
|
|
"documentsProcessed": 8,
|
|
"averageTime": 175000
|
|
}
|
|
]
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /documents/processing-stats`
|
|
Get real-time processing statistics.
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"stats": {
|
|
"totalDocuments": 150,
|
|
"documentAiAgenticRagSuccess": 142,
|
|
"averageProcessingTime": {
|
|
"documentAiAgenticRag": 180000
|
|
},
|
|
"averageApiCalls": {
|
|
"documentAiAgenticRag": 25
|
|
},
|
|
"activeProcessing": 3,
|
|
"queueLength": 2
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /documents/:id/agentic-rag-sessions`
|
|
Get agentic RAG processing sessions for a document.
|
|
|
|
**Path Parameters**:
|
|
- `id` (string, required) - Document ID (UUID)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"sessions": [
|
|
{
|
|
"id": "session-123",
|
|
"strategy": "optimized_agentic_rag",
|
|
"status": "completed",
|
|
"totalAgents": 6,
|
|
"completedAgents": 6,
|
|
"failedAgents": 0,
|
|
"overallValidationScore": 0.92,
|
|
"processingTimeMs": 180000,
|
|
"apiCallsCount": 25,
|
|
"totalCost": 0.35,
|
|
"createdAt": "2024-12-20T10:30:00Z",
|
|
"completedAt": "2024-12-20T10:33:00Z"
|
|
}
|
|
],
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
### Monitoring Endpoints
|
|
|
|
#### `GET /monitoring/upload-metrics`
|
|
Get upload metrics for a specified time period.
|
|
|
|
**Query Parameters**:
|
|
- `hours` (number, required) - Number of hours to analyze (1-168)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": {
|
|
"totalUploads": 45,
|
|
"successfulUploads": 43,
|
|
"failedUploads": 2,
|
|
"successRate": 0.956,
|
|
"averageFileSize": 2500000,
|
|
"totalDataTransferred": 112500000,
|
|
"uploadTrends": [
|
|
{
|
|
"hour": "2024-12-20T10:00:00Z",
|
|
"uploads": 8,
|
|
"successRate": 1.0
|
|
}
|
|
]
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /monitoring/upload-health`
|
|
Get upload pipeline health status.
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": {
|
|
"status": "healthy",
|
|
"successRate": 0.956,
|
|
"averageResponseTime": 1500,
|
|
"errorRate": 0.044,
|
|
"activeConnections": 12,
|
|
"lastError": null,
|
|
"lastErrorTime": null,
|
|
"uptime": 86400000
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /monitoring/real-time-stats`
|
|
Get real-time upload statistics.
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": {
|
|
"currentUploads": 3,
|
|
"queueLength": 2,
|
|
"processingRate": 8.5,
|
|
"averageProcessingTime": 180000,
|
|
"memoryUsage": 45.2,
|
|
"cpuUsage": 23.1,
|
|
"activeUsers": 15,
|
|
"systemLoad": 0.67
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
### Vector Database Endpoints
|
|
|
|
#### `GET /vector/document-chunks/:documentId`
|
|
Get document chunks for a specific document.
|
|
|
|
**Path Parameters**:
|
|
- `documentId` (string, required) - Document ID (UUID)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"chunks": [
|
|
{
|
|
"id": "chunk-123",
|
|
"content": "Document chunk content...",
|
|
"embedding": [0.1, 0.2, 0.3, ...],
|
|
"metadata": {
|
|
"sectionType": "financial",
|
|
"confidence": 0.95
|
|
},
|
|
"createdAt": "2024-12-20T10:30:00Z"
|
|
}
|
|
],
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /vector/analytics`
|
|
Get search analytics for the current user.
|
|
|
|
**Query Parameters**:
|
|
- `days` (number, optional) - Number of days to analyze (default: 30)
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"analytics": {
|
|
"totalSearches": 125,
|
|
"averageSearchTime": 250,
|
|
"searchSuccessRate": 0.98,
|
|
"popularQueries": [
|
|
"financial performance",
|
|
"market analysis",
|
|
"management team"
|
|
],
|
|
"searchTrends": [
|
|
{
|
|
"date": "2024-12-20",
|
|
"searches": 8,
|
|
"averageTime": 245
|
|
}
|
|
]
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
#### `GET /vector/stats`
|
|
Get vector database statistics.
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"stats": {
|
|
"totalChunks": 1500,
|
|
"totalDocuments": 150,
|
|
"averageChunkSize": 4000,
|
|
"embeddingDimensions": 1536,
|
|
"indexSize": 2500000,
|
|
"queryPerformance": {
|
|
"averageQueryTime": 250,
|
|
"cacheHitRate": 0.85
|
|
}
|
|
},
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 Error Handling
|
|
|
|
### Standard Error Response Format
|
|
All error responses follow this format:
|
|
|
|
```json
|
|
{
|
|
"success": false,
|
|
"error": "Error message description",
|
|
"errorCode": "ERROR_CODE",
|
|
"correlationId": "req-789",
|
|
"details": {
|
|
"field": "Additional error details"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Common Error Codes
|
|
|
|
#### `400 Bad Request`
|
|
- `INVALID_INPUT` - Invalid request parameters
|
|
- `MISSING_REQUIRED_FIELD` - Required field is missing
|
|
- `INVALID_FILE_TYPE` - Unsupported file type
|
|
- `FILE_TOO_LARGE` - File size exceeds limit
|
|
|
|
#### `401 Unauthorized`
|
|
- `MISSING_TOKEN` - Authentication token is missing
|
|
- `INVALID_TOKEN` - Authentication token is invalid
|
|
- `EXPIRED_TOKEN` - Authentication token has expired
|
|
|
|
#### `404 Not Found`
|
|
- `DOCUMENT_NOT_FOUND` - Document does not exist
|
|
- `SESSION_NOT_FOUND` - Processing session not found
|
|
- `FILE_NOT_FOUND` - File does not exist
|
|
|
|
#### `500 Internal Server Error`
|
|
- `PROCESSING_FAILED` - Document processing failed
|
|
- `STORAGE_ERROR` - File storage operation failed
|
|
- `DATABASE_ERROR` - Database operation failed
|
|
- `EXTERNAL_SERVICE_ERROR` - External service unavailable
|
|
|
|
### Error Recovery Strategies
|
|
|
|
#### Retry Logic
|
|
- **Transient Errors**: Automatically retry with exponential backoff
|
|
- **Rate Limiting**: Respect rate limits and implement backoff
|
|
- **Service Unavailable**: Retry with increasing delays
|
|
|
|
#### Fallback Strategies
|
|
- **Primary Strategy**: Optimized agentic RAG processing
|
|
- **Fallback Strategy**: Basic processing without advanced features
|
|
- **Degradation Strategy**: Simple text extraction only
|
|
|
|
---
|
|
|
|
## 📊 Rate Limiting
|
|
|
|
### Limits
|
|
- **Upload Endpoints**: 10 requests per minute per user
|
|
- **Processing Endpoints**: 5 requests per minute per user
|
|
- **Analytics Endpoints**: 30 requests per minute per user
|
|
- **Download Endpoints**: 20 requests per minute per user
|
|
|
|
### Rate Limit Headers
|
|
```http
|
|
X-RateLimit-Limit: 10
|
|
X-RateLimit-Remaining: 7
|
|
X-RateLimit-Reset: 1640000000
|
|
```
|
|
|
|
### Rate Limit Exceeded Response
|
|
```json
|
|
{
|
|
"success": false,
|
|
"error": "Rate limit exceeded",
|
|
"errorCode": "RATE_LIMIT_EXCEEDED",
|
|
"retryAfter": 60,
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Usage Examples
|
|
|
|
### Complete Document Processing Workflow
|
|
|
|
#### 1. Get Upload URL
|
|
```bash
|
|
curl -X POST http://localhost:5001/api/documents/upload-url \
|
|
-H "Authorization: Bearer <firebase_jwt_token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"fileName": "sample_cim.pdf",
|
|
"fileType": "application/pdf",
|
|
"fileSize": 2500000
|
|
}'
|
|
```
|
|
|
|
#### 2. Upload File to GCS
|
|
```bash
|
|
curl -X PUT "<upload_url>" \
|
|
-H "Content-Type: application/pdf" \
|
|
--upload-file sample_cim.pdf
|
|
```
|
|
|
|
#### 3. Confirm Upload
|
|
```bash
|
|
curl -X POST http://localhost:5001/api/documents/doc-123/confirm-upload \
|
|
-H "Authorization: Bearer <firebase_jwt_token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"filePath": "uploads/user-123/doc-123/sample_cim.pdf",
|
|
"fileSize": 2500000,
|
|
"fileName": "sample_cim.pdf"
|
|
}'
|
|
```
|
|
|
|
#### 4. Trigger AI Processing
|
|
```bash
|
|
curl -X POST http://localhost:5001/api/documents/doc-123/process-optimized-agentic-rag \
|
|
-H "Authorization: Bearer <firebase_jwt_token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"strategy": "optimized_agentic_rag",
|
|
"options": {
|
|
"enableSemanticChunking": true,
|
|
"enableMetadataEnrichment": true
|
|
}
|
|
}'
|
|
```
|
|
|
|
#### 5. Download PDF Report
|
|
```bash
|
|
curl -X GET http://localhost:5001/api/documents/doc-123/download \
|
|
-H "Authorization: Bearer <firebase_jwt_token>" \
|
|
--output cim_report.pdf
|
|
```
|
|
|
|
### JavaScript/TypeScript Examples
|
|
|
|
#### Document Upload and Processing
|
|
```typescript
|
|
import axios from 'axios';
|
|
|
|
const API_BASE = 'http://localhost:5001/api';
|
|
const AUTH_TOKEN = 'firebase_jwt_token';
|
|
|
|
// Get upload URL
|
|
const uploadUrlResponse = await axios.post(`${API_BASE}/documents/upload-url`, {
|
|
fileName: 'sample_cim.pdf',
|
|
fileType: 'application/pdf',
|
|
fileSize: 2500000
|
|
}, {
|
|
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
|
|
});
|
|
|
|
const { uploadUrl, filePath } = uploadUrlResponse.data;
|
|
|
|
// Upload file to GCS
|
|
await axios.put(uploadUrl, fileBuffer, {
|
|
headers: { 'Content-Type': 'application/pdf' }
|
|
});
|
|
|
|
// Confirm upload
|
|
await axios.post(`${API_BASE}/documents/${documentId}/confirm-upload`, {
|
|
filePath,
|
|
fileSize: 2500000,
|
|
fileName: 'sample_cim.pdf'
|
|
}, {
|
|
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
|
|
});
|
|
|
|
// Trigger AI processing
|
|
const processingResponse = await axios.post(
|
|
`${API_BASE}/documents/${documentId}/process-optimized-agentic-rag`,
|
|
{
|
|
strategy: 'optimized_agentic_rag',
|
|
options: {
|
|
enableSemanticChunking: true,
|
|
enableMetadataEnrichment: true
|
|
}
|
|
},
|
|
{
|
|
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
|
|
}
|
|
);
|
|
|
|
console.log('Processing result:', processingResponse.data);
|
|
```
|
|
|
|
#### Error Handling
|
|
```typescript
|
|
try {
|
|
const response = await axios.post(`${API_BASE}/documents/upload-url`, {
|
|
fileName: 'sample_cim.pdf',
|
|
fileType: 'application/pdf',
|
|
fileSize: 2500000
|
|
}, {
|
|
headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
|
|
});
|
|
|
|
console.log('Upload URL:', response.data.uploadUrl);
|
|
} catch (error) {
|
|
if (error.response) {
|
|
const { status, data } = error.response;
|
|
|
|
switch (status) {
|
|
case 400:
|
|
console.error('Bad request:', data.error);
|
|
break;
|
|
case 401:
|
|
console.error('Authentication failed:', data.error);
|
|
break;
|
|
case 429:
|
|
console.error('Rate limit exceeded, retry after:', data.retryAfter, 'seconds');
|
|
break;
|
|
case 500:
|
|
console.error('Server error:', data.error);
|
|
break;
|
|
default:
|
|
console.error('Unexpected error:', data.error);
|
|
}
|
|
} else {
|
|
console.error('Network error:', error.message);
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 Monitoring and Debugging
|
|
|
|
### Correlation IDs
|
|
All API responses include a `correlationId` for request tracking:
|
|
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": { ... },
|
|
"correlationId": "req-789"
|
|
}
|
|
```
|
|
|
|
### Request Logging
|
|
Include correlation ID in logs for debugging:
|
|
|
|
```typescript
|
|
logger.info('API request', {
|
|
correlationId: response.data.correlationId,
|
|
endpoint: '/documents/upload-url',
|
|
userId: 'user-123'
|
|
});
|
|
```
|
|
|
|
### Health Checks
|
|
Monitor API health with correlation IDs:
|
|
|
|
```bash
|
|
curl -X GET http://localhost:5001/api/monitoring/upload-health \
|
|
-H "Authorization: Bearer <firebase_jwt_token>"
|
|
```
|
|
|
|
---
|
|
|
|
This comprehensive API documentation provides all the information needed to integrate with the CIM Document Processor API, including authentication, endpoints, error handling, and usage examples. |