Files
cim_summary/API_DOCUMENTATION_GUIDE.md

15 KiB

API Documentation Guide

Complete API Reference for CIM Document Processor

🎯 Overview

This document provides comprehensive API documentation for the CIM Document Processor, including all endpoints, authentication, error handling, and usage examples.


🔐 Authentication

Firebase JWT Authentication

All API endpoints require Firebase JWT authentication. Include the JWT token in the Authorization header:

Authorization: Bearer <firebase_jwt_token>

Token Validation

  • Tokens are validated on every request
  • Invalid or expired tokens return 401 Unauthorized
  • User context is extracted from the token for data isolation

📊 Base URL

Development

http://localhost:5001/api

Production

https://your-domain.com/api

🔌 API Endpoints

Document Management

POST /documents/upload-url

Get a signed upload URL for direct file upload to Google Cloud Storage.

Request Body:

{
  "fileName": "sample_cim.pdf",
  "fileType": "application/pdf",
  "fileSize": 2500000
}

Response:

{
  "success": true,
  "uploadUrl": "https://storage.googleapis.com/...",
  "filePath": "uploads/user-123/doc-456/sample_cim.pdf",
  "correlationId": "req-789"
}

Error Responses:

  • 400 Bad Request - Invalid file type or size
  • 401 Unauthorized - Missing or invalid authentication
  • 500 Internal Server Error - Upload URL generation failed

POST /documents/:id/confirm-upload

Confirm file upload and start document processing.

Path Parameters:

  • id (string, required) - Document ID (UUID)

Request Body:

{
  "filePath": "uploads/user-123/doc-456/sample_cim.pdf",
  "fileSize": 2500000,
  "fileName": "sample_cim.pdf"
}

Response:

{
  "success": true,
  "documentId": "doc-456",
  "status": "processing",
  "message": "Document processing started",
  "correlationId": "req-789"
}

Error Responses:

  • 400 Bad Request - Invalid document ID or file path
  • 401 Unauthorized - Missing or invalid authentication
  • 404 Not Found - Document not found
  • 500 Internal Server Error - Processing failed to start

POST /documents/:id/process-optimized-agentic-rag

Trigger AI processing using the optimized agentic RAG strategy.

Path Parameters:

  • id (string, required) - Document ID (UUID)

Request Body:

{
  "strategy": "optimized_agentic_rag",
  "options": {
    "enableSemanticChunking": true,
    "enableMetadataEnrichment": true
  }
}

Response:

{
  "success": true,
  "processingStrategy": "optimized_agentic_rag",
  "processingTime": 180000,
  "apiCalls": 25,
  "summary": "Comprehensive CIM analysis completed...",
  "analysisData": {
    "dealOverview": { ... },
    "businessDescription": { ... },
    "financialSummary": { ... }
  },
  "correlationId": "req-789"
}

Error Responses:

  • 400 Bad Request - Invalid strategy or options
  • 401 Unauthorized - Missing or invalid authentication
  • 404 Not Found - Document not found
  • 500 Internal Server Error - Processing failed

GET /documents/:id/download

Download the processed PDF report.

Path Parameters:

  • id (string, required) - Document ID (UUID)

Response:

  • 200 OK - PDF file stream
  • Content-Type: application/pdf
  • Content-Disposition: attachment; filename="cim_report.pdf"

Error Responses:

  • 401 Unauthorized - Missing or invalid authentication
  • 404 Not Found - Document or PDF not found
  • 500 Internal Server Error - Download failed

DELETE /documents/:id

Delete a document and all associated data.

Path Parameters:

  • id (string, required) - Document ID (UUID)

Response:

{
  "success": true,
  "message": "Document deleted successfully",
  "correlationId": "req-789"
}

Error Responses:

  • 401 Unauthorized - Missing or invalid authentication
  • 404 Not Found - Document not found
  • 500 Internal Server Error - Deletion failed

Analytics & Monitoring

GET /documents/analytics

Get processing analytics for the current user.

Query Parameters:

  • days (number, optional) - Number of days to analyze (default: 30)

Response:

{
  "success": true,
  "analytics": {
    "totalDocuments": 150,
    "processingSuccessRate": 0.95,
    "averageProcessingTime": 180000,
    "totalApiCalls": 3750,
    "estimatedCost": 45.50,
    "documentsByStatus": {
      "completed": 142,
      "processing": 5,
      "failed": 3
    },
    "processingTrends": [
      {
        "date": "2024-12-20",
        "documentsProcessed": 8,
        "averageTime": 175000
      }
    ]
  },
  "correlationId": "req-789"
}

GET /documents/processing-stats

Get real-time processing statistics.

Response:

{
  "success": true,
  "stats": {
    "totalDocuments": 150,
    "documentAiAgenticRagSuccess": 142,
    "averageProcessingTime": {
      "documentAiAgenticRag": 180000
    },
    "averageApiCalls": {
      "documentAiAgenticRag": 25
    },
    "activeProcessing": 3,
    "queueLength": 2
  },
  "correlationId": "req-789"
}

GET /documents/:id/agentic-rag-sessions

Get agentic RAG processing sessions for a document.

Path Parameters:

  • id (string, required) - Document ID (UUID)

Response:

{
  "success": true,
  "sessions": [
    {
      "id": "session-123",
      "strategy": "optimized_agentic_rag",
      "status": "completed",
      "totalAgents": 6,
      "completedAgents": 6,
      "failedAgents": 0,
      "overallValidationScore": 0.92,
      "processingTimeMs": 180000,
      "apiCallsCount": 25,
      "totalCost": 0.35,
      "createdAt": "2024-12-20T10:30:00Z",
      "completedAt": "2024-12-20T10:33:00Z"
    }
  ],
  "correlationId": "req-789"
}

Monitoring Endpoints

GET /monitoring/upload-metrics

Get upload metrics for a specified time period.

Query Parameters:

  • hours (number, required) - Number of hours to analyze (1-168)

Response:

{
  "success": true,
  "data": {
    "totalUploads": 45,
    "successfulUploads": 43,
    "failedUploads": 2,
    "successRate": 0.956,
    "averageFileSize": 2500000,
    "totalDataTransferred": 112500000,
    "uploadTrends": [
      {
        "hour": "2024-12-20T10:00:00Z",
        "uploads": 8,
        "successRate": 1.0
      }
    ]
  },
  "correlationId": "req-789"
}

GET /monitoring/upload-health

Get upload pipeline health status.

Response:

{
  "success": true,
  "data": {
    "status": "healthy",
    "successRate": 0.956,
    "averageResponseTime": 1500,
    "errorRate": 0.044,
    "activeConnections": 12,
    "lastError": null,
    "lastErrorTime": null,
    "uptime": 86400000
  },
  "correlationId": "req-789"
}

GET /monitoring/real-time-stats

Get real-time upload statistics.

Response:

{
  "success": true,
  "data": {
    "currentUploads": 3,
    "queueLength": 2,
    "processingRate": 8.5,
    "averageProcessingTime": 180000,
    "memoryUsage": 45.2,
    "cpuUsage": 23.1,
    "activeUsers": 15,
    "systemLoad": 0.67
  },
  "correlationId": "req-789"
}

Vector Database Endpoints

GET /vector/document-chunks/:documentId

Get document chunks for a specific document.

Path Parameters:

  • documentId (string, required) - Document ID (UUID)

Response:

{
  "success": true,
  "chunks": [
    {
      "id": "chunk-123",
      "content": "Document chunk content...",
      "embedding": [0.1, 0.2, 0.3, ...],
      "metadata": {
        "sectionType": "financial",
        "confidence": 0.95
      },
      "createdAt": "2024-12-20T10:30:00Z"
    }
  ],
  "correlationId": "req-789"
}

GET /vector/analytics

Get search analytics for the current user.

Query Parameters:

  • days (number, optional) - Number of days to analyze (default: 30)

Response:

{
  "success": true,
  "analytics": {
    "totalSearches": 125,
    "averageSearchTime": 250,
    "searchSuccessRate": 0.98,
    "popularQueries": [
      "financial performance",
      "market analysis",
      "management team"
    ],
    "searchTrends": [
      {
        "date": "2024-12-20",
        "searches": 8,
        "averageTime": 245
      }
    ]
  },
  "correlationId": "req-789"
}

GET /vector/stats

Get vector database statistics.

Response:

{
  "success": true,
  "stats": {
    "totalChunks": 1500,
    "totalDocuments": 150,
    "averageChunkSize": 4000,
    "embeddingDimensions": 1536,
    "indexSize": 2500000,
    "queryPerformance": {
      "averageQueryTime": 250,
      "cacheHitRate": 0.85
    }
  },
  "correlationId": "req-789"
}

🚨 Error Handling

Standard Error Response Format

All error responses follow this format:

{
  "success": false,
  "error": "Error message description",
  "errorCode": "ERROR_CODE",
  "correlationId": "req-789",
  "details": {
    "field": "Additional error details"
  }
}

Common Error Codes

400 Bad Request

  • INVALID_INPUT - Invalid request parameters
  • MISSING_REQUIRED_FIELD - Required field is missing
  • INVALID_FILE_TYPE - Unsupported file type
  • FILE_TOO_LARGE - File size exceeds limit

401 Unauthorized

  • MISSING_TOKEN - Authentication token is missing
  • INVALID_TOKEN - Authentication token is invalid
  • EXPIRED_TOKEN - Authentication token has expired

404 Not Found

  • DOCUMENT_NOT_FOUND - Document does not exist
  • SESSION_NOT_FOUND - Processing session not found
  • FILE_NOT_FOUND - File does not exist

500 Internal Server Error

  • PROCESSING_FAILED - Document processing failed
  • STORAGE_ERROR - File storage operation failed
  • DATABASE_ERROR - Database operation failed
  • EXTERNAL_SERVICE_ERROR - External service unavailable

Error Recovery Strategies

Retry Logic

  • Transient Errors: Automatically retry with exponential backoff
  • Rate Limiting: Respect rate limits and implement backoff
  • Service Unavailable: Retry with increasing delays

Fallback Strategies

  • Primary Strategy: Optimized agentic RAG processing
  • Fallback Strategy: Basic processing without advanced features
  • Degradation Strategy: Simple text extraction only

📊 Rate Limiting

Limits

  • Upload Endpoints: 10 requests per minute per user
  • Processing Endpoints: 5 requests per minute per user
  • Analytics Endpoints: 30 requests per minute per user
  • Download Endpoints: 20 requests per minute per user

Rate Limit Headers

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1640000000

Rate Limit Exceeded Response

{
  "success": false,
  "error": "Rate limit exceeded",
  "errorCode": "RATE_LIMIT_EXCEEDED",
  "retryAfter": 60,
  "correlationId": "req-789"
}

📋 Usage Examples

Complete Document Processing Workflow

1. Get Upload URL

curl -X POST http://localhost:5001/api/documents/upload-url \
  -H "Authorization: Bearer <firebase_jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "fileName": "sample_cim.pdf",
    "fileType": "application/pdf",
    "fileSize": 2500000
  }'

2. Upload File to GCS

curl -X PUT "<upload_url>" \
  -H "Content-Type: application/pdf" \
  --upload-file sample_cim.pdf

3. Confirm Upload

curl -X POST http://localhost:5001/api/documents/doc-123/confirm-upload \
  -H "Authorization: Bearer <firebase_jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "filePath": "uploads/user-123/doc-123/sample_cim.pdf",
    "fileSize": 2500000,
    "fileName": "sample_cim.pdf"
  }'

4. Trigger AI Processing

curl -X POST http://localhost:5001/api/documents/doc-123/process-optimized-agentic-rag \
  -H "Authorization: Bearer <firebase_jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "strategy": "optimized_agentic_rag",
    "options": {
      "enableSemanticChunking": true,
      "enableMetadataEnrichment": true
    }
  }'

5. Download PDF Report

curl -X GET http://localhost:5001/api/documents/doc-123/download \
  -H "Authorization: Bearer <firebase_jwt_token>" \
  --output cim_report.pdf

JavaScript/TypeScript Examples

Document Upload and Processing

import axios from 'axios';

const API_BASE = 'http://localhost:5001/api';
const AUTH_TOKEN = 'firebase_jwt_token';

// Get upload URL
const uploadUrlResponse = await axios.post(`${API_BASE}/documents/upload-url`, {
  fileName: 'sample_cim.pdf',
  fileType: 'application/pdf',
  fileSize: 2500000
}, {
  headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
});

const { uploadUrl, filePath } = uploadUrlResponse.data;

// Upload file to GCS
await axios.put(uploadUrl, fileBuffer, {
  headers: { 'Content-Type': 'application/pdf' }
});

// Confirm upload
await axios.post(`${API_BASE}/documents/${documentId}/confirm-upload`, {
  filePath,
  fileSize: 2500000,
  fileName: 'sample_cim.pdf'
}, {
  headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
});

// Trigger AI processing
const processingResponse = await axios.post(
  `${API_BASE}/documents/${documentId}/process-optimized-agentic-rag`,
  {
    strategy: 'optimized_agentic_rag',
    options: {
      enableSemanticChunking: true,
      enableMetadataEnrichment: true
    }
  },
  {
    headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
  }
);

console.log('Processing result:', processingResponse.data);

Error Handling

try {
  const response = await axios.post(`${API_BASE}/documents/upload-url`, {
    fileName: 'sample_cim.pdf',
    fileType: 'application/pdf',
    fileSize: 2500000
  }, {
    headers: { Authorization: `Bearer ${AUTH_TOKEN}` }
  });
  
  console.log('Upload URL:', response.data.uploadUrl);
} catch (error) {
  if (error.response) {
    const { status, data } = error.response;
    
    switch (status) {
      case 400:
        console.error('Bad request:', data.error);
        break;
      case 401:
        console.error('Authentication failed:', data.error);
        break;
      case 429:
        console.error('Rate limit exceeded, retry after:', data.retryAfter, 'seconds');
        break;
      case 500:
        console.error('Server error:', data.error);
        break;
      default:
        console.error('Unexpected error:', data.error);
    }
  } else {
    console.error('Network error:', error.message);
  }
}

🔍 Monitoring and Debugging

Correlation IDs

All API responses include a correlationId for request tracking:

{
  "success": true,
  "data": { ... },
  "correlationId": "req-789"
}

Request Logging

Include correlation ID in logs for debugging:

logger.info('API request', {
  correlationId: response.data.correlationId,
  endpoint: '/documents/upload-url',
  userId: 'user-123'
});

Health Checks

Monitor API health with correlation IDs:

curl -X GET http://localhost:5001/api/monitoring/upload-health \
  -H "Authorization: Bearer <firebase_jwt_token>"

This comprehensive API documentation provides all the information needed to integrate with the CIM Document Processor API, including authentication, endpoints, error handling, and usage examples.