Files
cim_summary/TROUBLESHOOTING_GUIDE.md

15 KiB

Troubleshooting Guide

Complete Problem Resolution for CIM Document Processor

🎯 Overview

This guide provides comprehensive troubleshooting procedures for common issues in the CIM Document Processor, including diagnostic steps, solutions, and prevention strategies.


🔍 Diagnostic Procedures

System Health Check

Quick Health Assessment

# Check application health
curl -f http://localhost:5000/health

# Check database connectivity
curl -f http://localhost:5000/api/documents

# Check authentication service
curl -f http://localhost:5000/api/auth/status

Comprehensive Health Check

// utils/diagnostics.ts
export const runSystemDiagnostics = async () => {
  const diagnostics = {
    timestamp: new Date().toISOString(),
    services: {
      database: await checkDatabaseHealth(),
      storage: await checkStorageHealth(),
      auth: await checkAuthHealth(),
      ai: await checkAIHealth()
    },
    resources: {
      memory: process.memoryUsage(),
      cpu: process.cpuUsage(),
      uptime: process.uptime()
    }
  };
  
  return diagnostics;
};

🚨 Common Issues and Solutions

Authentication Issues

Problem: User cannot log in

Symptoms:

  • Login form shows "Invalid credentials"
  • Firebase authentication errors
  • Token validation failures

Diagnostic Steps:

  1. Check Firebase project configuration
  2. Verify authentication tokens
  3. Check network connectivity to Firebase
  4. Review authentication logs

Solutions:

// Check Firebase configuration
const firebaseConfig = {
  apiKey: process.env.FIREBASE_API_KEY,
  authDomain: process.env.FIREBASE_AUTH_DOMAIN,
  projectId: process.env.FIREBASE_PROJECT_ID
};

// Verify token validation
const verifyToken = async (token: string) => {
  try {
    const decodedToken = await admin.auth().verifyIdToken(token);
    return { valid: true, user: decodedToken };
  } catch (error) {
    logger.error('Token verification failed', { error: error.message });
    return { valid: false, error: error.message };
  }
};

Prevention:

  • Regular Firebase configuration validation
  • Token refresh mechanism
  • Proper error handling in authentication flow

Problem: Token expiration issues

Symptoms:

  • Users logged out unexpectedly
  • API requests returning 401 errors
  • Authentication state inconsistencies

Solutions:

// Implement token refresh
const refreshToken = async (refreshToken: string) => {
  try {
    const response = await fetch(`https://securetoken.googleapis.com/v1/token?key=${apiKey}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        grant_type: 'refresh_token',
        refresh_token: refreshToken
      })
    });
    
    const data = await response.json();
    return { success: true, token: data.id_token };
  } catch (error) {
    return { success: false, error: error.message };
  }
};

Document Upload Issues

Problem: File upload fails

Symptoms:

  • Upload progress stops
  • Error messages about file size or type
  • Storage service errors

Diagnostic Steps:

  1. Check file size and type validation
  2. Verify Firebase Storage configuration
  3. Check network connectivity
  4. Review storage permissions

Solutions:

// Enhanced file validation
const validateFile = (file: File) => {
  const maxSize = 100 * 1024 * 1024; // 100MB
  const allowedTypes = ['application/pdf', 'application/msword', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
  
  if (file.size > maxSize) {
    return { valid: false, error: 'File too large' };
  }
  
  if (!allowedTypes.includes(file.type)) {
    return { valid: false, error: 'Invalid file type' };
  }
  
  return { valid: true };
};

// Storage error handling
const uploadWithRetry = async (file: File, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await uploadToStorage(file);
      return result;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
};

Problem: Upload progress stalls

Symptoms:

  • Progress bar stops advancing
  • No error messages
  • Upload appears to hang

Solutions:

// Implement upload timeout
const uploadWithTimeout = async (file: File, timeoutMs = 300000) => {
  const uploadPromise = uploadToStorage(file);
  const timeoutPromise = new Promise((_, reject) => {
    setTimeout(() => reject(new Error('Upload timeout')), timeoutMs);
  });
  
  return Promise.race([uploadPromise, timeoutPromise]);
};

// Add progress monitoring
const monitorUploadProgress = (uploadTask: any, onProgress: (progress: number) => void) => {
  uploadTask.on('state_changed', 
    (snapshot: any) => {
      const progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
      onProgress(progress);
    },
    (error: any) => {
      console.error('Upload error:', error);
    },
    () => {
      onProgress(100);
    }
  );
};

Document Processing Issues

Problem: Document processing fails

Symptoms:

  • Documents stuck in "processing" status
  • AI processing errors
  • PDF generation failures

Diagnostic Steps:

  1. Check Document AI service status
  2. Verify LLM API credentials
  3. Review processing logs
  4. Check system resources

Solutions:

// Enhanced error handling for Document AI
const processWithFallback = async (document: Document) => {
  try {
    // Try Document AI first
    const result = await processWithDocumentAI(document);
    return result;
  } catch (error) {
    logger.warn('Document AI failed, trying fallback', { error: error.message });
    
    // Fallback to local processing
    try {
      const result = await processWithLocalParser(document);
      return result;
    } catch (fallbackError) {
      logger.error('Both Document AI and fallback failed', { 
        documentAIError: error.message,
        fallbackError: fallbackError.message 
      });
      throw new Error('Document processing failed');
    }
  }
};

// LLM service error handling
const callLLMWithRetry = async (prompt: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await callLLM(prompt);
      return response;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Exponential backoff
      const delay = Math.pow(2, attempt) * 1000;
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
};

Problem: PDF generation fails

Symptoms:

  • PDF generation errors
  • Missing PDF files
  • Generation timeout

Solutions:

// PDF generation with error handling
const generatePDFWithRetry = async (content: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const pdf = await generatePDF(content);
      return pdf;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Clear browser cache and retry
      await clearBrowserCache();
      await new Promise(resolve => setTimeout(resolve, 2000));
    }
  }
};

// Browser resource management
const clearBrowserCache = async () => {
  try {
    await browser.close();
    await browser.launch();
  } catch (error) {
    logger.error('Failed to clear browser cache', { error: error.message });
  }
};

Database Issues

Problem: Database connection failures

Symptoms:

  • API errors with database connection messages
  • Slow response times
  • Connection pool exhaustion

Diagnostic Steps:

  1. Check Supabase service status
  2. Verify database credentials
  3. Check connection pool settings
  4. Review query performance

Solutions:

// Connection pool management
const createConnectionPool = () => {
  return new Pool({
    connectionString: process.env.DATABASE_URL,
    max: 20, // Maximum number of connections
    idleTimeoutMillis: 30000, // Close idle connections after 30 seconds
    connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established
  });
};

// Query timeout handling
const executeQueryWithTimeout = async (query: string, params: any[], timeoutMs = 5000) => {
  const client = await pool.connect();
  
  try {
    const result = await Promise.race([
      client.query(query, params),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error('Query timeout')), timeoutMs)
      )
    ]);
    
    return result;
  } finally {
    client.release();
  }
};

Problem: Slow database queries

Symptoms:

  • Long response times
  • Database timeout errors
  • High CPU usage

Solutions:

// Query optimization
const optimizeQuery = (query: string) => {
  // Add proper indexes
  // Use query planning
  // Implement pagination
  return query;
};

// Implement query caching
const queryCache = new Map();

const cachedQuery = async (key: string, queryFn: () => Promise<any>, ttlMs = 300000) => {
  const cached = queryCache.get(key);
  if (cached && Date.now() - cached.timestamp < ttlMs) {
    return cached.data;
  }
  
  const data = await queryFn();
  queryCache.set(key, { data, timestamp: Date.now() });
  return data;
};

Performance Issues

Problem: Slow application response

Symptoms:

  • High response times
  • Timeout errors
  • User complaints about slowness

Diagnostic Steps:

  1. Monitor CPU and memory usage
  2. Check database query performance
  3. Review external service response times
  4. Analyze request patterns

Solutions:

// Performance monitoring
const performanceMiddleware = (req: Request, res: Response, next: NextFunction) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - start;
    
    if (duration > 5000) {
      logger.warn('Slow request detected', {
        method: req.method,
        path: req.path,
        duration,
        userAgent: req.get('User-Agent')
      });
    }
  });
  
  next();
};

// Implement caching
const cacheMiddleware = (ttlMs = 300000) => {
  const cache = new Map();
  
  return (req: Request, res: Response, next: NextFunction) => {
    const key = `${req.method}:${req.path}:${JSON.stringify(req.query)}`;
    const cached = cache.get(key);
    
    if (cached && Date.now() - cached.timestamp < ttlMs) {
      return res.json(cached.data);
    }
    
    const originalSend = res.json;
    res.json = function(data) {
      cache.set(key, { data, timestamp: Date.now() });
      return originalSend.call(this, data);
    };
    
    next();
  };
};

🔧 Debugging Tools

Log Analysis

Structured Logging

// Enhanced logging
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: { 
    service: 'cim-processor',
    version: process.env.APP_VERSION,
    environment: process.env.NODE_ENV
  },
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
    new winston.transports.Console({
      format: winston.format.simple()
    })
  ]
});

Log Analysis Commands

# Find errors in logs
grep -i "error" logs/combined.log | tail -20

# Find slow requests
grep "duration.*[5-9][0-9][0-9][0-9]" logs/combined.log

# Find authentication failures
grep -i "auth.*fail" logs/combined.log

# Monitor real-time logs
tail -f logs/combined.log | grep -E "(error|warn|critical)"

Debug Endpoints

Debug Information Endpoint

// routes/debug.ts
router.get('/debug/info', async (req: Request, res: Response) => {
  const debugInfo = {
    timestamp: new Date().toISOString(),
    environment: process.env.NODE_ENV,
    version: process.env.APP_VERSION,
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    cpu: process.cpuUsage(),
    services: {
      database: await checkDatabaseHealth(),
      storage: await checkStorageHealth(),
      auth: await checkAuthHealth()
    }
  };
  
  res.json(debugInfo);
});

📋 Troubleshooting Checklist

Pre-Incident Preparation

  • Set up monitoring and alerting
  • Configure structured logging
  • Create runbooks for common issues
  • Establish escalation procedures
  • Document system architecture

During Incident Response

  • Assess impact and scope
  • Check system health endpoints
  • Review recent logs and metrics
  • Identify root cause
  • Implement immediate fix
  • Communicate with stakeholders
  • Monitor system recovery

Post-Incident Review

  • Document incident timeline
  • Analyze root cause
  • Review response effectiveness
  • Update procedures and documentation
  • Implement preventive measures
  • Schedule follow-up review

🛠️ Maintenance Procedures

Regular Maintenance Tasks

Daily Tasks

  • Review system health metrics
  • Check error logs for new issues
  • Monitor performance trends
  • Verify backup systems

Weekly Tasks

  • Review alert effectiveness
  • Analyze performance metrics
  • Update monitoring thresholds
  • Review security logs

Monthly Tasks

  • Performance optimization review
  • Capacity planning assessment
  • Security audit
  • Documentation updates

Preventive Maintenance

System Optimization

// Regular cleanup tasks
const performMaintenance = async () => {
  // Clean up old logs
  await cleanupOldLogs();
  
  // Clear expired cache entries
  await clearExpiredCache();
  
  // Optimize database
  await optimizeDatabase();
  
  // Update system metrics
  await updateSystemMetrics();
};

📞 Support and Escalation

Support Levels

Level 1: Basic Support

  • User authentication issues
  • Basic configuration problems
  • Common error messages

Level 2: Technical Support

  • System performance issues
  • Database problems
  • Integration issues

Level 3: Advanced Support

  • Complex system failures
  • Security incidents
  • Architecture problems

Escalation Procedures

Escalation Criteria

  • System downtime > 15 minutes
  • Data loss or corruption
  • Security breaches
  • Performance degradation > 50%

Escalation Contacts

  • Primary: Operations Team Lead
  • Secondary: System Administrator
  • Emergency: CTO/Technical Director

This comprehensive troubleshooting guide provides the tools and procedures needed to quickly identify and resolve issues in the CIM Document Processor, ensuring high availability and user satisfaction.