# Troubleshooting Guide
## Complete Problem Resolution for CIM Document Processor

### 🎯 Overview

This guide provides comprehensive troubleshooting procedures for common issues in the CIM Document Processor, including diagnostic steps, solutions, and prevention strategies.

---

## 🔍 Diagnostic Procedures

### System Health Check

#### **Quick Health Assessment**
```bash
# Check application health
curl -f http://localhost:5000/health

# Check database connectivity
curl -f http://localhost:5000/api/documents

# Check authentication service
curl -f http://localhost:5000/api/auth/status
```

#### **Comprehensive Health Check**
```typescript
// utils/diagnostics.ts
export const runSystemDiagnostics = async () => {
  const diagnostics = {
    timestamp: new Date().toISOString(),
    services: {
      database: await checkDatabaseHealth(),
      storage: await checkStorageHealth(),
      auth: await checkAuthHealth(),
      ai: await checkAIHealth()
    },
    resources: {
      memory: process.memoryUsage(),
      cpu: process.cpuUsage(),
      uptime: process.uptime()
    }
  };
  
  return diagnostics;
};
```

---

## 🚨 Common Issues and Solutions

### Authentication Issues

#### **Problem**: User cannot log in
**Symptoms**:
- Login form shows "Invalid credentials"
- Firebase authentication errors
- Token validation failures

**Diagnostic Steps**:
1. Check Firebase project configuration
2. Verify authentication tokens
3. Check network connectivity to Firebase
4. Review authentication logs

**Solutions**:
```typescript
// Check Firebase configuration
const firebaseConfig = {
  apiKey: process.env.FIREBASE_API_KEY,
  authDomain: process.env.FIREBASE_AUTH_DOMAIN,
  projectId: process.env.FIREBASE_PROJECT_ID
};

// Verify token validation
const verifyToken = async (token: string) => {
  try {
    const decodedToken = await admin.auth().verifyIdToken(token);
    return { valid: true, user: decodedToken };
  } catch (error) {
    logger.error('Token verification failed', { error: error.message });
    return { valid: false, error: error.message };
  }
};
```

**Prevention**:
- Regular Firebase configuration validation
- Token refresh mechanism
- Proper error handling in authentication flow

#### **Problem**: Token expiration issues
**Symptoms**:
- Users logged out unexpectedly
- API requests returning 401 errors
- Authentication state inconsistencies

**Solutions**:
```typescript
// Implement token refresh
const refreshToken = async (refreshToken: string) => {
  try {
    const response = await fetch(`https://securetoken.googleapis.com/v1/token?key=${apiKey}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        grant_type: 'refresh_token',
        refresh_token: refreshToken
      })
    });
    
    const data = await response.json();
    return { success: true, token: data.id_token };
  } catch (error) {
    return { success: false, error: error.message };
  }
};
```

### Document Upload Issues

#### **Problem**: File upload fails
**Symptoms**:
- Upload progress stops
- Error messages about file size or type
- Storage service errors

**Diagnostic Steps**:
1. Check file size and type validation
2. Verify Firebase Storage configuration
3. Check network connectivity
4. Review storage permissions

**Solutions**:
```typescript
// Enhanced file validation
const validateFile = (file: File) => {
  const maxSize = 100 * 1024 * 1024; // 100MB
  const allowedTypes = ['application/pdf', 'application/msword', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
  
  if (file.size > maxSize) {
    return { valid: false, error: 'File too large' };
  }
  
  if (!allowedTypes.includes(file.type)) {
    return { valid: false, error: 'Invalid file type' };
  }
  
  return { valid: true };
};

// Storage error handling
const uploadWithRetry = async (file: File, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await uploadToStorage(file);
      return result;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
};
```

#### **Problem**: Upload progress stalls
**Symptoms**:
- Progress bar stops advancing
- No error messages
- Upload appears to hang

**Solutions**:
```typescript
// Implement upload timeout
const uploadWithTimeout = async (file: File, timeoutMs = 300000) => {
  const uploadPromise = uploadToStorage(file);
  const timeoutPromise = new Promise((_, reject) => {
    setTimeout(() => reject(new Error('Upload timeout')), timeoutMs);
  });
  
  return Promise.race([uploadPromise, timeoutPromise]);
};

// Add progress monitoring
const monitorUploadProgress = (uploadTask: any, onProgress: (progress: number) => void) => {
  uploadTask.on('state_changed', 
    (snapshot: any) => {
      const progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
      onProgress(progress);
    },
    (error: any) => {
      console.error('Upload error:', error);
    },
    () => {
      onProgress(100);
    }
  );
};
```

### Document Processing Issues

#### **Problem**: Document processing fails
**Symptoms**:
- Documents stuck in "processing" status
- AI processing errors
- PDF generation failures

**Diagnostic Steps**:
1. Check Document AI service status
2. Verify LLM API credentials
3. Review processing logs
4. Check system resources

**Solutions**:
```typescript
// Enhanced error handling for Document AI
const processWithFallback = async (document: Document) => {
  try {
    // Try Document AI first
    const result = await processWithDocumentAI(document);
    return result;
  } catch (error) {
    logger.warn('Document AI failed, trying fallback', { error: error.message });
    
    // Fallback to local processing
    try {
      const result = await processWithLocalParser(document);
      return result;
    } catch (fallbackError) {
      logger.error('Both Document AI and fallback failed', { 
        documentAIError: error.message,
        fallbackError: fallbackError.message 
      });
      throw new Error('Document processing failed');
    }
  }
};

// LLM service error handling
const callLLMWithRetry = async (prompt: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await callLLM(prompt);
      return response;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Exponential backoff
      const delay = Math.pow(2, attempt) * 1000;
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
};
```

#### **Problem**: PDF generation fails
**Symptoms**:
- PDF generation errors
- Missing PDF files
- Generation timeout

**Solutions**:
```typescript
// PDF generation with error handling
const generatePDFWithRetry = async (content: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const pdf = await generatePDF(content);
      return pdf;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Clear browser cache and retry
      await clearBrowserCache();
      await new Promise(resolve => setTimeout(resolve, 2000));
    }
  }
};

// Browser resource management
const clearBrowserCache = async () => {
  try {
    await browser.close();
    await browser.launch();
  } catch (error) {
    logger.error('Failed to clear browser cache', { error: error.message });
  }
};
```

### Database Issues

#### **Problem**: Database connection failures
**Symptoms**:
- API errors with database connection messages
- Slow response times
- Connection pool exhaustion

**Diagnostic Steps**:
1. Check Supabase service status
2. Verify database credentials
3. Check connection pool settings
4. Review query performance

**Solutions**:
```typescript
// Connection pool management
const createConnectionPool = () => {
  return new Pool({
    connectionString: process.env.DATABASE_URL,
    max: 20, // Maximum number of connections
    idleTimeoutMillis: 30000, // Close idle connections after 30 seconds
    connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established
  });
};

// Query timeout handling
const executeQueryWithTimeout = async (query: string, params: any[], timeoutMs = 5000) => {
  const client = await pool.connect();
  
  try {
    const result = await Promise.race([
      client.query(query, params),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error('Query timeout')), timeoutMs)
      )
    ]);
    
    return result;
  } finally {
    client.release();
  }
};
```

#### **Problem**: Slow database queries
**Symptoms**:
- Long response times
- Database timeout errors
- High CPU usage

**Solutions**:
```typescript
// Query optimization
const optimizeQuery = (query: string) => {
  // Add proper indexes
  // Use query planning
  // Implement pagination
  return query;
};

// Implement query caching
const queryCache = new Map();

const cachedQuery = async (key: string, queryFn: () => Promise<any>, ttlMs = 300000) => {
  const cached = queryCache.get(key);
  if (cached && Date.now() - cached.timestamp < ttlMs) {
    return cached.data;
  }
  
  const data = await queryFn();
  queryCache.set(key, { data, timestamp: Date.now() });
  return data;
};
```

### Performance Issues

#### **Problem**: Slow application response
**Symptoms**:
- High response times
- Timeout errors
- User complaints about slowness

**Diagnostic Steps**:
1. Monitor CPU and memory usage
2. Check database query performance
3. Review external service response times
4. Analyze request patterns

**Solutions**:
```typescript
// Performance monitoring
const performanceMiddleware = (req: Request, res: Response, next: NextFunction) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - start;
    
    if (duration > 5000) {
      logger.warn('Slow request detected', {
        method: req.method,
        path: req.path,
        duration,
        userAgent: req.get('User-Agent')
      });
    }
  });
  
  next();
};

// Implement caching
const cacheMiddleware = (ttlMs = 300000) => {
  const cache = new Map();
  
  return (req: Request, res: Response, next: NextFunction) => {
    const key = `${req.method}:${req.path}:${JSON.stringify(req.query)}`;
    const cached = cache.get(key);
    
    if (cached && Date.now() - cached.timestamp < ttlMs) {
      return res.json(cached.data);
    }
    
    const originalSend = res.json;
    res.json = function(data) {
      cache.set(key, { data, timestamp: Date.now() });
      return originalSend.call(this, data);
    };
    
    next();
  };
};
```

---

## 🔧 Debugging Tools

### Log Analysis

#### **Structured Logging**
```typescript
// Enhanced logging
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: { 
    service: 'cim-processor',
    version: process.env.APP_VERSION,
    environment: process.env.NODE_ENV
  },
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
    new winston.transports.Console({
      format: winston.format.simple()
    })
  ]
});
```

#### **Log Analysis Commands**
```bash
# Find errors in logs
grep -i "error" logs/combined.log | tail -20

# Find slow requests
grep "duration.*[5-9][0-9][0-9][0-9]" logs/combined.log

# Find authentication failures
grep -i "auth.*fail" logs/combined.log

# Monitor real-time logs
tail -f logs/combined.log | grep -E "(error|warn|critical)"
```

### Debug Endpoints

#### **Debug Information Endpoint**
```typescript
// routes/debug.ts
router.get('/debug/info', async (req: Request, res: Response) => {
  const debugInfo = {
    timestamp: new Date().toISOString(),
    environment: process.env.NODE_ENV,
    version: process.env.APP_VERSION,
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    cpu: process.cpuUsage(),
    services: {
      database: await checkDatabaseHealth(),
      storage: await checkStorageHealth(),
      auth: await checkAuthHealth()
    }
  };
  
  res.json(debugInfo);
});
```

---

## 📋 Troubleshooting Checklist

### Pre-Incident Preparation
- [ ] Set up monitoring and alerting
- [ ] Configure structured logging
- [ ] Create runbooks for common issues
- [ ] Establish escalation procedures
- [ ] Document system architecture

### During Incident Response
- [ ] Assess impact and scope
- [ ] Check system health endpoints
- [ ] Review recent logs and metrics
- [ ] Identify root cause
- [ ] Implement immediate fix
- [ ] Communicate with stakeholders
- [ ] Monitor system recovery

### Post-Incident Review
- [ ] Document incident timeline
- [ ] Analyze root cause
- [ ] Review response effectiveness
- [ ] Update procedures and documentation
- [ ] Implement preventive measures
- [ ] Schedule follow-up review

---

## 🛠️ Maintenance Procedures

### Regular Maintenance Tasks

#### **Daily Tasks**
- [ ] Review system health metrics
- [ ] Check error logs for new issues
- [ ] Monitor performance trends
- [ ] Verify backup systems

#### **Weekly Tasks**
- [ ] Review alert effectiveness
- [ ] Analyze performance metrics
- [ ] Update monitoring thresholds
- [ ] Review security logs

#### **Monthly Tasks**
- [ ] Performance optimization review
- [ ] Capacity planning assessment
- [ ] Security audit
- [ ] Documentation updates

### Preventive Maintenance

#### **System Optimization**
```typescript
// Regular cleanup tasks
const performMaintenance = async () => {
  // Clean up old logs
  await cleanupOldLogs();
  
  // Clear expired cache entries
  await clearExpiredCache();
  
  // Optimize database
  await optimizeDatabase();
  
  // Update system metrics
  await updateSystemMetrics();
};
```

---

## 📞 Support and Escalation

### Support Levels

#### **Level 1: Basic Support**
- User authentication issues
- Basic configuration problems
- Common error messages

#### **Level 2: Technical Support**
- System performance issues
- Database problems
- Integration issues

#### **Level 3: Advanced Support**
- Complex system failures
- Security incidents
- Architecture problems

### Escalation Procedures

#### **Escalation Criteria**
- System downtime > 15 minutes
- Data loss or corruption
- Security breaches
- Performance degradation > 50%

#### **Escalation Contacts**
- **Primary**: Operations Team Lead
- **Secondary**: System Administrator
- **Emergency**: CTO/Technical Director

---

This comprehensive troubleshooting guide provides the tools and procedures needed to quickly identify and resolve issues in the CIM Document Processor, ensuring high availability and user satisfaction.