Files
cim_summary/TROUBLESHOOTING_GUIDE.md

606 lines
15 KiB
Markdown

# Troubleshooting Guide
## Complete Problem Resolution for CIM Document Processor
### 🎯 Overview
This guide provides comprehensive troubleshooting procedures for common issues in the CIM Document Processor, including diagnostic steps, solutions, and prevention strategies.
---
## 🔍 Diagnostic Procedures
### System Health Check
#### **Quick Health Assessment**
```bash
# Check application health
curl -f http://localhost:5000/health
# Check database connectivity
curl -f http://localhost:5000/api/documents
# Check authentication service
curl -f http://localhost:5000/api/auth/status
```
#### **Comprehensive Health Check**
```typescript
// utils/diagnostics.ts
export const runSystemDiagnostics = async () => {
const diagnostics = {
timestamp: new Date().toISOString(),
services: {
database: await checkDatabaseHealth(),
storage: await checkStorageHealth(),
auth: await checkAuthHealth(),
ai: await checkAIHealth()
},
resources: {
memory: process.memoryUsage(),
cpu: process.cpuUsage(),
uptime: process.uptime()
}
};
return diagnostics;
};
```
---
## 🚨 Common Issues and Solutions
### Authentication Issues
#### **Problem**: User cannot log in
**Symptoms**:
- Login form shows "Invalid credentials"
- Firebase authentication errors
- Token validation failures
**Diagnostic Steps**:
1. Check Firebase project configuration
2. Verify authentication tokens
3. Check network connectivity to Firebase
4. Review authentication logs
**Solutions**:
```typescript
// Check Firebase configuration
const firebaseConfig = {
apiKey: process.env.FIREBASE_API_KEY,
authDomain: process.env.FIREBASE_AUTH_DOMAIN,
projectId: process.env.FIREBASE_PROJECT_ID
};
// Verify token validation
const verifyToken = async (token: string) => {
try {
const decodedToken = await admin.auth().verifyIdToken(token);
return { valid: true, user: decodedToken };
} catch (error) {
logger.error('Token verification failed', { error: error.message });
return { valid: false, error: error.message };
}
};
```
**Prevention**:
- Regular Firebase configuration validation
- Token refresh mechanism
- Proper error handling in authentication flow
#### **Problem**: Token expiration issues
**Symptoms**:
- Users logged out unexpectedly
- API requests returning 401 errors
- Authentication state inconsistencies
**Solutions**:
```typescript
// Implement token refresh
const refreshToken = async (refreshToken: string) => {
try {
const response = await fetch(`https://securetoken.googleapis.com/v1/token?key=${apiKey}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
grant_type: 'refresh_token',
refresh_token: refreshToken
})
});
const data = await response.json();
return { success: true, token: data.id_token };
} catch (error) {
return { success: false, error: error.message };
}
};
```
### Document Upload Issues
#### **Problem**: File upload fails
**Symptoms**:
- Upload progress stops
- Error messages about file size or type
- Storage service errors
**Diagnostic Steps**:
1. Check file size and type validation
2. Verify Firebase Storage configuration
3. Check network connectivity
4. Review storage permissions
**Solutions**:
```typescript
// Enhanced file validation
const validateFile = (file: File) => {
const maxSize = 100 * 1024 * 1024; // 100MB
const allowedTypes = ['application/pdf', 'application/msword', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
if (file.size > maxSize) {
return { valid: false, error: 'File too large' };
}
if (!allowedTypes.includes(file.type)) {
return { valid: false, error: 'Invalid file type' };
}
return { valid: true };
};
// Storage error handling
const uploadWithRetry = async (file: File, maxRetries = 3) => {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await uploadToStorage(file);
return result;
} catch (error) {
if (attempt === maxRetries) throw error;
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
};
```
#### **Problem**: Upload progress stalls
**Symptoms**:
- Progress bar stops advancing
- No error messages
- Upload appears to hang
**Solutions**:
```typescript
// Implement upload timeout
const uploadWithTimeout = async (file: File, timeoutMs = 300000) => {
const uploadPromise = uploadToStorage(file);
const timeoutPromise = new Promise((_, reject) => {
setTimeout(() => reject(new Error('Upload timeout')), timeoutMs);
});
return Promise.race([uploadPromise, timeoutPromise]);
};
// Add progress monitoring
const monitorUploadProgress = (uploadTask: any, onProgress: (progress: number) => void) => {
uploadTask.on('state_changed',
(snapshot: any) => {
const progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
onProgress(progress);
},
(error: any) => {
console.error('Upload error:', error);
},
() => {
onProgress(100);
}
);
};
```
### Document Processing Issues
#### **Problem**: Document processing fails
**Symptoms**:
- Documents stuck in "processing" status
- AI processing errors
- PDF generation failures
**Diagnostic Steps**:
1. Check Document AI service status
2. Verify LLM API credentials
3. Review processing logs
4. Check system resources
**Solutions**:
```typescript
// Enhanced error handling for Document AI
const processWithFallback = async (document: Document) => {
try {
// Try Document AI first
const result = await processWithDocumentAI(document);
return result;
} catch (error) {
logger.warn('Document AI failed, trying fallback', { error: error.message });
// Fallback to local processing
try {
const result = await processWithLocalParser(document);
return result;
} catch (fallbackError) {
logger.error('Both Document AI and fallback failed', {
documentAIError: error.message,
fallbackError: fallbackError.message
});
throw new Error('Document processing failed');
}
}
};
// LLM service error handling
const callLLMWithRetry = async (prompt: string, maxRetries = 3) => {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await callLLM(prompt);
return response;
} catch (error) {
if (attempt === maxRetries) throw error;
// Exponential backoff
const delay = Math.pow(2, attempt) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
}
}
};
```
#### **Problem**: PDF generation fails
**Symptoms**:
- PDF generation errors
- Missing PDF files
- Generation timeout
**Solutions**:
```typescript
// PDF generation with error handling
const generatePDFWithRetry = async (content: string, maxRetries = 3) => {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const pdf = await generatePDF(content);
return pdf;
} catch (error) {
if (attempt === maxRetries) throw error;
// Clear browser cache and retry
await clearBrowserCache();
await new Promise(resolve => setTimeout(resolve, 2000));
}
}
};
// Browser resource management
const clearBrowserCache = async () => {
try {
await browser.close();
await browser.launch();
} catch (error) {
logger.error('Failed to clear browser cache', { error: error.message });
}
};
```
### Database Issues
#### **Problem**: Database connection failures
**Symptoms**:
- API errors with database connection messages
- Slow response times
- Connection pool exhaustion
**Diagnostic Steps**:
1. Check Supabase service status
2. Verify database credentials
3. Check connection pool settings
4. Review query performance
**Solutions**:
```typescript
// Connection pool management
const createConnectionPool = () => {
return new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Maximum number of connections
idleTimeoutMillis: 30000, // Close idle connections after 30 seconds
connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established
});
};
// Query timeout handling
const executeQueryWithTimeout = async (query: string, params: any[], timeoutMs = 5000) => {
const client = await pool.connect();
try {
const result = await Promise.race([
client.query(query, params),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Query timeout')), timeoutMs)
)
]);
return result;
} finally {
client.release();
}
};
```
#### **Problem**: Slow database queries
**Symptoms**:
- Long response times
- Database timeout errors
- High CPU usage
**Solutions**:
```typescript
// Query optimization
const optimizeQuery = (query: string) => {
// Add proper indexes
// Use query planning
// Implement pagination
return query;
};
// Implement query caching
const queryCache = new Map();
const cachedQuery = async (key: string, queryFn: () => Promise<any>, ttlMs = 300000) => {
const cached = queryCache.get(key);
if (cached && Date.now() - cached.timestamp < ttlMs) {
return cached.data;
}
const data = await queryFn();
queryCache.set(key, { data, timestamp: Date.now() });
return data;
};
```
### Performance Issues
#### **Problem**: Slow application response
**Symptoms**:
- High response times
- Timeout errors
- User complaints about slowness
**Diagnostic Steps**:
1. Monitor CPU and memory usage
2. Check database query performance
3. Review external service response times
4. Analyze request patterns
**Solutions**:
```typescript
// Performance monitoring
const performanceMiddleware = (req: Request, res: Response, next: NextFunction) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
if (duration > 5000) {
logger.warn('Slow request detected', {
method: req.method,
path: req.path,
duration,
userAgent: req.get('User-Agent')
});
}
});
next();
};
// Implement caching
const cacheMiddleware = (ttlMs = 300000) => {
const cache = new Map();
return (req: Request, res: Response, next: NextFunction) => {
const key = `${req.method}:${req.path}:${JSON.stringify(req.query)}`;
const cached = cache.get(key);
if (cached && Date.now() - cached.timestamp < ttlMs) {
return res.json(cached.data);
}
const originalSend = res.json;
res.json = function(data) {
cache.set(key, { data, timestamp: Date.now() });
return originalSend.call(this, data);
};
next();
};
};
```
---
## 🔧 Debugging Tools
### Log Analysis
#### **Structured Logging**
```typescript
// Enhanced logging
const logger = winston.createLogger({
level: 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'cim-processor',
version: process.env.APP_VERSION,
environment: process.env.NODE_ENV
},
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' }),
new winston.transports.Console({
format: winston.format.simple()
})
]
});
```
#### **Log Analysis Commands**
```bash
# Find errors in logs
grep -i "error" logs/combined.log | tail -20
# Find slow requests
grep "duration.*[5-9][0-9][0-9][0-9]" logs/combined.log
# Find authentication failures
grep -i "auth.*fail" logs/combined.log
# Monitor real-time logs
tail -f logs/combined.log | grep -E "(error|warn|critical)"
```
### Debug Endpoints
#### **Debug Information Endpoint**
```typescript
// routes/debug.ts
router.get('/debug/info', async (req: Request, res: Response) => {
const debugInfo = {
timestamp: new Date().toISOString(),
environment: process.env.NODE_ENV,
version: process.env.APP_VERSION,
uptime: process.uptime(),
memory: process.memoryUsage(),
cpu: process.cpuUsage(),
services: {
database: await checkDatabaseHealth(),
storage: await checkStorageHealth(),
auth: await checkAuthHealth()
}
};
res.json(debugInfo);
});
```
---
## 📋 Troubleshooting Checklist
### Pre-Incident Preparation
- [ ] Set up monitoring and alerting
- [ ] Configure structured logging
- [ ] Create runbooks for common issues
- [ ] Establish escalation procedures
- [ ] Document system architecture
### During Incident Response
- [ ] Assess impact and scope
- [ ] Check system health endpoints
- [ ] Review recent logs and metrics
- [ ] Identify root cause
- [ ] Implement immediate fix
- [ ] Communicate with stakeholders
- [ ] Monitor system recovery
### Post-Incident Review
- [ ] Document incident timeline
- [ ] Analyze root cause
- [ ] Review response effectiveness
- [ ] Update procedures and documentation
- [ ] Implement preventive measures
- [ ] Schedule follow-up review
---
## 🛠️ Maintenance Procedures
### Regular Maintenance Tasks
#### **Daily Tasks**
- [ ] Review system health metrics
- [ ] Check error logs for new issues
- [ ] Monitor performance trends
- [ ] Verify backup systems
#### **Weekly Tasks**
- [ ] Review alert effectiveness
- [ ] Analyze performance metrics
- [ ] Update monitoring thresholds
- [ ] Review security logs
#### **Monthly Tasks**
- [ ] Performance optimization review
- [ ] Capacity planning assessment
- [ ] Security audit
- [ ] Documentation updates
### Preventive Maintenance
#### **System Optimization**
```typescript
// Regular cleanup tasks
const performMaintenance = async () => {
// Clean up old logs
await cleanupOldLogs();
// Clear expired cache entries
await clearExpiredCache();
// Optimize database
await optimizeDatabase();
// Update system metrics
await updateSystemMetrics();
};
```
---
## 📞 Support and Escalation
### Support Levels
#### **Level 1: Basic Support**
- User authentication issues
- Basic configuration problems
- Common error messages
#### **Level 2: Technical Support**
- System performance issues
- Database problems
- Integration issues
#### **Level 3: Advanced Support**
- Complex system failures
- Security incidents
- Architecture problems
### Escalation Procedures
#### **Escalation Criteria**
- System downtime > 15 minutes
- Data loss or corruption
- Security breaches
- Performance degradation > 50%
#### **Escalation Contacts**
- **Primary**: Operations Team Lead
- **Secondary**: System Administrator
- **Emergency**: CTO/Technical Director
---
This comprehensive troubleshooting guide provides the tools and procedures needed to quickly identify and resolve issues in the CIM Document Processor, ensuring high availability and user satisfaction.