606 lines
15 KiB
Markdown
606 lines
15 KiB
Markdown
# Troubleshooting Guide
|
|
## Complete Problem Resolution for CIM Document Processor
|
|
|
|
### 🎯 Overview
|
|
|
|
This guide provides comprehensive troubleshooting procedures for common issues in the CIM Document Processor, including diagnostic steps, solutions, and prevention strategies.
|
|
|
|
---
|
|
|
|
## 🔍 Diagnostic Procedures
|
|
|
|
### System Health Check
|
|
|
|
#### **Quick Health Assessment**
|
|
```bash
|
|
# Check application health
|
|
curl -f http://localhost:5000/health
|
|
|
|
# Check database connectivity
|
|
curl -f http://localhost:5000/api/documents
|
|
|
|
# Check authentication service
|
|
curl -f http://localhost:5000/api/auth/status
|
|
```
|
|
|
|
#### **Comprehensive Health Check**
|
|
```typescript
|
|
// utils/diagnostics.ts
|
|
export const runSystemDiagnostics = async () => {
|
|
const diagnostics = {
|
|
timestamp: new Date().toISOString(),
|
|
services: {
|
|
database: await checkDatabaseHealth(),
|
|
storage: await checkStorageHealth(),
|
|
auth: await checkAuthHealth(),
|
|
ai: await checkAIHealth()
|
|
},
|
|
resources: {
|
|
memory: process.memoryUsage(),
|
|
cpu: process.cpuUsage(),
|
|
uptime: process.uptime()
|
|
}
|
|
};
|
|
|
|
return diagnostics;
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 Common Issues and Solutions
|
|
|
|
### Authentication Issues
|
|
|
|
#### **Problem**: User cannot log in
|
|
**Symptoms**:
|
|
- Login form shows "Invalid credentials"
|
|
- Firebase authentication errors
|
|
- Token validation failures
|
|
|
|
**Diagnostic Steps**:
|
|
1. Check Firebase project configuration
|
|
2. Verify authentication tokens
|
|
3. Check network connectivity to Firebase
|
|
4. Review authentication logs
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Check Firebase configuration
|
|
const firebaseConfig = {
|
|
apiKey: process.env.FIREBASE_API_KEY,
|
|
authDomain: process.env.FIREBASE_AUTH_DOMAIN,
|
|
projectId: process.env.FIREBASE_PROJECT_ID
|
|
};
|
|
|
|
// Verify token validation
|
|
const verifyToken = async (token: string) => {
|
|
try {
|
|
const decodedToken = await admin.auth().verifyIdToken(token);
|
|
return { valid: true, user: decodedToken };
|
|
} catch (error) {
|
|
logger.error('Token verification failed', { error: error.message });
|
|
return { valid: false, error: error.message };
|
|
}
|
|
};
|
|
```
|
|
|
|
**Prevention**:
|
|
- Regular Firebase configuration validation
|
|
- Token refresh mechanism
|
|
- Proper error handling in authentication flow
|
|
|
|
#### **Problem**: Token expiration issues
|
|
**Symptoms**:
|
|
- Users logged out unexpectedly
|
|
- API requests returning 401 errors
|
|
- Authentication state inconsistencies
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Implement token refresh
|
|
const refreshToken = async (refreshToken: string) => {
|
|
try {
|
|
const response = await fetch(`https://securetoken.googleapis.com/v1/token?key=${apiKey}`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
grant_type: 'refresh_token',
|
|
refresh_token: refreshToken
|
|
})
|
|
});
|
|
|
|
const data = await response.json();
|
|
return { success: true, token: data.id_token };
|
|
} catch (error) {
|
|
return { success: false, error: error.message };
|
|
}
|
|
};
|
|
```
|
|
|
|
### Document Upload Issues
|
|
|
|
#### **Problem**: File upload fails
|
|
**Symptoms**:
|
|
- Upload progress stops
|
|
- Error messages about file size or type
|
|
- Storage service errors
|
|
|
|
**Diagnostic Steps**:
|
|
1. Check file size and type validation
|
|
2. Verify Firebase Storage configuration
|
|
3. Check network connectivity
|
|
4. Review storage permissions
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Enhanced file validation
|
|
const validateFile = (file: File) => {
|
|
const maxSize = 100 * 1024 * 1024; // 100MB
|
|
const allowedTypes = ['application/pdf', 'application/msword', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
|
|
|
|
if (file.size > maxSize) {
|
|
return { valid: false, error: 'File too large' };
|
|
}
|
|
|
|
if (!allowedTypes.includes(file.type)) {
|
|
return { valid: false, error: 'Invalid file type' };
|
|
}
|
|
|
|
return { valid: true };
|
|
};
|
|
|
|
// Storage error handling
|
|
const uploadWithRetry = async (file: File, maxRetries = 3) => {
|
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
|
try {
|
|
const result = await uploadToStorage(file);
|
|
return result;
|
|
} catch (error) {
|
|
if (attempt === maxRetries) throw error;
|
|
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
|
|
}
|
|
}
|
|
};
|
|
```
|
|
|
|
#### **Problem**: Upload progress stalls
|
|
**Symptoms**:
|
|
- Progress bar stops advancing
|
|
- No error messages
|
|
- Upload appears to hang
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Implement upload timeout
|
|
const uploadWithTimeout = async (file: File, timeoutMs = 300000) => {
|
|
const uploadPromise = uploadToStorage(file);
|
|
const timeoutPromise = new Promise((_, reject) => {
|
|
setTimeout(() => reject(new Error('Upload timeout')), timeoutMs);
|
|
});
|
|
|
|
return Promise.race([uploadPromise, timeoutPromise]);
|
|
};
|
|
|
|
// Add progress monitoring
|
|
const monitorUploadProgress = (uploadTask: any, onProgress: (progress: number) => void) => {
|
|
uploadTask.on('state_changed',
|
|
(snapshot: any) => {
|
|
const progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
|
|
onProgress(progress);
|
|
},
|
|
(error: any) => {
|
|
console.error('Upload error:', error);
|
|
},
|
|
() => {
|
|
onProgress(100);
|
|
}
|
|
);
|
|
};
|
|
```
|
|
|
|
### Document Processing Issues
|
|
|
|
#### **Problem**: Document processing fails
|
|
**Symptoms**:
|
|
- Documents stuck in "processing" status
|
|
- AI processing errors
|
|
- PDF generation failures
|
|
|
|
**Diagnostic Steps**:
|
|
1. Check Document AI service status
|
|
2. Verify LLM API credentials
|
|
3. Review processing logs
|
|
4. Check system resources
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Enhanced error handling for Document AI
|
|
const processWithFallback = async (document: Document) => {
|
|
try {
|
|
// Try Document AI first
|
|
const result = await processWithDocumentAI(document);
|
|
return result;
|
|
} catch (error) {
|
|
logger.warn('Document AI failed, trying fallback', { error: error.message });
|
|
|
|
// Fallback to local processing
|
|
try {
|
|
const result = await processWithLocalParser(document);
|
|
return result;
|
|
} catch (fallbackError) {
|
|
logger.error('Both Document AI and fallback failed', {
|
|
documentAIError: error.message,
|
|
fallbackError: fallbackError.message
|
|
});
|
|
throw new Error('Document processing failed');
|
|
}
|
|
}
|
|
};
|
|
|
|
// LLM service error handling
|
|
const callLLMWithRetry = async (prompt: string, maxRetries = 3) => {
|
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
|
try {
|
|
const response = await callLLM(prompt);
|
|
return response;
|
|
} catch (error) {
|
|
if (attempt === maxRetries) throw error;
|
|
|
|
// Exponential backoff
|
|
const delay = Math.pow(2, attempt) * 1000;
|
|
await new Promise(resolve => setTimeout(resolve, delay));
|
|
}
|
|
}
|
|
};
|
|
```
|
|
|
|
#### **Problem**: PDF generation fails
|
|
**Symptoms**:
|
|
- PDF generation errors
|
|
- Missing PDF files
|
|
- Generation timeout
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// PDF generation with error handling
|
|
const generatePDFWithRetry = async (content: string, maxRetries = 3) => {
|
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
|
try {
|
|
const pdf = await generatePDF(content);
|
|
return pdf;
|
|
} catch (error) {
|
|
if (attempt === maxRetries) throw error;
|
|
|
|
// Clear browser cache and retry
|
|
await clearBrowserCache();
|
|
await new Promise(resolve => setTimeout(resolve, 2000));
|
|
}
|
|
}
|
|
};
|
|
|
|
// Browser resource management
|
|
const clearBrowserCache = async () => {
|
|
try {
|
|
await browser.close();
|
|
await browser.launch();
|
|
} catch (error) {
|
|
logger.error('Failed to clear browser cache', { error: error.message });
|
|
}
|
|
};
|
|
```
|
|
|
|
### Database Issues
|
|
|
|
#### **Problem**: Database connection failures
|
|
**Symptoms**:
|
|
- API errors with database connection messages
|
|
- Slow response times
|
|
- Connection pool exhaustion
|
|
|
|
**Diagnostic Steps**:
|
|
1. Check Supabase service status
|
|
2. Verify database credentials
|
|
3. Check connection pool settings
|
|
4. Review query performance
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Connection pool management
|
|
const createConnectionPool = () => {
|
|
return new Pool({
|
|
connectionString: process.env.DATABASE_URL,
|
|
max: 20, // Maximum number of connections
|
|
idleTimeoutMillis: 30000, // Close idle connections after 30 seconds
|
|
connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established
|
|
});
|
|
};
|
|
|
|
// Query timeout handling
|
|
const executeQueryWithTimeout = async (query: string, params: any[], timeoutMs = 5000) => {
|
|
const client = await pool.connect();
|
|
|
|
try {
|
|
const result = await Promise.race([
|
|
client.query(query, params),
|
|
new Promise((_, reject) =>
|
|
setTimeout(() => reject(new Error('Query timeout')), timeoutMs)
|
|
)
|
|
]);
|
|
|
|
return result;
|
|
} finally {
|
|
client.release();
|
|
}
|
|
};
|
|
```
|
|
|
|
#### **Problem**: Slow database queries
|
|
**Symptoms**:
|
|
- Long response times
|
|
- Database timeout errors
|
|
- High CPU usage
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Query optimization
|
|
const optimizeQuery = (query: string) => {
|
|
// Add proper indexes
|
|
// Use query planning
|
|
// Implement pagination
|
|
return query;
|
|
};
|
|
|
|
// Implement query caching
|
|
const queryCache = new Map();
|
|
|
|
const cachedQuery = async (key: string, queryFn: () => Promise<any>, ttlMs = 300000) => {
|
|
const cached = queryCache.get(key);
|
|
if (cached && Date.now() - cached.timestamp < ttlMs) {
|
|
return cached.data;
|
|
}
|
|
|
|
const data = await queryFn();
|
|
queryCache.set(key, { data, timestamp: Date.now() });
|
|
return data;
|
|
};
|
|
```
|
|
|
|
### Performance Issues
|
|
|
|
#### **Problem**: Slow application response
|
|
**Symptoms**:
|
|
- High response times
|
|
- Timeout errors
|
|
- User complaints about slowness
|
|
|
|
**Diagnostic Steps**:
|
|
1. Monitor CPU and memory usage
|
|
2. Check database query performance
|
|
3. Review external service response times
|
|
4. Analyze request patterns
|
|
|
|
**Solutions**:
|
|
```typescript
|
|
// Performance monitoring
|
|
const performanceMiddleware = (req: Request, res: Response, next: NextFunction) => {
|
|
const start = Date.now();
|
|
|
|
res.on('finish', () => {
|
|
const duration = Date.now() - start;
|
|
|
|
if (duration > 5000) {
|
|
logger.warn('Slow request detected', {
|
|
method: req.method,
|
|
path: req.path,
|
|
duration,
|
|
userAgent: req.get('User-Agent')
|
|
});
|
|
}
|
|
});
|
|
|
|
next();
|
|
};
|
|
|
|
// Implement caching
|
|
const cacheMiddleware = (ttlMs = 300000) => {
|
|
const cache = new Map();
|
|
|
|
return (req: Request, res: Response, next: NextFunction) => {
|
|
const key = `${req.method}:${req.path}:${JSON.stringify(req.query)}`;
|
|
const cached = cache.get(key);
|
|
|
|
if (cached && Date.now() - cached.timestamp < ttlMs) {
|
|
return res.json(cached.data);
|
|
}
|
|
|
|
const originalSend = res.json;
|
|
res.json = function(data) {
|
|
cache.set(key, { data, timestamp: Date.now() });
|
|
return originalSend.call(this, data);
|
|
};
|
|
|
|
next();
|
|
};
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Debugging Tools
|
|
|
|
### Log Analysis
|
|
|
|
#### **Structured Logging**
|
|
```typescript
|
|
// Enhanced logging
|
|
const logger = winston.createLogger({
|
|
level: 'info',
|
|
format: winston.format.combine(
|
|
winston.format.timestamp(),
|
|
winston.format.errors({ stack: true }),
|
|
winston.format.json()
|
|
),
|
|
defaultMeta: {
|
|
service: 'cim-processor',
|
|
version: process.env.APP_VERSION,
|
|
environment: process.env.NODE_ENV
|
|
},
|
|
transports: [
|
|
new winston.transports.File({ filename: 'error.log', level: 'error' }),
|
|
new winston.transports.File({ filename: 'combined.log' }),
|
|
new winston.transports.Console({
|
|
format: winston.format.simple()
|
|
})
|
|
]
|
|
});
|
|
```
|
|
|
|
#### **Log Analysis Commands**
|
|
```bash
|
|
# Find errors in logs
|
|
grep -i "error" logs/combined.log | tail -20
|
|
|
|
# Find slow requests
|
|
grep "duration.*[5-9][0-9][0-9][0-9]" logs/combined.log
|
|
|
|
# Find authentication failures
|
|
grep -i "auth.*fail" logs/combined.log
|
|
|
|
# Monitor real-time logs
|
|
tail -f logs/combined.log | grep -E "(error|warn|critical)"
|
|
```
|
|
|
|
### Debug Endpoints
|
|
|
|
#### **Debug Information Endpoint**
|
|
```typescript
|
|
// routes/debug.ts
|
|
router.get('/debug/info', async (req: Request, res: Response) => {
|
|
const debugInfo = {
|
|
timestamp: new Date().toISOString(),
|
|
environment: process.env.NODE_ENV,
|
|
version: process.env.APP_VERSION,
|
|
uptime: process.uptime(),
|
|
memory: process.memoryUsage(),
|
|
cpu: process.cpuUsage(),
|
|
services: {
|
|
database: await checkDatabaseHealth(),
|
|
storage: await checkStorageHealth(),
|
|
auth: await checkAuthHealth()
|
|
}
|
|
};
|
|
|
|
res.json(debugInfo);
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Troubleshooting Checklist
|
|
|
|
### Pre-Incident Preparation
|
|
- [ ] Set up monitoring and alerting
|
|
- [ ] Configure structured logging
|
|
- [ ] Create runbooks for common issues
|
|
- [ ] Establish escalation procedures
|
|
- [ ] Document system architecture
|
|
|
|
### During Incident Response
|
|
- [ ] Assess impact and scope
|
|
- [ ] Check system health endpoints
|
|
- [ ] Review recent logs and metrics
|
|
- [ ] Identify root cause
|
|
- [ ] Implement immediate fix
|
|
- [ ] Communicate with stakeholders
|
|
- [ ] Monitor system recovery
|
|
|
|
### Post-Incident Review
|
|
- [ ] Document incident timeline
|
|
- [ ] Analyze root cause
|
|
- [ ] Review response effectiveness
|
|
- [ ] Update procedures and documentation
|
|
- [ ] Implement preventive measures
|
|
- [ ] Schedule follow-up review
|
|
|
|
---
|
|
|
|
## 🛠️ Maintenance Procedures
|
|
|
|
### Regular Maintenance Tasks
|
|
|
|
#### **Daily Tasks**
|
|
- [ ] Review system health metrics
|
|
- [ ] Check error logs for new issues
|
|
- [ ] Monitor performance trends
|
|
- [ ] Verify backup systems
|
|
|
|
#### **Weekly Tasks**
|
|
- [ ] Review alert effectiveness
|
|
- [ ] Analyze performance metrics
|
|
- [ ] Update monitoring thresholds
|
|
- [ ] Review security logs
|
|
|
|
#### **Monthly Tasks**
|
|
- [ ] Performance optimization review
|
|
- [ ] Capacity planning assessment
|
|
- [ ] Security audit
|
|
- [ ] Documentation updates
|
|
|
|
### Preventive Maintenance
|
|
|
|
#### **System Optimization**
|
|
```typescript
|
|
// Regular cleanup tasks
|
|
const performMaintenance = async () => {
|
|
// Clean up old logs
|
|
await cleanupOldLogs();
|
|
|
|
// Clear expired cache entries
|
|
await clearExpiredCache();
|
|
|
|
// Optimize database
|
|
await optimizeDatabase();
|
|
|
|
// Update system metrics
|
|
await updateSystemMetrics();
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## 📞 Support and Escalation
|
|
|
|
### Support Levels
|
|
|
|
#### **Level 1: Basic Support**
|
|
- User authentication issues
|
|
- Basic configuration problems
|
|
- Common error messages
|
|
|
|
#### **Level 2: Technical Support**
|
|
- System performance issues
|
|
- Database problems
|
|
- Integration issues
|
|
|
|
#### **Level 3: Advanced Support**
|
|
- Complex system failures
|
|
- Security incidents
|
|
- Architecture problems
|
|
|
|
### Escalation Procedures
|
|
|
|
#### **Escalation Criteria**
|
|
- System downtime > 15 minutes
|
|
- Data loss or corruption
|
|
- Security breaches
|
|
- Performance degradation > 50%
|
|
|
|
#### **Escalation Contacts**
|
|
- **Primary**: Operations Team Lead
|
|
- **Secondary**: System Administrator
|
|
- **Emergency**: CTO/Technical Director
|
|
|
|
---
|
|
|
|
This comprehensive troubleshooting guide provides the tools and procedures needed to quickly identify and resolve issues in the CIM Document Processor, ensuring high availability and user satisfaction. |