Files
HomeAudit/PAPERLESS_AI_DATABASE_ISSUE_FIX.md
admin 45363040f3 feat: Complete infrastructure cleanup phase documentation and status updates
## Major Infrastructure Milestones Achieved

###  Service Migrations Completed
- Jellyfin: Successfully migrated to Docker Swarm with latest version
- Vaultwarden: Running in Docker Swarm on OMV800 (eliminated duplicate)
- Nextcloud: Operational with database optimization and cron setup
- Paperless services: Both NGX and AI running successfully

### 🚨 Duplicate Service Analysis Complete
- Identified MariaDB conflict (OMV800 Swarm vs lenovo410 standalone)
- Identified Vaultwarden duplication (now resolved)
- Documented PostgreSQL and Redis consolidation opportunities
- Mapped monitoring stack optimization needs

### 🏗️ Infrastructure Status Documentation
- Updated README with current cleanup phase status
- Enhanced Service Analysis with duplicate service inventory
- Updated Quick Start guide with immediate action items
- Documented current container distribution across 6 nodes

### 📋 Action Plan Documentation
- Phase 1: Immediate service conflict resolution (this week)
- Phase 2: Service migration and load balancing (next 2 weeks)
- Phase 3: Database consolidation and optimization (future)

### 🔧 Current Infrastructure Health
- Docker Swarm: All 6 nodes operational and healthy
- Caddy Reverse Proxy: Fully operational with SSL certificates
- Storage: MergerFS healthy, local storage for databases
- Monitoring: Prometheus + Grafana + Uptime Kuma operational

### 📊 Container Distribution Status
- OMV800: 25+ containers (needs load balancing)
- lenovo410: 9 containers (cleanup in progress)
- fedora: 1 container (ready for additional services)
- audrey: 4 containers (well-balanced, monitoring hub)
- lenovo420: 7 containers (balanced, can assist)
- surface: 9 containers (specialized, reverse proxy)

### 🎯 Next Steps
1. Remove lenovo410 MariaDB (eliminate port 3306 conflict)
2. Clean up lenovo410 Vaultwarden (256MB space savings)
3. Verify no service conflicts exist
4. Begin service migration from OMV800 to fedora/audrey

Status: Infrastructure 99% complete, entering cleanup and optimization phase
2025-09-01 16:50:37 -04:00

205 lines
5.7 KiB
Markdown

# Paperless AI Database Issue - Complete Fix
## 🚨 Problem Summary
You're experiencing a database issue where **Paperless AI and Paperless-ngx are using different databases**, causing tags and titles applied by Paperless AI to not match the documents in Paperless-ngx.
## 🔍 Root Cause Analysis
### **Database Mismatch**
- **Paperless-ngx**: Uses PostgreSQL with host `postgresql_postgresql_primary`
- **Paperless AI**: Uses its own local database in `/app/data`
### **Configuration Differences**
- **Paperless-ngx**: Properly configured with external PostgreSQL database
- **Paperless AI**: Uses `network_mode: bridge` and doesn't connect to the same database
### **Missing Integration**
- Paperless AI lacks proper environment variables to connect to Paperless-ngx
- No shared database connection between the two services
- Different network configurations preventing proper communication
## 🛠️ Complete Solution
### **1. New Paperless AI Configuration**
I've created a new configuration file: `stacks/ai/paperless-ai.yml`
**Key Features:**
- ✅ Connects to the same PostgreSQL database as Paperless-ngx
- ✅ Uses the same Redis instance
- ✅ Shares the same network configuration
- ✅ Proper environment variable configuration
- ✅ Health checks and monitoring
- ✅ Secure secrets management
### **2. Setup Scripts**
#### **Diagnostic Script**
```bash
./scripts/diagnose_paperless_issues.sh
```
- Analyzes current configuration
- Identifies specific issues
- Provides detailed recommendations
#### **Quick Fix Script**
```bash
./scripts/quick_fix_paperless_ai.sh
```
- Stops problematic containers
- Creates backups
- Sets up proper integration
#### **Complete Setup Script**
```bash
./scripts/setup_paperless_ai_integration.sh
```
- Interactive configuration
- Environment file creation
- Deployment automation
### **3. Environment Configuration**
The new setup requires proper environment variables:
```bash
# Paperless-ngx Connection
PAPERLESS_URL=https://paperless.pressmess.duckdns.org
PAPERLESS_USERNAME=admin
PAPERLESS_PASSWORD=your_password
# Database Connection (same as Paperless-ngx)
PAPERLESS_DBHOST=postgresql_postgresql_primary
PAPERLESS_DBNAME=paperless
PAPERLESS_DBUSER=postgres
PAPERLESS_DBPASS_FILE=/run/secrets/pg_root_password
# AI Provider (configure at least one)
OPENAI_API_KEY=your_openai_key
OLLAMA_BASE_URL=http://ollama:11434
DEEPSEEK_API_KEY=your_deepseek_key
```
## 🚀 Implementation Steps
### **Step 1: Run Diagnostic**
```bash
./scripts/diagnose_paperless_issues.sh
```
### **Step 2: Quick Fix (Immediate)**
```bash
./scripts/quick_fix_paperless_ai.sh
```
### **Step 3: Complete Setup**
```bash
./scripts/setup_paperless_ai_integration.sh
```
### **Step 4: Deploy**
```bash
cd stacks/ai
docker-compose -f paperless-ai.yml --env-file .env up -d
```
### **Step 5: Verify**
```bash
./scripts/verify_paperless_ai.sh
```
## 🔧 Configuration Details
### **Database Integration**
- Both services now use the same PostgreSQL database
- Shared Redis instance for caching and messaging
- Proper network connectivity between containers
### **Document Processing**
- Paperless AI can access the same document storage
- Tags and titles are applied directly to the shared database
- Real-time synchronization between services
### **Security**
- Uses Docker secrets for sensitive data
- Proper network isolation
- Secure API token management
## 📊 Expected Results
After implementing this fix:
1. **✅ Unified Database**: Both services use the same PostgreSQL database
2. **✅ Synchronized Tags**: Tags applied by Paperless AI appear in Paperless-ngx
3. **✅ Consistent Titles**: Document titles are properly synchronized
4. **✅ Real-time Updates**: Changes are immediately visible in both interfaces
5. **✅ Proper Integration**: Seamless communication between services
## 🛡️ Backup and Recovery
### **Automatic Backups**
- Current Paperless AI data is automatically backed up
- Backup location: `backups/paperless-ai-YYYYMMDD_HHMMSS/`
- Includes all configuration and data
### **Rollback Procedure**
If issues occur:
```bash
# Stop new configuration
cd stacks/ai
docker-compose -f paperless-ai.yml down
# Restore from backup
tar xzf backups/paperless-ai-YYYYMMDD_HHMMSS/paperless-ai-data-backup.tar.gz
```
## 🔍 Monitoring and Troubleshooting
### **Health Checks**
- Container health monitoring
- Database connectivity verification
- API endpoint testing
### **Logs and Debugging**
```bash
# View Paperless AI logs
docker-compose -f stacks/ai/paperless-ai.yml logs -f
# View Paperless-ngx logs
docker logs paperless
# Check database connectivity
docker exec paperless-ai pg_isready -h postgresql_postgresql_primary
```
### **Common Issues and Solutions**
| Issue | Solution |
|-------|----------|
| Database connection failed | Verify PostgreSQL container is running |
| API authentication failed | Check PAPERLESS_USERNAME/PAPERLESS_PASSWORD |
| AI processing not working | Configure at least one AI provider API key |
| Network connectivity issues | Ensure both containers are on same network |
## 📚 Additional Resources
- **Paperless AI Documentation**: https://github.com/clusterzx/paperless-ai
- **Paperless-ngx API Documentation**: https://docs.paperless-ngx.com/api/
- **Docker Compose Documentation**: https://docs.docker.com/compose/
## 🎯 Success Criteria
The fix is successful when:
- [ ] Paperless AI container starts without errors
- [ ] Database connectivity is established
- [ ] API authentication works
- [ ] Tags applied by Paperless AI appear in Paperless-ngx
- [ ] Document titles are properly synchronized
- [ ] Health checks pass
- [ ] No error messages in logs
---
**Note**: This solution ensures that Paperless AI and Paperless-ngx work together as a unified document management system with proper database synchronization and real-time updates.