Files
HomeAudit/dev_documentation/QUICK_START.md
admin 705a2757c1 Major infrastructure migration and Vaultwarden PostgreSQL troubleshooting
COMPREHENSIVE CHANGES:

INFRASTRUCTURE MIGRATION:
- Migrated services to Docker Swarm on OMV800 (192.168.50.229)
- Deployed PostgreSQL database for Vaultwarden migration
- Updated all stack configurations for Docker Swarm compatibility
- Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox)
- Implemented proper secret management for all services

VAULTWARDEN POSTGRESQL MIGRATION:
- Attempted migration from SQLite to PostgreSQL for NFS compatibility
- Created PostgreSQL stack with proper user/password configuration
- Built custom Vaultwarden image with PostgreSQL support
- Troubleshot persistent SQLite fallback issue despite PostgreSQL config
- Identified known issue where Vaultwarden silently falls back to SQLite
- Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues
- Current status: Old Vaultwarden on lenovo410 still working, new one has config issues

PAPERLESS SERVICES:
- Successfully deployed Paperless-NGX and Paperless-AI on OMV800
- Both services running on ports 8000 and 3000 respectively
- Caddy configuration updated for external access
- Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org

CADDY CONFIGURATION:
- Updated Caddyfile on Surface (192.168.50.254) for new service locations
- Fixed Vaultwarden reverse proxy to point to new Docker Swarm service
- Removed old notification hub reference that was causing conflicts
- All services properly configured for external access via DuckDNS

BACKUP AND DISCOVERY:
- Created comprehensive backup system for all hosts
- Generated detailed discovery reports for infrastructure analysis
- Implemented automated backup validation scripts
- Created migration progress tracking and verification reports

MONITORING STACK:
- Deployed Prometheus, Grafana, and Blackbox monitoring
- Created infrastructure and system overview dashboards
- Added proper service discovery and alerting configuration
- Implemented performance monitoring for all critical services

DOCUMENTATION:
- Reorganized documentation into logical structure
- Created comprehensive migration playbook and troubleshooting guides
- Added hardware specifications and optimization recommendations
- Documented all configuration changes and service dependencies

CURRENT STATUS:
- Paperless services:  Working and accessible externally
- Vaultwarden:  PostgreSQL configuration issues, old instance still working
- Monitoring:  Deployed and operational
- Caddy:  Updated and working for external access
- PostgreSQL:  Database running, connection issues with Vaultwarden

NEXT STEPS:
- Continue troubleshooting Vaultwarden PostgreSQL configuration
- Consider alternative approaches for Vaultwarden migration
- Validate all external service access
- Complete final migration validation

TECHNICAL NOTES:
- Used Docker Swarm for orchestration on OMV800
- Implemented proper secret management for sensitive data
- Added comprehensive logging and monitoring
- Created automated backup and validation scripts
2025-08-30 20:18:44 -04:00

249 lines
7.3 KiB
Markdown

# QUICK START GUIDE - HOMEAUDIT MIGRATION
**Generated:** 2025-08-29
**Status:** READY FOR SERVICE MIGRATION - 99% Complete
---
## 🎯 **PROJECT OVERVIEW**
**Home infrastructure migration to Docker Swarm with optimized service distribution.** All critical infrastructure is now in place and ready for service migration.
---
## 📊 **CURRENT STATUS DASHBOARD**
### **✅ COMPLETED INFRASTRUCTURE**
- **Docker Swarm**: All 6 nodes joined and labeled ✅
- **Caddy Reverse Proxy**: Deployed and secured on surface ✅
- **Storage Configuration**: SMB/NFS hybrid complete ✅
- **Service Analysis**: Complete with security hardening ✅
- **Node Renaming**: lenovo410 (formerly jonathan-2518f5u) ✅
- **Backup Infrastructure**: Comprehensive system with RAID-1 ✅
### **🔄 NEXT STEPS**
- **Service Migration**: Move services to Docker Swarm
- **Database Services**: Deploy PostgreSQL and MariaDB
- **Monitoring Stack**: Deploy Grafana + Netdata
- **GPU Acceleration**: Configure for Jellyfin/Immich
- **Paperless Services**: ✅ Both Paperless-NGX and Paperless-AI now running on OMV800
---
## 🏗️ **INFRASTRUCTURE ARCHITECTURE**
### **Docker Swarm Nodes:**
```
OMV800 (Manager) - role=storage, cpu=high, memory=high, gpu=false
fedora - role=compute, cpu=medium, memory=medium, gpu=false
lenovo410 - role=compute, cpu=medium, memory=medium, gpu=false
audrey - role=compute, cpu=medium, memory=medium, gpu=false
surface - role=compute, cpu=medium, memory=medium, gpu=false
lenovo420 - role=ai-ml, cpu=high, memory=high, gpu=true
```
### **Networks:**
- **swarm-public**: Overlay network for service communication
- **database-network**: For database services
- **monitoring-network**: For monitoring services
- **ingress**: For ingress traffic
### **Reverse Proxy:**
- **Caddy**: Running on surface (192.168.50.254)
- **SSL**: Automatic certificates via DuckDNS
- **Security**: High-risk services removed from external access
### **Storage Infrastructure:**
- **SMB/NFS Hybrid**: Both protocols available
- **Exports Available**: adguard, appflowy, caddy, homeassistant, immich, jellyfin, media, nextcloud, ollama, paperless, vaultwarden
- **Permissions**: Properly configured for service access
### **Backup Infrastructure:**
- **Primary Storage**: raspberrypi with 7.3TB RAID-1 array
- **Automated Backups**: Comprehensive backup system with validation
- **Offsite Capability**: Cloud integration ready
- **Restoration Testing**: Automated verification procedures
- **Discovery Complete**: Comprehensive backup targets identified
- **Backup Size**: 1-15GB estimated total
- **Critical Data**: Databases, volumes, configurations, secrets, user data
---
## 🚀 **IMMEDIATE ACTIONS**
### **1. Deploy Database Services**
```bash
# Deploy PostgreSQL and MariaDB on OMV800
ssh root@omv800.local "cd /opt/stacks/databases && docker stack deploy -c postgresql.yml databases"
ssh root@omv800.local "cd /opt/stacks/databases && docker stack deploy -c mariadb.yml databases"
```
### **2. Migrate Services to Swarm**
```bash
# Start with simple services first
ssh root@omv800.local "cd /opt/stacks/apps && docker stack deploy -c jellyfin.yml media"
```
### **3. Deploy Monitoring**
```bash
# Deploy basic monitoring stack
ssh root@omv800.local "cd /opt/stacks/monitoring && docker stack deploy -c grafana.yml monitoring"
```
---
## 🔧 **DEVELOPMENT WORKFLOW**
### **Service Deployment Process:**
1. **Test locally** with docker-compose
2. **Convert to stack** format
3. **Deploy to swarm** with proper labels
4. **Update Caddy** if needed
5. **Test access** via domain
### **Configuration Management:**
- **Stack files**: `/opt/stacks/` on OMV800
- **Secrets**: Docker Swarm secrets
- **Volumes**: NFS/SMB mounts from OMV800
- **Networks**: Overlay networks for service communication
---
## 📋 **ESSENTIAL FILES**
### **Infrastructure:**
- `dev_documentation/infrastructure/SERVICE_ANALYSIS_AND_CADDYFILE.md` - Service mapping and routing
- `dev_documentation/infrastructure/HARDWARE_SPECIFICATIONS.md` - Hardware details
- `dev_documentation/infrastructure/COMPREHENSIVE_END_STATE_ANALYSIS.md` - Optimization strategy
### **Migration:**
- `dev_documentation/migration/COMPREHENSIVE_MIGRATION_ISSUES_REPORT.md` - Migration status
- `migration_scripts/scripts/` - Automation scripts
- `stacks/` - Docker Swarm stack files
### **Monitoring:**
- `dev_documentation/monitoring/` - Monitoring configuration
- `configs/monitoring/` - Prometheus/Grafana configs
---
## 🛠️ **COMMON TASKS**
### **Deploy a New Service:**
```bash
# 1. Create stack file
vim /opt/stacks/apps/newservice.yml
# 2. Deploy to swarm
docker stack deploy -c newservice.yml apps
# 3. Update Caddy if needed
scp caddyfile.txt jon@192.168.50.254:/tmp/
ssh jon@192.168.50.254 "sudo cp /tmp/caddyfile.txt /etc/caddy/Caddyfile && sudo systemctl reload caddy"
```
### **Check Service Status:**
```bash
# Check all services
ssh root@omv800.local "docker service ls"
# Check specific service
ssh root@omv800.local "docker service ps servicename"
# Check logs
ssh root@omv800.local "docker service logs servicename"
```
### **Scale Services:**
```bash
# Scale a service
ssh root@omv800.local "docker service scale servicename=3"
# Update service
ssh root@omv800.local "docker service update --image newimage:tag servicename"
```
---
## 🚨 **EMERGENCY PROCEDURES**
### **Service Down:**
```bash
# Check service status
ssh root@omv800.local "docker service ls"
# Restart service
ssh root@omv800.local "docker service update --force servicename"
# Check logs
ssh root@omv800.local "docker service logs servicename"
```
### **Node Issues:**
```bash
# Check node status
ssh root@omv800.local "docker node ls"
# Drain node (move services away)
ssh root@omv800.local "docker node update --availability drain nodename"
# Remove node
ssh root@omv800.local "docker node rm nodename"
```
### **Caddy Issues:**
```bash
# Check Caddy status
ssh jon@192.168.50.254 "sudo systemctl status caddy"
# Restart Caddy
ssh jon@192.168.50.254 "sudo systemctl restart caddy"
# Check logs
ssh jon@192.168.50.254 "sudo journalctl -u caddy -f"
```
---
## ⚠️ **IMPORTANT WARNINGS**
### **Security:**
- **Never expose** system management interfaces externally
- **Use secrets** for all passwords and API keys
- **Keep AdGuard Home** local-only for DNS security
- **Monitor access** to sensitive services
### **Data Safety:**
- **Backup before** major changes
- **Test migrations** on non-critical services first
- **Verify data integrity** after service moves
- **Keep original** configurations as backup
### **Performance:**
- **Monitor resource usage** during migration
- **Scale gradually** to avoid overwhelming nodes
- **Test under load** before going live
- **Have rollback plan** ready
---
## 📞 **SUPPORT CONTACTS**
### **Infrastructure:**
- **OMV800**: Primary storage and database host
- **surface**: Caddy reverse proxy
- **lenovo410**: Home automation services
- **lenovo420**: AI/ML processing
- **audrey**: Monitoring services
- **fedora**: Development and automation
### **Access Methods:**
- **SSH**: Use inventory.ini for correct usernames
- **Web**: Services accessible via Caddy domains
- **Monitoring**: Uptime Kuma for service status
---
**Status: READY FOR SERVICE MIGRATION** 🚀
**Last Updated:** 2025-08-29
**Next Review:** After database deployment