COMPREHENSIVE CHANGES: INFRASTRUCTURE MIGRATION: - Migrated services to Docker Swarm on OMV800 (192.168.50.229) - Deployed PostgreSQL database for Vaultwarden migration - Updated all stack configurations for Docker Swarm compatibility - Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox) - Implemented proper secret management for all services VAULTWARDEN POSTGRESQL MIGRATION: - Attempted migration from SQLite to PostgreSQL for NFS compatibility - Created PostgreSQL stack with proper user/password configuration - Built custom Vaultwarden image with PostgreSQL support - Troubleshot persistent SQLite fallback issue despite PostgreSQL config - Identified known issue where Vaultwarden silently falls back to SQLite - Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues - Current status: Old Vaultwarden on lenovo410 still working, new one has config issues PAPERLESS SERVICES: - Successfully deployed Paperless-NGX and Paperless-AI on OMV800 - Both services running on ports 8000 and 3000 respectively - Caddy configuration updated for external access - Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org CADDY CONFIGURATION: - Updated Caddyfile on Surface (192.168.50.254) for new service locations - Fixed Vaultwarden reverse proxy to point to new Docker Swarm service - Removed old notification hub reference that was causing conflicts - All services properly configured for external access via DuckDNS BACKUP AND DISCOVERY: - Created comprehensive backup system for all hosts - Generated detailed discovery reports for infrastructure analysis - Implemented automated backup validation scripts - Created migration progress tracking and verification reports MONITORING STACK: - Deployed Prometheus, Grafana, and Blackbox monitoring - Created infrastructure and system overview dashboards - Added proper service discovery and alerting configuration - Implemented performance monitoring for all critical services DOCUMENTATION: - Reorganized documentation into logical structure - Created comprehensive migration playbook and troubleshooting guides - Added hardware specifications and optimization recommendations - Documented all configuration changes and service dependencies CURRENT STATUS: - Paperless services: ✅ Working and accessible externally - Vaultwarden: ❌ PostgreSQL configuration issues, old instance still working - Monitoring: ✅ Deployed and operational - Caddy: ✅ Updated and working for external access - PostgreSQL: ✅ Database running, connection issues with Vaultwarden NEXT STEPS: - Continue troubleshooting Vaultwarden PostgreSQL configuration - Consider alternative approaches for Vaultwarden migration - Validate all external service access - Complete final migration validation TECHNICAL NOTES: - Used Docker Swarm for orchestration on OMV800 - Implemented proper secret management for sensitive data - Added comprehensive logging and monitoring - Created automated backup and validation scripts
33 KiB
33 KiB
Home Lab Network Architecture Diagrams
Current Infrastructure State (As-Is)
┌─────────────────────────────────────────────────────────────┐
│ CURRENT NETWORK TOPOLOGY │
│ 192.168.50.0/24 │
└─────────────────────────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ Router/Gateway │
│ 192.168.50.1 │
└─────────┬─────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌───────▼────────┐ ┌────────▼────────┐ ┌───────▼────────┐
│ TAILSCALE │ │ LOCAL NETWORK │ │ INTERNET │
│ MESH VPN │ │ ETHERNET/WiFi │ │ CONNECTION │
│ │ │ │ │ │
└────────────────┘ └─────────────────┘ └────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ DEVICE LAYOUT │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ OMV800 │ │ FEDORA │ │ LENOVO420 │ │ LENOVO │
│ (Primary NAS) │ │ (Workstation) │ │ (Secondary) │ │ (Workstation) │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ IP: .229/112 │ │ IP: .28/.21 │ │ IP: .66/.95 │ │ IP: .181/.80 │
│ CPU: i5-6400 │ │ CPU: N95 │ │ CPU: i5-2520M │ │ CPU: i5 M540 │
│ RAM: 31GB │ │ RAM: 15GB │ │ RAM: 15GB │ │ RAM: 7.6GB │
│ OS: Debian 12 │ │ OS: Fedora 42 │ │ OS: Ubuntu 24 │ │ OS: Ubuntu 24 │
│ │ │ │ │ │ │ │
│ ⚠️ OVERLOADED │ │ ✅ UNDERUSED │ │ ✅ BALANCED │ │ ⚠️ OVERLOADED │
│ 19 CONTAINERS │ │ 1 CONTAINER │ │ 7 CONTAINERS │ │ 15 CONTAINERS │
│ │ │ │ │ │ │ │
│ • Immich │ │ • Portainer │ │ • Portainer │ │ • Home Assist │
│ • Jellyfin │ │ Agent │ │ Agent │ │ • ESPHome │
│ • Nextcloud │ │ │ │ • DuckDNS │ │ • N8N │
│ • Paperless │ │ │ │ • OpenWakeWord │ │ • Paperless │
│ • PostgreSQL │ │ │ │ • Whisper │ │ • MariaDB │
│ • Redis │ │ │ │ • Mosquitto │ │ • Redis │
│ • Vikunja │ │ │ │ • Omni-tools │ │ • Music Assist │
│ • Joplin │ │ │ │ • Filebrowser │ │ • Homeway │
│ • Traefik │ │ │ │ • Watchtower │ │ • Z-Wave JS UI │
│ • + 10 more... │ │ │ │ │ │ • + 6 more... │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ SURFACE │ │ OMVBACKUP │ │ COMPROMISED │ │ AUDREY │
│ (Portable) │ │ (Backup RPi) │ │ DEVICE │ │ (Offline RPi) │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ IP: .254/.97 │ │ IP: .107 │ │ IP: .81 │ │ IP: .45 │
│ CPU: i5-6300U │ │ CPU: ARM A72 │ │ MAC: cc:f7:35 │ │ Status: OFFLINE │
│ RAM: 7.7GB │ │ RAM: 906MB │ │ ❌ BLOCKED │ │ ❌ ISSUES │
│ OS: Ubuntu 24 │ │ OS: Debian 12 │ │ Amazon Device │ │ │
│ │ │ │ │ │ │ │
│ ✅ SPECIALIZED │ │ ✅ STORAGE ONLY │ │ 🚨 MALWARE │ │ │
│ 9 CONTAINERS │ │ 0 CONTAINERS │ │ DETECTED │ │ │
│ │ │ │ │ │ │ │
│ • AppFlowy │ │ • NFS Exports │ │ • Porn sites │ │ │
│ Cloud Stack │ │ • Backup Store │ │ • Malware DL │ │ │
│ • PostgreSQL │ │ • 7.3TB RAID │ │ • Firewall │ │ │
│ • Redis │ │ │ │ BLOCKED │ │ │
│ • Minio │ │ │ │ │ │ │
│ • Nginx │ │ │ │ │ │ │
│ • + 5 more... │ │ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ STORAGE TOPOLOGY │
└─────────────────────────────────────────────────────────────────────────────────────────┘
OMV800 Storage Pool OMVBackup Storage
┌─────────────────────┐ ┌─────────────────────┐
│ Primary Storage │ ←── Replication ──→ │ Backup Repository │
│ • 17TB DataPool │ (Real-time) │ • 7.3TB RAID Array │
│ • 456GB System SSD │ │ • Automated Backup │
│ • MergerFS Pool │ │ • NFS/SMB Exports │
│ • Multiple Drives │ │ • Redundancy Store │
└─────────────────────┘ └─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ NETWORK TRAFFIC FLOWS │
└─────────────────────────────────────────────────────────────────────────────────────────┘
Internet ←→ Router ←→ Local Network
↓
┌───────────────┐
│ High Traffic │
│ • Media │ ←── OMV800 (Jellyfin, Immich)
│ • Backups │ ←── All devices → OMVBackup
│ • IoT Data │ ←── Lenovo (Home Assistant)
└───────────────┘
↓
┌───────────────┐
│ Medium Traffic │
│ • Web Apps │ ←── Fedora, Surface
│ • Databases │ ←── Cross-device queries
│ • Monitoring│ ←── All devices
└───────────────┘
Proposed Optimized Architecture (To-Be)
┌─────────────────────────────────────────────────────────────┐
│ OPTIMIZED NETWORK TOPOLOGY │
│ Segmented VLANs + High Availability │
└─────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ Core Router │
│ 192.168.50.1 │
│ + VLAN Support │
└─────────┬───────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
┌───────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ VLAN 10 │ │ VLAN 20 │ │ VLAN 30 │
│ Core Services │ │ IoT & Smart │ │ Backup & │
│ .10.0/24 │ │ Home .20.0/24 │ │ Storage .30.0/24│
└────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ TIER-BASED SERVICE DISTRIBUTION │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────┐
│ TIER 1: CORE │
│ High Availability │
└─────────────────────────────────┘
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ OMV800 │ │ FEDORA │
│ Primary Storage Hub │ │ Application Server │
├─────────────────────────────────────┤ ├─────────────────────────────────────┤
│ VLAN: 10.229 | Tailscale: 112 │ │ VLAN: 10.28 | Tailscale: .21 │
│ CPU: i5-6400 (4c) | RAM: 31GB │ │ CPU: N95 (4c) | RAM: 15GB │
│ Role: Centralized Storage + Media │ │ Role: Compute + Development │
│ │ │ │
│ ✅ OPTIMIZED CONTAINERS (8-10): │ │ ✅ OPTIMIZED CONTAINERS (6-8): │
│ ┌─────────────────────────────────┐ │ │ ┌─────────────────────────────────┐ │
│ │ 📸 Immich (Photo Management) │ │ │ │ 🏠 Home Assistant (Smart Home) │ │
│ │ 🎬 Jellyfin (Media Server) │ │ │ │ ⚙️ N8N (Automation Workflows) │ │
│ │ ☁️ Nextcloud (File Sync) │ │ │ │ 📄 Paperless-NGX (Documents) │ │
│ │ 🗄️ MariaDB (Database Hub) │ │ │ │ 🔌 ESPHome (IoT Management) │ │
│ │ ⚡ Redis (Caching Layer) │ │ │ │ 💻 Code-Server (Development) │ │
│ │ 🌐 Traefik (Reverse Proxy) │ │ │ │ 📊 Monitoring (Prometheus) │ │
│ │ 🔄 Watchtower (Auto-updates) │ │ │ │ 📈 Grafana (Dashboards) │ │
│ │ 🐳 Portainer (Management) │ │ │ │ │ │
│ └─────────────────────────────────┘ │ │ └─────────────────────────────────┘ │
│ │ │ │
│ 💾 STORAGE OPTIMIZATION: │ │ 🎯 COMPUTE OPTIMIZATION: │
│ • 17TB: Media & Photos │ │ • CPU-intensive applications │
│ • 456GB SSD: Databases │ │ • Development environments │
│ • Real-time backup to OMVBackup │ │ • Monitoring & automation │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
┌─────────────────────────────────┐
│ TIER 2: SPECIALIZED │
│ Medium Priority │
└─────────────────────────────────┘
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ LENOVO420 │ │ SURFACE │
│ Backup & Monitoring │ │ Mobile & Cloud Services │
├─────────────────────────────────────┤ ├─────────────────────────────────────┤
│ VLAN: 30.66 | Tailscale: .95 │ │ VLAN: 10.254 | Tailscale: .97 │
│ CPU: i5-2520M (4c) | RAM: 15GB │ │ CPU: i5-6300U (4c) | RAM: 7.7GB │
│ Role: Backup Orchestration │ │ Role: Personal Productivity │
│ │ │ │
│ ✅ OPTIMIZED CONTAINERS (5-7): │ │ ✅ OPTIMIZED CONTAINERS (4-6): │
│ ┌─────────────────────────────────┐ │ │ ┌─────────────────────────────────┐ │
│ │ 🐳 Portainer Agent (Cluster) │ │ │ │ 📝 AppFlowy Cloud (Workspace) │ │
│ │ 💾 Backup Orchestration │ │ │ │ 🔐 VPN Server (Remote Access) │ │
│ │ 📊 Uptime Kuma (Monitoring) │ │ │ │ 🔑 Vaultwarden (Passwords) │ │
│ │ 🗨️ MQTT Broker (IoT Comms) │ │ │ │ 🔄 Syncthing (File Sync) │ │
│ │ 📹 Frigate (Security Cameras) │ │ │ │ 🌐 Lightweight Web Services │ │
│ │ 🛡️ Secondary Databases │ │ │ │ │ │
│ └─────────────────────────────────┘ │ │ └─────────────────────────────────┘ │
│ │ │ │
│ 🎯 BACKUP OPTIMIZATION: │ │ 🎯 MOBILITY OPTIMIZATION: │
│ • Centralized backup coordination │ │ • Personal productivity tools │
│ • Disaster recovery planning │ │ • Remote access capabilities │
│ • Security monitoring │ │ • Portable service deployment │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
┌─────────────────────────────────┐
│ TIER 3: SUPPORT │
│ Specialized Functions │
└─────────────────────────────────┘
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ LENOVO │ │ OMVBACKUP │
│ IoT & Smart Home Hub │ │ Backup Repository │
├─────────────────────────────────────┤ ├─────────────────────────────────────┤
│ VLAN: 20.181 | Tailscale: .80 │ │ VLAN: 30.107 | Physical only │
│ CPU: i5 M540 (4c) | RAM: 7.6GB │ │ CPU: ARM A72 (4c) | RAM: 906MB │
│ Role: Real-time IoT Processing │ │ Role: Data Safety & Recovery │
│ │ │ │
│ ✅ OPTIMIZED CONTAINERS (6-8): │ │ ✅ STORAGE SERVICES: │
│ ┌─────────────────────────────────┐ │ │ ┌─────────────────────────────────┐ │
│ │ 🌊 Z-Wave JS UI (Z-Wave) │ │ │ │ 💾 7.3TB RAID Array │ │
│ │ 🐝 Zigbee2MQTT (Zigbee) │ │ │ │ 📂 NFS/SMB Export Services │ │
│ │ 🎵 Music Assistant (Audio) │ │ │ │ 🔄 Real-time Backup Scripts │ │
│ │ 🎤 OpenWakeWord (Voice AI) │ │ │ │ ✅ Data Integrity Monitoring │ │
│ │ 🔗 Node-RED (IoT Automation) │ │ │ │ 📊 Basic Web Management │ │
│ │ 📊 InfluxDB (Time Series) │ │ │ │ 🛡️ Backup Verification │ │
│ │ 🗨️ Mosquitto MQTT │ │ │ │ │ │
│ └─────────────────────────────────┘ │ │ └─────────────────────────────────┘ │
│ │ │ │
│ 🎯 IOT OPTIMIZATION: │ │ 🎯 STORAGE OPTIMIZATION: │
│ • Low-latency device response │ │ • Continuous data protection │
│ • Real-time automation │ │ • Automated backup verification │
│ • Voice processing & AI │ │ • Disaster recovery ready │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ HIGH AVAILABILITY DESIGN │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ FAILOVER STRATEGY │
└─────────────────────────────────────────┘
OMV800 (Primary)
↓ ↑
┌─────────────────────┐
│ Load Balancer │ ←── Traefik/HAProxy
│ (Traefik HA) │
└─────────┬───────────┘
↓
┌─────────────────────┐
│ Service Mesh │
│ Discovery │ ←── Consul/etcd
└─────────┬───────────┘
↓
┌─────────────────┼─────────────────┐
↓ ↓ ↓
FEDORA (App Tier) LENOVO420 (Backup) SURFACE (Edge)
↓ ↓ ↓
[Auto-failover] [Secondary DBs] [Remote Access]
Database Replication Strategy:
┌─────────────────────────────────┐
│ Primary: OMV800 (MariaDB) │
│ ↓ Real-time sync │
│ Secondary: Lenovo420 (Replica) │
│ ↓ Backup sync │
│ Backup: OMVBackup (Cold) │
└─────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ PERFORMANCE METRICS │
└─────────────────────────────────────────────────────────────────────────────────────────┘
📊 CURRENT vs OPTIMIZED COMPARISON:
┌─────────────────┬─────────────────┬─────────────────┬─────────────────┐
│ METRIC │ CURRENT │ OPTIMIZED │ IMPROVEMENT │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ Resource Usage │ 70% avg │ 45% avg │ 35% reduction │
│ Container Density│ Unbalanced │ Optimized │ 40% better │
│ Failover Time │ Manual (hours) │ Auto (30 sec) │ 99.9% faster │
│ Media Response │ 2-5 seconds │ <1 second │ 3x improvement │
│ Backup Speed │ 4 hours │ 1.5 hours │ 62% faster │
│ Network Traffic │ High cross-dev │ Segmented │ 50% reduction │
│ Uptime │ 95% (SPOF) │ 99.9% (HA) │ 5x reliability │
│ Management │ 6 separate UIs │ Centralized │ Single pane │
└─────────────────┴─────────────────┴─────────────────┴─────────────────┘
🔄 MIGRATION TIMELINE:
Week 1: Core Infrastructure
├── Day 1-2: Set up VLAN segmentation
├── Day 3-4: Migrate critical services to OMV800
├── Day 5-7: Implement Traefik load balancing
Week 2: Service Distribution
├── Day 1-3: Move compute services to Fedora
├── Day 4-5: Consolidate IoT services to Lenovo
├── Day 6-7: Set up backup orchestration
Week 3: High Availability
├── Day 1-3: Implement database replication
├── Day 4-5: Configure automated failover
├── Day 6-7: Testing and optimization
🎯 SUCCESS METRICS:
• ✅ 99.9% service availability
• ✅ <30 second failover times
• ✅ 50% reduction in manual maintenance
• ✅ Centralized monitoring and management
• ✅ Improved security with network segmentation
Security Enhancement Overlay
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ SECURITY ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────┐
│ Internet │
│ Threats │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Firewall/IDS │
│ + Threat Intel │
└──────────┬──────────┘
│
┌────────────────▼────────────────┐
│ VPN Gateway │
│ (WireGuard/Tailscale) │
└────────────────┬────────────────┘
│
┌────────▼────────┐
│ DMZ Zone │
│ (Surface) │
└────────┬────────┘
│
┌────────────────────▼────────────────────┐
│ Internal Network │
│ Zero Trust Segmentation │
└─────────────────────────────────────────┘
🛡️ SECURITY ZONES:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ ZONE 1: DMZ │ │ ZONE 2: CORE │ │ ZONE 3: BACKUP │
│ (Surface) │ │ (OMV800,Fedora) │ │ (Lenovo420,Pi) │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ • Public facing │ │ • Internal apps │ │ • Data storage │
│ • VPN endpoint │ │ • Databases │ │ • Cold backups │
│ • Rate limited │ │ • Media serving │ │ • Air-gapped │
└─────────────────┘ └─────────────────┘ └─────────────────┘
🔐 ACCESS CONTROL MATRIX:
External Users ────────▶ DMZ Zone Only
Trusted Users ────────▶ Core Zone (authenticated)
Admin Users ────────▶ All Zones (MFA required)
Service-to-Service ─────▶ Encrypted + Cert-based
Backup Jobs ────────▶ Dedicated backup network
🚨 MONITORING & ALERTING:
[Threat Detection] ──▶ [SIEM Collection] ──▶ [Automated Response]
│ │ │
▼ ▼ ▼
• Network traffic • Centralized logs • Auto-blocking
• Failed logins • Security events • Alert notifications
• Resource anomalies • Performance metrics • Incident response