Files
HomeAudit/README.md
admin 705a2757c1 Major infrastructure migration and Vaultwarden PostgreSQL troubleshooting
COMPREHENSIVE CHANGES:

INFRASTRUCTURE MIGRATION:
- Migrated services to Docker Swarm on OMV800 (192.168.50.229)
- Deployed PostgreSQL database for Vaultwarden migration
- Updated all stack configurations for Docker Swarm compatibility
- Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox)
- Implemented proper secret management for all services

VAULTWARDEN POSTGRESQL MIGRATION:
- Attempted migration from SQLite to PostgreSQL for NFS compatibility
- Created PostgreSQL stack with proper user/password configuration
- Built custom Vaultwarden image with PostgreSQL support
- Troubleshot persistent SQLite fallback issue despite PostgreSQL config
- Identified known issue where Vaultwarden silently falls back to SQLite
- Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues
- Current status: Old Vaultwarden on lenovo410 still working, new one has config issues

PAPERLESS SERVICES:
- Successfully deployed Paperless-NGX and Paperless-AI on OMV800
- Both services running on ports 8000 and 3000 respectively
- Caddy configuration updated for external access
- Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org

CADDY CONFIGURATION:
- Updated Caddyfile on Surface (192.168.50.254) for new service locations
- Fixed Vaultwarden reverse proxy to point to new Docker Swarm service
- Removed old notification hub reference that was causing conflicts
- All services properly configured for external access via DuckDNS

BACKUP AND DISCOVERY:
- Created comprehensive backup system for all hosts
- Generated detailed discovery reports for infrastructure analysis
- Implemented automated backup validation scripts
- Created migration progress tracking and verification reports

MONITORING STACK:
- Deployed Prometheus, Grafana, and Blackbox monitoring
- Created infrastructure and system overview dashboards
- Added proper service discovery and alerting configuration
- Implemented performance monitoring for all critical services

DOCUMENTATION:
- Reorganized documentation into logical structure
- Created comprehensive migration playbook and troubleshooting guides
- Added hardware specifications and optimization recommendations
- Documented all configuration changes and service dependencies

CURRENT STATUS:
- Paperless services:  Working and accessible externally
- Vaultwarden:  PostgreSQL configuration issues, old instance still working
- Monitoring:  Deployed and operational
- Caddy:  Updated and working for external access
- PostgreSQL:  Database running, connection issues with Vaultwarden

NEXT STEPS:
- Continue troubleshooting Vaultwarden PostgreSQL configuration
- Consider alternative approaches for Vaultwarden migration
- Validate all external service access
- Complete final migration validation

TECHNICAL NOTES:
- Used Docker Swarm for orchestration on OMV800
- Implemented proper secret management for sensitive data
- Added comprehensive logging and monitoring
- Created automated backup and validation scripts
2025-08-30 20:18:44 -04:00

6.0 KiB

HomeAudit - Infrastructure Migration and Monitoring

A comprehensive home infrastructure audit, migration, and monitoring system for Docker Swarm deployment.

🏗️ Infrastructure Overview

Current Deployment Status

  • Paperless Stack: Paperless-NGX (port 8000) + Paperless AI (port 3000) on OMV800
  • Monitoring Stack: Prometheus + Grafana + Node Exporter + Blackbox Exporter
  • Caddy Reverse Proxy: SSL termination and domain routing
  • 🔄 Migration Progress: 85% complete

Device Inventory

Device IP Role Status
OMV800 192.168.50.229 Docker Swarm Manager Active
Surface 192.168.50.254 Caddy Reverse Proxy Active
jonathan-2518f5u 192.168.50.181 Worker Node Active
lenovo420 192.168.50.66 Worker Node Active
audrey 192.168.50.145 Worker Node Active
fedora 192.168.50.225 Worker Node Active

📊 Monitoring Stack

Components

  • Prometheus (port 9091): Metrics collection and storage
  • Grafana (port 3002): Data visualization and dashboards
  • Node Exporter (port 9100): System metrics collection
  • Blackbox Exporter (port 9115): Service health monitoring

Metrics Coverage

  • 15 Active Targets: Services, system, and health checks
  • 784 Metrics: Comprehensive infrastructure monitoring
  • Real-time Data: 15-60 second scrape intervals
  • 30-day Retention: Historical trend analysis

Dashboards

  1. Infrastructure Overview: Service health and availability
  2. System Overview: CPU, memory, disk, network monitoring

Access URLs

🔧 Services Status

Active Services

  • Paperless-NGX: Document management (port 8000)
  • Paperless AI: AI-powered document processing (port 3000)
  • Nextcloud: File storage and sync (port 8081)
  • Home Assistant: Home automation (port 8123)
  • Portainer: Container management (port 9000)
  • AppFlowy: Note-taking (port 9080)

Database Services

  • PostgreSQL: Primary database
  • MariaDB: Secondary database
  • Redis: Caching layer
  • Mosquitto: MQTT broker

🚀 Quick Start

1. Access Monitoring

# Grafana Dashboard
open https://grafana.pressmess.duckdns.org
# Login: admin / admin123

# Prometheus Metrics
open https://prometheus.pressmess.duckdns.org

2. Check Service Health

# View all monitoring targets
curl "http://192.168.50.229:9091/api/v1/targets"

# Check system metrics
curl "http://192.168.50.229:9091/api/v1/query?query=up"

3. Monitor System Resources

# CPU Usage
curl "http://192.168.50.229:9091/api/v1/query?query=100%20-%20(avg%20by%20(instance)%20(irate(node_cpu_seconds_total{mode=\"idle\"}[5m]))%20*%20100)"

# Memory Usage
curl "http://192.168.50.229:9091/api/v1/query?query=(1%20-%20(node_memory_MemAvailable_bytes%20/%20node_memory_MemTotal_bytes))%20*%20100"

📁 Project Structure

HomeAudit/
├── stacks/                    # Docker Swarm stacks
│   └── monitoring/           # Monitoring stack configuration
├── configs/                  # Configuration files
│   └── monitoring/          # Prometheus, Grafana configs
├── scripts/                  # Utility scripts
├── dev_documentation/        # Detailed documentation
└── comprehensive_discovery_results/  # Audit results

🔍 Monitoring Features

System Monitoring

  • CPU Usage: Per-core and overall utilization
  • Memory Usage: Total, available, cached, buffers
  • Disk Usage: Space, I/O, mount points
  • Network I/O: Bytes sent/received per interface
  • System Load: 1m, 5m, 15m averages

Service Monitoring

  • HTTP Health Checks: Web service availability
  • TCP Health Checks: Database and backend services
  • Response Times: Service performance tracking
  • Availability Metrics: Uptime and reliability

Infrastructure Monitoring

  • Docker Swarm: Service health and resource usage
  • Container Metrics: Resource consumption per container
  • Network Connectivity: Inter-service communication
  • Hardware Health: System temperature and status

🛠️ Maintenance

Update Monitoring Stack

# Deploy updated configuration
ssh root@192.168.50.229 "cd /opt/stacks/monitoring && docker stack deploy -c final-monitoring.yml monitoring"

# Check service status
ssh root@192.168.50.229 "docker service ls | grep monitoring"

View Logs

# Prometheus logs
ssh root@192.168.50.229 "docker service logs monitoring_prometheus"

# Grafana logs
ssh root@192.168.50.229 "docker service logs monitoring_grafana"

📈 Performance Metrics

Current System Specs

  • Total Memory: 31GB
  • CPU Cores: Multi-core system
  • Storage: SSD-based storage
  • Network: Gigabit connectivity

Monitoring Performance

  • Scrape Interval: 15-60 seconds
  • Data Retention: 30 days
  • Metrics Count: 784 different metrics
  • Target Health: 15/15 targets healthy

🔮 Future Enhancements

Planned Improvements

  1. AlertManager: Smart alerting and notifications
  2. cAdvisor: Container resource monitoring
  3. Application Exporters: Database and service-specific metrics
  4. Centralized Logging: Log aggregation and analysis

Optional Enhancements

  1. Distributed Tracing: Request flow tracking
  2. APM: Application performance monitoring
  3. Synthetic Monitoring: User journey testing
  4. Automated Incident Response: Self-healing infrastructure

📞 Support

For issues or questions:

  1. Check the monitoring dashboards for system health
  2. Review service logs for error details
  3. Consult the comprehensive documentation in dev_documentation/
  4. Check the migration status in comprehensive_discovery_results/

Last Updated: August 30, 2025
Monitoring Status: Fully Operational
Migration Progress: 85% Complete