Files
HomeAudit/configs/monitoring/prometheus-production.yml
admin 705a2757c1 Major infrastructure migration and Vaultwarden PostgreSQL troubleshooting
COMPREHENSIVE CHANGES:

INFRASTRUCTURE MIGRATION:
- Migrated services to Docker Swarm on OMV800 (192.168.50.229)
- Deployed PostgreSQL database for Vaultwarden migration
- Updated all stack configurations for Docker Swarm compatibility
- Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox)
- Implemented proper secret management for all services

VAULTWARDEN POSTGRESQL MIGRATION:
- Attempted migration from SQLite to PostgreSQL for NFS compatibility
- Created PostgreSQL stack with proper user/password configuration
- Built custom Vaultwarden image with PostgreSQL support
- Troubleshot persistent SQLite fallback issue despite PostgreSQL config
- Identified known issue where Vaultwarden silently falls back to SQLite
- Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues
- Current status: Old Vaultwarden on lenovo410 still working, new one has config issues

PAPERLESS SERVICES:
- Successfully deployed Paperless-NGX and Paperless-AI on OMV800
- Both services running on ports 8000 and 3000 respectively
- Caddy configuration updated for external access
- Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org

CADDY CONFIGURATION:
- Updated Caddyfile on Surface (192.168.50.254) for new service locations
- Fixed Vaultwarden reverse proxy to point to new Docker Swarm service
- Removed old notification hub reference that was causing conflicts
- All services properly configured for external access via DuckDNS

BACKUP AND DISCOVERY:
- Created comprehensive backup system for all hosts
- Generated detailed discovery reports for infrastructure analysis
- Implemented automated backup validation scripts
- Created migration progress tracking and verification reports

MONITORING STACK:
- Deployed Prometheus, Grafana, and Blackbox monitoring
- Created infrastructure and system overview dashboards
- Added proper service discovery and alerting configuration
- Implemented performance monitoring for all critical services

DOCUMENTATION:
- Reorganized documentation into logical structure
- Created comprehensive migration playbook and troubleshooting guides
- Added hardware specifications and optimization recommendations
- Documented all configuration changes and service dependencies

CURRENT STATUS:
- Paperless services:  Working and accessible externally
- Vaultwarden:  PostgreSQL configuration issues, old instance still working
- Monitoring:  Deployed and operational
- Caddy:  Updated and working for external access
- PostgreSQL:  Database running, connection issues with Vaultwarden

NEXT STEPS:
- Continue troubleshooting Vaultwarden PostgreSQL configuration
- Consider alternative approaches for Vaultwarden migration
- Validate all external service access
- Complete final migration validation

TECHNICAL NOTES:
- Used Docker Swarm for orchestration on OMV800
- Implemented proper secret management for sensitive data
- Added comprehensive logging and monitoring
- Created automated backup and validation scripts
2025-08-30 20:18:44 -04:00

71 lines
2.0 KiB
YAML

global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Blackbox exporter
- job_name: 'blackbox'
static_configs:
- targets: ['192.168.50.229:9115']
# Node exporter - system metrics
- job_name: 'node-exporter'
static_configs:
- targets: ['192.168.50.229:9100']
scrape_interval: 30s
# Docker Swarm services that expose metrics
- job_name: 'docker-swarm-metrics'
static_configs:
- targets:
- '192.168.50.229:9091' # Prometheus
- '192.168.50.229:3002' # Grafana
scrape_interval: 30s
# HTTP service health checks via blackbox exporter
- job_name: 'http-service-health'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- 'http://192.168.50.229:8000' # Paperless-NGX
- 'http://192.168.50.229:3000' # Paperless-AI
- 'http://192.168.50.229:8081' # Nextcloud
- 'http://192.168.50.181:8123' # Home Assistant
- 'http://192.168.50.181:9000' # Portainer
- 'http://192.168.50.66:9080' # AppFlowy
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.50.229:9115
scrape_interval: 60s
# TCP service health checks via blackbox exporter
- job_name: 'tcp-service-health'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- '192.168.50.229:6379' # Redis
- '192.168.50.229:5432' # PostgreSQL
- '192.168.50.229:3306' # MariaDB
- '192.168.50.229:1883' # Mosquitto
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.50.229:9115
scrape_interval: 60s