Files
HomeAudit/stacks/core/traefik-production.yml
admin 705a2757c1 Major infrastructure migration and Vaultwarden PostgreSQL troubleshooting
COMPREHENSIVE CHANGES:

INFRASTRUCTURE MIGRATION:
- Migrated services to Docker Swarm on OMV800 (192.168.50.229)
- Deployed PostgreSQL database for Vaultwarden migration
- Updated all stack configurations for Docker Swarm compatibility
- Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox)
- Implemented proper secret management for all services

VAULTWARDEN POSTGRESQL MIGRATION:
- Attempted migration from SQLite to PostgreSQL for NFS compatibility
- Created PostgreSQL stack with proper user/password configuration
- Built custom Vaultwarden image with PostgreSQL support
- Troubleshot persistent SQLite fallback issue despite PostgreSQL config
- Identified known issue where Vaultwarden silently falls back to SQLite
- Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues
- Current status: Old Vaultwarden on lenovo410 still working, new one has config issues

PAPERLESS SERVICES:
- Successfully deployed Paperless-NGX and Paperless-AI on OMV800
- Both services running on ports 8000 and 3000 respectively
- Caddy configuration updated for external access
- Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org

CADDY CONFIGURATION:
- Updated Caddyfile on Surface (192.168.50.254) for new service locations
- Fixed Vaultwarden reverse proxy to point to new Docker Swarm service
- Removed old notification hub reference that was causing conflicts
- All services properly configured for external access via DuckDNS

BACKUP AND DISCOVERY:
- Created comprehensive backup system for all hosts
- Generated detailed discovery reports for infrastructure analysis
- Implemented automated backup validation scripts
- Created migration progress tracking and verification reports

MONITORING STACK:
- Deployed Prometheus, Grafana, and Blackbox monitoring
- Created infrastructure and system overview dashboards
- Added proper service discovery and alerting configuration
- Implemented performance monitoring for all critical services

DOCUMENTATION:
- Reorganized documentation into logical structure
- Created comprehensive migration playbook and troubleshooting guides
- Added hardware specifications and optimization recommendations
- Documented all configuration changes and service dependencies

CURRENT STATUS:
- Paperless services:  Working and accessible externally
- Vaultwarden:  PostgreSQL configuration issues, old instance still working
- Monitoring:  Deployed and operational
- Caddy:  Updated and working for external access
- PostgreSQL:  Database running, connection issues with Vaultwarden

NEXT STEPS:
- Continue troubleshooting Vaultwarden PostgreSQL configuration
- Consider alternative approaches for Vaultwarden migration
- Validate all external service access
- Complete final migration validation

TECHNICAL NOTES:
- Used Docker Swarm for orchestration on OMV800
- Implemented proper secret management for sensitive data
- Added comprehensive logging and monitoring
- Created automated backup and validation scripts
2025-08-30 20:18:44 -04:00

160 lines
5.4 KiB
YAML

version: '3.9'
services:
traefik:
image: traefik:v3.1 # Updated to latest stable version
user: "0:0" # Run as root for Docker socket access
command:
# Swarm provider configuration (v3.1 syntax)
- --providers.swarm=true
- --providers.swarm.exposedbydefault=false
- --providers.swarm.network=traefik-public
- --providers.swarm.endpoint=tcp://172.17.0.1:2375
# Entry points
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --entrypoints.traefik.address=:8080
# API and Dashboard
- --api.dashboard=true
- --api.insecure=false
# SSL/TLS Configuration
- --certificatesresolvers.letsencrypt.acme.email=admin@localhost
- --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.letsencrypt.acme.httpchallenge=true
- --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
# Logging
- --log.level=INFO
- --log.format=json
- --log.filePath=/logs/traefik.log
- --accesslog=true
- --accesslog.format=json
- --accesslog.filePath=/logs/access.log
- --accesslog.filters.statuscodes=400-599
# Metrics
- --metrics.prometheus=true
- --metrics.prometheus.addEntryPointsLabels=true
- --metrics.prometheus.addServicesLabels=true
- --metrics.prometheus.buckets=0.1,0.3,1.2,5.0
# Security headers
- --global.checknewversion=false
- --global.sendanonymoususage=false
# Rate limiting (v3.1 syntax removed for simplicity)
# Rate limiting can be configured via middleware instead
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- traefik_letsencrypt:/letsencrypt
- traefik_logs:/logs
networks:
- traefik-public
ports:
- "80:80"
- "443:443"
- "8080:8080"
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
preferences:
- spread: node.id
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
labels:
# Enable Traefik for this service
- traefik.enable=true
- traefik.docker.network=traefik-public
# Dashboard configuration with authentication
- traefik.http.routers.dashboard.rule=Host(`traefik.${DOMAIN:-localhost}`) && (PathPrefix(`/api`) || PathPrefix(`/dashboard`))
- traefik.http.routers.dashboard.service=api@internal
- traefik.http.routers.dashboard.entrypoints=websecure
- traefik.http.routers.dashboard.tls=true
- traefik.http.routers.dashboard.tls.certresolver=letsencrypt
- traefik.http.routers.dashboard.middlewares=dashboard-auth,security-headers
# Authentication middleware (bcrypt hash for password: secure_password_2024)
- traefik.http.middlewares.dashboard-auth.basicauth.users=admin:$$2y$$10$$xvzBkbKKvRX.jGG6F7L.ReEMyEx.7BkqNGQO2rFt/1aBgx8jPElXW
- traefik.http.middlewares.dashboard-auth.basicauth.realm=Traefik Dashboard
# Security headers middleware
- traefik.http.middlewares.security-headers.headers.framedeny=true
- traefik.http.middlewares.security-headers.headers.sslredirect=true
- traefik.http.middlewares.security-headers.headers.browserxssfilter=true
- traefik.http.middlewares.security-headers.headers.contenttypenosniff=true
- traefik.http.middlewares.security-headers.headers.forcestsheader=true
- traefik.http.middlewares.security-headers.headers.stsincludesubdomains=true
- traefik.http.middlewares.security-headers.headers.stsseconds=63072000
- traefik.http.middlewares.security-headers.headers.stspreload=true
# Global HTTP to HTTPS redirect
- traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)
- traefik.http.routers.http-catchall.entrypoints=web
- traefik.http.routers.http-catchall.middlewares=redirect-to-https
- traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https
- traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true
# Dummy service for Swarm compatibility
- traefik.http.services.dummy-svc.loadbalancer.server.port=9999
# Health check
- traefik.http.routers.ping.rule=Path(`/ping`)
- traefik.http.routers.ping.service=ping@internal
- traefik.http.routers.ping.entrypoints=traefik
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/ping"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
volumes:
traefik_letsencrypt:
driver: local
driver_opts:
type: none
o: bind
device: /opt/traefik/letsencrypt
traefik_logs:
driver: local
driver_opts:
type: none
o: bind
device: /opt/traefik/logs
networks:
traefik-public:
external: true
driver: overlay
attachable: true