# Traefik Production Deployment Guide ## Overview This guide provides comprehensive instructions for deploying Traefik v3.1 in production with full authentication, monitoring, and security features on Docker Swarm with SELinux enforcement. ## Architecture Components ### Core Services - **Traefik v3.1**: Load balancer and reverse proxy with authentication - **Prometheus**: Metrics collection and alerting - **Grafana**: Monitoring dashboards and visualization - **AlertManager**: Alert routing and notification management - **Loki + Promtail**: Log aggregation and analysis ### Security Features - ✅ Basic authentication with bcrypt hashing - ✅ TLS/SSL termination with automatic certificates - ✅ Security headers (HSTS, XSS protection, etc.) - ✅ Rate limiting and DDoS protection - ✅ SELinux policy compliance - ✅ Prometheus metrics for security monitoring ## Prerequisites ### System Requirements - Docker Swarm cluster (single manager minimum) - SELinux enabled (Fedora/RHEL/CentOS) - Minimum 4GB RAM, 20GB disk space - Network ports: 80, 443, 8080, 9090, 3000 ### Directory Structure ```bash sudo mkdir -p /opt/{traefik,monitoring}/{letsencrypt,logs,prometheus,grafana,alertmanager,loki} sudo mkdir -p /opt/monitoring/{prometheus/{data,config},grafana/{data,config}} sudo mkdir -p /opt/monitoring/{alertmanager/{data,config},loki/data,promtail/config} sudo chown -R 1000:1000 /opt/monitoring/grafana ``` ## Installation Steps ### Step 1: SELinux Policy Configuration ```bash # Install SELinux development tools sudo dnf install -y selinux-policy-devel # Install custom SELinux policy cd /home/jonathan/Coding/HomeAudit/selinux ./install_selinux_policy.sh ``` ### Step 2: Docker Swarm Network Setup ```bash # Create overlay network docker network create --driver overlay --attachable traefik-public ``` ### Step 3: Configuration Deployment ```bash # Copy monitoring configurations sudo cp configs/monitoring/prometheus.yml /opt/monitoring/prometheus/config/ sudo cp configs/monitoring/traefik_rules.yml /opt/monitoring/prometheus/config/ sudo cp configs/monitoring/alertmanager.yml /opt/monitoring/alertmanager/config/ # Set proper permissions sudo chown -R 65534:65534 /opt/monitoring/prometheus sudo chown -R 472:472 /opt/monitoring/grafana ``` ### Step 4: Environment Variables Create `/opt/traefik/.env`: ```bash DOMAIN=yourdomain.com EMAIL=admin@yourdomain.com ``` ### Step 5: Deploy Services ```bash # Deploy Traefik export DOMAIN=yourdomain.com docker stack deploy -c stacks/core/traefik-production.yml traefik # Deploy monitoring stack docker stack deploy -c stacks/monitoring/traefik-monitoring.yml monitoring ``` ## Configuration Details ### Authentication Credentials - **Username**: `admin` - **Password**: `secure_password_2024` (bcrypt hash included) - **Change in production**: Generate new hash with `htpasswd -nbB admin newpassword` ### SSL/TLS Configuration - Automatic Let's Encrypt certificates - HTTPS redirect for all HTTP traffic - HSTS headers with 2-year max-age - Secure cipher suites only ### Monitoring Access Points - **Traefik Dashboard**: `https://traefik.yourdomain.com/dashboard/` - **Prometheus**: `https://prometheus.yourdomain.com` - **Grafana**: `https://grafana.yourdomain.com` - **AlertManager**: `https://alertmanager.yourdomain.com` ## Security Monitoring ### Key Metrics Monitored 1. **Authentication Failures**: Rate of 401/403 responses 2. **Brute Force Attacks**: High-frequency auth failures 3. **Service Availability**: Backend health status 4. **Response Times**: 95th percentile latency 5. **Error Rates**: 5xx error percentage 6. **Certificate Expiration**: TLS cert validity 7. **Rate Limiting**: 429 response frequency ### Alert Thresholds - **Critical**: >50 auth failures/second = Possible brute force - **Warning**: >10 auth failures/minute = High failure rate - **Critical**: Service backend down >1 minute - **Warning**: 95th percentile response time >2 seconds - **Warning**: Error rate >10% for 5 minutes - **Warning**: TLS certificate expires <7 days - **Critical**: TLS certificate expired ## Production Checklist ### Pre-Deployment - [ ] SELinux policy installed and tested - [ ] Docker Swarm initialized and nodes joined - [ ] Directory structure created with correct permissions - [ ] Environment variables configured - [ ] DNS records pointing to Swarm manager - [ ] Firewall rules configured for ports 80, 443, 8080 ### Post-Deployment Verification - [ ] Traefik dashboard accessible with authentication - [ ] HTTPS redirects working correctly - [ ] Security headers present in responses - [ ] Prometheus collecting Traefik metrics - [ ] Grafana dashboards displaying data - [ ] AlertManager receiving and routing alerts - [ ] Log aggregation working in Loki - [ ] Certificate auto-renewal configured ### Security Validation - [ ] Authentication required for all admin interfaces - [ ] TLS certificates valid and auto-renewing - [ ] Security headers (HSTS, XSS protection) enabled - [ ] Rate limiting functional - [ ] Monitoring alerts triggering correctly - [ ] SELinux in enforcing mode without denials ## Maintenance Operations ### Certificate Management ```bash # Check certificate status docker exec $(docker ps -q -f name=traefik) ls -la /letsencrypt/acme.json # Force certificate renewal (if needed) docker exec $(docker ps -q -f name=traefik) rm /letsencrypt/acme.json docker service update --force traefik_traefik ``` ### Log Management ```bash # Rotate Traefik logs sudo logrotate -f /etc/logrotate.d/traefik # Check log sizes du -sh /opt/traefik/logs/* ``` ### Monitoring Maintenance ```bash # Check Prometheus targets curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health' # Grafana backup tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /opt/monitoring/grafana/data ``` ## Troubleshooting ### Common Issues **SELinux Permission Denied** ```bash # Check for denials sudo ausearch -m avc -ts recent | grep traefik # Temporarily disable to test sudo setenforce 0 # Re-install policy if needed cd selinux && ./install_selinux_policy.sh ``` **Authentication Not Working** ```bash # Check service labels docker service inspect traefik_traefik | jq '.[0].Spec.Labels' # Verify bcrypt hash echo 'admin:$2y$10$xvzBkbKKvRX.jGG6F7L.ReEMyEx.7BkqNGQO2rFt/1aBgx8jPElXW' | htpasswd -i -v /dev/stdin admin ``` **Certificate Issues** ```bash # Check ACME log docker service logs traefik_traefik | grep -i acme # Verify DNS resolution nslookup yourdomain.com # Check rate limits curl -I https://acme-v02.api.letsencrypt.org/directory ``` ### Health Checks ```bash # Traefik API health curl -f http://localhost:8080/ping # Service discovery curl -s http://localhost:8080/api/http/services | jq '.' # Prometheus metrics curl -s http://localhost:8080/metrics | grep traefik_ ``` ## Performance Tuning ### Resource Limits - **Traefik**: 1 CPU, 512MB RAM - **Prometheus**: 1 CPU, 1GB RAM - **Grafana**: 0.5 CPU, 512MB RAM - **AlertManager**: 0.2 CPU, 256MB RAM ### Scaling Recommendations - Single Traefik instance per manager node - Prometheus data retention: 30 days - Log rotation: Daily, keep 7 days - Monitoring scrape interval: 15 seconds ## Backup Strategy ### Critical Data - `/opt/traefik/letsencrypt/`: TLS certificates - `/opt/monitoring/prometheus/data/`: Metrics data - `/opt/monitoring/grafana/data/`: Dashboards and config - `/opt/monitoring/alertmanager/config/`: Alert rules ### Backup Script ```bash #!/bin/bash BACKUP_DIR="/backup/traefik-$(date +%Y%m%d)" mkdir -p "$BACKUP_DIR" tar -czf "$BACKUP_DIR/traefik-config.tar.gz" /opt/traefik/ tar -czf "$BACKUP_DIR/monitoring-config.tar.gz" /opt/monitoring/ ``` ## Support and Documentation ### Log Locations - **Traefik Logs**: `/opt/traefik/logs/` - **Access Logs**: `/opt/traefik/logs/access.log` - **Service Logs**: `docker service logs traefik_traefik` ### Monitoring Queries ```promql # Authentication failure rate rate(traefik_service_requests_total{code=~"401|403"}[5m]) # Service availability up{job="traefik"} # Response time 95th percentile histogram_quantile(0.95, rate(traefik_service_request_duration_seconds_bucket[5m])) ``` This deployment provides enterprise-grade Traefik configuration with comprehensive security, monitoring, and operational capabilities.