Major accomplishments: - ✅ SELinux policy installed and working - ✅ Core Traefik v2.10 deployment running - ✅ Production configuration ready (v3.1) - ✅ Monitoring stack configured - ✅ Comprehensive documentation created - ✅ Security hardening implemented Current status: - 🟡 Partially deployed (60% complete) - ⚠️ Docker socket access needs resolution - ❌ Monitoring stack not deployed yet - ⚠️ Production migration pending Next steps: 1. Fix Docker socket permissions 2. Deploy monitoring stack 3. Migrate to production config 4. Validate full functionality Files added: - Complete Traefik deployment documentation - Production and test configurations - Monitoring stack configurations - SELinux policy module - Security checklists and guides - Current status documentation
8.1 KiB
8.1 KiB
Traefik Production Deployment Guide
Overview
This guide provides comprehensive instructions for deploying Traefik v3.1 in production with full authentication, monitoring, and security features on Docker Swarm with SELinux enforcement.
Architecture Components
Core Services
- Traefik v3.1: Load balancer and reverse proxy with authentication
- Prometheus: Metrics collection and alerting
- Grafana: Monitoring dashboards and visualization
- AlertManager: Alert routing and notification management
- Loki + Promtail: Log aggregation and analysis
Security Features
- ✅ Basic authentication with bcrypt hashing
- ✅ TLS/SSL termination with automatic certificates
- ✅ Security headers (HSTS, XSS protection, etc.)
- ✅ Rate limiting and DDoS protection
- ✅ SELinux policy compliance
- ✅ Prometheus metrics for security monitoring
Prerequisites
System Requirements
- Docker Swarm cluster (single manager minimum)
- SELinux enabled (Fedora/RHEL/CentOS)
- Minimum 4GB RAM, 20GB disk space
- Network ports: 80, 443, 8080, 9090, 3000
Directory Structure
sudo mkdir -p /opt/{traefik,monitoring}/{letsencrypt,logs,prometheus,grafana,alertmanager,loki}
sudo mkdir -p /opt/monitoring/{prometheus/{data,config},grafana/{data,config}}
sudo mkdir -p /opt/monitoring/{alertmanager/{data,config},loki/data,promtail/config}
sudo chown -R 1000:1000 /opt/monitoring/grafana
Installation Steps
Step 1: SELinux Policy Configuration
# Install SELinux development tools
sudo dnf install -y selinux-policy-devel
# Install custom SELinux policy
cd /home/jonathan/Coding/HomeAudit/selinux
./install_selinux_policy.sh
Step 2: Docker Swarm Network Setup
# Create overlay network
docker network create --driver overlay --attachable traefik-public
Step 3: Configuration Deployment
# Copy monitoring configurations
sudo cp configs/monitoring/prometheus.yml /opt/monitoring/prometheus/config/
sudo cp configs/monitoring/traefik_rules.yml /opt/monitoring/prometheus/config/
sudo cp configs/monitoring/alertmanager.yml /opt/monitoring/alertmanager/config/
# Set proper permissions
sudo chown -R 65534:65534 /opt/monitoring/prometheus
sudo chown -R 472:472 /opt/monitoring/grafana
Step 4: Environment Variables
Create /opt/traefik/.env:
DOMAIN=yourdomain.com
EMAIL=admin@yourdomain.com
Step 5: Deploy Services
# Deploy Traefik
export DOMAIN=yourdomain.com
docker stack deploy -c stacks/core/traefik-production.yml traefik
# Deploy monitoring stack
docker stack deploy -c stacks/monitoring/traefik-monitoring.yml monitoring
Configuration Details
Authentication Credentials
- Username:
admin - Password:
secure_password_2024(bcrypt hash included) - Change in production: Generate new hash with
htpasswd -nbB admin newpassword
SSL/TLS Configuration
- Automatic Let's Encrypt certificates
- HTTPS redirect for all HTTP traffic
- HSTS headers with 2-year max-age
- Secure cipher suites only
Monitoring Access Points
- Traefik Dashboard:
https://traefik.yourdomain.com/dashboard/ - Prometheus:
https://prometheus.yourdomain.com - Grafana:
https://grafana.yourdomain.com - AlertManager:
https://alertmanager.yourdomain.com
Security Monitoring
Key Metrics Monitored
- Authentication Failures: Rate of 401/403 responses
- Brute Force Attacks: High-frequency auth failures
- Service Availability: Backend health status
- Response Times: 95th percentile latency
- Error Rates: 5xx error percentage
- Certificate Expiration: TLS cert validity
- Rate Limiting: 429 response frequency
Alert Thresholds
- Critical: >50 auth failures/second = Possible brute force
- Warning: >10 auth failures/minute = High failure rate
- Critical: Service backend down >1 minute
- Warning: 95th percentile response time >2 seconds
- Warning: Error rate >10% for 5 minutes
- Warning: TLS certificate expires <7 days
- Critical: TLS certificate expired
Production Checklist
Pre-Deployment
- SELinux policy installed and tested
- Docker Swarm initialized and nodes joined
- Directory structure created with correct permissions
- Environment variables configured
- DNS records pointing to Swarm manager
- Firewall rules configured for ports 80, 443, 8080
Post-Deployment Verification
- Traefik dashboard accessible with authentication
- HTTPS redirects working correctly
- Security headers present in responses
- Prometheus collecting Traefik metrics
- Grafana dashboards displaying data
- AlertManager receiving and routing alerts
- Log aggregation working in Loki
- Certificate auto-renewal configured
Security Validation
- Authentication required for all admin interfaces
- TLS certificates valid and auto-renewing
- Security headers (HSTS, XSS protection) enabled
- Rate limiting functional
- Monitoring alerts triggering correctly
- SELinux in enforcing mode without denials
Maintenance Operations
Certificate Management
# Check certificate status
docker exec $(docker ps -q -f name=traefik) ls -la /letsencrypt/acme.json
# Force certificate renewal (if needed)
docker exec $(docker ps -q -f name=traefik) rm /letsencrypt/acme.json
docker service update --force traefik_traefik
Log Management
# Rotate Traefik logs
sudo logrotate -f /etc/logrotate.d/traefik
# Check log sizes
du -sh /opt/traefik/logs/*
Monitoring Maintenance
# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health'
# Grafana backup
tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /opt/monitoring/grafana/data
Troubleshooting
Common Issues
SELinux Permission Denied
# Check for denials
sudo ausearch -m avc -ts recent | grep traefik
# Temporarily disable to test
sudo setenforce 0
# Re-install policy if needed
cd selinux && ./install_selinux_policy.sh
Authentication Not Working
# Check service labels
docker service inspect traefik_traefik | jq '.[0].Spec.Labels'
# Verify bcrypt hash
echo 'admin:$2y$10$xvzBkbKKvRX.jGG6F7L.ReEMyEx.7BkqNGQO2rFt/1aBgx8jPElXW' | htpasswd -i -v /dev/stdin admin
Certificate Issues
# Check ACME log
docker service logs traefik_traefik | grep -i acme
# Verify DNS resolution
nslookup yourdomain.com
# Check rate limits
curl -I https://acme-v02.api.letsencrypt.org/directory
Health Checks
# Traefik API health
curl -f http://localhost:8080/ping
# Service discovery
curl -s http://localhost:8080/api/http/services | jq '.'
# Prometheus metrics
curl -s http://localhost:8080/metrics | grep traefik_
Performance Tuning
Resource Limits
- Traefik: 1 CPU, 512MB RAM
- Prometheus: 1 CPU, 1GB RAM
- Grafana: 0.5 CPU, 512MB RAM
- AlertManager: 0.2 CPU, 256MB RAM
Scaling Recommendations
- Single Traefik instance per manager node
- Prometheus data retention: 30 days
- Log rotation: Daily, keep 7 days
- Monitoring scrape interval: 15 seconds
Backup Strategy
Critical Data
/opt/traefik/letsencrypt/: TLS certificates/opt/monitoring/prometheus/data/: Metrics data/opt/monitoring/grafana/data/: Dashboards and config/opt/monitoring/alertmanager/config/: Alert rules
Backup Script
#!/bin/bash
BACKUP_DIR="/backup/traefik-$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
tar -czf "$BACKUP_DIR/traefik-config.tar.gz" /opt/traefik/
tar -czf "$BACKUP_DIR/monitoring-config.tar.gz" /opt/monitoring/
Support and Documentation
Log Locations
- Traefik Logs:
/opt/traefik/logs/ - Access Logs:
/opt/traefik/logs/access.log - Service Logs:
docker service logs traefik_traefik
Monitoring Queries
# Authentication failure rate
rate(traefik_service_requests_total{code=~"401|403"}[5m])
# Service availability
up{job="traefik"}
# Response time 95th percentile
histogram_quantile(0.95, rate(traefik_service_request_duration_seconds_bucket[5m]))
This deployment provides enterprise-grade Traefik configuration with comprehensive security, monitoring, and operational capabilities.