Files
HomeAudit/TRAEFIK_DEPLOYMENT_GUIDE.md
admin 9ea31368f5 Complete Traefik infrastructure deployment - 60% complete
Major accomplishments:
-  SELinux policy installed and working
-  Core Traefik v2.10 deployment running
-  Production configuration ready (v3.1)
-  Monitoring stack configured
-  Comprehensive documentation created
-  Security hardening implemented

Current status:
- 🟡 Partially deployed (60% complete)
- ⚠️ Docker socket access needs resolution
-  Monitoring stack not deployed yet
- ⚠️ Production migration pending

Next steps:
1. Fix Docker socket permissions
2. Deploy monitoring stack
3. Migrate to production config
4. Validate full functionality

Files added:
- Complete Traefik deployment documentation
- Production and test configurations
- Monitoring stack configurations
- SELinux policy module
- Security checklists and guides
- Current status documentation
2025-08-28 15:22:41 -04:00

8.1 KiB

Traefik Production Deployment Guide

Overview

This guide provides comprehensive instructions for deploying Traefik v3.1 in production with full authentication, monitoring, and security features on Docker Swarm with SELinux enforcement.

Architecture Components

Core Services

  • Traefik v3.1: Load balancer and reverse proxy with authentication
  • Prometheus: Metrics collection and alerting
  • Grafana: Monitoring dashboards and visualization
  • AlertManager: Alert routing and notification management
  • Loki + Promtail: Log aggregation and analysis

Security Features

  • Basic authentication with bcrypt hashing
  • TLS/SSL termination with automatic certificates
  • Security headers (HSTS, XSS protection, etc.)
  • Rate limiting and DDoS protection
  • SELinux policy compliance
  • Prometheus metrics for security monitoring

Prerequisites

System Requirements

  • Docker Swarm cluster (single manager minimum)
  • SELinux enabled (Fedora/RHEL/CentOS)
  • Minimum 4GB RAM, 20GB disk space
  • Network ports: 80, 443, 8080, 9090, 3000

Directory Structure

sudo mkdir -p /opt/{traefik,monitoring}/{letsencrypt,logs,prometheus,grafana,alertmanager,loki}
sudo mkdir -p /opt/monitoring/{prometheus/{data,config},grafana/{data,config}}
sudo mkdir -p /opt/monitoring/{alertmanager/{data,config},loki/data,promtail/config}
sudo chown -R 1000:1000 /opt/monitoring/grafana

Installation Steps

Step 1: SELinux Policy Configuration

# Install SELinux development tools
sudo dnf install -y selinux-policy-devel

# Install custom SELinux policy
cd /home/jonathan/Coding/HomeAudit/selinux
./install_selinux_policy.sh

Step 2: Docker Swarm Network Setup

# Create overlay network
docker network create --driver overlay --attachable traefik-public

Step 3: Configuration Deployment

# Copy monitoring configurations
sudo cp configs/monitoring/prometheus.yml /opt/monitoring/prometheus/config/
sudo cp configs/monitoring/traefik_rules.yml /opt/monitoring/prometheus/config/
sudo cp configs/monitoring/alertmanager.yml /opt/monitoring/alertmanager/config/

# Set proper permissions
sudo chown -R 65534:65534 /opt/monitoring/prometheus
sudo chown -R 472:472 /opt/monitoring/grafana

Step 4: Environment Variables

Create /opt/traefik/.env:

DOMAIN=yourdomain.com
EMAIL=admin@yourdomain.com

Step 5: Deploy Services

# Deploy Traefik
export DOMAIN=yourdomain.com
docker stack deploy -c stacks/core/traefik-production.yml traefik

# Deploy monitoring stack
docker stack deploy -c stacks/monitoring/traefik-monitoring.yml monitoring

Configuration Details

Authentication Credentials

  • Username: admin
  • Password: secure_password_2024 (bcrypt hash included)
  • Change in production: Generate new hash with htpasswd -nbB admin newpassword

SSL/TLS Configuration

  • Automatic Let's Encrypt certificates
  • HTTPS redirect for all HTTP traffic
  • HSTS headers with 2-year max-age
  • Secure cipher suites only

Monitoring Access Points

  • Traefik Dashboard: https://traefik.yourdomain.com/dashboard/
  • Prometheus: https://prometheus.yourdomain.com
  • Grafana: https://grafana.yourdomain.com
  • AlertManager: https://alertmanager.yourdomain.com

Security Monitoring

Key Metrics Monitored

  1. Authentication Failures: Rate of 401/403 responses
  2. Brute Force Attacks: High-frequency auth failures
  3. Service Availability: Backend health status
  4. Response Times: 95th percentile latency
  5. Error Rates: 5xx error percentage
  6. Certificate Expiration: TLS cert validity
  7. Rate Limiting: 429 response frequency

Alert Thresholds

  • Critical: >50 auth failures/second = Possible brute force
  • Warning: >10 auth failures/minute = High failure rate
  • Critical: Service backend down >1 minute
  • Warning: 95th percentile response time >2 seconds
  • Warning: Error rate >10% for 5 minutes
  • Warning: TLS certificate expires <7 days
  • Critical: TLS certificate expired

Production Checklist

Pre-Deployment

  • SELinux policy installed and tested
  • Docker Swarm initialized and nodes joined
  • Directory structure created with correct permissions
  • Environment variables configured
  • DNS records pointing to Swarm manager
  • Firewall rules configured for ports 80, 443, 8080

Post-Deployment Verification

  • Traefik dashboard accessible with authentication
  • HTTPS redirects working correctly
  • Security headers present in responses
  • Prometheus collecting Traefik metrics
  • Grafana dashboards displaying data
  • AlertManager receiving and routing alerts
  • Log aggregation working in Loki
  • Certificate auto-renewal configured

Security Validation

  • Authentication required for all admin interfaces
  • TLS certificates valid and auto-renewing
  • Security headers (HSTS, XSS protection) enabled
  • Rate limiting functional
  • Monitoring alerts triggering correctly
  • SELinux in enforcing mode without denials

Maintenance Operations

Certificate Management

# Check certificate status
docker exec $(docker ps -q -f name=traefik) ls -la /letsencrypt/acme.json

# Force certificate renewal (if needed)
docker exec $(docker ps -q -f name=traefik) rm /letsencrypt/acme.json
docker service update --force traefik_traefik

Log Management

# Rotate Traefik logs
sudo logrotate -f /etc/logrotate.d/traefik

# Check log sizes
du -sh /opt/traefik/logs/*

Monitoring Maintenance

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health'

# Grafana backup
tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /opt/monitoring/grafana/data

Troubleshooting

Common Issues

SELinux Permission Denied

# Check for denials
sudo ausearch -m avc -ts recent | grep traefik

# Temporarily disable to test
sudo setenforce 0

# Re-install policy if needed
cd selinux && ./install_selinux_policy.sh

Authentication Not Working

# Check service labels
docker service inspect traefik_traefik | jq '.[0].Spec.Labels'

# Verify bcrypt hash
echo 'admin:$2y$10$xvzBkbKKvRX.jGG6F7L.ReEMyEx.7BkqNGQO2rFt/1aBgx8jPElXW' | htpasswd -i -v /dev/stdin admin

Certificate Issues

# Check ACME log
docker service logs traefik_traefik | grep -i acme

# Verify DNS resolution
nslookup yourdomain.com

# Check rate limits
curl -I https://acme-v02.api.letsencrypt.org/directory

Health Checks

# Traefik API health
curl -f http://localhost:8080/ping

# Service discovery
curl -s http://localhost:8080/api/http/services | jq '.'

# Prometheus metrics
curl -s http://localhost:8080/metrics | grep traefik_

Performance Tuning

Resource Limits

  • Traefik: 1 CPU, 512MB RAM
  • Prometheus: 1 CPU, 1GB RAM
  • Grafana: 0.5 CPU, 512MB RAM
  • AlertManager: 0.2 CPU, 256MB RAM

Scaling Recommendations

  • Single Traefik instance per manager node
  • Prometheus data retention: 30 days
  • Log rotation: Daily, keep 7 days
  • Monitoring scrape interval: 15 seconds

Backup Strategy

Critical Data

  • /opt/traefik/letsencrypt/: TLS certificates
  • /opt/monitoring/prometheus/data/: Metrics data
  • /opt/monitoring/grafana/data/: Dashboards and config
  • /opt/monitoring/alertmanager/config/: Alert rules

Backup Script

#!/bin/bash
BACKUP_DIR="/backup/traefik-$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

tar -czf "$BACKUP_DIR/traefik-config.tar.gz" /opt/traefik/
tar -czf "$BACKUP_DIR/monitoring-config.tar.gz" /opt/monitoring/

Support and Documentation

Log Locations

  • Traefik Logs: /opt/traefik/logs/
  • Access Logs: /opt/traefik/logs/access.log
  • Service Logs: docker service logs traefik_traefik

Monitoring Queries

# Authentication failure rate
rate(traefik_service_requests_total{code=~"401|403"}[5m])

# Service availability
up{job="traefik"}

# Response time 95th percentile
histogram_quantile(0.95, rate(traefik_service_request_duration_seconds_bucket[5m]))

This deployment provides enterprise-grade Traefik configuration with comprehensive security, monitoring, and operational capabilities.