HomeAudit/README.md

# HomeAudit - Infrastructure Migration and Monitoring

A comprehensive home infrastructure audit, migration, and monitoring system for Docker Swarm deployment.

## 🏗️ Infrastructure Overview

### **Current Deployment Status**
- **✅ Paperless Stack**: Paperless-NGX (port 8000) + Paperless AI (port 3000) on OMV800
- **✅ Monitoring Stack**: Prometheus + Grafana + Node Exporter + Blackbox Exporter
- **✅ Caddy Reverse Proxy**: SSL termination and domain routing
- **🔄 Migration Progress**: 85% complete

### **Device Inventory**
| Device | IP | Role | Status |
|--------|----|------|--------|
| OMV800 | 192.168.50.229 | Docker Swarm Manager | ✅ Active |
| Surface | 192.168.50.254 | Caddy Reverse Proxy | ✅ Active |
| jonathan-2518f5u | 192.168.50.181 | Worker Node | ✅ Active |
| lenovo420 | 192.168.50.66 | Worker Node | ✅ Active |
| audrey | 192.168.50.145 | Worker Node | ✅ Active |
| fedora | 192.168.50.225 | Worker Node | ✅ Active |

## 📊 Monitoring Stack

### **Components**
- **Prometheus** (port 9091): Metrics collection and storage
- **Grafana** (port 3002): Data visualization and dashboards
- **Node Exporter** (port 9100): System metrics collection
- **Blackbox Exporter** (port 9115): Service health monitoring

### **Metrics Coverage**
- **15 Active Targets**: Services, system, and health checks
- **784 Metrics**: Comprehensive infrastructure monitoring
- **Real-time Data**: 15-60 second scrape intervals
- **30-day Retention**: Historical trend analysis

### **Dashboards**
1. **Infrastructure Overview**: Service health and availability
2. **System Overview**: CPU, memory, disk, network monitoring

### **Access URLs**
- **Grafana**: https://grafana.pressmess.duckdns.org (admin/admin123)
- **Prometheus**: https://prometheus.pressmess.duckdns.org

## 🔧 Services Status

### **Active Services**
- **Paperless-NGX**: Document management (port 8000)
- **Paperless AI**: AI-powered document processing (port 3000)
- **Nextcloud**: File storage and sync (port 8081)
- **Home Assistant**: Home automation (port 8123)
- **Portainer**: Container management (port 9000)
- **AppFlowy**: Note-taking (port 9080)

### **Database Services**
- **PostgreSQL**: Primary database
- **MariaDB**: Secondary database
- **Redis**: Caching layer
- **Mosquitto**: MQTT broker

## 🚀 Quick Start

### **1. Access Monitoring**
```bash
# Grafana Dashboard
open https://grafana.pressmess.duckdns.org
# Login: admin / admin123

# Prometheus Metrics
open https://prometheus.pressmess.duckdns.org
```

### **2. Check Service Health**
```bash
# View all monitoring targets
curl "http://192.168.50.229:9091/api/v1/targets"

# Check system metrics
curl "http://192.168.50.229:9091/api/v1/query?query=up"
```

### **3. Monitor System Resources**
```bash
# CPU Usage
curl "http://192.168.50.229:9091/api/v1/query?query=100%20-%20(avg%20by%20(instance)%20(irate(node_cpu_seconds_total{mode=\"idle\"}[5m]))%20*%20100)"

# Memory Usage
curl "http://192.168.50.229:9091/api/v1/query?query=(1%20-%20(node_memory_MemAvailable_bytes%20/%20node_memory_MemTotal_bytes))%20*%20100"
```

## 📁 Project Structure

```
HomeAudit/
├── stacks/                    # Docker Swarm stacks
│   └── monitoring/           # Monitoring stack configuration
├── configs/                  # Configuration files
│   └── monitoring/          # Prometheus, Grafana configs
├── scripts/                  # Utility scripts
├── dev_documentation/        # Detailed documentation
└── comprehensive_discovery_results/  # Audit results
```

## 🔍 Monitoring Features

### **System Monitoring**
- **CPU Usage**: Per-core and overall utilization
- **Memory Usage**: Total, available, cached, buffers
- **Disk Usage**: Space, I/O, mount points
- **Network I/O**: Bytes sent/received per interface
- **System Load**: 1m, 5m, 15m averages

### **Service Monitoring**
- **HTTP Health Checks**: Web service availability
- **TCP Health Checks**: Database and backend services
- **Response Times**: Service performance tracking
- **Availability Metrics**: Uptime and reliability

### **Infrastructure Monitoring**
- **Docker Swarm**: Service health and resource usage
- **Container Metrics**: Resource consumption per container
- **Network Connectivity**: Inter-service communication
- **Hardware Health**: System temperature and status

## 🛠️ Maintenance

### **Update Monitoring Stack**
```bash
# Deploy updated configuration
ssh root@192.168.50.229 "cd /opt/stacks/monitoring && docker stack deploy -c final-monitoring.yml monitoring"

# Check service status
ssh root@192.168.50.229 "docker service ls | grep monitoring"
```

### **View Logs**
```bash
# Prometheus logs
ssh root@192.168.50.229 "docker service logs monitoring_prometheus"

# Grafana logs
ssh root@192.168.50.229 "docker service logs monitoring_grafana"
```

## 📈 Performance Metrics

### **Current System Specs**
- **Total Memory**: 31GB
- **CPU Cores**: Multi-core system
- **Storage**: SSD-based storage
- **Network**: Gigabit connectivity

### **Monitoring Performance**
- **Scrape Interval**: 15-60 seconds
- **Data Retention**: 30 days
- **Metrics Count**: 784 different metrics
- **Target Health**: 15/15 targets healthy

## 🔮 Future Enhancements

### **Planned Improvements**
1. **AlertManager**: Smart alerting and notifications
2. **cAdvisor**: Container resource monitoring
3. **Application Exporters**: Database and service-specific metrics
4. **Centralized Logging**: Log aggregation and analysis

### **Optional Enhancements**
1. **Distributed Tracing**: Request flow tracking
2. **APM**: Application performance monitoring
3. **Synthetic Monitoring**: User journey testing
4. **Automated Incident Response**: Self-healing infrastructure

## 📞 Support

For issues or questions:
1. Check the monitoring dashboards for system health
2. Review service logs for error details
3. Consult the comprehensive documentation in `dev_documentation/`
4. Check the migration status in `comprehensive_discovery_results/`

---

**Last Updated**: August 30, 2025
**Monitoring Status**: ✅ Fully Operational
**Migration Progress**: 85% Complete