COMPREHENSIVE CHANGES: INFRASTRUCTURE MIGRATION: - Migrated services to Docker Swarm on OMV800 (192.168.50.229) - Deployed PostgreSQL database for Vaultwarden migration - Updated all stack configurations for Docker Swarm compatibility - Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox) - Implemented proper secret management for all services VAULTWARDEN POSTGRESQL MIGRATION: - Attempted migration from SQLite to PostgreSQL for NFS compatibility - Created PostgreSQL stack with proper user/password configuration - Built custom Vaultwarden image with PostgreSQL support - Troubleshot persistent SQLite fallback issue despite PostgreSQL config - Identified known issue where Vaultwarden silently falls back to SQLite - Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues - Current status: Old Vaultwarden on lenovo410 still working, new one has config issues PAPERLESS SERVICES: - Successfully deployed Paperless-NGX and Paperless-AI on OMV800 - Both services running on ports 8000 and 3000 respectively - Caddy configuration updated for external access - Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org CADDY CONFIGURATION: - Updated Caddyfile on Surface (192.168.50.254) for new service locations - Fixed Vaultwarden reverse proxy to point to new Docker Swarm service - Removed old notification hub reference that was causing conflicts - All services properly configured for external access via DuckDNS BACKUP AND DISCOVERY: - Created comprehensive backup system for all hosts - Generated detailed discovery reports for infrastructure analysis - Implemented automated backup validation scripts - Created migration progress tracking and verification reports MONITORING STACK: - Deployed Prometheus, Grafana, and Blackbox monitoring - Created infrastructure and system overview dashboards - Added proper service discovery and alerting configuration - Implemented performance monitoring for all critical services DOCUMENTATION: - Reorganized documentation into logical structure - Created comprehensive migration playbook and troubleshooting guides - Added hardware specifications and optimization recommendations - Documented all configuration changes and service dependencies CURRENT STATUS: - Paperless services: ✅ Working and accessible externally - Vaultwarden: ❌ PostgreSQL configuration issues, old instance still working - Monitoring: ✅ Deployed and operational - Caddy: ✅ Updated and working for external access - PostgreSQL: ✅ Database running, connection issues with Vaultwarden NEXT STEPS: - Continue troubleshooting Vaultwarden PostgreSQL configuration - Consider alternative approaches for Vaultwarden migration - Validate all external service access - Complete final migration validation TECHNICAL NOTES: - Used Docker Swarm for orchestration on OMV800 - Implemented proper secret management for sensitive data - Added comprehensive logging and monitoring - Created automated backup and validation scripts
141 lines
4.5 KiB
Markdown
141 lines
4.5 KiB
Markdown
# IMAGE PINNING PLAN - CURRENT STATE
|
|
**Purpose:** Eliminate non-deterministic `:latest` pulls and ensure reproducible deployments across hosts by pinning images to immutable digests.
|
|
|
|
**Status:** ✅ **SCRIPT AVAILABLE - READY FOR IMPLEMENTATION**
|
|
**Next Action:** Generate `image-digest-lock.yaml` from current running containers
|
|
|
|
---
|
|
|
|
## 🎯 **WHY DIGESTS INSTEAD OF TAGS**
|
|
- Tags can move; digests are immutable
|
|
- Works even when upstream versioning varies across services
|
|
- Zero guesswork about "which stable version" for every image
|
|
|
|
## 📋 **CURRENT SCOPE (FROM AUDIT)**
|
|
The audit flagged many containers using `:latest` across all hosts:
|
|
- `portainer`, `watchtower`, `duckdns`, `paperless-ai`
|
|
- `mosquitto`, `vaultwarden`, `zwave-js-ui`, `n8n`
|
|
- `esphome`, `dozzle`, `uptime-kuma`
|
|
- Several AppFlowy images and others
|
|
|
|
**Target:** Pin all images actually in use on each host, not just those tagged `:latest`.
|
|
|
|
## ✅ **CURRENT STATUS**
|
|
|
|
### **Script Available**
|
|
- ✅ **`migration_scripts/scripts/generate_image_digest_lock.sh`** - Available and executable
|
|
- ⚠️ **`image-digest-lock.yaml`** - Ready to be generated
|
|
|
|
### **Required Actions**
|
|
1. **Generate initial lock file** from current running containers (Priority: HIGH)
|
|
2. **Update stack files** to use digest references
|
|
3. **Integrate into deployment pipeline**
|
|
|
|
## 📦 **DELIVERABLES READY**
|
|
|
|
### **1. Script Available**
|
|
```bash
|
|
# File: migration_scripts/scripts/generate_image_digest_lock.sh
|
|
# Purpose: Gathers exact digests for images running on specified hosts
|
|
# Output: image-digest-lock.yaml with canonical mapping
|
|
# Status: ✅ AVAILABLE AND EXECUTABLE
|
|
```
|
|
|
|
### **2. Lock File Structure**
|
|
```yaml
|
|
# image-digest-lock.yaml
|
|
# Canonical mapping of image:tag -> image@sha256:<digest> per host
|
|
hosts:
|
|
omv800:
|
|
portainer:portainer:latest: "portainer/portainer@sha256:abc123..."
|
|
watchtower:latest: "containrrr/watchtower@sha256:def456..."
|
|
surface:
|
|
# ... other hosts and images
|
|
```
|
|
|
|
## 🔧 **USAGE (SCRIPT READY)**
|
|
|
|
### **Step 1: Generate Lock File**
|
|
```bash
|
|
bash migration_scripts/scripts/generate_image_digest_lock.sh \
|
|
--hosts "omv800 jonathan-2518f5u surface fedora audrey" \
|
|
--output image-digest-lock.yaml
|
|
```
|
|
|
|
### **Step 2: Review Lock File**
|
|
```bash
|
|
cat image-digest-lock.yaml
|
|
```
|
|
|
|
### **Step 3: Apply Digests During Deployment**
|
|
- For Swarm stacks and Compose files, use digest form: `repo/image@sha256:<digest>`
|
|
- When generating stacks from automation, resolve `image:tag` via lock file
|
|
- If digest not present, fail closed or explicitly pull and lock
|
|
|
|
## 📅 **ROLLOUT STRATEGY**
|
|
|
|
### **Phase A: Foundation (Ready)**
|
|
- ✅ Create `generate_image_digest_lock.sh` script
|
|
- ⏳ Lock currently running images to capture consistent baseline
|
|
- ⏳ Generate initial `image-digest-lock.yaml`
|
|
|
|
### **Phase B: Implementation**
|
|
- [ ] Update internal Compose/Stack definitions to use digests
|
|
- [ ] Start with critical services (DNS, HA, Databases)
|
|
- [ ] Apply to remaining services
|
|
|
|
### **Phase C: Automation**
|
|
- [ ] Integrate lock resolution into CI/deploy scripts
|
|
- [ ] New services automatically pin digests at deploy time
|
|
|
|
## 🔄 **RENEWAL POLICY**
|
|
- Regenerate lock weekly or on change windows
|
|
- Only adopt updated digests after services pass health checks in canary
|
|
- Keep human-readable tags alongside digest for context
|
|
|
|
## 📝 **NOTES**
|
|
- For images with strict vendor guidance (e.g., Home Assistant), prefer vendor-recommended channels
|
|
- Still pin by digest for deployment consistency
|
|
- Script is **READY** for implementation
|
|
|
|
## 🚀 **IMMEDIATE NEXT STEPS**
|
|
|
|
### **Generate Initial Lock File**
|
|
```bash
|
|
# Generate lock file from current running containers
|
|
bash migration_scripts/scripts/generate_image_digest_lock.sh \
|
|
--hosts "omv800 jonathan-2518f5u surface fedora audrey" \
|
|
--output image-digest-lock.yaml
|
|
|
|
# Review the generated lock file
|
|
cat image-digest-lock.yaml
|
|
|
|
# Validate the lock file structure
|
|
python3 -c "
|
|
import yaml
|
|
with open('image-digest-lock.yaml', 'r') as f:
|
|
data = yaml.safe_load(f)
|
|
print(f'Lock file contains {len(data.get(\"hosts\", {}))} hosts')
|
|
for host, images in data.get('hosts', {}).items():
|
|
print(f'{host}: {len(images)} images')
|
|
"
|
|
```
|
|
|
|
### **Update Stack Files**
|
|
```bash
|
|
# Example: Update a stack file to use digests
|
|
# Before: image: portainer/portainer:latest
|
|
# After: image: portainer/portainer@sha256:abc123...
|
|
|
|
# Process all stack files
|
|
find stacks/ -name "*.yml" -exec sed -i 's/image: \(.*\):latest/image: \1@sha256:DIGEST/g' {} \;
|
|
```
|
|
|
|
---
|
|
|
|
**Plan Status:** ✅ READY FOR IMPLEMENTATION
|
|
**Last Updated:** 2025-08-29
|
|
**Next Review:** After initial lock file generation
|
|
|
|
|