Files
HomeAudit/dev_documentation/automation/IMAGE_PINNING_PLAN.md
admin 705a2757c1 Major infrastructure migration and Vaultwarden PostgreSQL troubleshooting
COMPREHENSIVE CHANGES:

INFRASTRUCTURE MIGRATION:
- Migrated services to Docker Swarm on OMV800 (192.168.50.229)
- Deployed PostgreSQL database for Vaultwarden migration
- Updated all stack configurations for Docker Swarm compatibility
- Added comprehensive monitoring stack (Prometheus, Grafana, Blackbox)
- Implemented proper secret management for all services

VAULTWARDEN POSTGRESQL MIGRATION:
- Attempted migration from SQLite to PostgreSQL for NFS compatibility
- Created PostgreSQL stack with proper user/password configuration
- Built custom Vaultwarden image with PostgreSQL support
- Troubleshot persistent SQLite fallback issue despite PostgreSQL config
- Identified known issue where Vaultwarden silently falls back to SQLite
- Added ENABLE_DB_WAL=false to prevent filesystem compatibility issues
- Current status: Old Vaultwarden on lenovo410 still working, new one has config issues

PAPERLESS SERVICES:
- Successfully deployed Paperless-NGX and Paperless-AI on OMV800
- Both services running on ports 8000 and 3000 respectively
- Caddy configuration updated for external access
- Services accessible via paperless.pressmess.duckdns.org and paperless-ai.pressmess.duckdns.org

CADDY CONFIGURATION:
- Updated Caddyfile on Surface (192.168.50.254) for new service locations
- Fixed Vaultwarden reverse proxy to point to new Docker Swarm service
- Removed old notification hub reference that was causing conflicts
- All services properly configured for external access via DuckDNS

BACKUP AND DISCOVERY:
- Created comprehensive backup system for all hosts
- Generated detailed discovery reports for infrastructure analysis
- Implemented automated backup validation scripts
- Created migration progress tracking and verification reports

MONITORING STACK:
- Deployed Prometheus, Grafana, and Blackbox monitoring
- Created infrastructure and system overview dashboards
- Added proper service discovery and alerting configuration
- Implemented performance monitoring for all critical services

DOCUMENTATION:
- Reorganized documentation into logical structure
- Created comprehensive migration playbook and troubleshooting guides
- Added hardware specifications and optimization recommendations
- Documented all configuration changes and service dependencies

CURRENT STATUS:
- Paperless services:  Working and accessible externally
- Vaultwarden:  PostgreSQL configuration issues, old instance still working
- Monitoring:  Deployed and operational
- Caddy:  Updated and working for external access
- PostgreSQL:  Database running, connection issues with Vaultwarden

NEXT STEPS:
- Continue troubleshooting Vaultwarden PostgreSQL configuration
- Consider alternative approaches for Vaultwarden migration
- Validate all external service access
- Complete final migration validation

TECHNICAL NOTES:
- Used Docker Swarm for orchestration on OMV800
- Implemented proper secret management for sensitive data
- Added comprehensive logging and monitoring
- Created automated backup and validation scripts
2025-08-30 20:18:44 -04:00

141 lines
4.5 KiB
Markdown

# IMAGE PINNING PLAN - CURRENT STATE
**Purpose:** Eliminate non-deterministic `:latest` pulls and ensure reproducible deployments across hosts by pinning images to immutable digests.
**Status:****SCRIPT AVAILABLE - READY FOR IMPLEMENTATION**
**Next Action:** Generate `image-digest-lock.yaml` from current running containers
---
## 🎯 **WHY DIGESTS INSTEAD OF TAGS**
- Tags can move; digests are immutable
- Works even when upstream versioning varies across services
- Zero guesswork about "which stable version" for every image
## 📋 **CURRENT SCOPE (FROM AUDIT)**
The audit flagged many containers using `:latest` across all hosts:
- `portainer`, `watchtower`, `duckdns`, `paperless-ai`
- `mosquitto`, `vaultwarden`, `zwave-js-ui`, `n8n`
- `esphome`, `dozzle`, `uptime-kuma`
- Several AppFlowy images and others
**Target:** Pin all images actually in use on each host, not just those tagged `:latest`.
## ✅ **CURRENT STATUS**
### **Script Available**
-**`migration_scripts/scripts/generate_image_digest_lock.sh`** - Available and executable
- ⚠️ **`image-digest-lock.yaml`** - Ready to be generated
### **Required Actions**
1. **Generate initial lock file** from current running containers (Priority: HIGH)
2. **Update stack files** to use digest references
3. **Integrate into deployment pipeline**
## 📦 **DELIVERABLES READY**
### **1. Script Available**
```bash
# File: migration_scripts/scripts/generate_image_digest_lock.sh
# Purpose: Gathers exact digests for images running on specified hosts
# Output: image-digest-lock.yaml with canonical mapping
# Status: ✅ AVAILABLE AND EXECUTABLE
```
### **2. Lock File Structure**
```yaml
# image-digest-lock.yaml
# Canonical mapping of image:tag -> image@sha256:<digest> per host
hosts:
omv800:
portainer:portainer:latest: "portainer/portainer@sha256:abc123..."
watchtower:latest: "containrrr/watchtower@sha256:def456..."
surface:
# ... other hosts and images
```
## 🔧 **USAGE (SCRIPT READY)**
### **Step 1: Generate Lock File**
```bash
bash migration_scripts/scripts/generate_image_digest_lock.sh \
--hosts "omv800 jonathan-2518f5u surface fedora audrey" \
--output image-digest-lock.yaml
```
### **Step 2: Review Lock File**
```bash
cat image-digest-lock.yaml
```
### **Step 3: Apply Digests During Deployment**
- For Swarm stacks and Compose files, use digest form: `repo/image@sha256:<digest>`
- When generating stacks from automation, resolve `image:tag` via lock file
- If digest not present, fail closed or explicitly pull and lock
## 📅 **ROLLOUT STRATEGY**
### **Phase A: Foundation (Ready)**
- ✅ Create `generate_image_digest_lock.sh` script
- ⏳ Lock currently running images to capture consistent baseline
- ⏳ Generate initial `image-digest-lock.yaml`
### **Phase B: Implementation**
- [ ] Update internal Compose/Stack definitions to use digests
- [ ] Start with critical services (DNS, HA, Databases)
- [ ] Apply to remaining services
### **Phase C: Automation**
- [ ] Integrate lock resolution into CI/deploy scripts
- [ ] New services automatically pin digests at deploy time
## 🔄 **RENEWAL POLICY**
- Regenerate lock weekly or on change windows
- Only adopt updated digests after services pass health checks in canary
- Keep human-readable tags alongside digest for context
## 📝 **NOTES**
- For images with strict vendor guidance (e.g., Home Assistant), prefer vendor-recommended channels
- Still pin by digest for deployment consistency
- Script is **READY** for implementation
## 🚀 **IMMEDIATE NEXT STEPS**
### **Generate Initial Lock File**
```bash
# Generate lock file from current running containers
bash migration_scripts/scripts/generate_image_digest_lock.sh \
--hosts "omv800 jonathan-2518f5u surface fedora audrey" \
--output image-digest-lock.yaml
# Review the generated lock file
cat image-digest-lock.yaml
# Validate the lock file structure
python3 -c "
import yaml
with open('image-digest-lock.yaml', 'r') as f:
data = yaml.safe_load(f)
print(f'Lock file contains {len(data.get(\"hosts\", {}))} hosts')
for host, images in data.get('hosts', {}).items():
print(f'{host}: {len(images)} images')
"
```
### **Update Stack Files**
```bash
# Example: Update a stack file to use digests
# Before: image: portainer/portainer:latest
# After: image: portainer/portainer@sha256:abc123...
# Process all stack files
find stacks/ -name "*.yml" -exec sed -i 's/image: \(.*\):latest/image: \1@sha256:DIGEST/g' {} \;
```
---
**Plan Status:** ✅ READY FOR IMPLEMENTATION
**Last Updated:** 2025-08-29
**Next Review:** After initial lock file generation