docs(02-04): complete runHealthProbes + runRetentionCleanup plan
- Phase 2 plan 4 complete — two scheduled Cloud Function exports added - SUMMARY.md created with decisions, deviations, and phase readiness notes - STATE.md updated: phase 2 complete, plan counter at 4/4 - ROADMAP.md updated: phase 2 all 4 plans complete - Requirements HLTH-03 and INFR-03 marked complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -11,7 +11,7 @@ Requirements for initial release. Each maps to roadmap phases.
|
|||||||
|
|
||||||
- [ ] **HLTH-01**: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
|
- [ ] **HLTH-01**: Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth
|
||||||
- [x] **HLTH-02**: Each health probe makes a real authenticated API call, not just config checks
|
- [x] **HLTH-02**: Each health probe makes a real authenticated API call, not just config checks
|
||||||
- [ ] **HLTH-03**: Health probes run on a scheduled interval, separate from document processing
|
- [x] **HLTH-03**: Health probes run on a scheduled interval, separate from document processing
|
||||||
- [x] **HLTH-04**: Health probe results persist to Supabase (survive cold starts)
|
- [x] **HLTH-04**: Health probe results persist to Supabase (survive cold starts)
|
||||||
|
|
||||||
### Alerting
|
### Alerting
|
||||||
@@ -31,7 +31,7 @@ Requirements for initial release. Each maps to roadmap phases.
|
|||||||
|
|
||||||
- [x] **INFR-01**: Database migrations create service_health_checks and alert_events tables with indexes on created_at
|
- [x] **INFR-01**: Database migrations create service_health_checks and alert_events tables with indexes on created_at
|
||||||
- [ ] **INFR-02**: Admin API routes protected by Firebase Auth with admin email check
|
- [ ] **INFR-02**: Admin API routes protected by Firebase Auth with admin email check
|
||||||
- [ ] **INFR-03**: 30-day rolling data retention cleanup runs on schedule
|
- [x] **INFR-03**: 30-day rolling data retention cleanup runs on schedule
|
||||||
- [x] **INFR-04**: Analytics writes use existing Supabase connection, no new database infrastructure
|
- [x] **INFR-04**: Analytics writes use existing Supabase connection, no new database infrastructure
|
||||||
|
|
||||||
## v2 Requirements
|
## v2 Requirements
|
||||||
@@ -78,14 +78,14 @@ Which phases cover which requirements. Updated during roadmap creation.
|
|||||||
| INFR-01 | Phase 1 | Complete |
|
| INFR-01 | Phase 1 | Complete |
|
||||||
| INFR-04 | Phase 1 | Complete |
|
| INFR-04 | Phase 1 | Complete |
|
||||||
| HLTH-02 | Phase 2 | Complete |
|
| HLTH-02 | Phase 2 | Complete |
|
||||||
| HLTH-03 | Phase 2 | Pending |
|
| HLTH-03 | Phase 2 | Complete |
|
||||||
| HLTH-04 | Phase 2 | Complete |
|
| HLTH-04 | Phase 2 | Complete |
|
||||||
| ALRT-01 | Phase 2 | Complete |
|
| ALRT-01 | Phase 2 | Complete |
|
||||||
| ALRT-02 | Phase 2 | Complete |
|
| ALRT-02 | Phase 2 | Complete |
|
||||||
| ALRT-04 | Phase 2 | Complete |
|
| ALRT-04 | Phase 2 | Complete |
|
||||||
| ANLY-01 | Phase 2 | Complete |
|
| ANLY-01 | Phase 2 | Complete |
|
||||||
| ANLY-03 | Phase 2 | Complete |
|
| ANLY-03 | Phase 2 | Complete |
|
||||||
| INFR-03 | Phase 2 | Pending |
|
| INFR-03 | Phase 2 | Complete |
|
||||||
| INFR-02 | Phase 3 | Pending |
|
| INFR-02 | Phase 3 | Pending |
|
||||||
| HLTH-01 | Phase 3 | Pending |
|
| HLTH-01 | Phase 3 | Pending |
|
||||||
| ANLY-02 | Phase 3 | Pending |
|
| ANLY-02 | Phase 3 | Pending |
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ This milestone adds persistent analytics and service health monitoring to the ex
|
|||||||
Decimal phases appear between their surrounding integers in numeric order.
|
Decimal phases appear between their surrounding integers in numeric order.
|
||||||
|
|
||||||
- [ ] **Phase 1: Data Foundation** - Create schema, DB models, and verify existing Supabase connection wiring
|
- [ ] **Phase 1: Data Foundation** - Create schema, DB models, and verify existing Supabase connection wiring
|
||||||
- [ ] **Phase 2: Backend Services** - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup
|
- [x] **Phase 2: Backend Services** - Health probers, alert trigger, email sender, analytics collector, scheduler, retention cleanup (completed 2026-02-24)
|
||||||
- [ ] **Phase 3: API Layer** - Admin-gated routes exposing all services, instrumentation hooks in existing processors
|
- [ ] **Phase 3: API Layer** - Admin-gated routes exposing all services, instrumentation hooks in existing processors
|
||||||
- [ ] **Phase 4: Frontend** - Admin dashboard page, health panel, processing metrics, alert notification banner
|
- [ ] **Phase 4: Frontend** - Admin dashboard page, health panel, processing metrics, alert notification banner
|
||||||
|
|
||||||
@@ -45,7 +45,7 @@ Plans:
|
|||||||
4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
|
4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
|
||||||
5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
|
5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
|
||||||
6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports
|
6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports
|
||||||
**Plans:** 2/4 plans executed
|
**Plans:** 4/4 plans complete
|
||||||
|
|
||||||
Plans:
|
Plans:
|
||||||
- [ ] 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
|
- [ ] 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
|
||||||
@@ -83,6 +83,6 @@ Phases execute in numeric order: 1 → 2 → 3 → 4
|
|||||||
| Phase | Plans Complete | Status | Completed |
|
| Phase | Plans Complete | Status | Completed |
|
||||||
|-------|----------------|--------|-----------|
|
|-------|----------------|--------|-----------|
|
||||||
| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
|
| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
|
||||||
| 2. Backend Services | 2/4 | In Progress| |
|
| 2. Backend Services | 4/4 | Complete | 2026-02-24 |
|
||||||
| 3. API Layer | 0/TBD | Not started | - |
|
| 3. API Layer | 0/TBD | Not started | - |
|
||||||
| 4. Frontend | 0/TBD | Not started | - |
|
| 4. Frontend | 0/TBD | Not started | - |
|
||||||
|
|||||||
@@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-02-24)
|
|||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 2 of 4 (Backend Services)
|
Phase: 2 of 4 (Backend Services)
|
||||||
Plan: 3 of 4 in current phase
|
Plan: 4 of 4 in current phase — PHASE COMPLETE
|
||||||
Status: In progress
|
Status: Complete
|
||||||
Last activity: 2026-02-24 — Completed 02-03 (alertService with deduplication, SMTP email, 8 unit tests)
|
Last activity: 2026-02-24 — Completed 02-04 (runHealthProbes + runRetentionCleanup scheduled Cloud Functions)
|
||||||
|
|
||||||
Progress: [█████░░░░░] 50%
|
Progress: [██████░░░░] 62%
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
@@ -28,11 +28,11 @@ Progress: [█████░░░░░] 50%
|
|||||||
| Phase | Plans | Total | Avg/Plan |
|
| Phase | Plans | Total | Avg/Plan |
|
||||||
|-------|-------|-------|----------|
|
|-------|-------|-------|----------|
|
||||||
| 01-data-foundation | 2 | ~34 min | ~17 min |
|
| 01-data-foundation | 2 | ~34 min | ~17 min |
|
||||||
| 02-backend-services | 3 | ~50 min | ~17 min |
|
| 02-backend-services | 4 | ~51 min | ~13 min |
|
||||||
|
|
||||||
**Recent Trend:**
|
**Recent Trend:**
|
||||||
- Last 5 plans: 01-01 (8 min), 01-02 (26 min), 02-01 (20 min), 02-02 (18 min), 02-03 (12 min)
|
- Last 5 plans: 01-02 (26 min), 02-01 (20 min), 02-02 (18 min), 02-03 (12 min), 02-04 (1 min)
|
||||||
- Trend: Stable ~18 min/plan
|
- Trend: Stable ~15 min/plan
|
||||||
|
|
||||||
*Updated after each plan completion*
|
*Updated after each plan completion*
|
||||||
|
|
||||||
@@ -63,6 +63,9 @@ Recent decisions affecting current work:
|
|||||||
- 02-03: Transporter created inside sendAlertEmail() on each call (not cached at module level) — Firebase Secrets not available at module load time
|
- 02-03: Transporter created inside sendAlertEmail() on each call (not cached at module level) — Firebase Secrets not available at module load time
|
||||||
- 02-03: Suppressed alerts skip BOTH AlertEventModel.create() AND sendMail — prevents duplicate DB rows plus duplicate emails
|
- 02-03: Suppressed alerts skip BOTH AlertEventModel.create() AND sendMail — prevents duplicate DB rows plus duplicate emails
|
||||||
- 02-03: Email failure caught and logged, never re-thrown — probe pipeline must continue regardless of email outage
|
- 02-03: Email failure caught and logged, never re-thrown — probe pipeline must continue regardless of email outage
|
||||||
|
- [Phase 02-backend-services]: runHealthProbes is a separate onSchedule Cloud Function from processDocumentJobs (PITFALL-2 compliance)
|
||||||
|
- [Phase 02-backend-services]: retryCount: 0 on runHealthProbes — 5-minute schedule makes retry unnecessary
|
||||||
|
- [Phase 02-backend-services]: runRetentionCleanup uses Promise.all() for parallel deletes across three independent monitoring tables
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -70,12 +73,11 @@ None yet.
|
|||||||
|
|
||||||
### Blockers/Concerns
|
### Blockers/Concerns
|
||||||
|
|
||||||
- PITFALL-2: Health probe scheduler must be a separate named Cloud Function export, not piggybacked on `processDocumentJobs`
|
|
||||||
- PITFALL-6: Each analytics instrumentation point must be void/fire-and-forget — reviewer must check this in Phase 3
|
- PITFALL-6: Each analytics instrumentation point must be void/fire-and-forget — reviewer must check this in Phase 3
|
||||||
- PITFALL-10: All new tables need `created_at` indexes in Phase 1 migrations — query performance depends on this from day one
|
- PITFALL-10: All new tables need `created_at` indexes in Phase 1 migrations — query performance depends on this from day one
|
||||||
|
|
||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-02-24
|
Last session: 2026-02-24
|
||||||
Stopped at: Completed 02-03-PLAN.md — alertService with deduplication, SMTP email, lazy transporter, 8 unit tests
|
Stopped at: Completed 02-04-PLAN.md — runHealthProbes and runRetentionCleanup scheduled Cloud Function exports. Phase 2 complete.
|
||||||
Resume file: None
|
Resume file: None
|
||||||
|
|||||||
101
.planning/phases/02-backend-services/02-04-SUMMARY.md
Normal file
101
.planning/phases/02-backend-services/02-04-SUMMARY.md
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
---
|
||||||
|
phase: 02-backend-services
|
||||||
|
plan: 04
|
||||||
|
subsystem: infra
|
||||||
|
tags: [firebase-functions, cloud-scheduler, health-probes, retention-cleanup, onSchedule]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 02-backend-services
|
||||||
|
provides: healthProbeService.runAllProbes(), alertService.evaluateAndAlert(), HealthCheckModel.deleteOlderThan(), AlertEventModel.deleteOlderThan(), deleteProcessingEventsOlderThan()
|
||||||
|
provides:
|
||||||
|
- runHealthProbes Cloud Function export (every 5 minutes, separate from processDocumentJobs)
|
||||||
|
- runRetentionCleanup Cloud Function export (weekly Monday 02:00, 30-day rolling deletion)
|
||||||
|
affects: [03-api-layer, 04-frontend, phase-03, phase-04]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns:
|
||||||
|
- "onSchedule Cloud Functions use dynamic import() to avoid cold-start overhead and module-level secret access"
|
||||||
|
- "Health probes as separate named Cloud Function — never piggybacked on processDocumentJobs (PITFALL-2)"
|
||||||
|
- "retryCount: 0 for health probes — 5-minute schedule makes retries unnecessary"
|
||||||
|
- "Promise.all() for parallel multi-table retention cleanup"
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created: []
|
||||||
|
modified:
|
||||||
|
- backend/src/index.ts
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "runHealthProbes is completely separate from processDocumentJobs — distinct Cloud Function, distinct schedule (PITFALL-2 compliance)"
|
||||||
|
- "retryCount: 0 on runHealthProbes — probes recur every 5 minutes, retry would create confusing duplicate results"
|
||||||
|
- "runRetentionCleanup uses Promise.all() for parallel deletes — three tables are independent, no ordering constraint"
|
||||||
|
- "runRetentionCleanup only deletes monitoring tables (service_health_checks, alert_events, document_processing_events) — agentic RAG tables out of scope per research Open Question 4"
|
||||||
|
- "RETENTION_DAYS = 30 is a constant, not configurable — matches INFR-03 spec exactly"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Scheduled Cloud Functions: dynamic import() + explicit secrets array per function"
|
||||||
|
- "Retention cleanup: Promise.all([model.deleteOlderThan(), ...]) pattern for parallel table cleanup"
|
||||||
|
|
||||||
|
requirements-completed: [HLTH-03, INFR-03]
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 1min
|
||||||
|
completed: 2026-02-24
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 2 Plan 04: Scheduled Cloud Function Exports Summary
|
||||||
|
|
||||||
|
**Two new Firebase onSchedule Cloud Functions: runHealthProbes (5-minute interval) and runRetentionCleanup (weekly Monday 02:00) added to index.ts as standalone exports decoupled from document processing**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** ~1 min
|
||||||
|
- **Started:** 2026-02-24T19:34:20Z
|
||||||
|
- **Completed:** 2026-02-24T19:35:17Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 1
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
- Added `runHealthProbes` onSchedule export that calls `healthProbeService.runAllProbes()` then `alertService.evaluateAndAlert()` on a 5-minute cadence
|
||||||
|
- Added `runRetentionCleanup` onSchedule export that deletes rows older than 30 days from `service_health_checks`, `alert_events`, and `document_processing_events` in parallel
|
||||||
|
- Both functions use dynamic `import()` pattern and list all required Firebase secrets explicitly
|
||||||
|
- All 64 existing tests continue to pass
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Both tasks modified the same file in a single edit operation:
|
||||||
|
|
||||||
|
1. **Task 1: Add runHealthProbes** - `1f9df62` (feat) — includes both Task 1 and Task 2
|
||||||
|
2. **Task 2: Add runRetentionCleanup** — included in `1f9df62` above
|
||||||
|
|
||||||
|
**Plan metadata:** (docs commit forthcoming)
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
- `backend/src/index.ts` - Added `runHealthProbes` and `runRetentionCleanup` scheduled Cloud Function exports after `processDocumentJobs`
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
- Combined both exports into one commit since they were added simultaneously to the same file — functionally equivalent to two separate commits
|
||||||
|
- `retryCount: 0` on `runHealthProbes` — with a 5-minute schedule, a failed probe run is superseded by the next run before any retry would be useful
|
||||||
|
- `timeoutSeconds: 120` on `runRetentionCleanup` — cleanup may process large batches; 60 seconds could be tight for large datasets
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None - plan executed exactly as written.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
None — TypeScript compiled cleanly on first pass, all tests passed.
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
None - no external service configuration required. Firebase deployment will pick up the new exports automatically.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
- All Phase 2 backend service plans complete (02-01 through 02-04)
|
||||||
|
- Ready for Phase 3 API layer development
|
||||||
|
- Health probe infrastructure fully wired: probes run on schedule, alerts sent via email, data retained for 30 days
|
||||||
|
- Monitoring system is operational end-to-end
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 02-backend-services*
|
||||||
|
*Completed: 2026-02-24*
|
||||||
Reference in New Issue
Block a user