docs(02-03): complete alertService plan
- SUMMARY.md with deduplication, lazy transporter, and email decisions - STATE.md: plan 3/4, 50% progress, decisions recorded - ROADMAP.md: phase 02 updated (3/4 summaries) - REQUIREMENTS.md: ALRT-01, ALRT-02, ALRT-04 marked complete
This commit is contained in:
@@ -16,10 +16,10 @@ Requirements for initial release. Each maps to roadmap phases.
|
|||||||
|
|
||||||
### Alerting
|
### Alerting
|
||||||
|
|
||||||
- [ ] **ALRT-01**: Admin receives email alert when a service goes down or degrades
|
- [x] **ALRT-01**: Admin receives email alert when a service goes down or degrades
|
||||||
- [ ] **ALRT-02**: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
|
- [x] **ALRT-02**: Alert deduplication prevents repeat emails for the same ongoing issue (cooldown period)
|
||||||
- [ ] **ALRT-03**: Admin sees in-app alert banner for active critical issues
|
- [ ] **ALRT-03**: Admin sees in-app alert banner for active critical issues
|
||||||
- [ ] **ALRT-04**: Alert recipient stored as configuration, not hardcoded
|
- [x] **ALRT-04**: Alert recipient stored as configuration, not hardcoded
|
||||||
|
|
||||||
### Processing Analytics
|
### Processing Analytics
|
||||||
|
|
||||||
@@ -80,9 +80,9 @@ Which phases cover which requirements. Updated during roadmap creation.
|
|||||||
| HLTH-02 | Phase 2 | Complete |
|
| HLTH-02 | Phase 2 | Complete |
|
||||||
| HLTH-03 | Phase 2 | Pending |
|
| HLTH-03 | Phase 2 | Pending |
|
||||||
| HLTH-04 | Phase 2 | Complete |
|
| HLTH-04 | Phase 2 | Complete |
|
||||||
| ALRT-01 | Phase 2 | Pending |
|
| ALRT-01 | Phase 2 | Complete |
|
||||||
| ALRT-02 | Phase 2 | Pending |
|
| ALRT-02 | Phase 2 | Complete |
|
||||||
| ALRT-04 | Phase 2 | Pending |
|
| ALRT-04 | Phase 2 | Complete |
|
||||||
| ANLY-01 | Phase 2 | Complete |
|
| ANLY-01 | Phase 2 | Complete |
|
||||||
| ANLY-03 | Phase 2 | Complete |
|
| ANLY-03 | Phase 2 | Complete |
|
||||||
| INFR-03 | Phase 2 | Pending |
|
| INFR-03 | Phase 2 | Pending |
|
||||||
|
|||||||
@@ -10,28 +10,28 @@ See: .planning/PROJECT.md (updated 2026-02-24)
|
|||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 2 of 4 (Backend Services)
|
Phase: 2 of 4 (Backend Services)
|
||||||
Plan: 2 of TBD in current phase
|
Plan: 3 of 4 in current phase
|
||||||
Status: In progress
|
Status: In progress
|
||||||
Last activity: 2026-02-24 — Completed 02-02 (healthProbeService with 4 probers + 9 unit tests)
|
Last activity: 2026-02-24 — Completed 02-03 (alertService with deduplication, SMTP email, 8 unit tests)
|
||||||
|
|
||||||
Progress: [████░░░░░░] 40%
|
Progress: [█████░░░░░] 50%
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
**Velocity:**
|
**Velocity:**
|
||||||
- Total plans completed: 4
|
- Total plans completed: 5
|
||||||
- Average duration: ~18 min
|
- Average duration: ~17 min
|
||||||
- Total execution time: ~1.2 hours
|
- Total execution time: ~1.4 hours
|
||||||
|
|
||||||
**By Phase:**
|
**By Phase:**
|
||||||
|
|
||||||
| Phase | Plans | Total | Avg/Plan |
|
| Phase | Plans | Total | Avg/Plan |
|
||||||
|-------|-------|-------|----------|
|
|-------|-------|-------|----------|
|
||||||
| 01-data-foundation | 2 | ~34 min | ~17 min |
|
| 01-data-foundation | 2 | ~34 min | ~17 min |
|
||||||
| 02-backend-services | 2 | ~38 min | ~19 min |
|
| 02-backend-services | 3 | ~50 min | ~17 min |
|
||||||
|
|
||||||
**Recent Trend:**
|
**Recent Trend:**
|
||||||
- Last 5 plans: 01-01 (8 min), 01-02 (26 min), 02-01 (20 min), 02-02 (18 min)
|
- Last 5 plans: 01-01 (8 min), 01-02 (26 min), 02-01 (20 min), 02-02 (18 min), 02-03 (12 min)
|
||||||
- Trend: Stable ~18 min/plan
|
- Trend: Stable ~18 min/plan
|
||||||
|
|
||||||
*Updated after each plan completion*
|
*Updated after each plan completion*
|
||||||
@@ -60,6 +60,9 @@ Recent decisions affecting current work:
|
|||||||
- 02-02: Promise.allSettled for probe orchestration — all 4 probes run even if one throws outside its own try/catch
|
- 02-02: Promise.allSettled for probe orchestration — all 4 probes run even if one throws outside its own try/catch
|
||||||
- 02-02: Per-probe HealthCheckModel.create failure swallowed with logger.error — probe results still returned to caller
|
- 02-02: Per-probe HealthCheckModel.create failure swallowed with logger.error — probe results still returned to caller
|
||||||
- [Phase 02-backend-services]: 02-01: recordProcessingEvent return type is void (not Promise<void>) — type system prevents accidental await on critical path
|
- [Phase 02-backend-services]: 02-01: recordProcessingEvent return type is void (not Promise<void>) — type system prevents accidental await on critical path
|
||||||
|
- 02-03: Transporter created inside sendAlertEmail() on each call (not cached at module level) — Firebase Secrets not available at module load time
|
||||||
|
- 02-03: Suppressed alerts skip BOTH AlertEventModel.create() AND sendMail — prevents duplicate DB rows plus duplicate emails
|
||||||
|
- 02-03: Email failure caught and logged, never re-thrown — probe pipeline must continue regardless of email outage
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -74,5 +77,5 @@ None yet.
|
|||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-02-24
|
Last session: 2026-02-24
|
||||||
Stopped at: Completed 02-02-PLAN.md — healthProbeService with 4 probers + 9 unit tests (nodemailer installed)
|
Stopped at: Completed 02-03-PLAN.md — alertService with deduplication, SMTP email, lazy transporter, 8 unit tests
|
||||||
Resume file: None
|
Resume file: None
|
||||||
|
|||||||
124
.planning/phases/02-backend-services/02-03-SUMMARY.md
Normal file
124
.planning/phases/02-backend-services/02-03-SUMMARY.md
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
---
|
||||||
|
phase: 02-backend-services
|
||||||
|
plan: 03
|
||||||
|
subsystem: infra
|
||||||
|
tags: [nodemailer, smtp, alerting, deduplication, email, vitest]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 02-backend-services
|
||||||
|
provides: "AlertEventModel with findRecentByService() and create() for deduplication"
|
||||||
|
- phase: 02-backend-services
|
||||||
|
provides: "ProbeResult type from healthProbeService for alert evaluation"
|
||||||
|
provides:
|
||||||
|
- "alertService with evaluateAndAlert(probeResults) — deduplication, row creation, email send"
|
||||||
|
- "SMTP email via nodemailer with lazy transporter (Firebase Secret timing safe)"
|
||||||
|
- "Config-based recipient via process.env.EMAIL_WEEKLY_RECIPIENT (never hardcoded)"
|
||||||
|
- "8 unit tests covering all alert scenarios and edge cases"
|
||||||
|
affects: [02-04-scheduler, 03-api]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns:
|
||||||
|
- "Lazy transporter pattern: nodemailer.createTransport() called inside function, not at module level (Firebase Secret timing)"
|
||||||
|
- "Alert deduplication: findRecentByService() cooldown check before row creation AND email"
|
||||||
|
- "Non-throwing email: catch email errors, log them, never re-throw (probe pipeline safety)"
|
||||||
|
- "vi.mock factories with inline vi.fn() only — no outer variable references to avoid TDZ hoisting"
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- backend/src/services/alertService.ts
|
||||||
|
- backend/src/__tests__/unit/alertService.test.ts
|
||||||
|
modified: []
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "Transporter created inside sendAlertEmail() on each call — not at module level — avoids Firebase Secret not-yet-available error (PITFALL A)"
|
||||||
|
- "Suppressed alerts skip BOTH AlertEventModel.create() AND sendMail — prevents duplicate DB rows in addition to duplicate emails"
|
||||||
|
- "Email failure caught in try/catch and logged via logger.error — never re-thrown so probe pipeline continues"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Alert deduplication pattern: check findRecentByService before creating row or sending email"
|
||||||
|
- "Non-throwing side effects: email, analytics, and similar fire-and-forget paths must never throw"
|
||||||
|
|
||||||
|
requirements-completed: [ALRT-01, ALRT-02, ALRT-04]
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 12min
|
||||||
|
completed: 2026-02-24
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 2 Plan 03: Alert Service Summary
|
||||||
|
|
||||||
|
**Nodemailer SMTP alert service with cooldown deduplication via AlertEventModel, config-based recipient, and lazy transporter pattern for Firebase Secret compatibility**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 12 min
|
||||||
|
- **Started:** 2026-02-24T19:27:42Z
|
||||||
|
- **Completed:** 2026-02-24T19:39:30Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 2
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
|
||||||
|
- `alertService.evaluateAndAlert()` evaluates ProbeResults and sends email alerts for degraded/down services
|
||||||
|
- Deduplication via `AlertEventModel.findRecentByService()` with configurable `ALERT_COOLDOWN_MINUTES` env var
|
||||||
|
- Email recipient read from `process.env.EMAIL_WEEKLY_RECIPIENT` — never hardcoded (ALRT-04)
|
||||||
|
- Lazy transporter pattern: `nodemailer.createTransport()` called inside `sendAlertEmail()` function (Firebase Secret timing fix)
|
||||||
|
- 8 unit tests cover all alert scenarios: healthy skip, down/degraded alerts, deduplication, recipient config, missing recipient, email failure, and multi-probe processing
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Each task was committed atomically:
|
||||||
|
|
||||||
|
1. **Task 1: Create alertService with deduplication and email** - `91f609c` (feat)
|
||||||
|
2. **Task 2: Create alertService unit tests** - `4b5afe2` (test)
|
||||||
|
|
||||||
|
**Plan metadata:** (docs commit hash TBD)
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `backend/src/services/alertService.ts` - Alert evaluation, deduplication, and email delivery
|
||||||
|
- `backend/src/__tests__/unit/alertService.test.ts` - 8 unit tests, all passing
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- **Lazy transporter:** `nodemailer.createTransport()` called inside `sendAlertEmail()` on each call, not cached at module level. This is required because Firebase Secrets (`EMAIL_PASS`) are not injected into `process.env` at module load time — only when the function is invoked.
|
||||||
|
- **Suppress both row and email:** When `findRecentByService()` returns a non-null alert, both `AlertEventModel.create()` and `sendMail` are skipped. This prevents duplicate DB rows in the alert_events table in addition to preventing duplicate emails.
|
||||||
|
- **Non-throwing email path:** Email send failures are caught in try/catch and logged via `logger.error`. The function never re-throws, so email outages cannot break the health probe pipeline.
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
### Auto-fixed Issues
|
||||||
|
|
||||||
|
**1. [Rule 3 - Blocking] Restructured nodemailer mock to avoid Vitest TDZ hoisting error**
|
||||||
|
- **Found during:** Task 2 (alertService unit tests)
|
||||||
|
- **Issue:** Test file declared `const mockSendMail = vi.fn()` outside the `vi.mock()` factory and referenced it inside. Because `vi.mock()` is hoisted to the top of the file, `mockSendMail` was accessed before initialization, causing `ReferenceError: Cannot access 'mockSendMail' before initialization`
|
||||||
|
- **Fix:** Removed the outer `mockSendMail` variable. The nodemailer mock factory uses only inline `vi.fn()` calls. Tests access the mock's `sendMail` via `vi.mocked(nodemailer.createTransport).mock.results[0].value` through a `getMockSendMail()` helper. This is consistent with the project decision: "vi.mock() factories must use only inline vi.fn() to avoid Vitest hoisting TDZ errors" (established in 01-02)
|
||||||
|
- **Files modified:** `backend/src/__tests__/unit/alertService.test.ts`
|
||||||
|
- **Verification:** All 8 tests pass after fix
|
||||||
|
- **Committed in:** `4b5afe2` (Task 2 commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Total deviations:** 1 auto-fixed (1 blocking — Vitest TDZ hoisting)
|
||||||
|
**Impact on plan:** Required fix for tests to run. No scope creep. Consistent with established project pattern from 01-02.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
|
||||||
|
None beyond the auto-fixed TDZ hoisting issue above.
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
|
||||||
|
None - no external service configuration required beyond the existing email env vars (`EMAIL_HOST`, `EMAIL_PORT`, `EMAIL_SECURE`, `EMAIL_USER`, `EMAIL_PASS`, `EMAIL_WEEKLY_RECIPIENT`, `ALERT_COOLDOWN_MINUTES`) documented in prior research.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
|
||||||
|
- `alertService.evaluateAndAlert()` ready to be called from the health probe scheduler (Plan 02-04)
|
||||||
|
- All 3 alert requirements satisfied: ALRT-01 (email on degraded/down), ALRT-02 (cooldown deduplication), ALRT-04 (recipient from config)
|
||||||
|
- No blockers for Phase 2 Plan 04 (scheduler)
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 02-backend-services*
|
||||||
|
*Completed: 2026-02-24*
|
||||||
Reference in New Issue
Block a user