chore: complete v1.0 Analytics & Monitoring milestone
Archive milestone artifacts (roadmap, requirements, audit, phase directories) to .planning/milestones/. Evolve PROJECT.md with validated requirements and decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,157 @@
|
||||
---
|
||||
phase: 02-backend-services
|
||||
verified: 2026-02-24T14:38:30Z
|
||||
status: passed
|
||||
score: 14/14 must-haves verified
|
||||
re_verification: false
|
||||
---
|
||||
|
||||
# Phase 2: Backend Services Verification Report
|
||||
|
||||
**Phase Goal:** All monitoring logic runs correctly — health probes make real API calls, alerts fire with deduplication, analytics events write non-blocking to Supabase, and data is cleaned up on schedule
|
||||
**Verified:** 2026-02-24T14:38:30Z
|
||||
**Status:** PASSED
|
||||
**Re-verification:** No — initial verification
|
||||
|
||||
---
|
||||
|
||||
## Goal Achievement
|
||||
|
||||
### Observable Truths
|
||||
|
||||
| # | Truth | Status | Evidence |
|
||||
|----|-------|--------|----------|
|
||||
| 1 | `recordProcessingEvent()` writes to `document_processing_events` table via Supabase | VERIFIED | `analyticsService.ts:34` — `void supabase.from('document_processing_events').insert(...)` |
|
||||
| 2 | `recordProcessingEvent()` returns `void` (not `Promise`) so callers cannot accidentally await it | VERIFIED | `analyticsService.ts:31` — `export function recordProcessingEvent(data: ProcessingEventData): void` |
|
||||
| 3 | A deliberate Supabase write failure logs an error but does not throw or reject | VERIFIED | `analyticsService.ts:45-52` — `.then(({ error }) => { if (error) logger.error(...) })` — no rethrow; test 3 passes |
|
||||
| 4 | `deleteProcessingEventsOlderThan(30)` removes rows older than 30 days | VERIFIED | `analyticsService.ts:68-88` — `.lt('created_at', cutoff)` with JS-computed ISO date; test 5-6 pass |
|
||||
| 5 | Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken) | VERIFIED | `healthProbeService.ts:32-173` — 4 individual probe functions each call real SDK clients; tests 1, 5 pass |
|
||||
| 6 | Each probe returns a structured `ProbeResult` with `service_name`, `status`, `latency_ms`, and optional `error_message` | VERIFIED | `healthProbeService.ts:13-19` — `ProbeResult` interface; all probe functions return it; test 1 passes |
|
||||
| 7 | Probe results are persisted to Supabase via `HealthCheckModel.create()` | VERIFIED | `healthProbeService.ts:219-225` — `await HealthCheckModel.create({...})` inside post-probe loop; test 2 passes |
|
||||
| 8 | A single probe failure does not prevent other probes from running | VERIFIED | `healthProbeService.ts:198` — `Promise.allSettled()` + individual try/catch on persist; test 3 passes |
|
||||
| 9 | LLM probe uses cheapest model (`claude-haiku-4-5`) with `max_tokens 5` | VERIFIED | `healthProbeService.ts:63-66` — `model: 'claude-haiku-4-5', max_tokens: 5` |
|
||||
| 10 | Supabase probe uses `getPostgresPool().query('SELECT 1')`, not PostgREST client | VERIFIED | `healthProbeService.ts:105-106` — `const pool = getPostgresPool(); await pool.query('SELECT 1')`; test 5 passes |
|
||||
| 11 | An alert email is sent when a probe returns 'degraded' or 'down'; deduplication prevents duplicate emails within cooldown | VERIFIED | `alertService.ts:103-143` — `evaluateAndAlert()` checks `findRecentByService()` before creating row and sending email; tests 2-4 pass |
|
||||
| 12 | Alert recipient is read from `process.env.EMAIL_WEEKLY_RECIPIENT`, never hardcoded in runtime logic | VERIFIED | `alertService.ts:43` — `const recipient = process.env['EMAIL_WEEKLY_RECIPIENT']`; no hardcoded address in runtime path; test 5 passes |
|
||||
| 13 | `runHealthProbes` Cloud Function export runs on 'every 5 minutes' schedule, separate from `processDocumentJobs` | VERIFIED | `index.ts:340-363` — `export const runHealthProbes = onSchedule({ schedule: 'every 5 minutes', ... })` — separate export |
|
||||
| 14 | `runRetentionCleanup` deletes from `service_health_checks`, `alert_events`, and `document_processing_events` older than 30 days on schedule | VERIFIED | `index.ts:366-390` — `schedule: 'every monday 02:00'`; `Promise.all([HealthCheckModel.deleteOlderThan(30), AlertEventModel.deleteOlderThan(30), deleteProcessingEventsOlderThan(30)])` |
|
||||
|
||||
**Score:** 14/14 truths verified
|
||||
|
||||
---
|
||||
|
||||
### Required Artifacts
|
||||
|
||||
| Artifact | Expected | Status | Details |
|
||||
|----------|----------|--------|---------|
|
||||
| `backend/src/models/migrations/013_create_processing_events_table.sql` | DDL with indexes and RLS | VERIFIED | 34 lines — `CREATE TABLE IF NOT EXISTS document_processing_events`, 2 indexes on `created_at`/`document_id`, `ENABLE ROW LEVEL SECURITY` |
|
||||
| `backend/src/services/analyticsService.ts` | Fire-and-forget analytics writer | VERIFIED | 88 lines — exports `recordProcessingEvent` (void), `deleteProcessingEventsOlderThan` (Promise<number>), `ProcessingEventData` |
|
||||
| `backend/src/__tests__/unit/analyticsService.test.ts` | Unit tests, min 50 lines | VERIFIED | 205 lines — 6 tests, all pass |
|
||||
| `backend/src/services/healthProbeService.ts` | 4 probers + orchestrator | VERIFIED | 248 lines — exports `healthProbeService.runAllProbes()` and `ProbeResult` |
|
||||
| `backend/src/__tests__/unit/healthProbeService.test.ts` | Unit tests, min 80 lines | VERIFIED | 317 lines — 9 tests, all pass |
|
||||
| `backend/src/services/alertService.ts` | Alert deduplication + email | VERIFIED | 146 lines — exports `alertService.evaluateAndAlert()` |
|
||||
| `backend/src/__tests__/unit/alertService.test.ts` | Unit tests, min 80 lines | VERIFIED | 235 lines — 8 tests, all pass |
|
||||
| `backend/src/index.ts` | Two new `onSchedule` Cloud Function exports | VERIFIED | `export const runHealthProbes` (line 340), `export const runRetentionCleanup` (line 366) |
|
||||
|
||||
---
|
||||
|
||||
### Key Link Verification
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|------|----|-----|--------|---------|
|
||||
| `analyticsService.ts` | `config/supabase.ts` | `getSupabaseServiceClient()` call | WIRED | `analyticsService.ts:1,32,70` — imported and called inside both exported functions |
|
||||
| `analyticsService.ts` | `document_processing_events` table | `void supabase.from('document_processing_events').insert(...)` | WIRED | `analyticsService.ts:34-35` — pattern matches exactly |
|
||||
| `healthProbeService.ts` | `HealthCheckModel.ts` | `HealthCheckModel.create()` for persistence | WIRED | `healthProbeService.ts:5,219` — imported statically and called for each probe result |
|
||||
| `healthProbeService.ts` | `config/supabase.ts` | `getPostgresPool()` for Supabase probe | WIRED | `healthProbeService.ts:4,105` — imported and called inside `probeSupabase()` |
|
||||
| `alertService.ts` | `AlertEventModel.ts` | `findRecentByService()` and `create()` | WIRED | `alertService.ts:3,113,135` — imported and both methods called in `evaluateAndAlert()` |
|
||||
| `alertService.ts` | `nodemailer` | `createTransport` inside function scope | WIRED | `alertService.ts:1,22` — imported; `createTransporter()` is called lazily inside `sendAlertEmail()` |
|
||||
| `alertService.ts` | `process.env.EMAIL_WEEKLY_RECIPIENT` | Config-based recipient | WIRED | `alertService.ts:43` — `process.env['EMAIL_WEEKLY_RECIPIENT']` with no hardcoded fallback |
|
||||
| `index.ts (runHealthProbes)` | `healthProbeService.ts` | `dynamic import('./services/healthProbeService')` | WIRED | `index.ts:353` — `const { healthProbeService } = await import('./services/healthProbeService')` |
|
||||
| `index.ts (runHealthProbes)` | `alertService.ts` | `dynamic import('./services/alertService')` | WIRED | `index.ts:354` — `const { alertService } = await import('./services/alertService')` |
|
||||
| `index.ts (runRetentionCleanup)` | `HealthCheckModel.ts` | `HealthCheckModel.deleteOlderThan(30)` | WIRED | `index.ts:372,379` — dynamically imported and called in `Promise.all` |
|
||||
| `index.ts (runRetentionCleanup)` | `analyticsService.ts` | `deleteProcessingEventsOlderThan(30)` | WIRED | `index.ts:374,381` — dynamically imported and called in `Promise.all` |
|
||||
|
||||
---
|
||||
|
||||
### Requirements Coverage
|
||||
|
||||
| Requirement | Source Plan | Description | Status | Evidence |
|
||||
|-------------|-------------|-------------|--------|----------|
|
||||
| ANLY-01 | 02-01 | Document processing events persist to Supabase at write time | SATISFIED | `analyticsService.ts` writes to `document_processing_events` via Supabase on each `recordProcessingEvent()` call |
|
||||
| ANLY-03 | 02-01 | Analytics instrumentation is non-blocking (fire-and-forget) | SATISFIED | `recordProcessingEvent()` return type is `void`; uses `void supabase...insert(...).then(...)` — no `await`; test 2 verifies return is `undefined` |
|
||||
| HLTH-02 | 02-02 | Each health probe makes a real authenticated API call | SATISFIED | `healthProbeService.ts` — Document AI calls `client.listProcessors()`, LLM calls `client.messages.create()`, Supabase calls `pool.query('SELECT 1')`, Firebase calls `admin.auth().verifyIdToken()` |
|
||||
| HLTH-04 | 02-02 | Health probe results persist to Supabase | SATISFIED | `healthProbeService.ts:219-225` — `HealthCheckModel.create()` called for every probe result |
|
||||
| ALRT-01 | 02-03 | Admin receives email alert when a service goes down or degrades | SATISFIED | `alertService.ts` — `sendAlertEmail()` called after `AlertEventModel.create()` for any non-healthy probe status |
|
||||
| ALRT-02 | 02-03 | Alert deduplication prevents repeat emails within cooldown period | SATISFIED | `alertService.ts:113-128` — `AlertEventModel.findRecentByService()` gates both row creation and email; test 4 verifies suppression |
|
||||
| ALRT-04 | 02-03 | Alert recipient stored as configuration, not hardcoded | SATISFIED | `alertService.ts:43` — `process.env['EMAIL_WEEKLY_RECIPIENT']` with no hardcoded default; service skips email if env var missing |
|
||||
| HLTH-03 | 02-04 | Health probes run on a scheduled interval, separate from document processing | SATISFIED | `index.ts:340-363` — `export const runHealthProbes = onSchedule({ schedule: 'every 5 minutes' })` — distinct export from `processDocumentJobs` |
|
||||
| INFR-03 | 02-04 | 30-day rolling data retention cleanup runs on schedule | SATISFIED | `index.ts:366-390` — `export const runRetentionCleanup = onSchedule({ schedule: 'every monday 02:00' })` — deletes from all 3 monitoring tables |
|
||||
|
||||
**Orphaned requirements check:** Requirements INFR-01 and INFR-04 are mapped to Phase 1 in REQUIREMENTS.md and are not claimed by any Phase 2 plan — correctly out of scope. HLTH-01, ANLY-02, INFR-02, ALRT-03 are mapped to Phase 3/4 — correctly out of scope.
|
||||
|
||||
All 9 Phase 2 requirement IDs (HLTH-02, HLTH-03, HLTH-04, ALRT-01, ALRT-02, ALRT-04, ANLY-01, ANLY-03, INFR-03) are accounted for with implementation evidence.
|
||||
|
||||
---
|
||||
|
||||
### Anti-Patterns Found
|
||||
|
||||
| File | Line | Pattern | Severity | Impact |
|
||||
|------|------|---------|----------|--------|
|
||||
| `backend/src/index.ts` | 225 | `defineString('EMAIL_WEEKLY_RECIPIENT', { default: 'jpressnell@bluepointcapital.com' })` | Info | Personal email address as Firebase `defineString` deployment default. `emailWeeklyRecipient` variable is defined but never passed to any function or included in any secrets array — it is effectively unused. The runtime `alertService.ts` reads `process.env['EMAIL_WEEKLY_RECIPIENT']` correctly with no hardcoded default. **Not an ALRT-04 violation** (the `defineString` default is deployment infrastructure config, not source-code-hardcoded logic). Recommend removing the personal email from this default or replacing with a placeholder in a follow-up. |
|
||||
|
||||
No blockers. No stubs. No placeholder implementations found.
|
||||
|
||||
---
|
||||
|
||||
### TypeScript Compilation
|
||||
|
||||
```
|
||||
npx tsc --noEmit — exit 0 (no output, no errors)
|
||||
```
|
||||
|
||||
All new files compile cleanly with no TypeScript errors.
|
||||
|
||||
---
|
||||
|
||||
### Test Results
|
||||
|
||||
```
|
||||
Test Files 3 passed (3)
|
||||
Tests 23 passed (23)
|
||||
Duration 924ms
|
||||
```
|
||||
|
||||
All 23 unit tests across `analyticsService.test.ts`, `healthProbeService.test.ts`, and `alertService.test.ts` pass.
|
||||
|
||||
---
|
||||
|
||||
### Human Verification Required
|
||||
|
||||
#### 1. Live Firebase Deployment — Health Probe Execution
|
||||
|
||||
**Test:** Deploy to Firebase and wait for a `runHealthProbes` trigger (5-minute schedule). Check Firebase Cloud Logging for `healthProbeService: all probes complete` log entry and verify 4 new rows in `service_health_checks` table.
|
||||
**Expected:** 4 rows inserted, all with real latency values. `document_ai` and `firebase_auth` probes return either healthy or degraded (not a connection failure).
|
||||
**Why human:** Cannot run Firebase scheduled functions locally; requires live GCP credentials and deployed infrastructure.
|
||||
|
||||
#### 2. Alert Email Delivery — End-to-End
|
||||
|
||||
**Test:** Temporarily set `ANTHROPIC_API_KEY` to an invalid value and trigger `runHealthProbes`. Verify an email arrives at the `EMAIL_WEEKLY_RECIPIENT` address with subject `[CIM Summary] Alert: llm_api — service_down`.
|
||||
**Expected:** Email received within 5 minutes of probe run. Second probe cycle within 60 minutes should NOT send a duplicate email.
|
||||
**Why human:** SMTP delivery requires live credentials and network routing; deduplication cooldown requires real-time waiting.
|
||||
|
||||
#### 3. Retention Cleanup — Data Deletion Verification
|
||||
|
||||
**Test:** Insert rows into `document_processing_events`, `service_health_checks`, and `alert_events` with `created_at` older than 30 days, then trigger `runRetentionCleanup` manually. Verify old rows are deleted and recent rows remain.
|
||||
**Expected:** Only rows older than 30 days deleted; row counts logged accurately.
|
||||
**Why human:** Requires live Supabase access and insertion of backdated test data.
|
||||
|
||||
---
|
||||
|
||||
### Gaps Summary
|
||||
|
||||
None. All must-haves verified. Phase goal achieved.
|
||||
|
||||
---
|
||||
|
||||
_Verified: 2026-02-24T14:38:30Z_
|
||||
_Verifier: Claude (gsd-verifier)_
|
||||
Reference in New Issue
Block a user