docs(02-backend-services): create phase plan
This commit is contained in:
@@ -45,7 +45,13 @@ Plans:
|
||||
4. Alert recipient is read from configuration (environment variable or Supabase config row), not hardcoded in source
|
||||
5. Analytics events fire as fire-and-forget calls — a deliberately introduced 500ms Supabase delay does not increase processing pipeline duration
|
||||
6. A scheduled probe function and a weekly retention cleanup function exist as separate Firebase Cloud Function exports
|
||||
**Plans**: TBD
|
||||
**Plans:** 4 plans
|
||||
|
||||
Plans:
|
||||
- [ ] 02-01-PLAN.md — Analytics migration + analyticsService (fire-and-forget)
|
||||
- [ ] 02-02-PLAN.md — Health probe service (4 real API probers + orchestrator)
|
||||
- [ ] 02-03-PLAN.md — Alert service (deduplication + email via nodemailer)
|
||||
- [ ] 02-04-PLAN.md — Cloud Function exports (runHealthProbes + runRetentionCleanup)
|
||||
|
||||
### Phase 3: API Layer
|
||||
**Goal**: Admin-authenticated HTTP endpoints expose health status, alerts, and processing analytics; existing service processors emit analytics instrumentation
|
||||
@@ -77,6 +83,6 @@ Phases execute in numeric order: 1 → 2 → 3 → 4
|
||||
| Phase | Plans Complete | Status | Completed |
|
||||
|-------|----------------|--------|-----------|
|
||||
| 1. Data Foundation | 2/2 | Complete | 2026-02-24 |
|
||||
| 2. Backend Services | 0/TBD | Not started | - |
|
||||
| 2. Backend Services | 0/4 | Not started | - |
|
||||
| 3. API Layer | 0/TBD | Not started | - |
|
||||
| 4. Frontend | 0/TBD | Not started | - |
|
||||
|
||||
176
.planning/phases/02-backend-services/02-01-PLAN.md
Normal file
176
.planning/phases/02-backend-services/02-01-PLAN.md
Normal file
@@ -0,0 +1,176 @@
|
||||
---
|
||||
phase: 02-backend-services
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- backend/src/models/migrations/013_create_processing_events_table.sql
|
||||
- backend/src/services/analyticsService.ts
|
||||
- backend/src/__tests__/unit/analyticsService.test.ts
|
||||
autonomous: true
|
||||
requirements: [ANLY-01, ANLY-03]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "recordProcessingEvent() writes to document_processing_events table via Supabase"
|
||||
- "recordProcessingEvent() returns void (not Promise) so callers cannot accidentally await it"
|
||||
- "A deliberate Supabase write failure logs an error but does not throw or reject"
|
||||
- "deleteProcessingEventsOlderThan(30) removes rows older than 30 days"
|
||||
artifacts:
|
||||
- path: "backend/src/models/migrations/013_create_processing_events_table.sql"
|
||||
provides: "document_processing_events table DDL with indexes and RLS"
|
||||
contains: "CREATE TABLE IF NOT EXISTS document_processing_events"
|
||||
- path: "backend/src/services/analyticsService.ts"
|
||||
provides: "Fire-and-forget analytics event writer and retention delete"
|
||||
exports: ["recordProcessingEvent", "deleteProcessingEventsOlderThan"]
|
||||
- path: "backend/src/__tests__/unit/analyticsService.test.ts"
|
||||
provides: "Unit tests for analyticsService"
|
||||
min_lines: 50
|
||||
key_links:
|
||||
- from: "backend/src/services/analyticsService.ts"
|
||||
to: "backend/src/config/supabase.ts"
|
||||
via: "getSupabaseServiceClient() call"
|
||||
pattern: "getSupabaseServiceClient"
|
||||
- from: "backend/src/services/analyticsService.ts"
|
||||
to: "document_processing_events table"
|
||||
via: "void supabase.from('document_processing_events').insert(...)"
|
||||
pattern: "void.*from\\('document_processing_events'\\)"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create the analytics migration and fire-and-forget analytics service for persisting document processing events to Supabase.
|
||||
|
||||
Purpose: ANLY-01 requires processing events to persist (not in-memory), and ANLY-03 requires instrumentation to be non-blocking. This plan creates the database table and the service that writes to it without blocking the processing pipeline.
|
||||
|
||||
Output: Migration 013 SQL file, analyticsService.ts with recordProcessingEvent() and deleteProcessingEventsOlderThan(), and unit tests.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/jonathan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/02-backend-services/02-RESEARCH.md
|
||||
@.planning/phases/01-data-foundation/01-01-SUMMARY.md
|
||||
@.planning/phases/01-data-foundation/01-02-SUMMARY.md
|
||||
@backend/src/models/migrations/012_create_monitoring_tables.sql
|
||||
@backend/src/config/supabase.ts
|
||||
@backend/src/utils/logger.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create analytics migration and analyticsService</name>
|
||||
<files>
|
||||
backend/src/models/migrations/013_create_processing_events_table.sql
|
||||
backend/src/services/analyticsService.ts
|
||||
</files>
|
||||
<action>
|
||||
**Migration 013:** Create `backend/src/models/migrations/013_create_processing_events_table.sql` following the exact pattern from migration 012. The table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS document_processing_events (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
document_id UUID NOT NULL,
|
||||
user_id UUID NOT NULL,
|
||||
event_type TEXT NOT NULL CHECK (event_type IN ('upload_started', 'processing_started', 'completed', 'failed')),
|
||||
duration_ms INTEGER,
|
||||
error_message TEXT,
|
||||
stage TEXT,
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_document_processing_events_created_at
|
||||
ON document_processing_events(created_at);
|
||||
CREATE INDEX IF NOT EXISTS idx_document_processing_events_document_id
|
||||
ON document_processing_events(document_id);
|
||||
|
||||
ALTER TABLE document_processing_events ENABLE ROW LEVEL SECURITY;
|
||||
```
|
||||
|
||||
**analyticsService.ts:** Create `backend/src/services/analyticsService.ts` with two exports:
|
||||
|
||||
1. `recordProcessingEvent(data: ProcessingEventData): void` — Return type MUST be `void` (not `Promise<void>`) to prevent accidental `await`. Inside, call `getSupabaseServiceClient()` (per-method, not module level), then `void supabase.from('document_processing_events').insert({...}).then(({ error }) => { if (error) logger.error(...) })`. Never throw, never reject.
|
||||
|
||||
2. `deleteProcessingEventsOlderThan(days: number): Promise<number>` — Compute cutoff date in JS (`new Date(Date.now() - days * 86400000).toISOString()`), then delete with `.lt('created_at', cutoff)`. Return the count of deleted rows. This follows the same pattern as `HealthCheckModel.deleteOlderThan()`.
|
||||
|
||||
Export the `ProcessingEventData` interface:
|
||||
```typescript
|
||||
export interface ProcessingEventData {
|
||||
document_id: string;
|
||||
user_id: string;
|
||||
event_type: 'upload_started' | 'processing_started' | 'completed' | 'failed';
|
||||
duration_ms?: number;
|
||||
error_message?: string;
|
||||
stage?: string;
|
||||
}
|
||||
```
|
||||
|
||||
Use Winston logger (`import { logger } from '../utils/logger'`). Use `getSupabaseServiceClient` from `'../config/supabase'`. Follow project naming conventions (camelCase file, named exports).
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
|
||||
<manual>Verify 013 migration file exists and analyticsService exports recordProcessingEvent and deleteProcessingEventsOlderThan</manual>
|
||||
</verify>
|
||||
<done>Migration 013 creates document_processing_events table with indexes and RLS. analyticsService.ts exports recordProcessingEvent (void return) and deleteProcessingEventsOlderThan (Promise<number>). TypeScript compiles.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Create analyticsService unit tests</name>
|
||||
<files>
|
||||
backend/src/__tests__/unit/analyticsService.test.ts
|
||||
</files>
|
||||
<action>
|
||||
Create `backend/src/__tests__/unit/analyticsService.test.ts` using the Vitest + Supabase mock pattern established in Phase 1 (01-02-SUMMARY.md).
|
||||
|
||||
Mock setup:
|
||||
- `vi.mock('../../config/supabase')` with inline `vi.fn()` factory
|
||||
- `vi.mock('../../utils/logger')` with inline `vi.fn()` factory
|
||||
- Use `vi.mocked()` after import for typed access
|
||||
- `makeSupabaseChain()` helper per test (fresh mock state)
|
||||
|
||||
Test cases for `recordProcessingEvent`:
|
||||
1. **Calls Supabase insert with correct data** — verify `.from('document_processing_events').insert(...)` called with expected fields including `created_at`
|
||||
2. **Return type is void (not a Promise)** — call `recordProcessingEvent(data)` and verify the return value is `undefined` (void), not a thenable
|
||||
3. **Logs error on Supabase failure but does not throw** — mock the `.then` callback with `{ error: { message: 'test error' } }`, verify `logger.error` was called
|
||||
4. **Handles optional fields (duration_ms, error_message, stage) as null** — pass data without optional fields, verify insert called with `null` for those columns
|
||||
|
||||
Test cases for `deleteProcessingEventsOlderThan`:
|
||||
5. **Computes correct cutoff date and deletes** — mock Supabase delete chain, verify `.lt('created_at', ...)` called with ISO date string ~30 days ago
|
||||
6. **Returns count of deleted rows** — mock response with `data: [{}, {}, {}]` (3 rows), verify returns 3
|
||||
|
||||
Use `beforeEach(() => vi.clearAllMocks())` for test isolation.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/analyticsService.test.ts --reporter=verbose 2>&1</automated>
|
||||
</verify>
|
||||
<done>All analyticsService tests pass. recordProcessingEvent verified as fire-and-forget (void return, error-swallowing). deleteProcessingEventsOlderThan verified with correct date math and row count return.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. `npx tsc --noEmit` passes with no errors from new files
|
||||
2. `npx vitest run src/__tests__/unit/analyticsService.test.ts` — all tests pass
|
||||
3. Migration 013 SQL is valid and follows 012 pattern
|
||||
4. `recordProcessingEvent` return type is `void` (not `Promise<void>`)
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- Migration 013 creates document_processing_events table with id, document_id, user_id, event_type (CHECK constraint), duration_ms, error_message, stage, created_at
|
||||
- Indexes on created_at and document_id exist
|
||||
- RLS enabled on the table
|
||||
- analyticsService.recordProcessingEvent() is fire-and-forget (void return, no throw)
|
||||
- analyticsService.deleteProcessingEventsOlderThan() returns deleted row count
|
||||
- All unit tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-backend-services/02-01-SUMMARY.md`
|
||||
</output>
|
||||
176
.planning/phases/02-backend-services/02-02-PLAN.md
Normal file
176
.planning/phases/02-backend-services/02-02-PLAN.md
Normal file
@@ -0,0 +1,176 @@
|
||||
---
|
||||
phase: 02-backend-services
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- backend/package.json
|
||||
- backend/src/services/healthProbeService.ts
|
||||
- backend/src/__tests__/unit/healthProbeService.test.ts
|
||||
autonomous: true
|
||||
requirements: [HLTH-02, HLTH-04]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken)"
|
||||
- "Each probe returns a structured ProbeResult with service_name, status, latency_ms, and optional error_message"
|
||||
- "Probe results are persisted to Supabase via HealthCheckModel.create()"
|
||||
- "A single probe failure does not prevent other probes from running"
|
||||
- "LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5"
|
||||
- "Supabase probe uses getPostgresPool().query('SELECT 1'), not PostgREST client"
|
||||
artifacts:
|
||||
- path: "backend/src/services/healthProbeService.ts"
|
||||
provides: "Health probe orchestrator with 4 individual probers"
|
||||
exports: ["healthProbeService", "ProbeResult"]
|
||||
- path: "backend/src/__tests__/unit/healthProbeService.test.ts"
|
||||
provides: "Unit tests for all probes and orchestrator"
|
||||
min_lines: 80
|
||||
key_links:
|
||||
- from: "backend/src/services/healthProbeService.ts"
|
||||
to: "backend/src/models/HealthCheckModel.ts"
|
||||
via: "HealthCheckModel.create() for persistence"
|
||||
pattern: "HealthCheckModel\\.create"
|
||||
- from: "backend/src/services/healthProbeService.ts"
|
||||
to: "backend/src/config/supabase.ts"
|
||||
via: "getPostgresPool() for Supabase probe"
|
||||
pattern: "getPostgresPool"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create the health probe service with four real API probers (Document AI, LLM, Supabase, Firebase Auth) and an orchestrator that runs all probes and persists results.
|
||||
|
||||
Purpose: HLTH-02 requires real authenticated API calls (not config checks), and HLTH-04 requires results to persist to Supabase. This plan builds the probe logic and persistence layer.
|
||||
|
||||
Output: healthProbeService.ts with 4 probers + runAllProbes orchestrator, and unit tests. Also installs nodemailer (needed by Plan 03).
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/jonathan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/02-backend-services/02-RESEARCH.md
|
||||
@.planning/phases/01-data-foundation/01-01-SUMMARY.md
|
||||
@backend/src/models/HealthCheckModel.ts
|
||||
@backend/src/config/supabase.ts
|
||||
@backend/src/services/documentAiProcessor.ts
|
||||
@backend/src/services/llmService.ts
|
||||
@backend/src/config/firebase.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Install nodemailer and create healthProbeService</name>
|
||||
<files>
|
||||
backend/package.json
|
||||
backend/src/services/healthProbeService.ts
|
||||
</files>
|
||||
<action>
|
||||
**Step 1: Install nodemailer** (needed by Plan 03, installing now to avoid package.json conflicts in parallel execution):
|
||||
```bash
|
||||
cd backend && npm install nodemailer && npm install --save-dev @types/nodemailer
|
||||
```
|
||||
|
||||
**Step 2: Create healthProbeService.ts** with the following structure:
|
||||
|
||||
Export a `ProbeResult` interface:
|
||||
```typescript
|
||||
export interface ProbeResult {
|
||||
service_name: string;
|
||||
status: 'healthy' | 'degraded' | 'down';
|
||||
latency_ms: number;
|
||||
error_message?: string;
|
||||
probe_details?: Record<string, unknown>;
|
||||
}
|
||||
```
|
||||
|
||||
Create 4 individual probe functions (all private/unexported):
|
||||
|
||||
1. **probeDocumentAI()**: Import `DocumentProcessorServiceClient` from `@google-cloud/documentai`. Call `client.listProcessors({ parent: ... })` using the project ID from config. Latency > 2000ms = 'degraded'. Catch errors = 'down' with error_message.
|
||||
|
||||
2. **probeLLM()**: Import `Anthropic` from `@anthropic-ai/sdk`. Create client with `process.env.ANTHROPIC_API_KEY`. Call `client.messages.create({ model: 'claude-haiku-4-5', max_tokens: 5, messages: [{ role: 'user', content: 'Hi' }] })`. Use cheapest model (PITFALL B prevention). Latency > 5000ms = 'degraded'. 429 errors = 'degraded' (rate limit, not down). Other errors = 'down'.
|
||||
|
||||
3. **probeSupabase()**: Import `getPostgresPool` from `'../config/supabase'`. Call `pool.query('SELECT 1')`. Use direct PostgreSQL, NOT PostgREST (PITFALL C prevention). Latency > 2000ms = 'degraded'. Errors = 'down'.
|
||||
|
||||
4. **probeFirebaseAuth()**: Import `admin` from `firebase-admin` (or use the existing firebase config). Call `admin.auth().verifyIdToken('invalid-token-probe-check')`. This ALWAYS throws. If error message contains 'argument' or 'INVALID' = 'healthy' (SDK is alive). Other errors = 'down'.
|
||||
|
||||
Create `runAllProbes()` as the orchestrator:
|
||||
- Wrap each probe in individual try/catch (PITFALL E: one probe failure must not stop others)
|
||||
- For each ProbeResult, call `HealthCheckModel.create({ service_name, status, latency_ms, error_message, probe_details, checked_at: new Date().toISOString() })`
|
||||
- Return array of all ProbeResults
|
||||
- Log summary via Winston logger
|
||||
|
||||
Export as object: `export const healthProbeService = { runAllProbes }`.
|
||||
|
||||
Use Winston logger for all logging. Use `getSupabaseServiceClient()` per-method pattern for any Supabase calls (though probes use `getPostgresPool()` directly for the Supabase probe).
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
|
||||
<manual>Verify healthProbeService.ts exists with runAllProbes and ProbeResult exports</manual>
|
||||
</verify>
|
||||
<done>nodemailer installed. healthProbeService.ts exports ProbeResult interface and healthProbeService object with runAllProbes(). Four probes make real API calls. Each probe wrapped in try/catch. Results persisted via HealthCheckModel.create(). TypeScript compiles.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Create healthProbeService unit tests</name>
|
||||
<files>
|
||||
backend/src/__tests__/unit/healthProbeService.test.ts
|
||||
</files>
|
||||
<action>
|
||||
Create `backend/src/__tests__/unit/healthProbeService.test.ts` using the established Vitest mock pattern.
|
||||
|
||||
Mock all external dependencies:
|
||||
- `vi.mock('../../models/HealthCheckModel')` — mock `create()` to resolve successfully
|
||||
- `vi.mock('../../config/supabase')` — mock `getPostgresPool()` returning `{ query: vi.fn() }`
|
||||
- `vi.mock('@google-cloud/documentai')` — mock `DocumentProcessorServiceClient` with `listProcessors` resolving
|
||||
- `vi.mock('@anthropic-ai/sdk')` — mock `Anthropic` constructor, `messages.create` resolving
|
||||
- `vi.mock('firebase-admin')` — mock `auth().verifyIdToken()` throwing expected error
|
||||
- `vi.mock('../../utils/logger')` — mock logger
|
||||
|
||||
Test cases for `runAllProbes`:
|
||||
1. **All probes healthy — returns 4 ProbeResults with status 'healthy'** — all mocks resolve quickly, verify 4 results returned with status 'healthy'
|
||||
2. **Each result persisted via HealthCheckModel.create** — verify `HealthCheckModel.create` called 4 times with correct service_name values: 'document_ai', 'llm_api', 'supabase', 'firebase_auth'
|
||||
3. **One probe throws — others still run** — make Document AI mock throw, verify 3 other probes still complete and all 4 HealthCheckModel.create calls happen (the failed probe creates a 'down' result)
|
||||
4. **LLM probe 429 error returns 'degraded' not 'down'** — make Anthropic mock throw error with '429' in message, verify result status is 'degraded'
|
||||
5. **Supabase probe uses getPostgresPool not getSupabaseServiceClient** — verify `getPostgresPool` was called (not getSupabaseServiceClient) during Supabase probe
|
||||
6. **Firebase Auth probe — expected error = healthy** — mock verifyIdToken throwing 'Decoding Firebase ID token failed' (argument error), verify status is 'healthy'
|
||||
7. **Firebase Auth probe — unexpected error = down** — mock verifyIdToken throwing network error, verify status is 'down'
|
||||
8. **Latency measured correctly** — use `vi.useFakeTimers()` or verify `latency_ms` is a non-negative number
|
||||
|
||||
Use `beforeEach(() => vi.clearAllMocks())`.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/healthProbeService.test.ts --reporter=verbose 2>&1</automated>
|
||||
</verify>
|
||||
<done>All healthProbeService tests pass. Probes verified as making real API calls (mocked). Orchestrator verified as fault-tolerant (one probe failure doesn't stop others). Results verified as persisted via HealthCheckModel.create(). Supabase probe uses getPostgresPool, not PostgREST.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. `npm ls nodemailer` shows nodemailer installed
|
||||
2. `npx tsc --noEmit` passes
|
||||
3. `npx vitest run src/__tests__/unit/healthProbeService.test.ts` — all tests pass
|
||||
4. healthProbeService.ts does NOT use getSupabaseServiceClient for the Supabase probe (uses getPostgresPool)
|
||||
5. LLM probe uses 'claude-haiku-4-5' not an expensive model
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- nodemailer and @types/nodemailer installed in backend/package.json
|
||||
- healthProbeService exports ProbeResult and healthProbeService.runAllProbes
|
||||
- 4 probes: document_ai, llm_api, supabase, firebase_auth
|
||||
- Each probe returns structured ProbeResult with status/latency_ms/error_message
|
||||
- Probe results persisted via HealthCheckModel.create()
|
||||
- Individual probe failures isolated (other probes still run)
|
||||
- All unit tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-backend-services/02-02-SUMMARY.md`
|
||||
</output>
|
||||
182
.planning/phases/02-backend-services/02-03-PLAN.md
Normal file
182
.planning/phases/02-backend-services/02-03-PLAN.md
Normal file
@@ -0,0 +1,182 @@
|
||||
---
|
||||
phase: 02-backend-services
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: [02-02]
|
||||
files_modified:
|
||||
- backend/src/services/alertService.ts
|
||||
- backend/src/__tests__/unit/alertService.test.ts
|
||||
autonomous: true
|
||||
requirements: [ALRT-01, ALRT-02, ALRT-04]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "An alert email is sent when a probe returns 'degraded' or 'down'"
|
||||
- "A second probe failure within the cooldown period does NOT send a duplicate email"
|
||||
- "Alert recipient is read from process.env.EMAIL_WEEKLY_RECIPIENT, never hardcoded"
|
||||
- "Email failure does not throw or break the probe pipeline"
|
||||
- "Nodemailer transporter is created inside the function call, not at module level (Firebase Secret timing)"
|
||||
- "An alert_events row is created before sending the email"
|
||||
artifacts:
|
||||
- path: "backend/src/services/alertService.ts"
|
||||
provides: "Alert deduplication, email sending, and alert event creation"
|
||||
exports: ["alertService"]
|
||||
- path: "backend/src/__tests__/unit/alertService.test.ts"
|
||||
provides: "Unit tests for alert deduplication, email sending, recipient config"
|
||||
min_lines: 80
|
||||
key_links:
|
||||
- from: "backend/src/services/alertService.ts"
|
||||
to: "backend/src/models/AlertEventModel.ts"
|
||||
via: "findRecentByService() for deduplication, create() for alert row"
|
||||
pattern: "AlertEventModel\\.(findRecentByService|create)"
|
||||
- from: "backend/src/services/alertService.ts"
|
||||
to: "nodemailer"
|
||||
via: "createTransport + sendMail for email delivery"
|
||||
pattern: "nodemailer\\.createTransport"
|
||||
- from: "backend/src/services/alertService.ts"
|
||||
to: "process.env.EMAIL_WEEKLY_RECIPIENT"
|
||||
via: "Config-based recipient (ALRT-04)"
|
||||
pattern: "process\\.env\\.EMAIL_WEEKLY_RECIPIENT"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create the alert service with deduplication logic, SMTP email sending via nodemailer, and config-based recipient.
|
||||
|
||||
Purpose: ALRT-01 requires email alerts on service degradation/failure. ALRT-02 requires deduplication with cooldown. ALRT-04 requires the recipient to come from configuration, not hardcoded source code.
|
||||
|
||||
Output: alertService.ts with evaluateAndAlert() and sendAlertEmail(), and unit tests.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/jonathan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/02-backend-services/02-RESEARCH.md
|
||||
@.planning/phases/01-data-foundation/01-01-SUMMARY.md
|
||||
@.planning/phases/02-backend-services/02-02-PLAN.md
|
||||
@backend/src/models/AlertEventModel.ts
|
||||
@backend/src/index.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create alertService with deduplication and email</name>
|
||||
<files>
|
||||
backend/src/services/alertService.ts
|
||||
</files>
|
||||
<action>
|
||||
Create `backend/src/services/alertService.ts` with the following structure:
|
||||
|
||||
**Import the ProbeResult type** from `'./healthProbeService'` (created in Plan 02).
|
||||
|
||||
**Constants:**
|
||||
- `ALERT_COOLDOWN_MINUTES = parseInt(process.env.ALERT_COOLDOWN_MINUTES ?? '60', 10)` — configurable cooldown window
|
||||
|
||||
**Private function `createTransporter()`:**
|
||||
Create nodemailer transporter INSIDE function scope (not module level — PITFALL A: Firebase Secrets not available at module load). Read SMTP config from `process.env`:
|
||||
- `host`: `process.env.EMAIL_HOST ?? 'smtp.gmail.com'`
|
||||
- `port`: `parseInt(process.env.EMAIL_PORT ?? '587', 10)`
|
||||
- `secure`: `process.env.EMAIL_SECURE === 'true'`
|
||||
- `auth.user`: `process.env.EMAIL_USER`
|
||||
- `auth.pass`: `process.env.EMAIL_PASS`
|
||||
|
||||
**Private function `sendAlertEmail(serviceName, alertType, message)`:**
|
||||
- Read recipient from `process.env.EMAIL_WEEKLY_RECIPIENT` (ALRT-04: NEVER hardcode the email address)
|
||||
- If no recipient configured, log warning and return (do not throw)
|
||||
- Call `createTransporter()` then `transporter.sendMail({ from, to, subject, text, html })`
|
||||
- Subject format: `[CIM Summary] Alert: ${serviceName} — ${alertType}`
|
||||
- Wrap in try/catch — email failure logs error but does NOT throw (email failure must not break probe pipeline)
|
||||
|
||||
**Exported function `evaluateAndAlert(probeResults: ProbeResult[])`:**
|
||||
For each ProbeResult where status is 'degraded' or 'down':
|
||||
1. Map status to alert_type: 'down' -> 'service_down', 'degraded' -> 'service_degraded'
|
||||
2. Call `AlertEventModel.findRecentByService(service_name, alert_type, ALERT_COOLDOWN_MINUTES)`
|
||||
3. If recent alert exists within cooldown, log suppression and skip BOTH row creation AND email (PITFALL 3: prevent alert storms)
|
||||
4. If no recent alert, create alert_events row via `AlertEventModel.create({ service_name, alert_type, message: error_message or status description })`
|
||||
5. Then send email via `sendAlertEmail()`
|
||||
|
||||
Export as: `export const alertService = { evaluateAndAlert }`.
|
||||
|
||||
Use Winston logger for all logging. Use `import { AlertEventModel } from '../models/AlertEventModel'`.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
|
||||
<manual>Verify alertService.ts exports alertService.evaluateAndAlert. Verify no hardcoded email addresses in source.</manual>
|
||||
</verify>
|
||||
<done>alertService.ts exports evaluateAndAlert(). Deduplication checks AlertEventModel.findRecentByService() before creating rows or sending email. Recipient read from process.env.EMAIL_WEEKLY_RECIPIENT. Transporter created lazily. Email failures caught and logged. TypeScript compiles.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Create alertService unit tests</name>
|
||||
<files>
|
||||
backend/src/__tests__/unit/alertService.test.ts
|
||||
</files>
|
||||
<action>
|
||||
Create `backend/src/__tests__/unit/alertService.test.ts` using the Vitest mock pattern.
|
||||
|
||||
Mock dependencies:
|
||||
- `vi.mock('../../models/AlertEventModel')` — mock `findRecentByService` and `create`
|
||||
- `vi.mock('nodemailer')` — mock `createTransport` returning `{ sendMail: vi.fn().mockResolvedValue({}) }`
|
||||
- `vi.mock('../../utils/logger')` — mock logger
|
||||
|
||||
Create test ProbeResult fixtures:
|
||||
- `healthyProbe: { service_name: 'supabase', status: 'healthy', latency_ms: 50 }`
|
||||
- `downProbe: { service_name: 'document_ai', status: 'down', latency_ms: 0, error_message: 'Connection refused' }`
|
||||
- `degradedProbe: { service_name: 'llm_api', status: 'degraded', latency_ms: 6000 }`
|
||||
|
||||
Test cases:
|
||||
|
||||
1. **Healthy probes — no alerts sent** — pass array of healthy ProbeResults, verify AlertEventModel.findRecentByService NOT called, sendMail NOT called
|
||||
|
||||
2. **Down probe — creates alert_events row and sends email** — pass downProbe, mock findRecentByService returning null (no recent alert), verify AlertEventModel.create called with service_name='document_ai' and alert_type='service_down', verify sendMail called
|
||||
|
||||
3. **Degraded probe — creates alert with type 'service_degraded'** — pass degradedProbe, mock findRecentByService returning null, verify AlertEventModel.create called with alert_type='service_degraded'
|
||||
|
||||
4. **Deduplication — suppresses within cooldown** — pass downProbe, mock findRecentByService returning an existing alert object (non-null), verify AlertEventModel.create NOT called, sendMail NOT called, logger.info called with 'suppress' in message
|
||||
|
||||
5. **Recipient from env — reads process.env.EMAIL_WEEKLY_RECIPIENT** — set `process.env.EMAIL_WEEKLY_RECIPIENT = 'test@example.com'`, pass downProbe with no recent alert, verify sendMail called with `to: 'test@example.com'`
|
||||
|
||||
6. **No recipient configured — skips email but still creates alert row** — delete process.env.EMAIL_WEEKLY_RECIPIENT, pass downProbe with no recent alert, verify AlertEventModel.create IS called, sendMail NOT called, logger.warn called
|
||||
|
||||
7. **Email failure — does not throw** — mock sendMail to reject, verify evaluateAndAlert does not throw, verify logger.error called
|
||||
|
||||
8. **Multiple probes — processes each independently** — pass [downProbe, degradedProbe, healthyProbe], verify findRecentByService called twice (for down and degraded, not for healthy)
|
||||
|
||||
Use `beforeEach(() => { vi.clearAllMocks(); process.env.EMAIL_WEEKLY_RECIPIENT = 'admin@test.com'; })`.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/alertService.test.ts --reporter=verbose 2>&1</automated>
|
||||
</verify>
|
||||
<done>All alertService tests pass. Deduplication verified (suppresses within cooldown). Email sending verified with config-based recipient. Email failure verified as non-throwing. Multiple probe evaluation verified.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. `npx tsc --noEmit` passes
|
||||
2. `npx vitest run src/__tests__/unit/alertService.test.ts` — all tests pass
|
||||
3. `grep -r 'jpressnell\|bluepoint' backend/src/services/alertService.ts` returns nothing (no hardcoded emails)
|
||||
4. alertService reads recipient from `process.env.EMAIL_WEEKLY_RECIPIENT`
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- alertService exports evaluateAndAlert(probeResults)
|
||||
- Deduplication uses AlertEventModel.findRecentByService with configurable cooldown
|
||||
- Alert rows created via AlertEventModel.create before email send
|
||||
- Suppressed alerts skip BOTH row creation AND email
|
||||
- Recipient from process.env, never hardcoded
|
||||
- Transporter created inside function, not at module level
|
||||
- Email failures caught and logged, never thrown
|
||||
- All unit tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-backend-services/02-03-SUMMARY.md`
|
||||
</output>
|
||||
197
.planning/phases/02-backend-services/02-04-PLAN.md
Normal file
197
.planning/phases/02-backend-services/02-04-PLAN.md
Normal file
@@ -0,0 +1,197 @@
|
||||
---
|
||||
phase: 02-backend-services
|
||||
plan: 04
|
||||
type: execute
|
||||
wave: 3
|
||||
depends_on: [02-01, 02-02, 02-03]
|
||||
files_modified:
|
||||
- backend/src/index.ts
|
||||
autonomous: true
|
||||
requirements: [HLTH-03, INFR-03]
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "runHealthProbes Cloud Function export runs on 'every 5 minutes' schedule, completely separate from processDocumentJobs"
|
||||
- "runRetentionCleanup Cloud Function export runs on 'every monday 02:00' schedule"
|
||||
- "runHealthProbes calls healthProbeService.runAllProbes() and then alertService.evaluateAndAlert()"
|
||||
- "runRetentionCleanup deletes from service_health_checks, alert_events, and document_processing_events older than 30 days"
|
||||
- "Both exports list required Firebase secrets in their secrets array"
|
||||
- "Both exports use dynamic import() pattern (same as processDocumentJobs)"
|
||||
artifacts:
|
||||
- path: "backend/src/index.ts"
|
||||
provides: "Two new onSchedule Cloud Function exports"
|
||||
exports: ["runHealthProbes", "runRetentionCleanup"]
|
||||
key_links:
|
||||
- from: "backend/src/index.ts (runHealthProbes)"
|
||||
to: "backend/src/services/healthProbeService.ts"
|
||||
via: "dynamic import('./services/healthProbeService')"
|
||||
pattern: "import\\('./services/healthProbeService'\\)"
|
||||
- from: "backend/src/index.ts (runHealthProbes)"
|
||||
to: "backend/src/services/alertService.ts"
|
||||
via: "dynamic import('./services/alertService')"
|
||||
pattern: "import\\('./services/alertService'\\)"
|
||||
- from: "backend/src/index.ts (runRetentionCleanup)"
|
||||
to: "backend/src/models/HealthCheckModel.ts"
|
||||
via: "dynamic import for deleteOlderThan(30)"
|
||||
pattern: "HealthCheckModel\\.deleteOlderThan"
|
||||
- from: "backend/src/index.ts (runRetentionCleanup)"
|
||||
to: "backend/src/services/analyticsService.ts"
|
||||
via: "dynamic import for deleteProcessingEventsOlderThan(30)"
|
||||
pattern: "deleteProcessingEventsOlderThan"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Add two new Firebase Cloud Function scheduled exports to index.ts: runHealthProbes (every 5 minutes) and runRetentionCleanup (weekly).
|
||||
|
||||
Purpose: HLTH-03 requires health probes to run on a schedule separate from document processing (PITFALL-2). INFR-03 requires 30-day rolling data retention cleanup on schedule.
|
||||
|
||||
Output: Two new onSchedule exports in backend/src/index.ts.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/jonathan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/02-backend-services/02-RESEARCH.md
|
||||
@.planning/phases/02-backend-services/02-01-PLAN.md
|
||||
@.planning/phases/02-backend-services/02-02-PLAN.md
|
||||
@.planning/phases/02-backend-services/02-03-PLAN.md
|
||||
@backend/src/index.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Add runHealthProbes scheduled Cloud Function export</name>
|
||||
<files>
|
||||
backend/src/index.ts
|
||||
</files>
|
||||
<action>
|
||||
Add a new `onSchedule` export to `backend/src/index.ts` AFTER the existing `processDocumentJobs` export. Follow the exact same pattern as `processDocumentJobs`.
|
||||
|
||||
```typescript
|
||||
// Health probe scheduler — separate from document processing (PITFALL-2, HLTH-03)
|
||||
export const runHealthProbes = onSchedule({
|
||||
schedule: 'every 5 minutes',
|
||||
timeoutSeconds: 60,
|
||||
memory: '256MiB',
|
||||
retryCount: 0, // Probes should not retry — they run again in 5 minutes anyway
|
||||
secrets: [
|
||||
anthropicApiKey, // for LLM probe
|
||||
openaiApiKey, // for OpenAI probe fallback
|
||||
databaseUrl, // for Supabase probe
|
||||
supabaseServiceKey,
|
||||
supabaseAnonKey,
|
||||
],
|
||||
}, async (_event) => {
|
||||
const { healthProbeService } = await import('./services/healthProbeService');
|
||||
const { alertService } = await import('./services/alertService');
|
||||
|
||||
const results = await healthProbeService.runAllProbes();
|
||||
await alertService.evaluateAndAlert(results);
|
||||
|
||||
logger.info('runHealthProbes: complete', {
|
||||
probeCount: results.length,
|
||||
statuses: results.map(r => ({ service: r.service_name, status: r.status })),
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
Key requirements:
|
||||
- Use dynamic `import()` (not static import at top of file) — same pattern as processDocumentJobs
|
||||
- List ALL secrets that probes need in the `secrets` array (Firebase Secrets must be explicitly listed per function)
|
||||
- Use the existing `anthropicApiKey`, `openaiApiKey`, `databaseUrl`, `supabaseServiceKey`, `supabaseAnonKey` variables already defined via `defineSecret` at the top of index.ts
|
||||
- Set `retryCount: 0` — probes run every 5 minutes, no need to retry failures
|
||||
- First call `runAllProbes()` to measure and persist, then `evaluateAndAlert()` to check for alerts
|
||||
- Log a summary with probe count and statuses
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
|
||||
<manual>Verify index.ts has `export const runHealthProbes` as a separate export from processDocumentJobs</manual>
|
||||
</verify>
|
||||
<done>runHealthProbes export added to index.ts. Runs every 5 minutes. Calls healthProbeService.runAllProbes() then alertService.evaluateAndAlert(). Uses dynamic imports. Lists all required secrets. TypeScript compiles.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Add runRetentionCleanup scheduled Cloud Function export</name>
|
||||
<files>
|
||||
backend/src/index.ts
|
||||
</files>
|
||||
<action>
|
||||
Add a second `onSchedule` export to `backend/src/index.ts` AFTER runHealthProbes.
|
||||
|
||||
```typescript
|
||||
// Retention cleanup — weekly, separate from document processing (PITFALL-7, INFR-03)
|
||||
export const runRetentionCleanup = onSchedule({
|
||||
schedule: 'every monday 02:00',
|
||||
timeoutSeconds: 120,
|
||||
memory: '256MiB',
|
||||
secrets: [databaseUrl, supabaseServiceKey, supabaseAnonKey],
|
||||
}, async (_event) => {
|
||||
const { HealthCheckModel } = await import('./models/HealthCheckModel');
|
||||
const { AlertEventModel } = await import('./models/AlertEventModel');
|
||||
const { deleteProcessingEventsOlderThan } = await import('./services/analyticsService');
|
||||
|
||||
const RETENTION_DAYS = 30;
|
||||
|
||||
const [hcCount, alertCount, eventCount] = await Promise.all([
|
||||
HealthCheckModel.deleteOlderThan(RETENTION_DAYS),
|
||||
AlertEventModel.deleteOlderThan(RETENTION_DAYS),
|
||||
deleteProcessingEventsOlderThan(RETENTION_DAYS),
|
||||
]);
|
||||
|
||||
logger.info('runRetentionCleanup: complete', {
|
||||
retentionDays: RETENTION_DAYS,
|
||||
deletedHealthChecks: hcCount,
|
||||
deletedAlerts: alertCount,
|
||||
deletedProcessingEvents: eventCount,
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
Key requirements:
|
||||
- Use dynamic `import()` for all model and service imports
|
||||
- Run all three deletes in parallel with `Promise.all()` (they touch different tables)
|
||||
- Only include the secrets needed for Supabase access (no LLM keys needed for cleanup)
|
||||
- Set `timeoutSeconds: 120` (cleanup may take longer than probes)
|
||||
- The 30-day retention period is a constant, not configurable via env (matches INFR-03 spec)
|
||||
- Only manage monitoring tables: service_health_checks, alert_events, document_processing_events. Do NOT delete from performance_metrics, session_events, or execution_events (those are agentic RAG tables, out of scope per research Open Question 4)
|
||||
- Log the count of deleted rows from each table
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
|
||||
<manual>Verify index.ts has `export const runRetentionCleanup` as a separate export. Verify it calls deleteOlderThan on all three tables.</manual>
|
||||
</verify>
|
||||
<done>runRetentionCleanup export added to index.ts. Runs weekly Monday 02:00. Deletes from service_health_checks, alert_events, and document_processing_events older than 30 days. Uses Promise.all for parallel execution. Logs deletion counts. TypeScript compiles.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. `npx tsc --noEmit` passes
|
||||
2. `grep 'export const runHealthProbes' backend/src/index.ts` returns a match
|
||||
3. `grep 'export const runRetentionCleanup' backend/src/index.ts` returns a match
|
||||
4. Both exports use `onSchedule` (not piggybacked on processDocumentJobs — PITFALL-2 compliance)
|
||||
5. Both exports use dynamic `import()` pattern
|
||||
6. Full test suite still passes: `npx vitest run --reporter=verbose`
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- runHealthProbes is a separate onSchedule export running every 5 minutes
|
||||
- runRetentionCleanup is a separate onSchedule export running weekly Monday 02:00
|
||||
- Both are completely decoupled from processDocumentJobs
|
||||
- runHealthProbes calls runAllProbes() then evaluateAndAlert()
|
||||
- runRetentionCleanup calls deleteOlderThan(30) on all three monitoring tables
|
||||
- All required Firebase secrets listed in each function's secrets array
|
||||
- TypeScript compiles with no errors
|
||||
- Existing test suite passes with no regressions
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-backend-services/02-04-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user