Files
cim_summary/.planning/milestones/v1.0-phases/02-backend-services/02-02-PLAN.md
admin 38a0f0619d chore: complete v1.0 Analytics & Monitoring milestone
Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:34:18 -05:00

177 lines
9.2 KiB
Markdown

---
phase: 02-backend-services
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
- backend/package.json
- backend/src/services/healthProbeService.ts
- backend/src/__tests__/unit/healthProbeService.test.ts
autonomous: true
requirements: [HLTH-02, HLTH-04]
must_haves:
truths:
- "Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken)"
- "Each probe returns a structured ProbeResult with service_name, status, latency_ms, and optional error_message"
- "Probe results are persisted to Supabase via HealthCheckModel.create()"
- "A single probe failure does not prevent other probes from running"
- "LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5"
- "Supabase probe uses getPostgresPool().query('SELECT 1'), not PostgREST client"
artifacts:
- path: "backend/src/services/healthProbeService.ts"
provides: "Health probe orchestrator with 4 individual probers"
exports: ["healthProbeService", "ProbeResult"]
- path: "backend/src/__tests__/unit/healthProbeService.test.ts"
provides: "Unit tests for all probes and orchestrator"
min_lines: 80
key_links:
- from: "backend/src/services/healthProbeService.ts"
to: "backend/src/models/HealthCheckModel.ts"
via: "HealthCheckModel.create() for persistence"
pattern: "HealthCheckModel\\.create"
- from: "backend/src/services/healthProbeService.ts"
to: "backend/src/config/supabase.ts"
via: "getPostgresPool() for Supabase probe"
pattern: "getPostgresPool"
---
<objective>
Create the health probe service with four real API probers (Document AI, LLM, Supabase, Firebase Auth) and an orchestrator that runs all probes and persists results.
Purpose: HLTH-02 requires real authenticated API calls (not config checks), and HLTH-04 requires results to persist to Supabase. This plan builds the probe logic and persistence layer.
Output: healthProbeService.ts with 4 probers + runAllProbes orchestrator, and unit tests. Also installs nodemailer (needed by Plan 03).
</objective>
<execution_context>
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
@/home/jonathan/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-backend-services/02-RESEARCH.md
@.planning/phases/01-data-foundation/01-01-SUMMARY.md
@backend/src/models/HealthCheckModel.ts
@backend/src/config/supabase.ts
@backend/src/services/documentAiProcessor.ts
@backend/src/services/llmService.ts
@backend/src/config/firebase.ts
</context>
<tasks>
<task type="auto">
<name>Task 1: Install nodemailer and create healthProbeService</name>
<files>
backend/package.json
backend/src/services/healthProbeService.ts
</files>
<action>
**Step 1: Install nodemailer** (needed by Plan 03, installing now to avoid package.json conflicts in parallel execution):
```bash
cd backend && npm install nodemailer && npm install --save-dev @types/nodemailer
```
**Step 2: Create healthProbeService.ts** with the following structure:
Export a `ProbeResult` interface:
```typescript
export interface ProbeResult {
service_name: string;
status: 'healthy' | 'degraded' | 'down';
latency_ms: number;
error_message?: string;
probe_details?: Record<string, unknown>;
}
```
Create 4 individual probe functions (all private/unexported):
1. **probeDocumentAI()**: Import `DocumentProcessorServiceClient` from `@google-cloud/documentai`. Call `client.listProcessors({ parent: ... })` using the project ID from config. Latency > 2000ms = 'degraded'. Catch errors = 'down' with error_message.
2. **probeLLM()**: Import `Anthropic` from `@anthropic-ai/sdk`. Create client with `process.env.ANTHROPIC_API_KEY`. Call `client.messages.create({ model: 'claude-haiku-4-5', max_tokens: 5, messages: [{ role: 'user', content: 'Hi' }] })`. Use cheapest model (PITFALL B prevention). Latency > 5000ms = 'degraded'. 429 errors = 'degraded' (rate limit, not down). Other errors = 'down'.
3. **probeSupabase()**: Import `getPostgresPool` from `'../config/supabase'`. Call `pool.query('SELECT 1')`. Use direct PostgreSQL, NOT PostgREST (PITFALL C prevention). Latency > 2000ms = 'degraded'. Errors = 'down'.
4. **probeFirebaseAuth()**: Import `admin` from `firebase-admin` (or use the existing firebase config). Call `admin.auth().verifyIdToken('invalid-token-probe-check')`. This ALWAYS throws. If error message contains 'argument' or 'INVALID' = 'healthy' (SDK is alive). Other errors = 'down'.
Create `runAllProbes()` as the orchestrator:
- Wrap each probe in individual try/catch (PITFALL E: one probe failure must not stop others)
- For each ProbeResult, call `HealthCheckModel.create({ service_name, status, latency_ms, error_message, probe_details, checked_at: new Date().toISOString() })`
- Return array of all ProbeResults
- Log summary via Winston logger
Export as object: `export const healthProbeService = { runAllProbes }`.
Use Winston logger for all logging. Use `getSupabaseServiceClient()` per-method pattern for any Supabase calls (though probes use `getPostgresPool()` directly for the Supabase probe).
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
<manual>Verify healthProbeService.ts exists with runAllProbes and ProbeResult exports</manual>
</verify>
<done>nodemailer installed. healthProbeService.ts exports ProbeResult interface and healthProbeService object with runAllProbes(). Four probes make real API calls. Each probe wrapped in try/catch. Results persisted via HealthCheckModel.create(). TypeScript compiles.</done>
</task>
<task type="auto">
<name>Task 2: Create healthProbeService unit tests</name>
<files>
backend/src/__tests__/unit/healthProbeService.test.ts
</files>
<action>
Create `backend/src/__tests__/unit/healthProbeService.test.ts` using the established Vitest mock pattern.
Mock all external dependencies:
- `vi.mock('../../models/HealthCheckModel')` — mock `create()` to resolve successfully
- `vi.mock('../../config/supabase')` — mock `getPostgresPool()` returning `{ query: vi.fn() }`
- `vi.mock('@google-cloud/documentai')` — mock `DocumentProcessorServiceClient` with `listProcessors` resolving
- `vi.mock('@anthropic-ai/sdk')` — mock `Anthropic` constructor, `messages.create` resolving
- `vi.mock('firebase-admin')` — mock `auth().verifyIdToken()` throwing expected error
- `vi.mock('../../utils/logger')` — mock logger
Test cases for `runAllProbes`:
1. **All probes healthy — returns 4 ProbeResults with status 'healthy'** — all mocks resolve quickly, verify 4 results returned with status 'healthy'
2. **Each result persisted via HealthCheckModel.create** — verify `HealthCheckModel.create` called 4 times with correct service_name values: 'document_ai', 'llm_api', 'supabase', 'firebase_auth'
3. **One probe throws — others still run** — make Document AI mock throw, verify 3 other probes still complete and all 4 HealthCheckModel.create calls happen (the failed probe creates a 'down' result)
4. **LLM probe 429 error returns 'degraded' not 'down'** — make Anthropic mock throw error with '429' in message, verify result status is 'degraded'
5. **Supabase probe uses getPostgresPool not getSupabaseServiceClient** — verify `getPostgresPool` was called (not getSupabaseServiceClient) during Supabase probe
6. **Firebase Auth probe — expected error = healthy** — mock verifyIdToken throwing 'Decoding Firebase ID token failed' (argument error), verify status is 'healthy'
7. **Firebase Auth probe — unexpected error = down** — mock verifyIdToken throwing network error, verify status is 'down'
8. **Latency measured correctly** — use `vi.useFakeTimers()` or verify `latency_ms` is a non-negative number
Use `beforeEach(() => vi.clearAllMocks())`.
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/healthProbeService.test.ts --reporter=verbose 2>&1</automated>
</verify>
<done>All healthProbeService tests pass. Probes verified as making real API calls (mocked). Orchestrator verified as fault-tolerant (one probe failure doesn't stop others). Results verified as persisted via HealthCheckModel.create(). Supabase probe uses getPostgresPool, not PostgREST.</done>
</task>
</tasks>
<verification>
1. `npm ls nodemailer` shows nodemailer installed
2. `npx tsc --noEmit` passes
3. `npx vitest run src/__tests__/unit/healthProbeService.test.ts` — all tests pass
4. healthProbeService.ts does NOT use getSupabaseServiceClient for the Supabase probe (uses getPostgresPool)
5. LLM probe uses 'claude-haiku-4-5' not an expensive model
</verification>
<success_criteria>
- nodemailer and @types/nodemailer installed in backend/package.json
- healthProbeService exports ProbeResult and healthProbeService.runAllProbes
- 4 probes: document_ai, llm_api, supabase, firebase_auth
- Each probe returns structured ProbeResult with status/latency_ms/error_message
- Probe results persisted via HealthCheckModel.create()
- Individual probe failures isolated (other probes still run)
- All unit tests pass
</success_criteria>
<output>
After completion, create `.planning/phases/02-backend-services/02-02-SUMMARY.md`
</output>