cim_summary/.planning/milestones/v1.0-phases/02-backend-services/02-02-PLAN.md

---
phase: 02-backend-services
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
  - backend/package.json
  - backend/src/services/healthProbeService.ts
  - backend/src/__tests__/unit/healthProbeService.test.ts
autonomous: true
requirements: [HLTH-02, HLTH-04]

must_haves:
  truths:
    - "Each probe makes a real authenticated API call (Document AI list processors, Anthropic minimal message, Supabase SELECT 1 via pg pool, Firebase Auth verifyIdToken)"
    - "Each probe returns a structured ProbeResult with service_name, status, latency_ms, and optional error_message"
    - "Probe results are persisted to Supabase via HealthCheckModel.create()"
    - "A single probe failure does not prevent other probes from running"
    - "LLM probe uses cheapest model (claude-haiku-4-5) with max_tokens 5"
    - "Supabase probe uses getPostgresPool().query('SELECT 1'), not PostgREST client"
  artifacts:
    - path: "backend/src/services/healthProbeService.ts"
      provides: "Health probe orchestrator with 4 individual probers"
      exports: ["healthProbeService", "ProbeResult"]
    - path: "backend/src/__tests__/unit/healthProbeService.test.ts"
      provides: "Unit tests for all probes and orchestrator"
      min_lines: 80
  key_links:
    - from: "backend/src/services/healthProbeService.ts"
      to: "backend/src/models/HealthCheckModel.ts"
      via: "HealthCheckModel.create() for persistence"
      pattern: "HealthCheckModel\\.create"
    - from: "backend/src/services/healthProbeService.ts"
      to: "backend/src/config/supabase.ts"
      via: "getPostgresPool() for Supabase probe"
      pattern: "getPostgresPool"
---

<objective>
Create the health probe service with four real API probers (Document AI, LLM, Supabase, Firebase Auth) and an orchestrator that runs all probes and persists results.

Purpose: HLTH-02 requires real authenticated API calls (not config checks), and HLTH-04 requires results to persist to Supabase. This plan builds the probe logic and persistence layer.

Output: healthProbeService.ts with 4 probers + runAllProbes orchestrator, and unit tests. Also installs nodemailer (needed by Plan 03).
</objective>

<execution_context>
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
@/home/jonathan/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-backend-services/02-RESEARCH.md
@.planning/phases/01-data-foundation/01-01-SUMMARY.md
@backend/src/models/HealthCheckModel.ts
@backend/src/config/supabase.ts
@backend/src/services/documentAiProcessor.ts
@backend/src/services/llmService.ts
@backend/src/config/firebase.ts
</context>

<tasks>

<task type="auto">
  <name>Task 1: Install nodemailer and create healthProbeService</name>
  <files>
    backend/package.json
    backend/src/services/healthProbeService.ts
  </files>
  <action>
**Step 1: Install nodemailer** (needed by Plan 03, installing now to avoid package.json conflicts in parallel execution):
```bash
cd backend && npm install nodemailer && npm install --save-dev @types/nodemailer
```

**Step 2: Create healthProbeService.ts** with the following structure:

Export a `ProbeResult` interface:
```typescript
export interface ProbeResult {
  service_name: string;
  status: 'healthy' | 'degraded' | 'down';
  latency_ms: number;
  error_message?: string;
  probe_details?: Record<string, unknown>;
}
```

Create 4 individual probe functions (all private/unexported):

1. **probeDocumentAI()**: Import `DocumentProcessorServiceClient` from `@google-cloud/documentai`. Call `client.listProcessors({ parent: ... })` using the project ID from config. Latency > 2000ms = 'degraded'. Catch errors = 'down' with error_message.

2. **probeLLM()**: Import `Anthropic` from `@anthropic-ai/sdk`. Create client with `process.env.ANTHROPIC_API_KEY`. Call `client.messages.create({ model: 'claude-haiku-4-5', max_tokens: 5, messages: [{ role: 'user', content: 'Hi' }] })`. Use cheapest model (PITFALL B prevention). Latency > 5000ms = 'degraded'. 429 errors = 'degraded' (rate limit, not down). Other errors = 'down'.

3. **probeSupabase()**: Import `getPostgresPool` from `'../config/supabase'`. Call `pool.query('SELECT 1')`. Use direct PostgreSQL, NOT PostgREST (PITFALL C prevention). Latency > 2000ms = 'degraded'. Errors = 'down'.

4. **probeFirebaseAuth()**: Import `admin` from `firebase-admin` (or use the existing firebase config). Call `admin.auth().verifyIdToken('invalid-token-probe-check')`. This ALWAYS throws. If error message contains 'argument' or 'INVALID' = 'healthy' (SDK is alive). Other errors = 'down'.

Create `runAllProbes()` as the orchestrator:
- Wrap each probe in individual try/catch (PITFALL E: one probe failure must not stop others)
- For each ProbeResult, call `HealthCheckModel.create({ service_name, status, latency_ms, error_message, probe_details, checked_at: new Date().toISOString() })`
- Return array of all ProbeResults
- Log summary via Winston logger

Export as object: `export const healthProbeService = { runAllProbes }`.

Use Winston logger for all logging. Use `getSupabaseServiceClient()` per-method pattern for any Supabase calls (though probes use `getPostgresPool()` directly for the Supabase probe).
  </action>
  <verify>
    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
    <manual>Verify healthProbeService.ts exists with runAllProbes and ProbeResult exports</manual>
  </verify>
  <done>nodemailer installed. healthProbeService.ts exports ProbeResult interface and healthProbeService object with runAllProbes(). Four probes make real API calls. Each probe wrapped in try/catch. Results persisted via HealthCheckModel.create(). TypeScript compiles.</done>
</task>

<task type="auto">
  <name>Task 2: Create healthProbeService unit tests</name>
  <files>
    backend/src/__tests__/unit/healthProbeService.test.ts
  </files>
  <action>
Create `backend/src/__tests__/unit/healthProbeService.test.ts` using the established Vitest mock pattern.

Mock all external dependencies:
- `vi.mock('../../models/HealthCheckModel')` — mock `create()` to resolve successfully
- `vi.mock('../../config/supabase')` — mock `getPostgresPool()` returning `{ query: vi.fn() }`
- `vi.mock('@google-cloud/documentai')` — mock `DocumentProcessorServiceClient` with `listProcessors` resolving
- `vi.mock('@anthropic-ai/sdk')` — mock `Anthropic` constructor, `messages.create` resolving
- `vi.mock('firebase-admin')` — mock `auth().verifyIdToken()` throwing expected error
- `vi.mock('../../utils/logger')` — mock logger

Test cases for `runAllProbes`:
1. **All probes healthy — returns 4 ProbeResults with status 'healthy'** — all mocks resolve quickly, verify 4 results returned with status 'healthy'
2. **Each result persisted via HealthCheckModel.create** — verify `HealthCheckModel.create` called 4 times with correct service_name values: 'document_ai', 'llm_api', 'supabase', 'firebase_auth'
3. **One probe throws — others still run** — make Document AI mock throw, verify 3 other probes still complete and all 4 HealthCheckModel.create calls happen (the failed probe creates a 'down' result)
4. **LLM probe 429 error returns 'degraded' not 'down'** — make Anthropic mock throw error with '429' in message, verify result status is 'degraded'
5. **Supabase probe uses getPostgresPool not getSupabaseServiceClient** — verify `getPostgresPool` was called (not getSupabaseServiceClient) during Supabase probe
6. **Firebase Auth probe — expected error = healthy** — mock verifyIdToken throwing 'Decoding Firebase ID token failed' (argument error), verify status is 'healthy'
7. **Firebase Auth probe — unexpected error = down** — mock verifyIdToken throwing network error, verify status is 'down'
8. **Latency measured correctly** — use `vi.useFakeTimers()` or verify `latency_ms` is a non-negative number

Use `beforeEach(() => vi.clearAllMocks())`.
  </action>
  <verify>
    <automated>cd /home/jonathan/Coding/cim_summary/backend && npx vitest run src/__tests__/unit/healthProbeService.test.ts --reporter=verbose 2>&1</automated>
  </verify>
  <done>All healthProbeService tests pass. Probes verified as making real API calls (mocked). Orchestrator verified as fault-tolerant (one probe failure doesn't stop others). Results verified as persisted via HealthCheckModel.create(). Supabase probe uses getPostgresPool, not PostgREST.</done>
</task>

</tasks>

<verification>
1. `npm ls nodemailer` shows nodemailer installed
2. `npx tsc --noEmit` passes
3. `npx vitest run src/__tests__/unit/healthProbeService.test.ts` — all tests pass
4. healthProbeService.ts does NOT use getSupabaseServiceClient for the Supabase probe (uses getPostgresPool)
5. LLM probe uses 'claude-haiku-4-5' not an expensive model
</verification>

<success_criteria>
- nodemailer and @types/nodemailer installed in backend/package.json
- healthProbeService exports ProbeResult and healthProbeService.runAllProbes
- 4 probes: document_ai, llm_api, supabase, firebase_auth
- Each probe returns structured ProbeResult with status/latency_ms/error_message
- Probe results persisted via HealthCheckModel.create()
- Individual probe failures isolated (other probes still run)
- All unit tests pass
</success_criteria>

<output>
After completion, create `.planning/phases/02-backend-services/02-02-SUMMARY.md`
</output>