chore: complete v1.0 Analytics & Monitoring milestone

Archive milestone artifacts (roadmap, requirements, audit, phase directories)
to .planning/milestones/. Evolve PROJECT.md with validated requirements and
decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
admin
2026-02-25 10:34:18 -05:00
parent 8bad951d63
commit 38a0f0619d
39 changed files with 299 additions and 186 deletions

View File

@@ -0,0 +1,197 @@
---
phase: 02-backend-services
plan: 04
type: execute
wave: 3
depends_on: [02-01, 02-02, 02-03]
files_modified:
- backend/src/index.ts
autonomous: true
requirements: [HLTH-03, INFR-03]
must_haves:
truths:
- "runHealthProbes Cloud Function export runs on 'every 5 minutes' schedule, completely separate from processDocumentJobs"
- "runRetentionCleanup Cloud Function export runs on 'every monday 02:00' schedule"
- "runHealthProbes calls healthProbeService.runAllProbes() and then alertService.evaluateAndAlert()"
- "runRetentionCleanup deletes from service_health_checks, alert_events, and document_processing_events older than 30 days"
- "Both exports list required Firebase secrets in their secrets array"
- "Both exports use dynamic import() pattern (same as processDocumentJobs)"
artifacts:
- path: "backend/src/index.ts"
provides: "Two new onSchedule Cloud Function exports"
exports: ["runHealthProbes", "runRetentionCleanup"]
key_links:
- from: "backend/src/index.ts (runHealthProbes)"
to: "backend/src/services/healthProbeService.ts"
via: "dynamic import('./services/healthProbeService')"
pattern: "import\\('./services/healthProbeService'\\)"
- from: "backend/src/index.ts (runHealthProbes)"
to: "backend/src/services/alertService.ts"
via: "dynamic import('./services/alertService')"
pattern: "import\\('./services/alertService'\\)"
- from: "backend/src/index.ts (runRetentionCleanup)"
to: "backend/src/models/HealthCheckModel.ts"
via: "dynamic import for deleteOlderThan(30)"
pattern: "HealthCheckModel\\.deleteOlderThan"
- from: "backend/src/index.ts (runRetentionCleanup)"
to: "backend/src/services/analyticsService.ts"
via: "dynamic import for deleteProcessingEventsOlderThan(30)"
pattern: "deleteProcessingEventsOlderThan"
---
<objective>
Add two new Firebase Cloud Function scheduled exports to index.ts: runHealthProbes (every 5 minutes) and runRetentionCleanup (weekly).
Purpose: HLTH-03 requires health probes to run on a schedule separate from document processing (PITFALL-2). INFR-03 requires 30-day rolling data retention cleanup on schedule.
Output: Two new onSchedule exports in backend/src/index.ts.
</objective>
<execution_context>
@/home/jonathan/.claude/get-shit-done/workflows/execute-plan.md
@/home/jonathan/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-backend-services/02-RESEARCH.md
@.planning/phases/02-backend-services/02-01-PLAN.md
@.planning/phases/02-backend-services/02-02-PLAN.md
@.planning/phases/02-backend-services/02-03-PLAN.md
@backend/src/index.ts
</context>
<tasks>
<task type="auto">
<name>Task 1: Add runHealthProbes scheduled Cloud Function export</name>
<files>
backend/src/index.ts
</files>
<action>
Add a new `onSchedule` export to `backend/src/index.ts` AFTER the existing `processDocumentJobs` export. Follow the exact same pattern as `processDocumentJobs`.
```typescript
// Health probe scheduler — separate from document processing (PITFALL-2, HLTH-03)
export const runHealthProbes = onSchedule({
schedule: 'every 5 minutes',
timeoutSeconds: 60,
memory: '256MiB',
retryCount: 0, // Probes should not retry — they run again in 5 minutes anyway
secrets: [
anthropicApiKey, // for LLM probe
openaiApiKey, // for OpenAI probe fallback
databaseUrl, // for Supabase probe
supabaseServiceKey,
supabaseAnonKey,
],
}, async (_event) => {
const { healthProbeService } = await import('./services/healthProbeService');
const { alertService } = await import('./services/alertService');
const results = await healthProbeService.runAllProbes();
await alertService.evaluateAndAlert(results);
logger.info('runHealthProbes: complete', {
probeCount: results.length,
statuses: results.map(r => ({ service: r.service_name, status: r.status })),
});
});
```
Key requirements:
- Use dynamic `import()` (not static import at top of file) — same pattern as processDocumentJobs
- List ALL secrets that probes need in the `secrets` array (Firebase Secrets must be explicitly listed per function)
- Use the existing `anthropicApiKey`, `openaiApiKey`, `databaseUrl`, `supabaseServiceKey`, `supabaseAnonKey` variables already defined via `defineSecret` at the top of index.ts
- Set `retryCount: 0` — probes run every 5 minutes, no need to retry failures
- First call `runAllProbes()` to measure and persist, then `evaluateAndAlert()` to check for alerts
- Log a summary with probe count and statuses
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
<manual>Verify index.ts has `export const runHealthProbes` as a separate export from processDocumentJobs</manual>
</verify>
<done>runHealthProbes export added to index.ts. Runs every 5 minutes. Calls healthProbeService.runAllProbes() then alertService.evaluateAndAlert(). Uses dynamic imports. Lists all required secrets. TypeScript compiles.</done>
</task>
<task type="auto">
<name>Task 2: Add runRetentionCleanup scheduled Cloud Function export</name>
<files>
backend/src/index.ts
</files>
<action>
Add a second `onSchedule` export to `backend/src/index.ts` AFTER runHealthProbes.
```typescript
// Retention cleanup — weekly, separate from document processing (PITFALL-7, INFR-03)
export const runRetentionCleanup = onSchedule({
schedule: 'every monday 02:00',
timeoutSeconds: 120,
memory: '256MiB',
secrets: [databaseUrl, supabaseServiceKey, supabaseAnonKey],
}, async (_event) => {
const { HealthCheckModel } = await import('./models/HealthCheckModel');
const { AlertEventModel } = await import('./models/AlertEventModel');
const { deleteProcessingEventsOlderThan } = await import('./services/analyticsService');
const RETENTION_DAYS = 30;
const [hcCount, alertCount, eventCount] = await Promise.all([
HealthCheckModel.deleteOlderThan(RETENTION_DAYS),
AlertEventModel.deleteOlderThan(RETENTION_DAYS),
deleteProcessingEventsOlderThan(RETENTION_DAYS),
]);
logger.info('runRetentionCleanup: complete', {
retentionDays: RETENTION_DAYS,
deletedHealthChecks: hcCount,
deletedAlerts: alertCount,
deletedProcessingEvents: eventCount,
});
});
```
Key requirements:
- Use dynamic `import()` for all model and service imports
- Run all three deletes in parallel with `Promise.all()` (they touch different tables)
- Only include the secrets needed for Supabase access (no LLM keys needed for cleanup)
- Set `timeoutSeconds: 120` (cleanup may take longer than probes)
- The 30-day retention period is a constant, not configurable via env (matches INFR-03 spec)
- Only manage monitoring tables: service_health_checks, alert_events, document_processing_events. Do NOT delete from performance_metrics, session_events, or execution_events (those are agentic RAG tables, out of scope per research Open Question 4)
- Log the count of deleted rows from each table
</action>
<verify>
<automated>cd /home/jonathan/Coding/cim_summary/backend && npx tsc --noEmit --pretty 2>&1 | head -30</automated>
<manual>Verify index.ts has `export const runRetentionCleanup` as a separate export. Verify it calls deleteOlderThan on all three tables.</manual>
</verify>
<done>runRetentionCleanup export added to index.ts. Runs weekly Monday 02:00. Deletes from service_health_checks, alert_events, and document_processing_events older than 30 days. Uses Promise.all for parallel execution. Logs deletion counts. TypeScript compiles.</done>
</task>
</tasks>
<verification>
1. `npx tsc --noEmit` passes
2. `grep 'export const runHealthProbes' backend/src/index.ts` returns a match
3. `grep 'export const runRetentionCleanup' backend/src/index.ts` returns a match
4. Both exports use `onSchedule` (not piggybacked on processDocumentJobs — PITFALL-2 compliance)
5. Both exports use dynamic `import()` pattern
6. Full test suite still passes: `npx vitest run --reporter=verbose`
</verification>
<success_criteria>
- runHealthProbes is a separate onSchedule export running every 5 minutes
- runRetentionCleanup is a separate onSchedule export running weekly Monday 02:00
- Both are completely decoupled from processDocumentJobs
- runHealthProbes calls runAllProbes() then evaluateAndAlert()
- runRetentionCleanup calls deleteOlderThan(30) on all three monitoring tables
- All required Firebase secrets listed in each function's secrets array
- TypeScript compiles with no errors
- Existing test suite passes with no regressions
</success_criteria>
<output>
After completion, create `.planning/phases/02-backend-services/02-04-SUMMARY.md`
</output>