Files
cim_summary/.planning/phases/03-api-layer/03-RESEARCH.md
2026-02-24 15:33:12 -05:00

27 KiB

Phase 3: API Layer - Research

Researched: 2026-02-24 Domain: Express.js admin route construction, Firebase Auth middleware, Supabase analytics queries Confidence: HIGH


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Response shape & contracts

  • Analytics endpoint accepts a configurable time range via query param (e.g., ?range=24h, ?range=7d) with a sensible default
  • Field naming convention: match whatever the existing codebase already uses (camelCase or snake_case) — stay consistent

Auth & error behavior

  • Non-admin users receive 404 on admin endpoints — do not reveal that admin routes exist
  • Unauthenticated requests: Claude decides whether to return 401 or same 404 based on existing auth middleware patterns

Analytics instrumentation

  • Best-effort with logging: emit events asynchronously, log failures, but never let instrumentation errors propagate to processing
  • Key milestones only — upload started, processing complete, processing failed (not every pipeline stage)
  • Include duration/timing data per event — enables avg processing time metric in the analytics endpoint

Endpoint conventions

  • Route prefix: match existing Express app patterns
  • Acknowledge semantics: Claude decides (one-way, toggle, or with note — whatever fits best)

Claude's Discretion

  • Envelope pattern vs direct data for API responses
  • Health endpoint detail level (flat status vs nested with last-check times)
  • Admin role mechanism (Firebase custom claims vs Supabase role check vs other)
  • Unauthenticated request handling (401 vs 404)
  • Alert pagination strategy
  • Alert filtering support
  • Rate limiting on admin endpoints

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>


<phase_requirements>

Phase Requirements

ID Description Research Support
INFR-02 Admin API routes protected by Firebase Auth with admin email check Firebase Auth verifyFirebaseToken middleware exists; need requireAdmin layer that checks req.user.email against process.env.EMAIL_WEEKLY_RECIPIENT (already configured for alerts) or a dedicated ADMIN_EMAIL env var
HLTH-01 Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth HealthCheckModel.findLatestByService() already exists; need a query across all four service names or a loop; service names must match what healthProbeService writes
ANLY-02 Admin can view processing summary: upload counts, success/failure rates, avg processing time document_processing_events table exists with event_type, duration_ms, created_at; need a Supabase aggregation query grouped by event_type over a time window; recordProcessingEvent() must be called from jobProcessorService.processJob() (not yet called there)
</phase_requirements>

Summary

Phase 3 is entirely additive — it exposes data from Phase 1 and Phase 2 via admin-protected HTTP endpoints, and instruments the existing jobProcessorService.processJob() method with fire-and-forget analytics calls. No database schema changes are needed; all tables and models exist.

The three technical sub-problems are: (1) a two-layer auth middleware — Firebase token verification (existing verifyFirebaseToken) plus an admin email check (new, 5-10 lines); (2) three new route handlers reading from HealthCheckModel, AlertEventModel, and a new getAnalyticsSummary() function in analyticsService; and (3) inserting recordProcessingEvent() calls at three points inside processJob() without altering success/failure semantics.

The codebase is well-factored and consistent: route files live in backend/src/routes/, middleware in backend/src/middleware/, service functions in backend/src/services/. The existing verifyFirebaseToken middleware plus a new requireAdminEmail middleware compose cleanly onto the new /admin router. The existing { success: true, data: ..., correlationId: ... } envelope is the established pattern and should be followed.

Primary recommendation: Add adminRoutes.ts to the existing routes directory, mount it at /admin in index.ts, compose verifyFirebaseToken + requireAdminEmail as router-level middleware, and wire three handlers to existing model/service methods. Instrument processJob() at job-start, completion, and failure using the existing recordProcessingEvent() signature.


Standard Stack

Core

Library Version Purpose Why Standard
express already in use Router, Request/Response types Project standard
firebase-admin already in use Token verification (verifyIdToken) Existing auth layer
@supabase/supabase-js already in use Database reads via getSupabaseServiceClient() Project data layer

Supporting

Library Version Purpose When to Use
(none new) All needed libraries already present No new npm installs required

Alternatives Considered

Instead of Could Use Tradeoff
Email-based admin check Firebase custom claims Custom claims require Firebase Admin SDK setCustomUserClaims() call — more setup; email check works with zero additional config since EMAIL_WEEKLY_RECIPIENT is already defined
Email-based admin check Supabase role column Cross-system lookup adds latency and a new dependency; email check is synchronous against the already-decoded token

Installation: No new packages needed.


Architecture Patterns

backend/src/
├── routes/
│   ├── admin.ts           # NEW — /admin router with health, analytics, alerts endpoints
│   ├── documents.ts       # existing
│   ├── monitoring.ts      # existing
│   └── ...
├── middleware/
│   ├── firebaseAuth.ts    # existing — verifyFirebaseToken
│   ├── requireAdmin.ts    # NEW — requireAdminEmail middleware (10-15 lines)
│   └── ...
├── services/
│   ├── analyticsService.ts  # extend — add getAnalyticsSummary() query function
│   ├── jobProcessorService.ts  # modify — add recordProcessingEvent() calls
│   └── ...
└── index.ts               # modify — mount /admin routes

Pattern 1: Two-Layer Admin Auth Middleware

What: verifyFirebaseToken handles token signature + expiry; requireAdminEmail checks that req.user.email equals the configured admin email. Admin routes apply both in sequence.

When to use: All /admin/* routes.

Example:

// backend/src/middleware/requireAdmin.ts
import { Response, NextFunction } from 'express';
import { FirebaseAuthenticatedRequest } from './firebaseAuth';
import { logger } from '../utils/logger';

const ADMIN_EMAIL = process.env['ADMIN_EMAIL'] ?? process.env['EMAIL_WEEKLY_RECIPIENT'];

export function requireAdminEmail(
  req: FirebaseAuthenticatedRequest,
  res: Response,
  next: NextFunction
): void {
  const userEmail = req.user?.email;

  if (!userEmail || userEmail !== ADMIN_EMAIL) {
    // 404 — do not reveal admin routes exist (per locked decision)
    logger.warn('requireAdminEmail: access denied', {
      uid: req.user?.uid ?? 'unauthenticated',
      email: userEmail ?? 'none',
      path: req.path,
    });
    res.status(404).json({ error: 'Not found' });
    return;
  }

  next();
}

Unauthenticated handling: verifyFirebaseToken already returns 401 for missing/invalid tokens. Since it runs first, unauthenticated requests never reach requireAdminEmail. The 404 behavior (hiding admin routes) only applies to authenticated non-admin users — this is consistent with the existing middleware chain. No change needed to verifyFirebaseToken.

Pattern 2: Admin Router Construction

What: A dedicated Express Router with both middleware applied at router level, then individual route handlers.

When to use: All admin endpoints.

Example:

// backend/src/routes/admin.ts
import { Router, Request, Response } from 'express';
import { verifyFirebaseToken } from '../middleware/firebaseAuth';
import { requireAdminEmail } from '../middleware/requireAdmin';
import { addCorrelationId } from '../middleware/validation';
import { HealthCheckModel } from '../models/HealthCheckModel';
import { AlertEventModel } from '../models/AlertEventModel';
import { getAnalyticsSummary } from '../services/analyticsService';
import { logger } from '../utils/logger';

const router = Router();

// Auth chain: verify Firebase token, then assert admin email
router.use(verifyFirebaseToken);
router.use(requireAdminEmail);
router.use(addCorrelationId);

const SERVICE_NAMES = ['document_ai', 'llm', 'supabase', 'firebase_auth'] as const;

router.get('/health', async (req: Request, res: Response): Promise<void> => {
  try {
    const results = await Promise.all(
      SERVICE_NAMES.map(name => HealthCheckModel.findLatestByService(name))
    );
    const health = SERVICE_NAMES.map((name, i) => ({
      service: name,
      status: results[i]?.status ?? 'unknown',
      checkedAt: results[i]?.checked_at ?? null,
      latencyMs: results[i]?.latency_ms ?? null,
      errorMessage: results[i]?.error_message ?? null,
    }));
    res.json({ success: true, data: health, correlationId: req.correlationId });
  } catch (error) {
    logger.error('GET /admin/health failed', { error, correlationId: req.correlationId });
    res.status(500).json({ success: false, error: 'Health query failed', correlationId: req.correlationId });
  }
});

Pattern 3: Analytics Summary Query

What: A new getAnalyticsSummary(range: string) function in analyticsService.ts that queries document_processing_events aggregated over a time window. Supabase JS client does not support COUNT/AVG aggregations directly — use the Postgres pool (getPostgresPool().query()) for aggregate SQL, consistent with how runRetentionCleanup and the scheduled function's health check already use the pool.

When to use: GET /admin/analytics?range=24h

Range parsing: 24h24 hours, 7d7 days. Default: 24h.

Example:

// backend/src/services/analyticsService.ts (addition)
import { getPostgresPool } from '../config/supabase';

export interface AnalyticsSummary {
  range: string;
  totalUploads: number;
  succeeded: number;
  failed: number;
  successRate: number;
  avgProcessingMs: number | null;
  generatedAt: string;
}

export async function getAnalyticsSummary(range: string): Promise<AnalyticsSummary> {
  const interval = parseRange(range); // '24h' -> '24 hours', '7d' -> '7 days'
  const pool = getPostgresPool();

  const { rows } = await pool.query<{
    total_uploads: string;
    succeeded: string;
    failed: string;
    avg_processing_ms: string | null;
  }>(`
    SELECT
      COUNT(*) FILTER (WHERE event_type = 'upload_started')    AS total_uploads,
      COUNT(*) FILTER (WHERE event_type = 'completed')         AS succeeded,
      COUNT(*) FILTER (WHERE event_type = 'failed')            AS failed,
      AVG(duration_ms) FILTER (WHERE event_type = 'completed') AS avg_processing_ms
    FROM document_processing_events
    WHERE created_at >= NOW() - INTERVAL $1
  `, [interval]);

  const row = rows[0]!;
  const total = parseInt(row.total_uploads, 10);
  const succeeded = parseInt(row.succeeded, 10);
  const failed = parseInt(row.failed, 10);

  return {
    range,
    totalUploads: total,
    succeeded,
    failed,
    successRate: total > 0 ? succeeded / total : 0,
    avgProcessingMs: row.avg_processing_ms ? parseFloat(row.avg_processing_ms) : null,
    generatedAt: new Date().toISOString(),
  };
}

function parseRange(range: string): string {
  if (/^\d+h$/.test(range)) return range.replace('h', ' hours');
  if (/^\d+d$/.test(range)) return range.replace('d', ' days');
  return '24 hours'; // fallback
}

Pattern 4: Analytics Instrumentation in jobProcessorService

What: Three recordProcessingEvent() calls in processJob() at existing lifecycle points. The function signature already matches — document_id, user_id, event_type, optional duration_ms and error_message. The return type is void (not Promise<void>) so no await is possible.

Key instrumentation points:

  1. After ProcessingJobModel.markAsProcessing(jobId) — emit upload_started (no duration)
  2. After ProcessingJobModel.markAsCompleted(...) — emit completed with duration_ms = Date.now() - startTime
  3. In the catch block before ProcessingJobModel.markAsFailed(...) — emit failed with duration_ms and error_message

Example:

// In processJob(), after markAsProcessing:
recordProcessingEvent({
  document_id: job.document_id,
  user_id: job.user_id,
  event_type: 'upload_started',
});

// After markAsCompleted:
recordProcessingEvent({
  document_id: job.document_id,
  user_id: job.user_id,
  event_type: 'completed',
  duration_ms: Date.now() - startTime,
});

// In catch, before markAsFailed:
recordProcessingEvent({
  document_id: job.document_id,
  user_id: job.user_id ?? '',
  event_type: 'failed',
  duration_ms: Date.now() - startTime,
  error_message: errorMessage,
});

Constraint: job may be null in the catch block if findById failed. Guard with job?.document_id or skip instrumentation when job is null (it's already handled by the early return in that case).

Pattern 5: Alert Acknowledge Semantics

Decision: One-way acknowledge (active → acknowledged). AlertEventModel.acknowledge(id) already implements exactly this. No toggle, no note field. The endpoint returns the updated alert object.

router.post('/alerts/:id/acknowledge', async (req: Request, res: Response): Promise<void> => {
  const { id } = req.params;
  try {
    const updated = await AlertEventModel.acknowledge(id);
    res.json({ success: true, data: updated, correlationId: req.correlationId });
  } catch (error) {
    // AlertEventModel.acknowledge throws a specific error when id not found
    const msg = error instanceof Error ? error.message : String(error);
    if (msg.includes('not found')) {
      res.status(404).json({ success: false, error: 'Alert not found', correlationId: req.correlationId });
      return;
    }
    logger.error('POST /admin/alerts/:id/acknowledge failed', { id, error: msg });
    res.status(500).json({ success: false, error: 'Acknowledge failed', correlationId: req.correlationId });
  }
});

Anti-Patterns to Avoid

  • Awaiting recordProcessingEvent(): Its return type is void, not Promise<void>. Calling await recordProcessingEvent(...) is a TypeScript error and would break the fire-and-forget guarantee.
  • Supabase JS .select() for aggregates: Supabase JS client does not support SQL aggregate functions (COUNT, AVG). Use getPostgresPool().query() for analytics queries.
  • Caching admin email at module level: Firebase Secrets are not available at module load time. Read process.env['ADMIN_EMAIL'] inside the middleware function, not at the top of the file — or use lazy evaluation. The alertService precedent (creating transporter inside function scope) demonstrates this pattern.
  • Revealing admin routes to non-admin users: Never return 403 on admin routes — always return 404 to unauthenticated/non-admin callers (per locked decision). Since verifyFirebaseToken runs first and returns 401 for unauthenticated requests, unauthenticated callers get 401 (expected, token verification precedes admin check). Authenticated non-admin callers get 404.
  • Mutating existing processJob() logic: Analytics calls go around existing markAsProcessing, markAsCompleted, markAsFailed calls — never replacing or wrapping them.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Token verification Custom JWT validation verifyFirebaseToken (already exists) Handles expiry, revocation, recovery from session
Health data retrieval Raw SQL or in-memory aggregation HealthCheckModel.findLatestByService() (already exists) Validated input, proper error handling, same pattern as Phase 2
Alert CRUD New Supabase queries AlertEventModel.findActive(), AlertEventModel.acknowledge() (already exist) Consistent error handling, deduplication-aware
Correlation IDs Custom header logic addCorrelationId middleware (already exists) Applied at router level like other route files

Key insight: Phase 3 is primarily composition, not construction. Nearly all data access is through existing models. The only new code is the admin router, the admin email middleware, the getAnalyticsSummary() function, and three recordProcessingEvent() call sites.


Common Pitfalls

Pitfall 1: Admin Email Source

What goes wrong: ADMIN_EMAIL env var is not defined; admin check silently passes (if the check is email === undefined) or silently blocks all admin access. Why it happens: The codebase uses EMAIL_WEEKLY_RECIPIENT for the alert recipient — there is no ADMIN_EMAIL variable yet. If ADMIN_EMAIL is not set and the check falls back to undefined, email !== undefined would always be true (blocking all) or the inverse. How to avoid: Read ADMIN_EMAIL ?? EMAIL_WEEKLY_RECIPIENT as fallback. Log a logger.warn at startup/first call if neither is defined. If neither is set, fail closed (deny all admin access) with a logged warning. Warning signs: Admin endpoints return 404 even when authenticated with the correct email.

Pitfall 2: Service Name Mismatch on Health Endpoint

What goes wrong: GET /admin/health returns status: null / checkedAt: null for all services because the service names in the query don't match what healthProbeService writes. Why it happens: HealthCheckModel.findLatestByService(serviceName) does an exact string match. If the route handler uses 'document-ai' but the probe writes 'document_ai', the join finds nothing. How to avoid: Read healthProbeService.ts to confirm the exact service name strings used in HealthCheckResult / passed to HealthCheckModel.create(). Use those exact strings in the admin route. Warning signs: Response data has status: 'unknown' for all services.

Pitfall 3: job.user_id Type in Analytics Instrumentation

What goes wrong: TypeScript error or runtime undefined when emitting recordProcessingEvent in the catch block. Why it happens: job can be null if ProcessingJobModel.findById() threw before job was assigned. The catch block handles all errors, including the pre-assignment path. How to avoid: Guard instrumentation with if (job) in the catch block. ProcessingEventData.user_id is typed as string, so pass job.user_id only when job is non-null. Warning signs: TypeScript compile error on job.user_id in catch block.

Pitfall 4: getPostgresPool() vs getSupabaseServiceClient() for Aggregates

What goes wrong: Using getSupabaseServiceClient().from('document_processing_events').select(...) for the analytics summary and getting back raw rows instead of aggregated counts. Why it happens: Supabase JS PostgREST client does not support SQL aggregate functions in the query builder. How to avoid: Use getPostgresPool().query(sql, params) for the analytics aggregate query, consistent with how processDocumentJobs scheduled function performs its DB health check and how cleanupOldData runs bulk deletes. Warning signs: getAnalyticsSummary returns row-level data instead of aggregated counts.

Pitfall 5: Route Registration Order in index.ts

What goes wrong: Admin routes conflict with or shadow existing routes. Why it happens: Express matches routes in registration order. Registering /admin before /documents is fine as long as there are no overlapping paths. How to avoid: Add app.use('/admin', adminRoutes) alongside the existing route registrations. The /admin prefix is unique — no conflicts expected. Warning signs: Existing document/monitoring routes stop working after adding admin routes.


Code Examples

Verified patterns from the existing codebase:

Existing Route File Pattern (from routes/monitoring.ts)

// Source: backend/src/routes/monitoring.ts
import { Router, Request, Response } from 'express';
import { addCorrelationId } from '../middleware/validation';
import { logger } from '../utils/logger';

const router = Router();
router.use(addCorrelationId);

router.get('/some-endpoint', async (req: Request, res: Response): Promise<void> => {
  try {
    // ... data access
    res.json({
      success: true,
      data: someData,
      correlationId: req.correlationId || undefined,
    });
  } catch (error) {
    logger.error('Failed', {
      category: 'monitoring',
      operation: 'some_op',
      error: error instanceof Error ? error.message : 'Unknown error',
      correlationId: req.correlationId || undefined,
    });
    res.status(500).json({
      success: false,
      error: 'Failed to retrieve data',
      correlationId: req.correlationId || undefined,
    });
  }
});

export default router;

Existing Middleware Pattern (from middleware/firebaseAuth.ts)

// Source: backend/src/middleware/firebaseAuth.ts
export interface FirebaseAuthenticatedRequest extends Request {
  user?: admin.auth.DecodedIdToken;
}

export const verifyFirebaseToken = async (
  req: FirebaseAuthenticatedRequest,
  res: Response,
  next: NextFunction
): Promise<void> => {
  // ... verifies token, sets req.user, calls next() or returns 401
};

Existing Model Pattern (from models/HealthCheckModel.ts)

// Source: backend/src/models/HealthCheckModel.ts
static async findLatestByService(serviceName: string): Promise<ServiceHealthCheck | null> {
  const supabase = getSupabaseServiceClient();
  const { data, error } = await supabase
    .from('service_health_checks')
    .select('*')
    .eq('service_name', serviceName)
    .order('checked_at', { ascending: false })
    .limit(1)
    .single();
  if (error?.code === 'PGRST116') return null;
  // ...
}

Existing Analytics Record Pattern (from services/analyticsService.ts)

// Source: backend/src/services/analyticsService.ts
// Return type is void (NOT Promise<void>) — prevents accidental await on critical path
export function recordProcessingEvent(data: ProcessingEventData): void {
  const supabase = getSupabaseServiceClient();
  void supabase
    .from('document_processing_events')
    .insert({ ... })
    .then(({ error }) => {
      if (error) logger.error('analyticsService: failed to insert processing event', { ... });
    });
}

Route Registration Pattern (from index.ts)

// Source: backend/src/index.ts
app.use('/documents', documentRoutes);
app.use('/vector', vectorRoutes);
app.use('/monitoring', monitoringRoutes);
app.use('/api/audit', auditRoutes);
// New:
app.use('/admin', adminRoutes);

State of the Art

Old Approach Current Approach When Changed Impact
Legacy auth middleware (auth.ts) Firebase Auth (firebaseAuth.ts) Pre-Phase 3 auth.ts is fully deprecated and returns 501 — do not use it
In-memory monitoring (uploadMonitoringService) Supabase-persisted health checks and analytics Phase 1-2 Admin endpoints must read from Supabase, not in-memory state
Direct console.log Winston logger (logger from utils/logger.ts) Pre-Phase 3 Always use logger.info/warn/error/debug

Deprecated/outdated:

  • backend/src/middleware/auth.ts: All exports (authenticateToken, requireAdmin, requireRole) return 501. Do not import. Use firebaseAuth.ts.
  • uploadMonitoringService: In-memory service. Not suitable for admin health dashboard — data does not survive cold starts.

Open Questions

  1. Exact service name strings written by healthProbeService

    • What we know: The service names come from whatever healthProbeService.ts passes to HealthCheckModel.create({ service_name: ... })
    • What's unclear: The exact strings — likely 'document_ai', 'llm', 'supabase', 'firebase_auth' but must be verified before writing the health handler
    • Recommendation: Read healthProbeService.ts during plan/implementation to confirm exact strings before writing SERVICE_NAMES constant in the admin route
  2. job.user_id field type confirmation

    • What we know: ProcessingEventData.user_id is typed as string; ProcessingJob model has user_id field
    • What's unclear: Whether ProcessingJob.user_id can ever be undefined/nullable in practice
    • Recommendation: Check ProcessingJobModel type definition during implementation; add defensive ?? '' if nullable
  3. Alert pagination for GET /admin/alerts

    • What we know: AlertEventModel.findActive() returns all active alerts without limit; for a single-admin system this is unlikely to be an issue
    • What's unclear: Whether a limit/offset param is needed
    • Recommendation: Claude's discretion — default to returning all active alerts (no pagination) given single-admin use case; add ?limit=N support as optional param using .limit() on the Supabase query

Sources

Primary (HIGH confidence)

  • Codebase: backend/src/middleware/firebaseAuth.ts — verifyFirebaseToken implementation, FirebaseAuthenticatedRequest interface, 401 error responses
  • Codebase: backend/src/models/HealthCheckModel.ts — findLatestByService, findAll, deleteOlderThan patterns
  • Codebase: backend/src/models/AlertEventModel.ts — findActive, acknowledge, resolve, findRecentByService patterns
  • Codebase: backend/src/services/analyticsService.ts — recordProcessingEvent (void return), deleteProcessingEventsOlderThan (pool.query pattern)
  • Codebase: backend/src/services/jobProcessorService.ts — processJob lifecycle: startTime capture, markAsProcessing, markAsCompleted, markAsFailed, catch block structure
  • Codebase: backend/src/routes/monitoring.ts — route file pattern, envelope shape { success, data, correlationId }
  • Codebase: backend/src/index.ts — route registration, Express app structure, existing /health endpoint shape
  • Codebase: backend/src/models/migrations/012_create_monitoring_tables.sql — exact column names for service_health_checks, alert_events
  • Codebase: backend/src/models/migrations/013_create_processing_events_table.sql — exact column names for document_processing_events

Secondary (MEDIUM confidence)

  • Codebase: backend/src/services/alertService.ts — pattern for reading process.env['EMAIL_WEEKLY_RECIPIENT'] inside function (not at module level) to avoid Firebase Secrets timing issue

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all libraries already in use; no new dependencies
  • Architecture: HIGH — patterns derived from existing codebase, not assumptions
  • Pitfalls: HIGH — three of five pitfalls are directly observable from reading the existing code
  • Open questions: LOW confidence only on exact service name strings (requires reading one more file)

Research date: 2026-02-24 Valid until: 2026-03-24 (stable codebase; valid until significant refactoring)