Archive milestone artifacts (roadmap, requirements, audit, phase directories) to .planning/milestones/. Evolve PROJECT.md with validated requirements and decision outcomes. Create MILESTONES.md and RETROSPECTIVE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
27 KiB
Phase 3: API Layer - Research
Researched: 2026-02-24 Domain: Express.js admin route construction, Firebase Auth middleware, Supabase analytics queries Confidence: HIGH
<user_constraints>
User Constraints (from CONTEXT.md)
Locked Decisions
Response shape & contracts
- Analytics endpoint accepts a configurable time range via query param (e.g.,
?range=24h,?range=7d) with a sensible default - Field naming convention: match whatever the existing codebase already uses (camelCase or snake_case) — stay consistent
Auth & error behavior
- Non-admin users receive 404 on admin endpoints — do not reveal that admin routes exist
- Unauthenticated requests: Claude decides whether to return 401 or same 404 based on existing auth middleware patterns
Analytics instrumentation
- Best-effort with logging: emit events asynchronously, log failures, but never let instrumentation errors propagate to processing
- Key milestones only — upload started, processing complete, processing failed (not every pipeline stage)
- Include duration/timing data per event — enables avg processing time metric in the analytics endpoint
Endpoint conventions
- Route prefix: match existing Express app patterns
- Acknowledge semantics: Claude decides (one-way, toggle, or with note — whatever fits best)
Claude's Discretion
- Envelope pattern vs direct data for API responses
- Health endpoint detail level (flat status vs nested with last-check times)
- Admin role mechanism (Firebase custom claims vs Supabase role check vs other)
- Unauthenticated request handling (401 vs 404)
- Alert pagination strategy
- Alert filtering support
- Rate limiting on admin endpoints
Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope </user_constraints>
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| INFR-02 | Admin API routes protected by Firebase Auth with admin email check | Firebase Auth verifyFirebaseToken middleware exists; need requireAdmin layer that checks req.user.email against process.env.EMAIL_WEEKLY_RECIPIENT (already configured for alerts) or a dedicated ADMIN_EMAIL env var |
| HLTH-01 | Admin can view live health status (healthy/degraded/down) for Document AI, Claude/OpenAI, Supabase, and Firebase Auth | HealthCheckModel.findLatestByService() already exists; need a query across all four service names or a loop; service names must match what healthProbeService writes |
| ANLY-02 | Admin can view processing summary: upload counts, success/failure rates, avg processing time | document_processing_events table exists with event_type, duration_ms, created_at; need a Supabase aggregation query grouped by event_type over a time window; recordProcessingEvent() must be called from jobProcessorService.processJob() (not yet called there) |
| </phase_requirements> |
Summary
Phase 3 is entirely additive — it exposes data from Phase 1 and Phase 2 via admin-protected HTTP endpoints, and instruments the existing jobProcessorService.processJob() method with fire-and-forget analytics calls. No database schema changes are needed; all tables and models exist.
The three technical sub-problems are: (1) a two-layer auth middleware — Firebase token verification (existing verifyFirebaseToken) plus an admin email check (new, 5-10 lines); (2) three new route handlers reading from HealthCheckModel, AlertEventModel, and a new getAnalyticsSummary() function in analyticsService; and (3) inserting recordProcessingEvent() calls at three points inside processJob() without altering success/failure semantics.
The codebase is well-factored and consistent: route files live in backend/src/routes/, middleware in backend/src/middleware/, service functions in backend/src/services/. The existing verifyFirebaseToken middleware plus a new requireAdminEmail middleware compose cleanly onto the new /admin router. The existing { success: true, data: ..., correlationId: ... } envelope is the established pattern and should be followed.
Primary recommendation: Add adminRoutes.ts to the existing routes directory, mount it at /admin in index.ts, compose verifyFirebaseToken + requireAdminEmail as router-level middleware, and wire three handlers to existing model/service methods. Instrument processJob() at job-start, completion, and failure using the existing recordProcessingEvent() signature.
Standard Stack
Core
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| express | already in use | Router, Request/Response types | Project standard |
| firebase-admin | already in use | Token verification (verifyIdToken) |
Existing auth layer |
| @supabase/supabase-js | already in use | Database reads via getSupabaseServiceClient() |
Project data layer |
Supporting
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| (none new) | — | All needed libraries already present | No new npm installs required |
Alternatives Considered
| Instead of | Could Use | Tradeoff |
|---|---|---|
| Email-based admin check | Firebase custom claims | Custom claims require Firebase Admin SDK setCustomUserClaims() call — more setup; email check works with zero additional config since EMAIL_WEEKLY_RECIPIENT is already defined |
| Email-based admin check | Supabase role column | Cross-system lookup adds latency and a new dependency; email check is synchronous against the already-decoded token |
Installation: No new packages needed.
Architecture Patterns
Recommended Project Structure
backend/src/
├── routes/
│ ├── admin.ts # NEW — /admin router with health, analytics, alerts endpoints
│ ├── documents.ts # existing
│ ├── monitoring.ts # existing
│ └── ...
├── middleware/
│ ├── firebaseAuth.ts # existing — verifyFirebaseToken
│ ├── requireAdmin.ts # NEW — requireAdminEmail middleware (10-15 lines)
│ └── ...
├── services/
│ ├── analyticsService.ts # extend — add getAnalyticsSummary() query function
│ ├── jobProcessorService.ts # modify — add recordProcessingEvent() calls
│ └── ...
└── index.ts # modify — mount /admin routes
Pattern 1: Two-Layer Admin Auth Middleware
What: verifyFirebaseToken handles token signature + expiry; requireAdminEmail checks that req.user.email equals the configured admin email. Admin routes apply both in sequence.
When to use: All /admin/* routes.
Example:
// backend/src/middleware/requireAdmin.ts
import { Response, NextFunction } from 'express';
import { FirebaseAuthenticatedRequest } from './firebaseAuth';
import { logger } from '../utils/logger';
const ADMIN_EMAIL = process.env['ADMIN_EMAIL'] ?? process.env['EMAIL_WEEKLY_RECIPIENT'];
export function requireAdminEmail(
req: FirebaseAuthenticatedRequest,
res: Response,
next: NextFunction
): void {
const userEmail = req.user?.email;
if (!userEmail || userEmail !== ADMIN_EMAIL) {
// 404 — do not reveal admin routes exist (per locked decision)
logger.warn('requireAdminEmail: access denied', {
uid: req.user?.uid ?? 'unauthenticated',
email: userEmail ?? 'none',
path: req.path,
});
res.status(404).json({ error: 'Not found' });
return;
}
next();
}
Unauthenticated handling: verifyFirebaseToken already returns 401 for missing/invalid tokens. Since it runs first, unauthenticated requests never reach requireAdminEmail. The 404 behavior (hiding admin routes) only applies to authenticated non-admin users — this is consistent with the existing middleware chain. No change needed to verifyFirebaseToken.
Pattern 2: Admin Router Construction
What: A dedicated Express Router with both middleware applied at router level, then individual route handlers.
When to use: All admin endpoints.
Example:
// backend/src/routes/admin.ts
import { Router, Request, Response } from 'express';
import { verifyFirebaseToken } from '../middleware/firebaseAuth';
import { requireAdminEmail } from '../middleware/requireAdmin';
import { addCorrelationId } from '../middleware/validation';
import { HealthCheckModel } from '../models/HealthCheckModel';
import { AlertEventModel } from '../models/AlertEventModel';
import { getAnalyticsSummary } from '../services/analyticsService';
import { logger } from '../utils/logger';
const router = Router();
// Auth chain: verify Firebase token, then assert admin email
router.use(verifyFirebaseToken);
router.use(requireAdminEmail);
router.use(addCorrelationId);
const SERVICE_NAMES = ['document_ai', 'llm', 'supabase', 'firebase_auth'] as const;
router.get('/health', async (req: Request, res: Response): Promise<void> => {
try {
const results = await Promise.all(
SERVICE_NAMES.map(name => HealthCheckModel.findLatestByService(name))
);
const health = SERVICE_NAMES.map((name, i) => ({
service: name,
status: results[i]?.status ?? 'unknown',
checkedAt: results[i]?.checked_at ?? null,
latencyMs: results[i]?.latency_ms ?? null,
errorMessage: results[i]?.error_message ?? null,
}));
res.json({ success: true, data: health, correlationId: req.correlationId });
} catch (error) {
logger.error('GET /admin/health failed', { error, correlationId: req.correlationId });
res.status(500).json({ success: false, error: 'Health query failed', correlationId: req.correlationId });
}
});
Pattern 3: Analytics Summary Query
What: A new getAnalyticsSummary(range: string) function in analyticsService.ts that queries document_processing_events aggregated over a time window. Supabase JS client does not support COUNT/AVG aggregations directly — use the Postgres pool (getPostgresPool().query()) for aggregate SQL, consistent with how runRetentionCleanup and the scheduled function's health check already use the pool.
When to use: GET /admin/analytics?range=24h
Range parsing: 24h → 24 hours, 7d → 7 days. Default: 24h.
Example:
// backend/src/services/analyticsService.ts (addition)
import { getPostgresPool } from '../config/supabase';
export interface AnalyticsSummary {
range: string;
totalUploads: number;
succeeded: number;
failed: number;
successRate: number;
avgProcessingMs: number | null;
generatedAt: string;
}
export async function getAnalyticsSummary(range: string): Promise<AnalyticsSummary> {
const interval = parseRange(range); // '24h' -> '24 hours', '7d' -> '7 days'
const pool = getPostgresPool();
const { rows } = await pool.query<{
total_uploads: string;
succeeded: string;
failed: string;
avg_processing_ms: string | null;
}>(`
SELECT
COUNT(*) FILTER (WHERE event_type = 'upload_started') AS total_uploads,
COUNT(*) FILTER (WHERE event_type = 'completed') AS succeeded,
COUNT(*) FILTER (WHERE event_type = 'failed') AS failed,
AVG(duration_ms) FILTER (WHERE event_type = 'completed') AS avg_processing_ms
FROM document_processing_events
WHERE created_at >= NOW() - INTERVAL $1
`, [interval]);
const row = rows[0]!;
const total = parseInt(row.total_uploads, 10);
const succeeded = parseInt(row.succeeded, 10);
const failed = parseInt(row.failed, 10);
return {
range,
totalUploads: total,
succeeded,
failed,
successRate: total > 0 ? succeeded / total : 0,
avgProcessingMs: row.avg_processing_ms ? parseFloat(row.avg_processing_ms) : null,
generatedAt: new Date().toISOString(),
};
}
function parseRange(range: string): string {
if (/^\d+h$/.test(range)) return range.replace('h', ' hours');
if (/^\d+d$/.test(range)) return range.replace('d', ' days');
return '24 hours'; // fallback
}
Pattern 4: Analytics Instrumentation in jobProcessorService
What: Three recordProcessingEvent() calls in processJob() at existing lifecycle points. The function signature already matches — document_id, user_id, event_type, optional duration_ms and error_message. The return type is void (not Promise<void>) so no await is possible.
Key instrumentation points:
- After
ProcessingJobModel.markAsProcessing(jobId)— emitupload_started(no duration) - After
ProcessingJobModel.markAsCompleted(...)— emitcompletedwithduration_ms = Date.now() - startTime - In the catch block before
ProcessingJobModel.markAsFailed(...)— emitfailedwithduration_msanderror_message
Example:
// In processJob(), after markAsProcessing:
recordProcessingEvent({
document_id: job.document_id,
user_id: job.user_id,
event_type: 'upload_started',
});
// After markAsCompleted:
recordProcessingEvent({
document_id: job.document_id,
user_id: job.user_id,
event_type: 'completed',
duration_ms: Date.now() - startTime,
});
// In catch, before markAsFailed:
recordProcessingEvent({
document_id: job.document_id,
user_id: job.user_id ?? '',
event_type: 'failed',
duration_ms: Date.now() - startTime,
error_message: errorMessage,
});
Constraint: job may be null in the catch block if findById failed. Guard with job?.document_id or skip instrumentation when job is null (it's already handled by the early return in that case).
Pattern 5: Alert Acknowledge Semantics
Decision: One-way acknowledge (active → acknowledged). AlertEventModel.acknowledge(id) already implements exactly this. No toggle, no note field. The endpoint returns the updated alert object.
router.post('/alerts/:id/acknowledge', async (req: Request, res: Response): Promise<void> => {
const { id } = req.params;
try {
const updated = await AlertEventModel.acknowledge(id);
res.json({ success: true, data: updated, correlationId: req.correlationId });
} catch (error) {
// AlertEventModel.acknowledge throws a specific error when id not found
const msg = error instanceof Error ? error.message : String(error);
if (msg.includes('not found')) {
res.status(404).json({ success: false, error: 'Alert not found', correlationId: req.correlationId });
return;
}
logger.error('POST /admin/alerts/:id/acknowledge failed', { id, error: msg });
res.status(500).json({ success: false, error: 'Acknowledge failed', correlationId: req.correlationId });
}
});
Anti-Patterns to Avoid
- Awaiting
recordProcessingEvent(): Its return type isvoid, notPromise<void>. Callingawait recordProcessingEvent(...)is a TypeScript error and would break the fire-and-forget guarantee. - Supabase JS
.select()for aggregates: Supabase JS client does not support SQL aggregate functions (COUNT,AVG). UsegetPostgresPool().query()for analytics queries. - Caching admin email at module level: Firebase Secrets are not available at module load time. Read
process.env['ADMIN_EMAIL']inside the middleware function, not at the top of the file — or use lazy evaluation. The alertService precedent (creating transporter inside function scope) demonstrates this pattern. - Revealing admin routes to non-admin users: Never return 403 on admin routes — always return 404 to unauthenticated/non-admin callers (per locked decision). Since
verifyFirebaseTokenruns first and returns 401 for unauthenticated requests, unauthenticated callers get 401 (expected, token verification precedes admin check). Authenticated non-admin callers get 404. - Mutating existing
processJob()logic: Analytics calls go around existingmarkAsProcessing,markAsCompleted,markAsFailedcalls — never replacing or wrapping them.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Token verification | Custom JWT validation | verifyFirebaseToken (already exists) |
Handles expiry, revocation, recovery from session |
| Health data retrieval | Raw SQL or in-memory aggregation | HealthCheckModel.findLatestByService() (already exists) |
Validated input, proper error handling, same pattern as Phase 2 |
| Alert CRUD | New Supabase queries | AlertEventModel.findActive(), AlertEventModel.acknowledge() (already exist) |
Consistent error handling, deduplication-aware |
| Correlation IDs | Custom header logic | addCorrelationId middleware (already exists) |
Applied at router level like other route files |
Key insight: Phase 3 is primarily composition, not construction. Nearly all data access is through existing models. The only new code is the admin router, the admin email middleware, the getAnalyticsSummary() function, and three recordProcessingEvent() call sites.
Common Pitfalls
Pitfall 1: Admin Email Source
What goes wrong: ADMIN_EMAIL env var is not defined; admin check silently passes (if the check is email === undefined) or silently blocks all admin access.
Why it happens: The codebase uses EMAIL_WEEKLY_RECIPIENT for the alert recipient — there is no ADMIN_EMAIL variable yet. If ADMIN_EMAIL is not set and the check falls back to undefined, email !== undefined would always be true (blocking all) or the inverse.
How to avoid: Read ADMIN_EMAIL ?? EMAIL_WEEKLY_RECIPIENT as fallback. Log a logger.warn at startup/first call if neither is defined. If neither is set, fail closed (deny all admin access) with a logged warning.
Warning signs: Admin endpoints return 404 even when authenticated with the correct email.
Pitfall 2: Service Name Mismatch on Health Endpoint
What goes wrong: GET /admin/health returns status: null / checkedAt: null for all services because the service names in the query don't match what healthProbeService writes.
Why it happens: HealthCheckModel.findLatestByService(serviceName) does an exact string match. If the route handler uses 'document-ai' but the probe writes 'document_ai', the join finds nothing.
How to avoid: Read healthProbeService.ts to confirm the exact service name strings used in HealthCheckResult / passed to HealthCheckModel.create(). Use those exact strings in the admin route.
Warning signs: Response data has status: 'unknown' for all services.
Pitfall 3: job.user_id Type in Analytics Instrumentation
What goes wrong: TypeScript error or runtime undefined when emitting recordProcessingEvent in the catch block.
Why it happens: job can be null if ProcessingJobModel.findById() threw before job was assigned. The catch block handles all errors, including the pre-assignment path.
How to avoid: Guard instrumentation with if (job) in the catch block. ProcessingEventData.user_id is typed as string, so pass job.user_id only when job is non-null.
Warning signs: TypeScript compile error on job.user_id in catch block.
Pitfall 4: getPostgresPool() vs getSupabaseServiceClient() for Aggregates
What goes wrong: Using getSupabaseServiceClient().from('document_processing_events').select(...) for the analytics summary and getting back raw rows instead of aggregated counts.
Why it happens: Supabase JS PostgREST client does not support SQL aggregate functions in the query builder.
How to avoid: Use getPostgresPool().query(sql, params) for the analytics aggregate query, consistent with how processDocumentJobs scheduled function performs its DB health check and how cleanupOldData runs bulk deletes.
Warning signs: getAnalyticsSummary returns row-level data instead of aggregated counts.
Pitfall 5: Route Registration Order in index.ts
What goes wrong: Admin routes conflict with or shadow existing routes.
Why it happens: Express matches routes in registration order. Registering /admin before /documents is fine as long as there are no overlapping paths.
How to avoid: Add app.use('/admin', adminRoutes) alongside the existing route registrations. The /admin prefix is unique — no conflicts expected.
Warning signs: Existing document/monitoring routes stop working after adding admin routes.
Code Examples
Verified patterns from the existing codebase:
Existing Route File Pattern (from routes/monitoring.ts)
// Source: backend/src/routes/monitoring.ts
import { Router, Request, Response } from 'express';
import { addCorrelationId } from '../middleware/validation';
import { logger } from '../utils/logger';
const router = Router();
router.use(addCorrelationId);
router.get('/some-endpoint', async (req: Request, res: Response): Promise<void> => {
try {
// ... data access
res.json({
success: true,
data: someData,
correlationId: req.correlationId || undefined,
});
} catch (error) {
logger.error('Failed', {
category: 'monitoring',
operation: 'some_op',
error: error instanceof Error ? error.message : 'Unknown error',
correlationId: req.correlationId || undefined,
});
res.status(500).json({
success: false,
error: 'Failed to retrieve data',
correlationId: req.correlationId || undefined,
});
}
});
export default router;
Existing Middleware Pattern (from middleware/firebaseAuth.ts)
// Source: backend/src/middleware/firebaseAuth.ts
export interface FirebaseAuthenticatedRequest extends Request {
user?: admin.auth.DecodedIdToken;
}
export const verifyFirebaseToken = async (
req: FirebaseAuthenticatedRequest,
res: Response,
next: NextFunction
): Promise<void> => {
// ... verifies token, sets req.user, calls next() or returns 401
};
Existing Model Pattern (from models/HealthCheckModel.ts)
// Source: backend/src/models/HealthCheckModel.ts
static async findLatestByService(serviceName: string): Promise<ServiceHealthCheck | null> {
const supabase = getSupabaseServiceClient();
const { data, error } = await supabase
.from('service_health_checks')
.select('*')
.eq('service_name', serviceName)
.order('checked_at', { ascending: false })
.limit(1)
.single();
if (error?.code === 'PGRST116') return null;
// ...
}
Existing Analytics Record Pattern (from services/analyticsService.ts)
// Source: backend/src/services/analyticsService.ts
// Return type is void (NOT Promise<void>) — prevents accidental await on critical path
export function recordProcessingEvent(data: ProcessingEventData): void {
const supabase = getSupabaseServiceClient();
void supabase
.from('document_processing_events')
.insert({ ... })
.then(({ error }) => {
if (error) logger.error('analyticsService: failed to insert processing event', { ... });
});
}
Route Registration Pattern (from index.ts)
// Source: backend/src/index.ts
app.use('/documents', documentRoutes);
app.use('/vector', vectorRoutes);
app.use('/monitoring', monitoringRoutes);
app.use('/api/audit', auditRoutes);
// New:
app.use('/admin', adminRoutes);
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| Legacy auth middleware (auth.ts) | Firebase Auth (firebaseAuth.ts) | Pre-Phase 3 | auth.ts is fully deprecated and returns 501 — do not use it |
| In-memory monitoring (uploadMonitoringService) | Supabase-persisted health checks and analytics | Phase 1-2 | Admin endpoints must read from Supabase, not in-memory state |
Direct console.log |
Winston logger (logger from utils/logger.ts) |
Pre-Phase 3 | Always use logger.info/warn/error/debug |
Deprecated/outdated:
backend/src/middleware/auth.ts: All exports (authenticateToken,requireAdmin,requireRole) return 501. Do not import. UsefirebaseAuth.ts.uploadMonitoringService: In-memory service. Not suitable for admin health dashboard — data does not survive cold starts.
Open Questions
-
Exact service name strings written by healthProbeService
- What we know: The service names come from whatever
healthProbeService.tspasses toHealthCheckModel.create({ service_name: ... }) - What's unclear: The exact strings — likely
'document_ai','llm','supabase','firebase_auth'but must be verified before writing the health handler - Recommendation: Read
healthProbeService.tsduring plan/implementation to confirm exact strings before writingSERVICE_NAMESconstant in the admin route
- What we know: The service names come from whatever
-
job.user_idfield type confirmation- What we know:
ProcessingEventData.user_idis typed asstring;ProcessingJobmodel hasuser_idfield - What's unclear: Whether
ProcessingJob.user_idcan ever beundefined/nullable in practice - Recommendation: Check
ProcessingJobModeltype definition during implementation; add defensive?? ''if nullable
- What we know:
-
Alert pagination for GET /admin/alerts
- What we know:
AlertEventModel.findActive()returns all active alerts without limit; for a single-admin system this is unlikely to be an issue - What's unclear: Whether a limit/offset param is needed
- Recommendation: Claude's discretion — default to returning all active alerts (no pagination) given single-admin use case; add
?limit=Nsupport as optional param using.limit()on the Supabase query
- What we know:
Sources
Primary (HIGH confidence)
- Codebase:
backend/src/middleware/firebaseAuth.ts— verifyFirebaseToken implementation, FirebaseAuthenticatedRequest interface, 401 error responses - Codebase:
backend/src/models/HealthCheckModel.ts— findLatestByService, findAll, deleteOlderThan patterns - Codebase:
backend/src/models/AlertEventModel.ts— findActive, acknowledge, resolve, findRecentByService patterns - Codebase:
backend/src/services/analyticsService.ts— recordProcessingEvent (void return), deleteProcessingEventsOlderThan (pool.query pattern) - Codebase:
backend/src/services/jobProcessorService.ts— processJob lifecycle: startTime capture, markAsProcessing, markAsCompleted, markAsFailed, catch block structure - Codebase:
backend/src/routes/monitoring.ts— route file pattern, envelope shape{ success, data, correlationId } - Codebase:
backend/src/index.ts— route registration, Express app structure, existing/healthendpoint shape - Codebase:
backend/src/models/migrations/012_create_monitoring_tables.sql— exact column names for service_health_checks, alert_events - Codebase:
backend/src/models/migrations/013_create_processing_events_table.sql— exact column names for document_processing_events
Secondary (MEDIUM confidence)
- Codebase:
backend/src/services/alertService.ts— pattern for readingprocess.env['EMAIL_WEEKLY_RECIPIENT']inside function (not at module level) to avoid Firebase Secrets timing issue
Metadata
Confidence breakdown:
- Standard stack: HIGH — all libraries already in use; no new dependencies
- Architecture: HIGH — patterns derived from existing codebase, not assumptions
- Pitfalls: HIGH — three of five pitfalls are directly observable from reading the existing code
- Open questions: LOW confidence only on exact service name strings (requires reading one more file)
Research date: 2026-02-24 Valid until: 2026-03-24 (stable codebase; valid until significant refactoring)