Cloudflare Pipelines
ETL streaming platform for ingesting, transforming, and loading data into R2 with SQL transformations.
Overview
Pipelines provides:
- Streams: Durable event buffers (HTTP/Workers ingestion)
- Pipelines: SQL-based transformations
- Sinks: R2 destinations (Iceberg tables or Parquet/JSON files)
Status: Open beta (Workers Paid plan)
Pricing: No charge beyond standard R2 storage/operations
Architecture
Data Sources → Streams → Pipelines (SQL) → Sinks → R2
↑ ↓ ↓
HTTP/Workers Transform Iceberg/Parquet
| Component | Purpose | Key Feature |
|---|---|---|
| Streams | Event ingestion | Structured (validated) or unstructured |
| Pipelines | Transform with SQL | Immutable after creation |
| Sinks | Write to R2 | Exactly-once delivery |
Quick Start
# Interactive setup (recommended)
npx wrangler pipelines setup
Minimal Worker example:
interface Env {
STREAM: Pipeline;
}
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const event = { user_id: "123", event_type: "purchase", amount: 29.99 };
// Fire-and-forget pattern
ctx.waitUntil(env.STREAM.send([event]));
return new Response('OK');
}
} satisfies ExportedHandler<Env>;
Which Sink Type?
Need SQL queries on data?
→ R2 Data Catalog (Iceberg)
✅ ACID transactions, time-travel, schema evolution
❌ More setup complexity (namespace, table, catalog token)
Just file storage/archival?
→ R2 Storage (Parquet)
✅ Simple, direct file access
❌ No built-in SQL queries
Using external tools (Spark/Athena)?
→ R2 Storage (Parquet with partitioning)
✅ Standard format, partition pruning for performance
❌ Must manage schema compatibility yourself
Common Use Cases
- Analytics pipelines: Clickstream, telemetry, server logs
- Data warehousing: ETL into queryable Iceberg tables
- Event processing: Mobile/IoT with enrichment
- Ecommerce analytics: User events, purchases, views
Reading Order
New to Pipelines? Start here:
- configuration.md - Setup streams, sinks, pipelines
- api.md - Send events, TypeScript types, SQL functions
- patterns.md - Best practices, integrations, complete example
- gotchas.md - Critical warnings, troubleshooting
Task-based routing:
- Setup pipeline → configuration.md
- Send/query data → api.md
- Implement pattern → patterns.md
- Debug issue → gotchas.md
In This Reference
- configuration.md - wrangler.jsonc bindings, schema definition, sink options, CLI commands
- api.md - Pipeline binding interface, send() method, HTTP ingest, SQL function reference
- patterns.md - Fire-and-forget, schema validation with Zod, integrations, performance tuning
- gotchas.md - Silent validation failures, immutable pipelines, latency expectations, limits