admin/claude-skills

Fork 0

Files

History

admin 2a3dedde11

2026-01-30 03:04:10 +00:00

api.md

2026-01-30 03:04:10 +00:00

configuration.md

2026-01-30 03:04:10 +00:00

gotchas.md

2026-01-30 03:04:10 +00:00

patterns.md

2026-01-30 03:04:10 +00:00

README.md

2026-01-30 03:04:10 +00:00

README.md

Cloudflare R2 Data Catalog Skill Reference

Expert guidance for Cloudflare R2 Data Catalog - Apache Iceberg catalog built into R2 buckets.

Reading Order

New to R2 Data Catalog? Start here:

Read "What is R2 Data Catalog?" and "When to Use" below
configuration.md - Enable catalog, create tokens
patterns.md - PyIceberg setup and common patterns
api.md - REST API reference as needed
gotchas.md - Troubleshooting when issues arise

Quick reference? Jump to:

What is R2 Data Catalog?

R2 Data Catalog is a managed Apache Iceberg REST catalog built directly into R2 buckets. It provides:

Apache Iceberg tables - ACID transactions, schema evolution, time-travel queries
Zero-egress costs - Query from any cloud/region without data transfer fees
Standard REST API - Works with Spark, PyIceberg, Snowflake, Trino, DuckDB
No infrastructure - Fully managed, no catalog servers to run
Public beta - Available to all R2 subscribers, no extra cost beyond R2 storage

What is Apache Iceberg?

Open table format for analytics datasets in object storage. Features:

ACID transactions - Safe concurrent reads/writes
Metadata optimization - Fast queries without full scans
Schema evolution - Add/rename/delete columns without rewrites
Time-travel - Query historical snapshots
Partitioning - Organize data for efficient queries

When to Use

Use R2 Data Catalog for:

Log analytics - Store and query application/system logs
Data lakes/warehouses - Analytical datasets queried by multiple engines
BI pipelines - Aggregate data for dashboards and reports
Multi-cloud analytics - Share data across clouds without egress fees
Time-series data - Event streams, metrics, sensor data

Don't use for:

Transactional workloads - Use D1 or external database instead
Sub-second latency - Iceberg optimized for batch/analytical queries
Small datasets (<1GB) - Setup overhead not worth it
Unstructured data - Store files directly in R2, not as Iceberg tables

Architecture

┌─────────────────────────────────────────────────┐
│  Query Engines                                  │
│  (PyIceberg, Spark, Trino, Snowflake, DuckDB)  │
└────────────────┬────────────────────────────────┘
                 │
                 │ REST API (OAuth2 token)
                 ▼
┌─────────────────────────────────────────────────┐
│  R2 Data Catalog (Managed Iceberg REST Catalog)│
│  • Namespace/table metadata                     │
│  • Transaction coordination                     │
│  • Snapshot management                          │
└────────────────┬────────────────────────────────┘
                 │
                 │ Vended credentials
                 ▼
┌─────────────────────────────────────────────────┐
│  R2 Bucket Storage                              │
│  • Parquet data files                           │
│  • Metadata files                               │
│  • Manifest files                               │
└─────────────────────────────────────────────────┘

Key concepts:

Catalog URI - REST endpoint for catalog operations (e.g., https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket>)
Warehouse - Logical grouping of tables (typically same as bucket name)
Namespace - Schema/database containing tables (e.g., logs, analytics)
Table - Iceberg table with schema, data files, snapshots
Vended credentials - Temporary S3 credentials catalog provides for data access

Limits

Resource	Limit	Notes
Namespaces per catalog	No hard limit	Organize tables logically
Tables per namespace	<10,000 recommended	Performance degrades beyond this
Files per table	<100,000 recommended	Run compaction regularly
Snapshots per table	Configurable retention	Expire >7 days old
Partitions per table	100-1,000 optimal	Too many = slow metadata ops
Table size	Same as R2 bucket	10GB-10TB+ common
API rate limits	Standard R2 API limits	Shared with R2 storage operations
Target file size	128-512 MB	After compaction

Current Status

Public Beta (as of Jan 2026)

Available to all R2 subscribers
No extra cost beyond standard R2 storage/operations
Production-ready, but breaking changes possible
Supports: namespaces, tables, snapshots, compaction, time-travel, table maintenance

Decision Tree: Is R2 Data Catalog Right For You?

Start → Need analytics on object storage data?
         │
         ├─ No → Use R2 directly for object storage
         │
         └─ Yes → Dataset >1GB with structured schema?
                  │
                  ├─ No → Too small, use R2 + ad-hoc queries
                  │
                  └─ Yes → Need ACID transactions or schema evolution?
                           │
                           ├─ No → Consider simpler solutions (Parquet on R2)
                           │
                           └─ Yes → Need multi-cloud/multi-tool access?
                                    │
                                    ├─ No → D1 or external DB may be simpler
                                    │
                                    └─ Yes → ✅ Use R2 Data Catalog

Quick check: If you answer "yes" to all:

Dataset >1GB and growing
Structured/tabular data (logs, events, metrics)
Multiple query tools or cloud environments
Need versioning, schema changes, or concurrent access

→ R2 Data Catalog is a good fit.

In This Reference

configuration.md - Enable catalog, create API tokens, connect clients
api.md - REST endpoints, operations, maintenance
patterns.md - PyIceberg examples, common use cases
gotchas.md - Troubleshooting, best practices, limitations

README.md

Cloudflare R2 Data Catalog Skill Reference

Reading Order

What is R2 Data Catalog?

What is Apache Iceberg?

When to Use

Architecture

Limits

Current Status

Decision Tree: Is R2 Data Catalog Right For You?

In This Reference

See Also