This commit is contained in:
24
tinybird-best-practices/rules/append-data.md
Normal file
24
tinybird-best-practices/rules/append-data.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Append Data
|
||||
|
||||
Tinybird CLI supports three ways to append data to an existing datasource: local file, remote URL, or events payload.
|
||||
|
||||
## CLI: tb datasource append
|
||||
|
||||
```
|
||||
tb datasource append [datasource_name] --file /path/to/local/file
|
||||
```
|
||||
|
||||
```
|
||||
tb datasource append [datasource_name] --url https://url_to_csv
|
||||
```
|
||||
|
||||
```
|
||||
tb datasource append [datasource_name] --events '{"a":"b", "c":"d"}'
|
||||
```
|
||||
|
||||
Notes:
|
||||
- The command appends to an existing datasource.
|
||||
- Use `tb --cloud datasource append` to target Cloud; Local is the default.
|
||||
- For ingesting data from Kafka, S3 or GCS, see: https://www.tinybird.co/docs/forward/get-data-in/connectors
|
||||
|
||||
You can also send POST request to v0/events (streaming) and v0/datasources (batch) endpoints.
|
||||
49
tinybird-best-practices/rules/build-deploy.md
Normal file
49
tinybird-best-practices/rules/build-deploy.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Build & Deploy
|
||||
|
||||
Use this rule to keep Tinybird Local and Tinybird Cloud in sync with your project files. Build updates Tinybird Local for fast iteration; deploy updates Tinybird Cloud for production or shared environments.
|
||||
|
||||
## When to use build vs deploy
|
||||
|
||||
- Use `tb build` when you need Tinybird Local updated with your latest datafiles.
|
||||
- Use `tb --cloud deploy` when you need to publish changes to Tinybird Cloud.
|
||||
- If you are unsure whether a resource is synced in Cloud, run `tb --cloud deploy --check` to see the differences between the local project files and Tinybird Cloud.
|
||||
|
||||
## Build to Tinybird Local (tb build)
|
||||
|
||||
- Builds your local project into Tinybird Local using the current project files.
|
||||
- Use after editing datafiles to validate syntax and dependencies before testing locally.
|
||||
- This does not deploy to Tinybird Cloud.
|
||||
|
||||
## Deploy to Tinybird Cloud (tb --cloud deploy)
|
||||
|
||||
- Deploys the current project files to Tinybird Cloud.
|
||||
- Run a deploy check first with `tb --cloud deploy --check` to validate without deploying.
|
||||
- Use only when the user explicitly requests a cloud deployment.
|
||||
- Ask for confirmation before deploying.
|
||||
|
||||
## Deploy check without deploying (tb --cloud deploy --check)
|
||||
|
||||
- Validates that the project can be deployed without creating a deployment.
|
||||
- Use before a real deploy to catch schema or dependency errors early.
|
||||
- Check if the local project differs from Tinybird Cloud.
|
||||
|
||||
## Destructive operations and flags
|
||||
|
||||
- Deleting datasources, pipes, or connections locally requires an explicit destructive deploy.
|
||||
- Use `tb --cloud deploy --allow-destructive-operations` only when the user confirms deletion or data loss is acceptable.
|
||||
- If you see warnings about deletions, stop and ask for confirmation before re-running with the flag.
|
||||
|
||||
Example:
|
||||
```
|
||||
tb --cloud deploy --allow-destructive-operations
|
||||
```
|
||||
|
||||
## Validation intent (why)
|
||||
|
||||
- Building keeps Tinybird Local aligned with local files for faster iteration.
|
||||
- Deploy checks reduce failed deployments by validating changes before publishing.
|
||||
|
||||
## What not to do
|
||||
|
||||
- Do not deploy destructive changes without `--allow-destructive-operations` and explicit user confirmation.
|
||||
- Do not assume Tinybird Cloud is updated after a local build; build and deploy are separate operations.
|
||||
128
tinybird-best-practices/rules/cli-commands.md
Normal file
128
tinybird-best-practices/rules/cli-commands.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Tinybird CLI Commands
|
||||
|
||||
**⚠️ Never invent commands or flags.** If you are unsure whether a command or flag exists, run `tb <command> --help` to verify before using it. Only use commands and flags documented here or confirmed via `--help`.
|
||||
|
||||
## Global Options
|
||||
|
||||
- `tb --cloud <command>`: Run command against Cloud (production)
|
||||
- `tb --local <command>`: Run command against Local (default)
|
||||
- `tb --branch <branch_name> <command>`: Run command against a specific branch
|
||||
- `tb --debug <command>`: Print debug information
|
||||
|
||||
## Project & Development
|
||||
|
||||
- `tb create`: Initialize a new project
|
||||
- `tb create --prompt "description"`: Create project from AI prompt
|
||||
- `tb info`: Show project information and CLI context
|
||||
- `tb build`: Validate and build the project
|
||||
- `tb build --watch`: Build and watch for changes
|
||||
- `tb dev`: Build and watch for changes with live reload
|
||||
- `tb dev --ui`: Connect local project to Tinybird UI
|
||||
- `tb open`: Open workspace in the browser
|
||||
- `tb fmt <file>`: Format a .datasource, .pipe, or .connection file
|
||||
- `tb fmt <file> --diff`: Show diff without modifying file
|
||||
|
||||
## Deploy & Deployments
|
||||
|
||||
- `tb deploy`: Deploy the project
|
||||
- `tb deploy --wait`: Wait for deployment to finish
|
||||
- `tb deploy --check`: Validate deployment without actually creating
|
||||
- `tb deployment ls`: List all deployments
|
||||
- `tb deployment create`: Create a staging deployment and validate before promoting
|
||||
- `tb deployment promote`: Promote a staging deployment to production
|
||||
- `tb deployment discard`: Discard a pending deployment
|
||||
|
||||
## Data Sources
|
||||
|
||||
- `tb datasource ls`: List all data sources
|
||||
- `tb datasource append <name> --file <path>`: Append data from local file
|
||||
- `tb datasource append <name> --url <url>`: Append data from URL
|
||||
- `tb datasource append <name> --events '<json>'`: Append JSON events
|
||||
- `tb datasource replace <name> <file_or_url>`: Full replace of data source
|
||||
- `tb datasource replace <name> <file_or_url> --sql-condition "<condition>"`: Selective replace
|
||||
- `tb datasource delete <name> --sql-condition "<condition>"`: Delete matching rows
|
||||
- `tb datasource delete <name> --sql-condition "<condition>" --wait`: Delete and wait for completion
|
||||
- `tb datasource truncate <name> --yes`: Delete all rows
|
||||
- `tb datasource truncate <name> --cascade --yes`: Truncate including dependent MVs
|
||||
- `tb datasource sync <name> --yes`: Sync from S3/GCS connection
|
||||
- `tb datasource export <name> --format csv`: Export data to file
|
||||
|
||||
## Pipes & Endpoints
|
||||
|
||||
- `tb pipe ls`: List all pipes
|
||||
- `tb endpoint ls`: List all endpoints
|
||||
- `tb endpoint data <pipe_name>`: Get data from endpoint
|
||||
- `tb endpoint data <pipe_name> --param_name value`: Get data with parameters
|
||||
- `tb endpoint stats <pipe_name>`: Show endpoint stats for last 7 days
|
||||
- `tb endpoint url <pipe_name>`: Print endpoint URL
|
||||
- `tb endpoint token <pipe_name>`: Get token to read endpoint
|
||||
|
||||
## SQL Queries
|
||||
|
||||
- `tb sql "<query>"`: Run SQL query
|
||||
- `tb sql "<query>" --stats`: Run query and show stats
|
||||
- `tb sql --pipe <path> --node <node_name>`: Run SQL from a specific pipe node
|
||||
|
||||
## Materializations & Copy Pipes
|
||||
|
||||
- `tb materialization ls`: List all materializations
|
||||
- `tb copy ls`: List all copy pipes
|
||||
- `tb copy run <pipe_name>`: Run a copy pipe manually
|
||||
- `tb copy run <pipe_name> --param key=value`: Run with parameters
|
||||
|
||||
## Testing
|
||||
|
||||
- `tb test run`: Run the full test suite
|
||||
- `tb test run <file_or_test>`: Run specific test file or test
|
||||
- `tb test create <pipe_name>`: Create a test for a pipe
|
||||
- `tb test update <file_or_test>`: Update test expectations
|
||||
|
||||
## Mock Data
|
||||
|
||||
- `tb mock <datasource>`: Generate sample data for a data source
|
||||
- `tb mock <datasource> --rows 100`: Generate specific number of rows
|
||||
- `tb mock <datasource> --prompt "extra context"`: Add context for generation
|
||||
|
||||
## Tokens & Secrets
|
||||
|
||||
- `tb token ls`: List all tokens
|
||||
- `tb secret ls`: List all secrets
|
||||
- `tb secret set <name> <value>`: Create or update a secret
|
||||
- `tb secret rm <name>`: Delete a secret
|
||||
|
||||
## Connections & Sinks
|
||||
|
||||
- `tb connection ls`: List all connections
|
||||
- `tb sink ls`: List all sinks
|
||||
|
||||
## Jobs
|
||||
|
||||
- `tb job ls`: List all jobs
|
||||
- `tb job cancel <job_id>`: Cancel a running job
|
||||
|
||||
## Branches (Beta)
|
||||
|
||||
- `tb branch ls`: List all branches
|
||||
- `tb branch create <name>`: Create a new branch
|
||||
- `tb branch rm <name>`: Remove a branch
|
||||
|
||||
## Tinybird Local
|
||||
|
||||
- `tb local start`: Start Tinybird Local container
|
||||
- `tb local stop`: Stop Tinybird Local
|
||||
- `tb local restart --yes`: Restart Tinybird Local
|
||||
- `tb local status`: Check Tinybird Local status
|
||||
- `tb local remove`: Remove Tinybird Local completely
|
||||
- `tb local version`: Show Tinybird Local version
|
||||
|
||||
## Workspace
|
||||
|
||||
- `tb workspace ls`: List all workspaces
|
||||
- `tb workspace current`: Show current workspace
|
||||
- `tb workspace clear --yes`: Delete all resources (Local only)
|
||||
|
||||
## Authentication
|
||||
|
||||
- `tb login`: Authenticate via browser
|
||||
- `tb logout`: Remove authentication
|
||||
- `tb update`: Update CLI to latest version
|
||||
37
tinybird-best-practices/rules/connection-files.md
Normal file
37
tinybird-best-practices/rules/connection-files.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Connection Files
|
||||
|
||||
- Content cannot be empty.
|
||||
- Connection names must be unique.
|
||||
- No indentation for property names.
|
||||
- Supported types: kafka, gcs, s3.
|
||||
- If user requests an unsupported type, report it and do not create it.
|
||||
|
||||
Kafka example:
|
||||
```
|
||||
TYPE kafka
|
||||
KAFKA_BOOTSTRAP_SERVERS {{ tb_secret("PRODUCTION_KAFKA_SERVERS", "localhost:9092") }}
|
||||
KAFKA_SECURITY_PROTOCOL SASL_SSL
|
||||
KAFKA_SASL_MECHANISM PLAIN
|
||||
KAFKA_KEY {{ tb_secret("PRODUCTION_KAFKA_USERNAME", "") }}
|
||||
KAFKA_SECRET {{ tb_secret("PRODUCTION_KAFKA_PASSWORD", "") }}
|
||||
```
|
||||
|
||||
S3 example:
|
||||
```
|
||||
TYPE s3
|
||||
S3_REGION {{ tb_secret("PRODUCTION_S3_REGION", "") }}
|
||||
S3_ARN {{ tb_secret("PRODUCTION_S3_ARN", "") }}
|
||||
```
|
||||
|
||||
GCS service account example:
|
||||
```
|
||||
TYPE gcs
|
||||
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("PRODUCTION_GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON", "") }}
|
||||
```
|
||||
|
||||
GCS HMAC example:
|
||||
```
|
||||
TYPE gcs
|
||||
GCS_HMAC_ACCESS_ID {{ tb_secret("gcs_hmac_access_id") }}
|
||||
GCS_HMAC_SECRET {{ tb_secret("gcs_hmac_secret") }}
|
||||
```
|
||||
25
tinybird-best-practices/rules/copy-files.md
Normal file
25
tinybird-best-practices/rules/copy-files.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Copy Pipe Files
|
||||
|
||||
- Do not create by default unless requested.
|
||||
- Create under `/copies`.
|
||||
- Do not include COPY_SCHEDULE unless explicitly requested.
|
||||
- Use TYPE COPY and TARGET_DATASOURCE.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
DESCRIPTION Copy Pipe to export sales hour every hour to the sales_hour_copy Data Source
|
||||
|
||||
NODE daily_sales
|
||||
SQL >
|
||||
%
|
||||
SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales
|
||||
FROM teams
|
||||
WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now())
|
||||
and country = {{ String(country, 'US')}}
|
||||
GROUP BY day, country
|
||||
|
||||
TYPE COPY
|
||||
TARGET_DATASOURCE sales_hour_copy
|
||||
COPY_SCHEDULE 0 * * * *
|
||||
```
|
||||
60
tinybird-best-practices/rules/data-operations.md
Normal file
60
tinybird-best-practices/rules/data-operations.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Data Operations (Replace & Delete)
|
||||
|
||||
Operations for updating and removing data from Data Sources.
|
||||
|
||||
## Delete Data Selectively
|
||||
|
||||
Delete rows matching a SQL condition:
|
||||
|
||||
```bash
|
||||
tb datasource delete events --sql-condition "toDate(date) >= '2019-11-01' AND toDate(date) <= '2019-11-30'"
|
||||
```
|
||||
|
||||
- Runs asynchronously (returns job ID); use `--wait` to block until complete
|
||||
- **Does not cascade** to downstream Materialized Views—delete from MVs separately
|
||||
- Requires ADMIN token scope
|
||||
- Safe to run while actively ingesting data
|
||||
|
||||
## Truncate Data Source
|
||||
|
||||
Delete all rows from a Data Source:
|
||||
|
||||
```bash
|
||||
tb datasource truncate events
|
||||
```
|
||||
|
||||
Use `--cascade` to also truncate dependent Data Sources attached via Materialized Views.
|
||||
|
||||
## Replace Data Selectively (Partial Replace)
|
||||
|
||||
Replace only data matching a condition:
|
||||
|
||||
```bash
|
||||
tb datasource replace events data.csv --sql-condition "toDate(date) >= '2019-11-01' AND toDate(date) <= '2019-11-30'"
|
||||
```
|
||||
|
||||
**⚠️ Critical**: Never replace data in partitions where you are actively ingesting. You may lose data inserted during the operation.
|
||||
|
||||
**Rules**:
|
||||
- **Always include the partition key** in the SQL condition
|
||||
- The condition determines: (1) which partitions to operate on, (2) which rows from new data to append
|
||||
- **Cascades automatically** to downstream Materialized Views (all must have compatible partition keys)
|
||||
- Schema of new data must match existing Data Source exactly
|
||||
|
||||
### Why Partition Key Matters
|
||||
|
||||
If your Data Source uses `ENGINE_PARTITION_KEY "country"` and you run:
|
||||
```bash
|
||||
tb datasource replace events data.csv --sql-condition "status='active'"
|
||||
```
|
||||
This will **not work as expected**—the replace process uses payload rows to identify partitions. Always match the partition key.
|
||||
|
||||
## Replace Data Completely (Full Replace)
|
||||
|
||||
Replace entire Data Source contents (no `--sql-condition`):
|
||||
|
||||
```bash
|
||||
tb datasource replace events data.csv
|
||||
```
|
||||
|
||||
**⚠️ Critical**: Do not run while actively ingesting—you may lose data.
|
||||
56
tinybird-best-practices/rules/datasource-files.md
Normal file
56
tinybird-best-practices/rules/datasource-files.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Datasource Files
|
||||
|
||||
- Content cannot be empty.
|
||||
- Datasource names must be unique.
|
||||
- No indentation for property names (DESCRIPTION, SCHEMA, ENGINE, etc.).
|
||||
- Use MergeTree by default.
|
||||
- Use AggregatingMergeTree for materialized targets.
|
||||
- Always use JSON paths for schema (example: `user_id` String `json:$.user_id`).
|
||||
- Array syntax: `items` Array(String) `json:$.items[:]`.
|
||||
- DateTime64 requires precision (use DateTime64(3)).
|
||||
- Only include ENGINE_PARTITION_KEY and ENGINE_PRIMARY_KEY when explicitly requested.
|
||||
- Import configuration:
|
||||
- S3/GCS: set IMPORT_CONNECTION_NAME, IMPORT_BUCKET_URI, IMPORT_SCHEDULE (GCS supports @on-demand only, S3 supports @auto).
|
||||
- Kafka: set KAFKA_CONNECTION_NAME, KAFKA_TOPIC, KAFKA_GROUP_ID.
|
||||
- For landing datasources created from a .ndjson file with no schema specified, use:
|
||||
- `SCHEMA >`
|
||||
- `` `data` String `json:$` ``
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
DESCRIPTION >
|
||||
Some meaningful description of the datasource
|
||||
|
||||
SCHEMA >
|
||||
`column_name_1` Type `json:$.column_name_1`,
|
||||
`column_name_2` Type `json:$.column_name_2`
|
||||
|
||||
ENGINE "MergeTree"
|
||||
ENGINE_PARTITION_KEY "partition_key"
|
||||
ENGINE_SORTING_KEY "sorting_key_1, sorting_key_2"
|
||||
```
|
||||
|
||||
## Updating Datasource Schemas (Cloud)
|
||||
|
||||
If a schema change is incompatible with the deployed Cloud datasource, add a FORWARD_QUERY to transform data to the new schema. The query is a SELECT list only (no FROM/WHERE). Use accurateCastOrDefault for lossy conversions.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
FORWARD_QUERY >
|
||||
SELECT timestamp, CAST(session_id, 'UUID') as session_id, action, version, payload
|
||||
```
|
||||
|
||||
## Sharing Datasources
|
||||
|
||||
```
|
||||
SHARED_WITH >
|
||||
destination_workspace,
|
||||
other_destination_workspace
|
||||
```
|
||||
|
||||
Limitations:
|
||||
- Shared datasources are read-only.
|
||||
- You cannot share a shared datasource.
|
||||
- You cannot create a materialized view from a shared datasource.
|
||||
112
tinybird-best-practices/rules/deduplication-patterns.md
Normal file
112
tinybird-best-practices/rules/deduplication-patterns.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# Deduplication and Lambda Architecture
|
||||
|
||||
Strategies for handling duplicates and combining batch with real-time processing.
|
||||
|
||||
## Deduplication Strategy Selection
|
||||
|
||||
| Strategy | When to use |
|
||||
|----------|-------------|
|
||||
| Query-time (`argMax`, `LIMIT BY`, subquery) | Prototyping or small datasets |
|
||||
| ReplacingMergeTree | Large datasets, need latest row per key |
|
||||
| Periodic snapshots (Copy Pipes) | Freshness not critical, need rollups or different sorting keys |
|
||||
| Lambda architecture | Need freshness + complex transformations that MVs can't handle |
|
||||
|
||||
For dimensional/small tables, periodic full replace is usually best.
|
||||
|
||||
## Query-time Deduplication
|
||||
|
||||
```sql
|
||||
-- argMax: get latest value per key
|
||||
SELECT post_id, argMax(views, updated_at) as views
|
||||
FROM posts GROUP BY post_id
|
||||
|
||||
-- LIMIT BY
|
||||
SELECT * FROM posts ORDER BY updated_at DESC LIMIT 1 BY post_id
|
||||
|
||||
-- Subquery
|
||||
SELECT * FROM posts WHERE (post_id, updated_at) IN (
|
||||
SELECT post_id, max(updated_at) FROM posts GROUP BY post_id
|
||||
)
|
||||
```
|
||||
|
||||
## ReplacingMergeTree
|
||||
|
||||
```
|
||||
ENGINE "ReplacingMergeTree"
|
||||
ENGINE_SORTING_KEY "unique_id"
|
||||
ENGINE_VER "updated_at"
|
||||
ENGINE_IS_DELETED "is_deleted" -- optional, UInt8: 1=deleted, 0=active
|
||||
```
|
||||
|
||||
- Always query with `FINAL` or use alternative deduplication method
|
||||
- Deduplication happens during merges (asynchronous, uncontrollable)
|
||||
- **Do not** build AggregatingMergeTree MVs on top of ReplacingMergeTree—MVs only see incoming blocks, not merged state, so duplicates persist
|
||||
|
||||
```sql
|
||||
SELECT * FROM posts FINAL WHERE post_id = {{Int64(post_id)}}
|
||||
```
|
||||
|
||||
## Snapshot-based Deduplication (Copy Pipes)
|
||||
|
||||
Use Copy Pipes when:
|
||||
- ReplacingMergeTree + FINAL is too slow
|
||||
- You need different sorting keys that change with updates
|
||||
- You need downstream Materialized Views for rollups
|
||||
- Full replace with COPY_MODE replace if table is not massive and you don't have control over when duplicates can occur. If you have control, use COPY_MODE append.
|
||||
|
||||
```
|
||||
NODE generate_snapshot
|
||||
SQL >
|
||||
SELECT post_id, argMax(views, updated_at) as views, max(updated_at) as updated_at
|
||||
FROM posts_raw
|
||||
GROUP BY post_id
|
||||
|
||||
TYPE COPY
|
||||
TARGET_DATASOURCE posts_snapshot
|
||||
COPY_SCHEDULE 0 * * * *
|
||||
COPY_MODE replace
|
||||
```
|
||||
|
||||
## Lambda Architecture
|
||||
|
||||
Combine batch snapshots with real-time queries when:
|
||||
- Aggregating over ReplacingMergeTree (MVs fail—they only see blocks, not merged state)
|
||||
- Window functions requiring full table scans
|
||||
- CDC workloads
|
||||
- `uniqState` performance is problematic
|
||||
- endpoints that require JOINs at query time
|
||||
|
||||
### Pattern
|
||||
|
||||
1. **Batch layer**: Copy Pipe creates periodic deduplicated snapshots or intermediate tables.
|
||||
2. **Real-time layer**: Query fresh data since last snapshot
|
||||
3. **Serving layer**: UNION ALL combines both
|
||||
|
||||
```sql
|
||||
SELECT * FROM posts_snapshot
|
||||
UNION ALL
|
||||
SELECT post_id, argMax(views, updated_at) as views, max(updated_at) as updated_at
|
||||
FROM posts_raw
|
||||
WHERE updated_at > (SELECT max(updated_at) FROM posts_snapshot)
|
||||
GROUP BY post_id
|
||||
```
|
||||
|
||||
### Freshness vs Cost Trade-off
|
||||
|
||||
- More frequent Copy Pipe runs = fresher snapshots but higher cost
|
||||
- Less frequent = stale batch layer but real-time layer covers the gap
|
||||
- Balance based on query patterns and data volume
|
||||
|
||||
## argMax with Null Values
|
||||
|
||||
**Warning**: `argMaxMerge` prefers non-null values over null, even with lower timestamps.
|
||||
|
||||
Workaround—convert nulls to epoch before aggregation:
|
||||
```sql
|
||||
SELECT post_id,
|
||||
argMaxState(CASE WHEN flagged_at IS NULL THEN toDateTime('1970-01-01 00:00:00') ELSE flagged_at END, updated_at) as flagged_at
|
||||
FROM posts
|
||||
GROUP BY post_id
|
||||
```
|
||||
|
||||
Handle the sentinel value in downstream queries.
|
||||
32
tinybird-best-practices/rules/endpoint-files.md
Normal file
32
tinybird-best-practices/rules/endpoint-files.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Endpoint Files
|
||||
|
||||
Endpoint files are `.pipe` files with `TYPE endpoint` and should live under `/endpoints`.
|
||||
|
||||
- Follow all general pipe rules.
|
||||
- Ensure SQL follows Tinybird SQL rules (templating, SELECT-only, parameters).
|
||||
- Include the output node in TYPE or in the last node.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
DESCRIPTION >
|
||||
Some meaningful description of the endpoint
|
||||
|
||||
NODE endpoint_node
|
||||
SQL >
|
||||
SELECT ...
|
||||
TYPE endpoint
|
||||
```
|
||||
|
||||
## Endpoint URLs
|
||||
|
||||
- Run `tb endpoint ls` to list all endpoints and their URLs.
|
||||
- Include dynamic parameters when needed.
|
||||
- Date formats:
|
||||
- DateTime64: `YYYY-MM-DD HH:MM:SS.MMM`
|
||||
- DateTime: `YYYY-MM-DD HH:MM:SS`
|
||||
- Date: `YYYYMMDD`
|
||||
|
||||
## OpenAPI definitions
|
||||
|
||||
- curl `<api_base_url>/v0/pipes/openapi.json?token=<token>` to get the OpenAPI definition for all endpoints.
|
||||
72
tinybird-best-practices/rules/endpoint-optimization.md
Normal file
72
tinybird-best-practices/rules/endpoint-optimization.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Endpoint Optimization
|
||||
|
||||
Use this checklist when optimizing endpoints.
|
||||
|
||||
## Step 1: Identify Issues
|
||||
|
||||
- Check endpoint metrics (latency, read_bytes, write_bytes).
|
||||
- Look for high latency or excessive scanning.
|
||||
|
||||
## Step 2: 5-Question Diagnostic
|
||||
|
||||
1) Aggregations at query time?
|
||||
- Fix: Move to materialized views when possible, to snapshots (copy pipes) or lambda architecture if MVs do not fit.
|
||||
|
||||
2) Filters match sorting keys?
|
||||
- Fix: Include frequent filters in ENGINE_SORTING_KEY; order by selectivity.
|
||||
- Avoid timestamp as first key in multi-tenant cases.
|
||||
|
||||
3) Data types oversized?
|
||||
- Fix: Use smaller types, LowCardinality for low-unique strings, defaults instead of Nullable.
|
||||
|
||||
4) Complex ops too early?
|
||||
- Fix: Filter first, then joins/aggregations.
|
||||
|
||||
5) Heavy JOINs?
|
||||
- Fix: Replace with subqueries or filtered joins in materialized views.
|
||||
|
||||
## Step 3: Implementation Actions
|
||||
|
||||
- Schema changes: update datasource, sorting keys, and dependent pipes/endpoints.
|
||||
- Query optimizations: materialize repeated aggregations; rewrite queries.
|
||||
- JOIN optimizations: evaluate denormalization or filtered joins.
|
||||
|
||||
## Monitoring and Validation
|
||||
|
||||
- Track tinybird.pipe_stats_rt and tinybird.pipe_stats.
|
||||
- Success metrics: lower latency, lower read_bytes, improved read_bytes/write_bytes.
|
||||
|
||||
## Query Explain
|
||||
|
||||
- For more details, call the endpoint with explain=true parameter to understand the query plan. E.g: https://$TB_HOST/v0/pipes/endpoint_name?explain=true
|
||||
|
||||
## Templates
|
||||
|
||||
Materialized view:
|
||||
```
|
||||
NODE materialized_view_name
|
||||
SQL >
|
||||
SELECT toDate(timestamp) as date, customer_id, countState(*) as event_count
|
||||
FROM source_table
|
||||
GROUP BY date, customer_id
|
||||
|
||||
TYPE materialized
|
||||
DATASOURCE mv_datasource_name
|
||||
ENGINE "AggregatingMergeTree"
|
||||
ENGINE_PARTITION_KEY "toYYYYMM(date)"
|
||||
ENGINE_SORTING_KEY "customer_id, date"
|
||||
```
|
||||
|
||||
Optimized query:
|
||||
```
|
||||
NODE endpoint_query
|
||||
SQL >
|
||||
%
|
||||
SELECT date, sum(amount) as daily_total
|
||||
FROM events
|
||||
WHERE customer_id = {{ String(customer_id) }}
|
||||
AND date >= {{ Date(start_date) }}
|
||||
AND date <= {{ Date(end_date) }}
|
||||
GROUP BY date
|
||||
ORDER BY date DESC
|
||||
```
|
||||
39
tinybird-best-practices/rules/local-development.md
Normal file
39
tinybird-best-practices/rules/local-development.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Tinybird Local Development
|
||||
|
||||
## Overview
|
||||
|
||||
- Tinybird Local runs as a Docker container managed by the Tinybird CLI.
|
||||
- Local is the default execution target; use `--cloud` to operate on Cloud.
|
||||
- Use Tinybird Local to develop and test projects before deploying to Cloud.
|
||||
|
||||
## Commands
|
||||
|
||||
- `tb local start`
|
||||
- Options: `--use-aws-creds`, `--volumes-path <path>`, `--skip-new-version`, `--user-token`, `--workspace-token`, `--daemon`.
|
||||
- `tb local stop`
|
||||
- `tb local restart`
|
||||
- Options: `--use-aws-creds`, `--volumes-path`, `--skip-new-version`, `--yes`.
|
||||
- `tb local status`
|
||||
- `tb local remove`
|
||||
- `tb local version`
|
||||
- `tb local generate-tokens`
|
||||
|
||||
Notes:
|
||||
- If you remove the container without a persisted volume, local data is lost.
|
||||
- Use `tb --cloud ...` for Cloud operations.
|
||||
|
||||
## Local-First Workflow
|
||||
|
||||
1) `tb local start`
|
||||
2) Develop resources and run `tb build` as needed
|
||||
3) Test endpoints/queries locally
|
||||
4) Use `--cloud` for Cloud actions (deploy, etc.)
|
||||
|
||||
Use `--volumes-path` to persist data between restarts.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- If status shows unhealthy, run `tb local restart` and re-check.
|
||||
- If authentication is not ready, wait or restart the container.
|
||||
- If memory warnings appear in status, increase Docker memory allocation.
|
||||
- If Local is not running, start it with `tb local start`.
|
||||
43
tinybird-best-practices/rules/materialized-files.md
Normal file
43
tinybird-best-practices/rules/materialized-files.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Materialized Pipe Files
|
||||
|
||||
- Do not create by default unless requested.
|
||||
- Create under `/materializations`.
|
||||
- Use TYPE MATERIALIZED and set DATASOURCE to the target datasource.
|
||||
- Use State modifiers in the pipe; use AggregateFunction in the target datasource.
|
||||
- Use Merge modifiers when reading AggregateFunction columns.
|
||||
- Put all dimensions in ENGINE_SORTING_KEY, ordered from least to most cardinality.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
NODE daily_sales
|
||||
SQL >
|
||||
SELECT toStartOfDay(starting_date) day, country, sumState(sales) as total_sales
|
||||
FROM teams
|
||||
GROUP BY day, country
|
||||
|
||||
TYPE MATERIALIZED
|
||||
DATASOURCE sales_by_hour
|
||||
```
|
||||
|
||||
Target datasource example:
|
||||
|
||||
```
|
||||
SCHEMA >
|
||||
`total_sales` AggregateFunction(sum, Float64),
|
||||
`sales_count` AggregateFunction(count, UInt64),
|
||||
`dimension_1` String,
|
||||
`dimension_2` String,
|
||||
`date` DateTime
|
||||
|
||||
ENGINE "AggregatingMergeTree"
|
||||
ENGINE_PARTITION_KEY "toYYYYMM(date)"
|
||||
ENGINE_SORTING_KEY "date, dimension_1, dimension_2"
|
||||
```
|
||||
|
||||
## Usual gotchas
|
||||
- Materialized Views work as insert triggers, which means a delete or truncate operation on your original Data Source doesn't affect the related Materialized Views.
|
||||
|
||||
- As transformation and ingestion in the Materialized View is done on each block of inserted data in the original Data Source, some operations such as GROUP BY, ORDER BY, DISTINCT and LIMIT might need a specific engine, such as AggregatingMergeTree or SummingMergeTree, which can handle data aggregations.
|
||||
|
||||
- The Data Source resulting from a Materialized View generated using JOIN is automatically updated only if and when a new operation is performed over the Data Source in the FROM.
|
||||
35
tinybird-best-practices/rules/mock-data.md
Normal file
35
tinybird-best-practices/rules/mock-data.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Mock Data Generation
|
||||
|
||||
Tinybird mock data flow (as implemented by the agent) for a datasource:
|
||||
|
||||
1) Build a SQL query that returns mock rows.
|
||||
2) Execute locally with a limit and format using `tb --output=json|csv '<sql>' --rows-limit <rows>` command.
|
||||
3) Preview the generated output.
|
||||
4) Confirm creation of a fixture file under `fixtures/`.
|
||||
5) Write the fixture file:
|
||||
- `fixtures/<datasource_name>.ndjson` or `fixtures/<datasource_name>.csv`
|
||||
6) Confirm append.
|
||||
7) Append the fixture to the datasource in Tinybird Local.
|
||||
|
||||
## Example Mock Query
|
||||
|
||||
```
|
||||
SELECT
|
||||
rand() % 1000 AS experience_gained,
|
||||
1 + rand() % 100 AS level,
|
||||
rand() % 500 AS monster_kills,
|
||||
concat('player_', toString(rand() % 10000)) AS player_id,
|
||||
rand() % 50 AS pvp_kills,
|
||||
rand() % 200 AS quest_completions,
|
||||
now() - rand() % 86400 AS timestamp
|
||||
FROM numbers(ROWS)
|
||||
```
|
||||
|
||||
Notes:
|
||||
- The query must return exactly `ROWS` rows via `FROM numbers(ROWS)`.
|
||||
- Do not add FORMAT or a trailing semicolon in the mock query itself.
|
||||
|
||||
## Error Handling Notes
|
||||
|
||||
- If the datasource is in quarantine, query `<datasource_name>_quarantine` and surface the first 5 rows.
|
||||
- If append fails with "must be created first with 'mode=create'", rebuild the project and retry.
|
||||
19
tinybird-best-practices/rules/pipe-files.md
Normal file
19
tinybird-best-practices/rules/pipe-files.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Pipe Files (General)
|
||||
|
||||
- Pipe names must be unique.
|
||||
- Node names must differ from the pipe name and any resource name.
|
||||
- No indentation for property names (DESCRIPTION, NODE, SQL, TYPE, etc.).
|
||||
- Allowed TYPE values: endpoint, copy, materialized, sink.
|
||||
- Add the output node in the TYPE section or in the last node.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
DESCRIPTION >
|
||||
Some meaningful description of the pipe
|
||||
|
||||
NODE node_1
|
||||
SQL >
|
||||
SELECT ...
|
||||
TYPE endpoint
|
||||
```
|
||||
45
tinybird-best-practices/rules/project-files.md
Normal file
45
tinybird-best-practices/rules/project-files.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Project Files
|
||||
|
||||
## Project Root
|
||||
|
||||
- By default, create a `tinybird/` folder at the project root and nest Tinybird folders under it.
|
||||
- Ensure the `.tinyb` credentials file is at the same level where the CLI commands are run.
|
||||
|
||||
## tb info
|
||||
|
||||
Use `tb info` to confirm CLI context, especially for credentials issues.
|
||||
|
||||
It reports information about Local and Cloud environments:
|
||||
- Where the CLI is loading the `.tinyb` file from
|
||||
- Current logged workspace
|
||||
- API URL
|
||||
- UI URL
|
||||
- ClickHouse HTTP interface URL
|
||||
|
||||
It can show values for both Cloud and Local environments.
|
||||
|
||||
## File Locations
|
||||
|
||||
Default locations (use these unless the project uses a different structure):
|
||||
|
||||
- Endpoints: `/endpoints`
|
||||
- Materialized pipes: `/materializations`
|
||||
- Sink pipes: `/sinks`
|
||||
- Copy pipes: `/copies`
|
||||
- Connections: `/connections`
|
||||
- Datasources: `/datasources`
|
||||
- Fixtures: `/fixtures`
|
||||
|
||||
## File-Specific Rules
|
||||
|
||||
See these rule files for detailed requirements:
|
||||
|
||||
- `rules/datasource-files.md`
|
||||
- `rules/pipe-files.md`
|
||||
- `rules/endpoint-files.md`
|
||||
- `rules/materialized-files.md`
|
||||
- `rules/sink-files.md`
|
||||
- `rules/copy-files.md`
|
||||
- `rules/connection-files.md`
|
||||
|
||||
After making changes in the project files, check `rules/build-deploy.md` for next steps.
|
||||
27
tinybird-best-practices/rules/secrets.md
Normal file
27
tinybird-best-practices/rules/secrets.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Secrets
|
||||
|
||||
## Usage in Files
|
||||
|
||||
- Secret syntax: `{{ tb_secret("SECRET_NAME", "DEFAULT_VALUE_OPTIONAL") }}`.
|
||||
- Use secrets for credentials in connections and pipe SQL.
|
||||
- Secrets in pipe files do not allow default values.
|
||||
- Secrets in connection files may include default values.
|
||||
- Do not replace secrets with dynamic parameters when secrets are required.
|
||||
|
||||
## CLI: tb secret
|
||||
|
||||
- List secrets:
|
||||
- `tb secret ls`
|
||||
- `tb secret ls --match _test`
|
||||
|
||||
- Set or update a secret:
|
||||
- `tb secret set SECRET_NAME SECRET_VALUE`
|
||||
- `tb secret set SECRET_NAME` (prompts securely)
|
||||
- `tb secret set SECRET_NAME --multiline` (opens editor)
|
||||
|
||||
- Remove a secret:
|
||||
- `tb secret rm SECRET_NAME`
|
||||
|
||||
## Local Secrets
|
||||
|
||||
- If a `.env.local` file is present, its secrets are loaded automatically in Tinybird Local.
|
||||
32
tinybird-best-practices/rules/sink-files.md
Normal file
32
tinybird-best-practices/rules/sink-files.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Sink Pipe Files
|
||||
|
||||
- Do not create by default unless requested.
|
||||
- Create under `/sinks`.
|
||||
- Valid external systems: Kafka, S3, GCS.
|
||||
- Sink pipes depend on a connection; reuse existing connections when possible.
|
||||
- Do not include EXPORT_SCHEDULE unless explicitly requested.
|
||||
- Use TYPE SINK and set EXPORT_CONNECTION_NAME.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
DESCRIPTION Sink Pipe to export sales hour every hour using my_connection
|
||||
|
||||
NODE daily_sales
|
||||
SQL >
|
||||
%
|
||||
SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales
|
||||
FROM teams
|
||||
WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now())
|
||||
and country = {{ String(country, 'US')}}
|
||||
GROUP BY day, country
|
||||
|
||||
TYPE sink
|
||||
EXPORT_CONNECTION_NAME "my_connection"
|
||||
EXPORT_BUCKET_URI "s3://tinybird-sinks"
|
||||
EXPORT_FILE_TEMPLATE "daily_prices"
|
||||
EXPORT_SCHEDULE "*/5 * * * *"
|
||||
EXPORT_FORMAT "csv"
|
||||
EXPORT_COMPRESSION "gz"
|
||||
EXPORT_STRATEGY "truncate"
|
||||
```
|
||||
66
tinybird-best-practices/rules/sql.md
Normal file
66
tinybird-best-practices/rules/sql.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# SQL Rules
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. Filter early and read as little data as possible.
|
||||
2. Select only needed columns.
|
||||
3. Do complex work later in the pipeline.
|
||||
4. Prefer ClickHouse functions; only supported functions are allowed.
|
||||
|
||||
## Query Requirements
|
||||
|
||||
- SQL must be valid ClickHouse SQL with Tinybird templating (Tornado).
|
||||
- Only SELECT statements are allowed.
|
||||
- Avoid CTEs; use nodes or subqueries instead.
|
||||
- Do not use system tables (system.tables, system.datasources, information_schema.tables).
|
||||
- Do not use CREATE/INSERT/DELETE/TRUNCATE or currentDatabase().
|
||||
|
||||
## Parameter and Templating Rules
|
||||
|
||||
- If parameters are used, the query must start with `%` on its own line.
|
||||
- Parameter functions: String, DateTime, Date, Float32, Float64, Int, Integer, UInt8, UInt16, UInt32, UInt64, UInt128, UInt256, Int8, Int16, Int32, Int64, Int128, Int256.
|
||||
- Parameter names must be different from column names.
|
||||
- Default values must be hardcoded.
|
||||
- Parameters are never quoted.
|
||||
- In `defined()` checks, do not quote the parameter name.
|
||||
|
||||
Bad:
|
||||
```
|
||||
SELECT * FROM events WHERE session_id={{String(my_param, "default")}}
|
||||
```
|
||||
|
||||
Good:
|
||||
```
|
||||
%
|
||||
SELECT * FROM events WHERE session_id={{String(my_param, "default")}}
|
||||
```
|
||||
|
||||
## Join and Aggregation Rules
|
||||
|
||||
- Filter before JOINs and GROUP BY.
|
||||
- Avoid joining tables with >1M rows without filtering.
|
||||
- Avoid nested aggregates; use subqueries instead.
|
||||
- Use AggregateFunction columns with -Merge combinators.
|
||||
|
||||
## Operation Order
|
||||
|
||||
1. WHERE filters
|
||||
2. Select needed columns
|
||||
3. JOIN
|
||||
4. GROUP BY / aggregates
|
||||
5. ORDER BY
|
||||
6. LIMIT
|
||||
|
||||
## External Tables
|
||||
|
||||
Iceberg:
|
||||
```
|
||||
FROM iceberg('s3://bucket/path/to/table', {{tb_secret('aws_access_key_id')}}, {{tb_secret('aws_secret_access_key')}})
|
||||
```
|
||||
|
||||
Postgres:
|
||||
```
|
||||
FROM postgresql({{ tb_secret("db_host_port") }}, 'database', 'table', {{tb_secret('db_username')}}, {{tb_secret('db_password')}}, 'schema_optional')
|
||||
```
|
||||
|
||||
Do not split host and port into multiple secrets.
|
||||
21
tinybird-best-practices/rules/tests.md
Normal file
21
tinybird-best-practices/rules/tests.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Tests
|
||||
|
||||
- Test file name must match the pipe name.
|
||||
- Scenario names must be unique inside a test file.
|
||||
- Parameters format: `param1=value1¶m2=value2`.
|
||||
- Preserve case and formatting when user provides parameters.
|
||||
- If no parameters, create a single test with empty parameters.
|
||||
- Use fixture data for expected results; do not query endpoints or SQL to infer data.
|
||||
- Before creating tests, analyze fixture files used by the endpoint tables.
|
||||
- `expected_result` should always be an empty string; the tool fills it.
|
||||
- Only create tests when explicitly requested (e.g. "Create tests for this endpoint").
|
||||
- If asked to "test" or "call" an endpoint, call the endpoint instead of creating tests.
|
||||
|
||||
Test format:
|
||||
|
||||
```
|
||||
- name: kpis_single_day
|
||||
description: Test hourly granularity for a single day
|
||||
parameters: date_from=2024-01-01&date_to=2024-01-01
|
||||
expected_result: ''
|
||||
```
|
||||
42
tinybird-best-practices/rules/tokens.md
Normal file
42
tinybird-best-practices/rules/tokens.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Tokens
|
||||
|
||||
- Resource-scoped tokens are defined in datafiles.
|
||||
- Tinybird tracks and updates resource-scoped tokens from datafile contents.
|
||||
|
||||
Scopes and usage:
|
||||
- DATASOURCES:READ:datasource_name => `TOKEN <token_name> READ` in `.datasource` files
|
||||
- DATASOURCES:APPEND:datasource_name => `TOKEN <token_name> APPEND` in `.datasource` files
|
||||
- PIPES:READ:pipe_name => `TOKEN <token_name> READ` in `.pipe` files
|
||||
|
||||
Examples:
|
||||
```
|
||||
TOKEN app_read READ
|
||||
TOKEN landing_append APPEND
|
||||
```
|
||||
|
||||
For operational tokens (not tied to resources):
|
||||
```
|
||||
tb token create static new_admin_token --scope <scope>
|
||||
```
|
||||
Scopes: `TOKENS`, `ADMIN`, `ORG_DATASOURCES:READ`, `WORKSPACE:READ_ALL`.
|
||||
|
||||
## JWT Tokens
|
||||
|
||||
JWT tokens have a TTL and can only use `PIPES:READ` or `DATASOURCES:READ` scopes. They are intended for end users calling endpoints or reading datasources without exposing a master API key.
|
||||
|
||||
Create a JWT token:
|
||||
```
|
||||
tb token create jwt my_jwt_token --ttl 1h --scope PIPES:READ --resource my_pipe
|
||||
```
|
||||
|
||||
Datasource read with filter:
|
||||
```
|
||||
tb token create jwt my_jwt_token --ttl 1h --scope DATASOURCES:READ --resource my_datasource --filter "column = 'value'"
|
||||
```
|
||||
|
||||
Multiple scopes and resources (counts must match), with optional fixed params for PIPES:READ:
|
||||
```
|
||||
tb token create jwt my_jwt_token --ttl 1h \
|
||||
--scope PIPES:READ --resource my_pipe --fixed-params "k1=v1,k2=v2" \
|
||||
--scope DATASOURCES:READ --resource my_datasource --filter "column = 'value'"
|
||||
```
|
||||
Reference in New Issue
Block a user