refactor: route browser control via gateway/node

This commit is contained in:
Peter Steinberger
2026-01-27 03:23:42 +00:00
parent b151b8d196
commit e7fdccce39
91 changed files with 1909 additions and 1608 deletions

View File

@@ -1,5 +1,5 @@
---
summary: "Integrated browser control server + action commands"
summary: "Integrated browser control service + action commands"
read_when:
- Adding agent-controlled browser automation
- Debugging why clawd is interfering with your own Chrome
@@ -10,7 +10,7 @@ read_when:
Clawdbot can run a **dedicated Chrome/Brave/Edge/Chromium profile** that the agent controls.
It is isolated from your personal browser and is managed through a small local
control server.
control service inside the Gateway (loopback only).
Beginner view:
- Think of it as a **separate, agent-only browser**.
@@ -57,8 +57,7 @@ Browser settings live in `~/.clawdbot/clawdbot.json`.
{
browser: {
enabled: true, // default: true
controlUrl: "http://127.0.0.1:18791",
cdpUrl: "http://127.0.0.1:18792", // defaults to controlUrl + 1
// cdpUrl: "http://127.0.0.1:18792", // legacy single-profile override
remoteCdpTimeoutMs: 1500, // remote CDP HTTP timeout (ms)
remoteCdpHandshakeTimeoutMs: 3000, // remote CDP WebSocket handshake timeout (ms)
defaultProfile: "chrome",
@@ -77,10 +76,11 @@ Browser settings live in `~/.clawdbot/clawdbot.json`.
```
Notes:
- `controlUrl` defaults to `http://127.0.0.1:18791`.
- The browser control service binds to loopback on a port derived from `gateway.port`
(default: `18791`, which is gateway + 2). The relay uses the next port (`18792`).
- If you override the Gateway port (`gateway.port` or `CLAWDBOT_GATEWAY_PORT`),
the default browser ports shift to stay in the same “family” (control = gateway + 2).
- `cdpUrl` defaults to `controlUrl + 1` when unset.
the derived browser ports shift to stay in the same “family”.
- `cdpUrl` defaults to the relay port when unset.
- `remoteCdpTimeoutMs` applies to remote (non-loopback) CDP reachability checks.
- `remoteCdpHandshakeTimeoutMs` applies to remote CDP WebSocket reachability checks.
- `attachOnly: true` means “never launch a local browser; only attach if it is already running.”
@@ -126,38 +126,11 @@ clawdbot config set browser.executablePath "/usr/bin/google-chrome"
## Local vs remote control
- **Local control (default):** `controlUrl` is loopback (`127.0.0.1`/`localhost`).
The Gateway starts the control server and can launch a local browser.
- **Remote control:** `controlUrl` is non-loopback. The Gateway **does not** start
a local server; it assumes you are pointing at an existing server elsewhere.
- **Local control (default):** the Gateway starts the loopback control service and can launch a local browser.
- **Remote control (node host):** run a node host on the machine that has the browser; the Gateway proxies browser actions to it.
- **Remote CDP:** set `browser.profiles.<name>.cdpUrl` (or `browser.cdpUrl`) to
attach to a remote Chromium-based browser. In this case, Clawdbot will not launch a local browser.
## Remote browser (control server)
You can run the **browser control server** on another machine and point your
Gateway at it with a remote `controlUrl`. This lets the agent drive a browser
outside the host (lab box, VM, remote desktop, etc.).
Key points:
- The **control server** speaks to Chromium-based browsers (Chrome/Brave/Edge/Chromium) via **CDP**.
- The **Gateway** only needs the HTTP control URL.
- Profiles are resolved on the **control server** side.
Example:
```json5
{
browser: {
enabled: true,
controlUrl: "http://10.0.0.42:18791",
defaultProfile: "work"
}
}
```
Use `profiles.<name>.cdpUrl` for **remote CDP** if you want the Gateway to talk
directly to a Chromium-based browser instance without a remote control server.
Remote CDP URLs can include auth:
- Query tokens (e.g., `https://provider.example?token=<token>`)
- HTTP Basic auth (e.g., `https://user:pass@provider.example`)
@@ -166,11 +139,11 @@ Clawdbot preserves the auth when calling `/json/*` endpoints and when connecting
to the CDP WebSocket. Prefer environment variables or secrets managers for
tokens instead of committing them to config files.
### Node browser proxy (zero-config default)
## Node browser proxy (zero-config default)
If you run a **node host** on the machine that has your browser, Clawdbot can
auto-route browser tool calls to that node without any custom `controlUrl`
setup. This is the default path for remote gateways.
auto-route browser tool calls to that node without any extra browser config.
This is the default path for remote gateways.
Notes:
- The node host exposes its local browser control server via a **proxy command**.
@@ -179,7 +152,7 @@ Notes:
- On the node: `nodeHost.browserProxy.enabled=false`
- On the gateway: `gateway.nodes.browser.mode="off"`
### Browserless (hosted remote CDP)
## Browserless (hosted remote CDP)
[Browserless](https://browserless.io) is a hosted Chromium service that exposes
CDP endpoints over HTTPS. You can point a Clawdbot browser profile at a
@@ -207,94 +180,16 @@ Notes:
- Replace `<BROWSERLESS_API_KEY>` with your real Browserless token.
- Choose the region endpoint that matches your Browserless account (see their docs).
### Running the control server on the browser machine
Run a standalone browser control server (recommended when your Gateway is remote):
```bash
# on the machine that runs Chrome/Brave/Edge
clawdbot browser serve --bind <browser-host> --port 18791 --token <token>
```
Then point your Gateway at it:
```json5
{
browser: {
enabled: true,
controlUrl: "http://<browser-host>:18791",
// Option A (recommended): keep token in env on the Gateway
// (avoid writing secrets into config files)
// controlToken: "<token>"
}
}
```
And set the auth token in the Gateway environment:
```bash
export CLAWDBOT_BROWSER_CONTROL_TOKEN="<token>"
```
Option B: store the token in the Gateway config instead (same shared token):
```json5
{
browser: {
enabled: true,
controlUrl: "http://<browser-host>:18791",
controlToken: "<token>"
}
}
```
## Security
This section covers the **browser control server** (`browser.controlUrl`) used for agent browser automation.
Key ideas:
- Treat the browser control server like an admin API: **private network only**.
- Use **token auth** always when the server is reachable off-machine.
- Prefer **Tailnet-only** connectivity over LAN exposure.
- Browser control is loopback-only; access flows through the Gateways auth or node pairing.
- Keep the Gateway and any node hosts on a private network (Tailscale); avoid public exposure.
- Treat remote CDP URLs/tokens as secrets; prefer env vars or a secrets manager.
### Tokens (what is shared with what?)
- `browser.controlToken` / `CLAWDBOT_BROWSER_CONTROL_TOKEN` is **only** for authenticating browser control HTTP requests to `browser.controlUrl`.
- It is **not** the Gateway token (`gateway.auth.token`) and **not** a node pairing token.
- You *can* reuse the same string value, but its better to keep them separate to reduce blast radius.
### Binding (dont expose to your LAN by accident)
Recommended:
- Keep `clawdbot browser serve` bound to loopback (`127.0.0.1`) and publish it via Tailscale.
- Or bind to a Tailnet IP only (never `0.0.0.0`) and require a token.
Avoid:
- `--bind 0.0.0.0` (LAN-visible). Even with token auth, traffic is plain HTTP unless you also add TLS.
### TLS / HTTPS (recommended approach: terminate in front)
Best practice here: keep `clawdbot browser serve` on HTTP and terminate TLS in front.
If youre already using Tailscale, you have two good options:
1) **Tailnet-only, still HTTP** (transport is encrypted by Tailscale):
- Keep `controlUrl` as `http://…` but ensure its only reachable over your tailnet.
2) **Serve HTTPS via Tailscale** (nice UX: `https://…` URL):
```bash
# on the browser machine
clawdbot browser serve --bind 127.0.0.1 --port 18791 --token <token>
tailscale serve https / http://127.0.0.1:18791
```
Then set your Gateway config `browser.controlUrl` to the HTTPS URL (MagicDNS/ts.net) and keep using the same token.
Notes:
- Do **not** use Tailscale Funnel for this unless you explicitly want to make the endpoint public.
- For Tailnet setup/background, see [Gateway web surfaces](/web/index) and the [Gateway CLI](/cli/gateway).
Remote CDP tips:
- Prefer HTTPS endpoints and short-lived tokens where possible.
- Avoid embedding long-lived tokens directly in config files.
## Profiles (multi-browser)
@@ -318,13 +213,12 @@ Clawdbot can also drive **your existing Chrome tabs** (no separate “clawd” C
Full guide: [Chrome extension](/tools/chrome-extension)
Flow:
- You run a **browser control server** (Gateway on the same machine, or `clawdbot browser serve`).
- The Gateway runs locally (same machine) or a node host runs on the browser machine.
- A local **relay server** listens at a loopback `cdpUrl` (default: `http://127.0.0.1:18792`).
- You click the **Clawdbot Browser Relay** extension icon on a tab to attach (it does not auto-attach).
- The agent controls that tab via the normal `browser` tool, by selecting the right profile.
If the Gateway runs on the same machine as Chrome (default setup), you usually **do not** need `clawdbot browser serve`.
Use `browser serve` only when the Gateway runs elsewhere (remote mode).
If the Gateway runs elsewhere, run a node host on the browser machine so the Gateway can proxy browser actions.
### Sandboxed sessions
@@ -387,8 +281,7 @@ Platforms:
## Control API (optional)
If you want to integrate directly, the browser control server exposes a small
HTTP API:
For local integrations only, the Gateway exposes a small loopback HTTP API:
- Status/start/stop: `GET /`, `POST /start`, `POST /stop`
- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
@@ -613,7 +506,7 @@ These are useful for “make the site behave like X” workflows:
- The clawd browser profile may contain logged-in sessions; treat it as sensitive.
- For logins and anti-bot notes (X/Twitter, etc.), see [Browser login + X/Twitter posting](/tools/browser-login).
- Keep control URLs loopback-only unless you intentionally expose the server.
- Keep the Gateway/node host private (loopback or tailnet-only).
- Remote CDP endpoints are powerful; tunnel and protect them.
## Troubleshooting
@@ -631,12 +524,10 @@ How it maps:
- `browser act` uses the snapshot `ref` IDs to click/type/drag/select.
- `browser screenshot` captures pixels (full page or element).
- `browser` accepts:
- `profile` to choose a named browser profile (host or remote control server).
- `target` (`sandbox` | `host` | `custom`) to select where the browser lives.
- `controlUrl` sets `target: "custom"` implicitly (remote control server).
- `profile` to choose a named browser profile (clawd, chrome, or remote CDP).
- `target` (`sandbox` | `host` | `node`) to select where the browser lives.
- In sandboxed sessions, `target: "host"` requires `agents.defaults.sandbox.browser.allowHostControl=true`.
- If `target` is omitted: sandboxed sessions default to `sandbox`, non-sandbox sessions default to `host`.
- Sandbox allowlists can restrict `target: "custom"` to specific URLs/hosts/ports.
- Defaults: allowlists unset (no restriction), and sandbox host control is disabled.
- If a browser-capable node is connected, the tool may auto-route to it unless you pin `target="host"` or `target="node"`.
This keeps the agent deterministic and avoids brittle selectors.

View File

@@ -15,7 +15,7 @@ Attach/detach happens via a **single Chrome toolbar button**.
## What it is (concept)
There are three parts:
- **Browser control server** (HTTP): the API the agent/tool calls (`browser.controlUrl`)
- **Browser control service** (Gateway or node): the API the agent/tool calls (via the Gateway)
- **Local relay server** (loopback CDP): bridges between the control server and the extension (`http://127.0.0.1:18792` by default)
- **Chrome MV3 extension**: attaches to the active tab using `chrome.debugger` and pipes CDP messages to the relay
@@ -87,23 +87,22 @@ clawdbot browser create-profile \
- `!`: relay not reachable (most common: browser relay server isnt running on this machine).
If you see `!`:
- Make sure the Gateway is running locally (default setup), or run `clawdbot browser serve` on this machine (remote gateway setup).
- Make sure the Gateway is running locally (default setup), or run a node host on this machine if the Gateway runs elsewhere.
- Open the extension Options page; it shows whether the relay is reachable.
## Do I need `clawdbot browser serve`?
## Remote Gateway (use a node host)
### Local Gateway (same machine as Chrome) — usually **no**
### Local Gateway (same machine as Chrome) — usually **no extra steps**
If the Gateway is running on the same machine as Chrome and your `browser.controlUrl` is loopback (default),
you typically **do not** need `clawdbot browser serve`.
If the Gateway runs on the same machine as Chrome, it starts the browser control service on loopback
and auto-starts the relay server. The extension talks to the local relay; the CLI/tool calls go to the Gateway.
The Gateways built-in browser control server will start on `http://127.0.0.1:18791/` and Clawdbot will
auto-start the local relay server on `http://127.0.0.1:18792/`.
### Remote Gateway (Gateway runs elsewhere) — **run a node host**
### Remote Gateway (Gateway runs elsewhere) — **yes**
If your Gateway runs on another machine, start a node host on the machine that runs Chrome.
The Gateway will proxy browser actions to that node; the extension + relay stay local to the browser machine.
If your Gateway runs on another machine, run `clawdbot browser serve` on the machine that runs Chrome
(and publish it via Tailscale Serve / TLS). See the section below.
If multiple nodes are connected, pin one with `gateway.nodes.browser.node` or set `gateway.nodes.browser.mode`.
## Sandboxing (tool containers)
@@ -134,26 +133,10 @@ Then ensure the tool isnt denied by tool policy, and (if needed) call `browse
Debugging: `clawdbot sandbox explain`
## Remote Gateway (recommended: Tailscale Serve)
## Remote access tips
Goal: Gateway runs on one machine, but Chrome runs somewhere else.
On the **browser machine**:
```bash
clawdbot browser serve --bind 127.0.0.1 --port 18791 --token <token>
tailscale serve https / http://127.0.0.1:18791
```
On the **Gateway machine**:
- Set `browser.controlUrl` to the HTTPS Serve URL (MagicDNS/ts.net).
- Provide the token (prefer env):
```bash
export CLAWDBOT_BROWSER_CONTROL_TOKEN="<token>"
```
Then the agent can drive the browser by calling the remote `browser.controlUrl` API, while the extension + relay stay local on the browser machine.
- Keep the Gateway and node host on the same tailnet; avoid exposing relay ports to LAN or public Internet.
- Pair nodes intentionally; disable browser proxy routing if you dont want remote control (`gateway.nodes.browser.mode="off"`).
## How “extension path” works
@@ -176,8 +159,8 @@ This is powerful and risky. Treat it like giving the model “hands on your brow
Recommendations:
- Prefer a dedicated Chrome profile (separate from your personal browsing) for extension relay usage.
- Keep the browser control server tailnet-only (Tailscale) and require a token.
- Avoid exposing browser control over LAN (`0.0.0.0`) and avoid Funnel (public).
- Keep the Gateway and any node hosts tailnet-only; rely on Gateway auth + node pairing.
- Avoid exposing relay ports over LAN (`0.0.0.0`) and avoid Funnel (public).
Related:
- Browser tool overview: [Browser](/tools/browser)

View File

@@ -249,16 +249,17 @@ Profile management:
- `reset-profile` — kill orphan process on profile's port (local only)
Common parameters:
- `controlUrl` (defaults from config)
- `profile` (optional; defaults to `browser.defaultProfile`)
- `target` (`sandbox` | `host` | `node`)
- `node` (optional; picks a specific node id/name)
Notes:
- Requires `browser.enabled=true` (default is `true`; set `false` to disable).
- Uses `browser.controlUrl` unless `controlUrl` is passed explicitly.
- All actions accept optional `profile` parameter for multi-instance support.
- When `profile` is omitted, uses `browser.defaultProfile` (defaults to "chrome").
- Profile names: lowercase alphanumeric + hyphens only (max 64 chars).
- Port range: 18800-18899 (~100 profiles max).
- Remote profiles are attach-only (no start/stop/reset).
- If a browser-capable node is connected, the tool may auto-route to it (unless you pin `target`).
- `snapshot` defaults to `ai` when Playwright is installed; use `aria` for the accessibility tree.
- `snapshot` also supports role-snapshot options (`interactive`, `compact`, `depth`, `selector`) which return refs like `e12`.
- `act` requires `ref` from `snapshot` (numeric `12` from AI snapshots, or `e12` from role snapshots); use `evaluate` for rare CSS selector needs.
@@ -410,7 +411,9 @@ Gateway-backed tools (`canvas`, `nodes`, `cron`):
- `timeoutMs`
Browser tool:
- `controlUrl` (defaults from config)
- `profile` (optional; defaults to `browser.defaultProfile`)
- `target` (`sandbox` | `host` | `node`)
- `node` (optional; pin a specific node id/name)
## Recommended agent flows