Address review feedback: when port fallback occurs, maintain mapping from
original requested port to the relay server for proper cleanup and reuse.
- Add relayByOriginalPort map to track original port -> relay
- Update ensureChromeExtensionRelayServer to check both maps
- Update stopChromeExtensionRelayServer to clean up both mappings
- Stop function now uses the relay's actual bound port for auth cleanup
When the Chrome extension relay server fails to bind due to port
conflict (EADDRINUSE), automatically try alternative ports in the
dynamic range (49152-65535) instead of failing immediately.
This resolves issues where stale processes hold onto port 18792
after gateway restarts or crashes.
Fixes potential issues related to #8926, #13867, #17584
The supervisor's child adapter always spawned with `detached: true`,
which creates a new process group. On Windows Scheduled Tasks (headless,
no console), this prevents stdout/stderr pipes from properly connecting,
causing all exec tool output to silently disappear.
The old exec path (pre-supervisor refactor) never used `detached: true`.
The regression was introduced in cd44a0d01 (refactor process spawning).
Changes:
- child.ts: set `detached: false` on Windows, keep `detached: true` on
POSIX (where it's needed to survive parent exit). Skip the no-detach
fallback on Windows since it's already the default.
- child.test.ts: platform-aware assertions for detached behavior.
Fixes#18035Fixes#17806
Address review feedback - when --json mode is used, the drift warning
was completely suppressed. Now it's included in the warnings array
of the DaemonActionResponse so programmatic consumers can surface it.
When the gateway token in config differs from the token embedded in the
service plist/unit file, restart will not apply the new token. This can
cause silent auth failures after OAuth token switches.
Changes:
- Add checkTokenDrift() to service-audit.ts
- Call it in runServiceRestart() before restarting
- Warn user with suggestion to run 'openclaw gateway install --force'
Closes#18018
When a cron job fires and completes within the same wall-clock second it
was scheduled for, the next-run computation could return undefined or the
same second, causing the scheduler to re-trigger the job hundreds of
times in a tight loop.
Two-layer fix:
1. computeJobNextRunAtMs: When computeNextRunAtMs returns undefined for a
cron-kind schedule (edge case where floored nowSecondMs matches the
schedule), retry with the ceiling (next second) as reference time.
This ensures we always get the next valid occurrence.
2. applyJobResult: Add MIN_REFIRE_GAP_MS (2s) safety net for cron-kind
jobs. After a successful run, nextRunAtMs is guaranteed to be at
least 2s in the future. This breaks any remaining spin-loop edge
cases without affecting normal daily/hourly schedules (where the
natural next run is hours/days away).
Fixes#17821
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.
This commit adds four layers of defense:
1. **In-process lock watchdog** (session-write-lock.ts)
- Track acquiredAt timestamp on each held lock
- 60-second interval timer checks all held locks
- Auto-releases any lock held longer than maxHoldMs (default 5 min)
- Catches the hung-API-call case that try/finally cannot
2. **Gateway startup cleanup** (server-startup.ts)
- On boot, scan all agent session directories for *.jsonl.lock files
- Remove locks with dead PIDs or older than staleMs (30 min)
- Log each cleaned lock for diagnostics
3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
- New health check scans for .jsonl.lock files
- Reports PID status and age of each lock found
- In --fix mode, removes stale locks automatically
4. **Transcript error entry on API failure** (attempt.ts)
- When promptError is set, write an error marker to the session
transcript before releasing the lock
- Preserves conversation history even on model API failures
Closes#18060
Add support for Z.AI's native tool_stream parameter to enable real-time
visibility into model reasoning and tool call execution.
- Automatically inject tool_stream=true for zai/z-ai providers
- Allow disabling via params.tool_stream: false in model config
- Follows existing pattern of OpenRouter and OpenAI wrappers
This enables Z.AI API features described in:
https://docs.z.ai/api-reference#streaming
AI-assisted: Claude (OpenClaw agent) helped write this implementation.
Testing: lightly tested (code review + pattern matching existing wrappers)
Closes#18135
Synchronous hook that lets plugins inspect and optionally block messages
before they are written to the session JSONL file. Primary use case is
private mode... when enabled, the plugin returns { block: true } and the
message never gets persisted.
The hook runs on the hot path (synchronous, like tool_result_persist).
Handlers execute sequentially in priority order. If any handler returns
{ block: true }, the write is skipped immediately. Handlers can also
return a modified message to write instead of the original.
Changes:
- src/plugins/types.ts: add hook name, event/result types, handler map entry
- src/plugins/hooks.ts: add runBeforeMessageWrite() following tool_result_persist pattern
- src/agents/session-tool-result-guard.ts: invoke hook before every originalAppend() call
- src/agents/session-tool-result-guard-wrapper.ts: wire hook runner to the guard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, fs.promises.writeFile truncates the target file to 0 bytes
before writing. Since loadSessionStore reads the file synchronously
without holding the write lock, a concurrent read can observe the empty
file, fail to parse it, and fall through to an empty store — causing the
agent to lose its session context.
Changes:
- saveSessionStoreUnlocked (Windows path): write to a temp file first,
then rename it onto the target. If rename fails due to file locking,
retry 3 times with backoff, then fall back to copyFile (which
overwrites in-place without truncating to 0 bytes).
- loadSessionStore: on Windows, retry up to 3 times with 50ms
synchronous backoff (via Atomics.wait) when the file is empty or
unparseable, giving the writer time to finish. SharedArrayBuffer is
allocated once and reused across retry attempts.
Treat normal process exits (even with non-zero codes) as completed tool results.
This prevents standard exit codes (like grep exit 1) from being surfaced
as 'Tool Failure' warnings in the UI. The exit code is still appended
to the tool output for assistant awareness.
The gateway's system-presence.ts was not detecting the version when
OpenClaw is run as a launchd service, because the daemon-runtime.ts
sets OPENCLAW_SERVICE_VERSION but system-presence.ts only checked
OPENCLAW_VERSION and npm_package_version.
This caused 'openclaw status' to show 'unknown' for the version.
Issue: #18456🤖 AI-assisted (lightly tested)
Qwen 3 (and potentially other reasoning-capable models served via Ollama)
returns its final answer in a `reasoning` field with an empty `content`
field. This causes blank/empty responses since OpenClaw only reads `content`.
Changes:
- Add `reasoning?` to OllamaChatResponse message type
- Fall back to `reasoning` when `content` is empty in buildAssistantMessage
- Accumulate `reasoning` chunks during streaming when `content` is empty
This allows Qwen 3 to work correctly both with and without /no_think mode.
downgradeOpenAIReasoningBlocks was only called on model change, but
orphaned reasoning items (e.g. from an aborted stream) can exist without
a model switch and cause a 400 from the OpenAI Responses API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`models set` accepts any syntactically valid model ID without checking
the catalog, allowing typos to silently persist in config and fail at
runtime. It also unconditionally adds an empty `{}` entry to
`agents.defaults.models`, bypassing any provider routing constraints.
This commit:
- Validates the model ID against the catalog (skipped when catalog is
empty during initial setup)
- Warns when a new entry is added with empty config (no provider routing)
Closesopenclaw/openclaw#17183✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)
The gateway unconditionally scheduled a SIGUSR1 restart after every
update.run call, even when the update itself failed (broken deps,
build errors, etc.). This left the process restarting into a broken
state — corrupted node_modules, partial builds — causing a crash loop
that required manual intervention.
Three fixes:
1. Only restart on success: scheduleGatewaySigusr1Restart is now
gated on result.status === "ok". Failed or skipped updates still
write the restart sentinel (so the status can be reported back to
the user) but the running gateway stays alive.
2. Early bail on step failure: deps install, build, and ui:build now
check exit codes immediately (matching the preflight section) so a
failed deps install no longer cascades into a broken build and
ui:build.
3. Auto-repair config during update: the doctor step now runs with
--fix alongside --non-interactive, so unknown config keys left over
from schema changes between versions are stripped automatically
instead of causing a startup validation crash.
The global `agents.defaults.thinkingDefault` forces a single thinking
level for all models. Users running multiple models with different
reasoning capabilities (e.g. Claude with extended thinking, GPT-4o
without, Gemini Flash with lightweight reasoning) cannot optimise the
thinking level per model.
Add an optional `thinkingDefault` field to `AgentModelEntryConfig` so
each entry under `agents.defaults.models` can declare its own default.
Resolution priority: per-model → global → catalog auto-detect.
Example config:
"models": {
"anthropic/claude-sonnet-4-20250514": { "thinkingDefault": "high" },
"openai/gpt-4o": { "thinkingDefault": "off" }
}
Co-authored-by: Cursor <cursoragent@cursor.com>
Add automatic llms.txt awareness so agents check for /llms.txt or
/.well-known/llms.txt when exploring new domains.
Changes:
- System prompt: new 'llms.txt Discovery' section (full mode only,
when web_fetch is available) instructing agents to check for llms.txt
files when visiting new domains
- web_fetch tool: updated description to mention llms.txt discovery
llms.txt is an emerging standard (like robots.txt for AI) that helps
site owners describe how AI agents should interact with their content.
Making this a default behavior helps the ecosystem adopt agent-native
web experiences.
Ref: https://llmstxt.org
When a user sets `agents.defaults.model.primary: "ollama/gemma3:4b"`
but forgets to set OLLAMA_API_KEY, the error is a confusing
"unknown model: ollama/gemma3:4b". The Ollama provider requires any
dummy API key to register (the local server doesn't actually check it),
but this isn't obvious from the error.
Add `buildUnknownModelError()` that detects known local providers
(ollama, vllm) and appends an actionable hint with the env var name
and a link to the relevant docs page.
Before: Unknown model: ollama/gemma3:4b
After: Unknown model: ollama/gemma3:4b. Ollama requires authentication
to be registered as a provider. Set OLLAMA_API_KEY="ollama-local"
(any value works) or run "openclaw configure".
See: https://docs.openclaw.ai/providers/ollamaCloses#17328
When the gateway connection fails due to device token mismatch (e.g., after
re-pairing the device), clear the stored device-auth token so that
subsequent connection attempts can obtain a fresh token.
This fixes the cron tool failing with 'device token mismatch' error
after running 'openclaw configure' to re-pair the device.
Fixes#18175
When an agent config specifies `model: { primary: "..." }` without
an explicit `fallbacks` array, the existing code replaced the entire
model object from `agents.defaults`—discarding the default fallbacks.
This caused cron jobs (and agent sessions) to have only one model
candidate (the pinned model) plus the global primary as a final
fallback, skipping all intermediate fallback models.
The fix merges the agent model override into the existing defaults
model object using spread, so that keys like `fallbacks` survive
when the agent only overrides `primary`. Agents can still explicitly
override or clear fallbacks by providing their own `fallbacks` array.
Reproduction scenario:
- `agents.defaults.model = { primary: "codex", fallbacks: ["opus", "flash", "deepseek"] }`
- Agent config: `model: { primary: "codex" }`
- Cron job pins: `model: "flash"`
- Before fix: fallback candidates = [flash, codex] (3 models lost)
- After fix: fallback candidates = [flash, opus, deepseek, ..., codex]