Address review feedback: when port fallback occurs, maintain mapping from
original requested port to the relay server for proper cleanup and reuse.
- Add relayByOriginalPort map to track original port -> relay
- Update ensureChromeExtensionRelayServer to check both maps
- Update stopChromeExtensionRelayServer to clean up both mappings
- Stop function now uses the relay's actual bound port for auth cleanup
When the Chrome extension relay server fails to bind due to port
conflict (EADDRINUSE), automatically try alternative ports in the
dynamic range (49152-65535) instead of failing immediately.
This resolves issues where stale processes hold onto port 18792
after gateway restarts or crashes.
Fixes potential issues related to #8926, #13867, #17584
The supervisor's child adapter always spawned with `detached: true`,
which creates a new process group. On Windows Scheduled Tasks (headless,
no console), this prevents stdout/stderr pipes from properly connecting,
causing all exec tool output to silently disappear.
The old exec path (pre-supervisor refactor) never used `detached: true`.
The regression was introduced in cd44a0d01 (refactor process spawning).
Changes:
- child.ts: set `detached: false` on Windows, keep `detached: true` on
POSIX (where it's needed to survive parent exit). Skip the no-detach
fallback on Windows since it's already the default.
- child.test.ts: platform-aware assertions for detached behavior.
Fixes#18035Fixes#17806
Address review feedback - when --json mode is used, the drift warning
was completely suppressed. Now it's included in the warnings array
of the DaemonActionResponse so programmatic consumers can surface it.
When the gateway token in config differs from the token embedded in the
service plist/unit file, restart will not apply the new token. This can
cause silent auth failures after OAuth token switches.
Changes:
- Add checkTokenDrift() to service-audit.ts
- Call it in runServiceRestart() before restarting
- Warn user with suggestion to run 'openclaw gateway install --force'
Closes#18018
When a cron job fires and completes within the same wall-clock second it
was scheduled for, the next-run computation could return undefined or the
same second, causing the scheduler to re-trigger the job hundreds of
times in a tight loop.
Two-layer fix:
1. computeJobNextRunAtMs: When computeNextRunAtMs returns undefined for a
cron-kind schedule (edge case where floored nowSecondMs matches the
schedule), retry with the ceiling (next second) as reference time.
This ensures we always get the next valid occurrence.
2. applyJobResult: Add MIN_REFIRE_GAP_MS (2s) safety net for cron-kind
jobs. After a successful run, nextRunAtMs is guaranteed to be at
least 2s in the future. This breaks any remaining spin-loop edge
cases without affecting normal daily/hourly schedules (where the
natural next run is hours/days away).
Fixes#17821
The webchat UI rendered [[reply_to_current]], [[reply_to:<id>]], and
[[audio_as_voice]] tags as literal text because extractText() passed
assistant content through without stripping inline directives.
Add stripDirectiveTags() to the UI chat layer and apply it to all three
extractText code paths (string content, content array, .text property)
for assistant messages only. Regex mirrors src/utils/directive-tags.ts.
Fixes#18079
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.
This commit adds four layers of defense:
1. **In-process lock watchdog** (session-write-lock.ts)
- Track acquiredAt timestamp on each held lock
- 60-second interval timer checks all held locks
- Auto-releases any lock held longer than maxHoldMs (default 5 min)
- Catches the hung-API-call case that try/finally cannot
2. **Gateway startup cleanup** (server-startup.ts)
- On boot, scan all agent session directories for *.jsonl.lock files
- Remove locks with dead PIDs or older than staleMs (30 min)
- Log each cleaned lock for diagnostics
3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
- New health check scans for .jsonl.lock files
- Reports PID status and age of each lock found
- In --fix mode, removes stale locks automatically
4. **Transcript error entry on API failure** (attempt.ts)
- When promptError is set, write an error marker to the session
transcript before releasing the lock
- Preserves conversation history even on model API failures
Closes#18060
Add support for Z.AI's native tool_stream parameter to enable real-time
visibility into model reasoning and tool call execution.
- Automatically inject tool_stream=true for zai/z-ai providers
- Allow disabling via params.tool_stream: false in model config
- Follows existing pattern of OpenRouter and OpenAI wrappers
This enables Z.AI API features described in:
https://docs.z.ai/api-reference#streaming
AI-assisted: Claude (OpenClaw agent) helped write this implementation.
Testing: lightly tested (code review + pattern matching existing wrappers)
Closes#18135
Synchronous hook that lets plugins inspect and optionally block messages
before they are written to the session JSONL file. Primary use case is
private mode... when enabled, the plugin returns { block: true } and the
message never gets persisted.
The hook runs on the hot path (synchronous, like tool_result_persist).
Handlers execute sequentially in priority order. If any handler returns
{ block: true }, the write is skipped immediately. Handlers can also
return a modified message to write instead of the original.
Changes:
- src/plugins/types.ts: add hook name, event/result types, handler map entry
- src/plugins/hooks.ts: add runBeforeMessageWrite() following tool_result_persist pattern
- src/agents/session-tool-result-guard.ts: invoke hook before every originalAppend() call
- src/agents/session-tool-result-guard-wrapper.ts: wire hook runner to the guard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, fs.promises.writeFile truncates the target file to 0 bytes
before writing. Since loadSessionStore reads the file synchronously
without holding the write lock, a concurrent read can observe the empty
file, fail to parse it, and fall through to an empty store — causing the
agent to lose its session context.
Changes:
- saveSessionStoreUnlocked (Windows path): write to a temp file first,
then rename it onto the target. If rename fails due to file locking,
retry 3 times with backoff, then fall back to copyFile (which
overwrites in-place without truncating to 0 bytes).
- loadSessionStore: on Windows, retry up to 3 times with 50ms
synchronous backoff (via Atomics.wait) when the file is empty or
unparseable, giving the writer time to finish. SharedArrayBuffer is
allocated once and reused across retry attempts.
Treat normal process exits (even with non-zero codes) as completed tool results.
This prevents standard exit codes (like grep exit 1) from being surfaced
as 'Tool Failure' warnings in the UI. The exit code is still appended
to the tool output for assistant awareness.
The gateway's system-presence.ts was not detecting the version when
OpenClaw is run as a launchd service, because the daemon-runtime.ts
sets OPENCLAW_SERVICE_VERSION but system-presence.ts only checked
OPENCLAW_VERSION and npm_package_version.
This caused 'openclaw status' to show 'unknown' for the version.
Issue: #18456🤖 AI-assisted (lightly tested)
Qwen 3 (and potentially other reasoning-capable models served via Ollama)
returns its final answer in a `reasoning` field with an empty `content`
field. This causes blank/empty responses since OpenClaw only reads `content`.
Changes:
- Add `reasoning?` to OllamaChatResponse message type
- Fall back to `reasoning` when `content` is empty in buildAssistantMessage
- Accumulate `reasoning` chunks during streaming when `content` is empty
This allows Qwen 3 to work correctly both with and without /no_think mode.
downgradeOpenAIReasoningBlocks was only called on model change, but
orphaned reasoning items (e.g. from an aborted stream) can exist without
a model switch and cause a 400 from the OpenAI Responses API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>