* fix(browser): hot-reload profiles added after gateway start (#4841)
* style: format files with oxfmt
* Fix hot-reload stale config fields bug in forProfile
* Fix test order-dependency in hot-reload profiles test
* Fix mock reset order to prevent stale cfgProfiles
* Fix config cache blocking hot-reload by clearing cache before loadConfig
* test: improve hot-reload test to properly exercise config cache
- Add simulated cache behavior in mock
- Prime cache before mutating config
- Verify stale value without clearConfigCache
- Verify fresh value after hot-reload
Addresses review comment about test not exercising cache
* test: add hot-reload tests for browser profiles in server context.
* fix(browser): optimize profile hot-reload to avoid global cache clear
* fix(browser): remove unused loadConfig import
* fix(test): execute resetModules before test setup
* feat: implement browser server context with profile hot-reloading and tab management.
* fix(browser): harden profile hot-reload and shutdown cleanup
* test(browser): use toSorted in known-profile names test
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: ensure CLI exits after command completion
The CLI process would hang indefinitely after commands like
`openclaw gateway restart` completed successfully. Two root causes:
1. `runCli()` returned without calling `process.exit()` after
`program.parseAsync()` resolved, and Commander.js does not
force-exit the process.
2. `daemon-cli/register.ts` eagerly called `createDefaultDeps()`
which imported all messaging-provider modules, creating persistent
event-loop handles that prevented natural Node exit.
Changes:
- Add `flushAndExit()` helper that drains stdout/stderr before calling
`process.exit()`, preventing truncated piped output in CI/scripts.
- Call `flushAndExit()` after both `tryRouteCli()` and
`program.parseAsync()` resolve.
- Remove unnecessary `void createDefaultDeps()` from daemon-cli
registration — daemon lifecycle commands never use messaging deps.
- Make `serveAcpGateway()` return a promise that resolves on
intentional shutdown (SIGINT/SIGTERM), so `openclaw acp` blocks
`parseAsync` for the bridge lifetime and exits cleanly on signal.
- Handle the returned promise in the standalone main-module entry
point to avoid unhandled rejections.
Fixes#12904
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: refactor CLI lifecycle and lazy outbound deps (#12906) (thanks @DrCrinkle)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: defer gateway restart until all replies are sent
Fixes a race condition where gateway config changes (e.g., enabling
plugins via iMessage) trigger an immediate SIGUSR1 restart, killing the
iMessage RPC connection before replies are delivered.
Both restart paths (config watcher and RPC-triggered) now defer until
all queued operations, pending replies, and embedded agent runs complete
(polling every 500ms, 30s timeout). A shared emitGatewayRestart() guard
prevents double SIGUSR1 when both paths fire simultaneously.
Key changes:
- Dispatcher registry tracks active reply dispatchers globally
- markComplete() called in finally block for guaranteed cleanup
- Pre-restart deferral hook registered at gateway startup
- Centralized extractDeliveryInfo() for session key parsing
- Post-restart sentinel messages delivered directly (not via agent)
- config-patch distinguished from config-apply in sentinel kind
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: single-source gateway restart authorization
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Fixes#15692
The previous fix was too broad — it removed the relay for ALL isolated jobs.
This broke backwards compatibility for jobs without explicit delivery config.
The correct behavior is:
- If job.delivery exists → isolated runner handles it via runSubagentAnnounceFlow
- If only legacy payload.deliver fields → relay to main if requested (original behavior)
This addresses Greptile's review feedback about runIsolatedAgentJob being an
injected dependency that might not call runSubagentAnnounceFlow.
Uses resolveCronDeliveryPlan().source to distinguish between explicit delivery
config and legacy payload-only jobs.
When an isolated cron job delivers its output via deliverOutboundPayloads
or the subagent announce flow, the finish handler in executeJobCore
unconditionally posts a summary to the main agent session and wakes it
via requestHeartbeatNow. The main agent then generates a second response
that is also delivered to the target channel, resulting in duplicate
messages with different content.
Add a `delivered` flag to RunCronAgentTurnResult that is set to true
when the isolated run successfully delivers its output. In executeJobCore,
skip the enqueueSystemEvent + requestHeartbeatNow call when the flag is
set, preventing the main agent from waking up and double-posting.
Fixes#15692
* fix(gateway): normalize session key casing to prevent ghost sessions on Linux
On case-sensitive filesystems (Linux), mixed-case session keys like
agent:ops:MySession and agent:ops:mysession resolve to different store
entries, creating ghost duplicates that never converge.
Core changes in session-utils.ts:
- resolveSessionStoreKey: lowercase all session key components
- canonicalizeSpawnedByForAgent: accept cfg, resolve main-alias references
via canonicalizeMainSessionAlias after lowercasing
- loadSessionEntry: return legacyKey only when it differs from canonicalKey
- resolveGatewaySessionStoreTarget: scan store for case-insensitive matches;
add optional scanLegacyKeys param to skip disk reads for read-only callers
- Export findStoreKeysIgnoreCase for use by write-path consumers
- Compare global/unknown sentinels case-insensitively in all canonicalization
functions
sessions-resolve.ts:
- Make resolveSessionKeyFromResolveParams async for inline migration
- Check canonical key first (fast path), then fall back to legacy scan
- Delete ALL legacy case-variant keys in a single updateSessionStore pass
Fixes#12603
* fix(gateway): propagate canonical keys and clean up all case variants on write paths
- agent.ts: use canonicalizeSpawnedByForAgent (with cfg) instead of raw
toLowerCase; use findStoreKeysIgnoreCase to delete all legacy variants
on store write; pass canonicalKey to addChatRun, registerAgentRunContext,
resolveSendPolicy, and agentCommand
- sessions.ts: replace single-key migration with full case-variant cleanup
via findStoreKeysIgnoreCase in patch/reset/delete/compact handlers; add
case-insensitive fallback in preview (store already loaded); make
sessions.resolve handler async; pass scanLegacyKeys: false in preview
- server-node-events.ts: use findStoreKeysIgnoreCase to clean all legacy
variants on voice.transcript and agent.request write paths; pass
canonicalKey to addChatRun and agentCommand
* test(gateway): add session key case-normalization tests
Cover the case-insensitive session key canonicalization logic:
- resolveSessionStoreKey normalizes mixed-case bare and prefixed keys
- resolveSessionStoreKey resolves mixed-case main aliases (MAIN, Main)
- resolveGatewaySessionStoreTarget includes legacy mixed-case store keys
- resolveGatewaySessionStoreTarget collects all case-variant duplicates
- resolveGatewaySessionStoreTarget finds legacy main alias keys with
customized mainKey configuration
All 5 tests fail before the production changes, pass after.
* fix: clean legacy session alias cleanup gaps (openclaw#12846) thanks @mcaxtr
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(agents): wait for agent idle before flushing pending tool results
When pi-agent-core's auto-retry mechanism handles overloaded/rate-limit
errors, it resolves waitForRetry() on assistant message receipt — before
tool execution completes in the retried agent loop. This causes the
attempt's finally block to call flushPendingToolResults() while tools
are still executing, inserting synthetic 'missing tool result' errors
and causing silent agent failures.
The fix adds a waitForIdle() call before the flush to ensure the agent's
retry loop (including tool execution) has fully completed.
Evidence from real session: tool call and synthetic error were only 53ms
apart — the tool never had a chance to execute before being flushed.
Root cause is in pi-agent-core's _resolveRetry() firing on message_end
instead of agent_end, but this workaround in OpenClaw prevents the
symptom without requiring an upstream fix.
Fixes#8643Fixes#13351
Refs #6682, #12595
* test: add tests for tool result flush race condition
Validates that:
- Real tool results are not replaced by synthetic errors when they arrive in time
- Flush correctly inserts synthetic errors for genuinely orphaned tool calls
- Flush is a no-op after real tool results have already been received
Refs #8643, #13748
* fix(agents): add waitForIdle to all flushPendingToolResults call sites
The original fix only covered the main run finally block, but there are
two additional call sites that can trigger flushPendingToolResults while
tools are still executing:
1. The catch block in attempt.ts (session setup error handler)
2. The finally block in compact.ts (compaction teardown)
Both now await agent.waitForIdle() with a 30s timeout before flushing,
matching the pattern already applied to the main finally block.
Production testing on VPS with debug logging confirmed these additional
paths can fire during sub-agent runs, producing spurious synthetic
'missing tool result' errors.
* fix(agents): centralize idle-wait flush and clear timeout handle
---------
Co-authored-by: Renue Development <dev@renuebyscience.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
- Adds `activity`, `status`, `activityType`, and `activityUrl` to Discord provider config schema.
- Implements a `ReadyListener` in `DiscordProvider` to apply these settings on connection.
- Solves the issue where `@buape/carbon` ignores initial presence options in constructor.
- Validated manually and via existing test suite.
* Browser/Security: constrain trace and download output paths to temp roots
* Changelog: remove advisory ID from pre-public security note
* Browser/Security: constrain trace and download output paths to temp roots
* Changelog: remove advisory ID from pre-public security note
* test(bluebubbles): align timeout status expectation to 408
* test(discord): remove unused race-condition counter in threading test
* test(bluebubbles): align timeout status expectation to 408