* fix(gateway): drain active turns before restart to prevent message loss
On SIGUSR1 restart, the gateway now waits up to 30s for in-flight agent
turns to complete before tearing down the server. This prevents buffered
messages from being dropped when config.patch or update triggers a restart
while agents are mid-turn.
Changes:
- command-queue.ts: add getActiveTaskCount() and waitForActiveTasks()
helpers to track and wait on active lane tasks
- run-loop.ts: on restart signal, drain active tasks before server.close()
with a 30s timeout; extend force-exit timer accordingly
- command-queue.test.ts: update imports for new exports
Fixes#13883
* fix(queue): snapshot active tasks for restart drain
---------
Co-authored-by: Elonito <0xRaini@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
When the gateway is installed as a macOS launch agent and no token is
configured, the service enters an infinite restart loop because launchd
does not inherit shell environment variables. Auto-generate a token
during `gateway install` when auth mode is `token` and no token exists,
matching the existing pattern in doctor.ts and configure.gateway.ts.
The token is persisted to the config file and embedded in the plist
EnvironmentVariables for belt-and-suspenders reliability.
Relates-to: #5103, #2433, #1690, #7749
* feat: add LiteLLM provider types, env var, credentials, and auth choice
Add litellm-api-key auth choice, LITELLM_API_KEY env var mapping,
setLitellmApiKey() credential storage, and LITELLM_DEFAULT_MODEL_REF.
* feat: add LiteLLM onboarding handler and provider config
Add applyLitellmProviderConfig which properly registers
models.providers.litellm with baseUrl, api type, and model definitions.
This fixes the critical bug from PR #6488 where the provider entry was
never created, causing model resolution to fail at runtime.
* docs: add LiteLLM provider documentation
Add setup guide covering onboarding, manual config, virtual keys,
model routing, and usage tracking. Link from provider index.
* docs: add LiteLLM to sidebar navigation in docs.json
Add providers/litellm to both English and Chinese provider page lists
so the docs page appears in the sidebar navigation.
* test: add LiteLLM non-interactive onboarding test
Wire up litellmApiKey flag inference and auth-choice handler for the
non-interactive onboarding path, and add an integration test covering
profile, model default, and credential storage.
* fix: register --litellm-api-key CLI flag and add preferred provider mapping
Wire up the missing Commander CLI option, action handler mapping, and
help text for --litellm-api-key. Add litellm-api-key to the preferred
provider map for consistency with other providers.
* fix: remove zh-CN sidebar entry for litellm (no localized page yet)
* style: format buildLitellmModelDefinition return type
* fix(onboarding): harden LiteLLM provider setup (#12823)
* refactor(onboarding): keep auth-choice provider dispatcher under size limit
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* refactor: consolidate duplicate utility functions
- Add escapeRegExp to src/utils.ts and remove 10 local duplicates
- Rename bash-tools clampNumber to clampWithDefault (different signature)
- Centralize formatError calls to use formatErrorMessage from infra/errors.ts
- Re-export formatErrorMessage from cli/cli-utils.ts to preserve API
* refactor: consolidate remaining escapeRegExp duplicates
* refactor: consolidate sleep, stripAnsi, and clamp duplicates
* fix(paths): structurally resolve home dir to prevent Windows path bugs
Extract resolveRawHomeDir as a private function and gate the public
resolveEffectiveHomeDir through a single path.resolve() exit point.
This makes it structurally impossible for unresolved paths (missing
drive letter on Windows) to escape the function, regardless of how
many return paths exist in the raw lookup logic.
Simplify resolveRequiredHomeDir to only resolve the process.cwd()
fallback, since resolveEffectiveHomeDir now returns resolved values.
Fix shortenMeta in tool-meta.ts: the colon-based split for file:line
patterns (e.g. file.txt:12) conflicts with Windows drive letters
(C:\...) because indexOf(":") matches the drive colon first.
shortenHomeInString already handles file:line patterns correctly via
split/join, so the colon split was both unnecessary and harmful.
Update test assertions across all affected files to use path.resolve()
in expected values and input strings so they match the now-correct
resolved output on both Unix and Windows.
Fixes#12119
* fix(changelog): add paths Windows fix entry (#12125)
---------
Co-authored-by: Sebastian <19554889+sebslight@users.noreply.github.com>
* fix: use STATE_DIR instead of hardcoded ~/.openclaw for identity and canvas
device-identity.ts and canvas-host/server.ts used hardcoded
path.join(os.homedir(), '.openclaw', ...) ignoring OPENCLAW_STATE_DIR
env var and the resolveStateDir() logic from config/paths.ts.
This caused ~/.openclaw/identity and ~/.openclaw/canvas directories
to be created even when state dir was overridden or resided elsewhere.
* fix: format and remove duplicate imports
* fix: scope state-dir patch + add regression tests (#4824) (thanks @kossoy)
* fix: align state-dir fallbacks in hooks and agent paths (#4824) (thanks @kossoy)
---------
Co-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com>
* fix(gateway): use LAN IP for WebSocket/probe URLs when bind=lan (#11329)
When gateway.bind=lan, the HTTP server correctly binds to 0.0.0.0
(all interfaces), but WebSocket connection URLs, probe targets, and
Control UI links were hardcoded to 127.0.0.1. This caused CLI commands
and status probes to show localhost-only URLs even in LAN mode, and
made onboarding display misleading connection info.
- Add pickPrimaryLanIPv4() to gateway/net.ts to detect the machine's
primary LAN IPv4 address (prefers en0/eth0, falls back to any
external interface)
- Update pickProbeHostForBind() to use LAN IP when bind=lan
- Update buildGatewayConnectionDetails() to use LAN IP and report
"local lan <ip>" as the URL source
- Update resolveControlUiLinks() to return LAN-accessible URLs
- Update probe note in status.gather.ts to reflect new behavior
- Add tests for pickPrimaryLanIPv4 and bind=lan URL resolution
Closes#11329
Co-authored-by: Cursor <cursoragent@cursor.com>
* test: move vi.restoreAllMocks to afterEach in pickPrimaryLanIPv4
Per review feedback: avoid calling vi.restoreAllMocks() inside
individual tests as it restores all spies globally and can cause
ordering issues. Use afterEach in the describe block instead.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Changelog: note LAN bind URLs fix (#11448) (thanks @AnonO6)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* refactor: update cron job wake mode and run mode handling
- Changed default wake mode from 'next-heartbeat' to 'now' in CronJobEditor and related CLI commands.
- Updated cron-tool tests to reflect changes in run mode, introducing 'due' and 'force' options.
- Enhanced cron-tool logic to handle new run modes and ensure compatibility with existing job structures.
- Added new tests for delivery plan consistency and job execution behavior under various conditions.
- Improved normalization functions to handle wake mode and session target casing.
This refactor aims to streamline cron job configurations and enhance the overall user experience with clearer defaults and improved functionality.
* test: enhance cron job functionality and UI
- Added tests to ensure the isolated agent correctly announces the final payload text when delivering messages via Telegram.
- Implemented a new function to pick the last deliverable payload from a list of delivery payloads.
- Enhanced the cron service to maintain legacy "every" jobs while minute cron jobs recompute schedules.
- Updated the cron store migration tests to verify the addition of anchorMs to legacy every schedules.
- Improved the UI for displaying cron job details, including job state and delivery information, with new styles and layout adjustments.
These changes aim to improve the reliability and user experience of the cron job system.
* test: enhance sessions thinking level handling
- Added tests to verify that the correct thinking levels are applied during session spawning.
- Updated the sessions-spawn-tool to include a new parameter for overriding thinking levels.
- Enhanced the UI to support additional thinking levels, including "xhigh" and "full", and improved the handling of current options in dropdowns.
These changes aim to improve the flexibility and accuracy of thinking level configurations in session management.
* feat: enhance session management and cron job functionality
- Introduced passthrough arguments in the test-parallel script to allow for flexible command-line options.
- Updated session handling to hide cron run alias session keys from the sessions list, improving clarity.
- Enhanced the cron service to accurately record job start times and durations, ensuring better tracking of job execution.
- Added tests to verify the correct behavior of the cron service under various conditions, including zero-delay timers.
These changes aim to improve the usability and reliability of session and cron job management.
* feat: implement job running state checks in cron service
- Added functionality to prevent manual job runs if a job is already in progress, enhancing job management.
- Updated the `isJobDue` function to include checks for running jobs, ensuring accurate scheduling.
- Enhanced the `run` function to return a specific reason when a job is already running.
- Introduced a new test case to verify the behavior of forced manual runs during active job execution.
These changes aim to improve the reliability and clarity of cron job execution and management.
* feat: add session ID and key to CronRunLogEntry model
- Introduced `sessionid` and `sessionkey` properties to the `CronRunLogEntry` struct for enhanced tracking of session-related information.
- Updated the initializer and Codable conformance to accommodate the new properties, ensuring proper serialization and deserialization.
These changes aim to improve the granularity of logging and session management within the cron job system.
* fix: improve session display name resolution
- Updated the `resolveSessionDisplayName` function to ensure that both label and displayName are trimmed and default to an empty string if not present.
- Enhanced the logic to prevent returning the key if it matches the label or displayName, improving clarity in session naming.
These changes aim to enhance the accuracy and usability of session display names in the UI.
* perf: skip cron store persist when idle timer tick produces no changes
recomputeNextRuns now returns a boolean indicating whether any job
state was mutated. The idle path in onTimer only persists when the
return value is true, eliminating unnecessary file writes every 60s
for far-future or idle schedules.
* fix: prep for merge - explicit delivery mode migration, docs + changelog (#10776) (thanks @tyler6204)
* fix(cron): handle undefined sessionTarget in list output (#9649)
When sessionTarget is undefined, pad() would crash with 'Cannot read
properties of undefined (reading trim)'. Use '-' as fallback value.
* test(cron): add regression test for undefined sessionTarget (#9649)
Verifies that printCronList handles jobs with undefined sessionTarget
without crashing. Test fails on main branch, passes with the fix.
* fix: use correct CronSchedule format in tests (#9752) (thanks @lailoo)
Tests were using { kind: 'at', atMs: number } but the CronSchedule type
requires { kind: 'at', at: string } where 'at' is an ISO date string.
---------
Co-authored-by: damaozi <1811866786@qq.com>
Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
- Replace inline completion logic with `checkShellCompletionStatus` and `ensureCompletionCacheExists`
- Auto-upgrade old slow dynamic patterns silently during update
- Auto-regenerate cache if profile exists but cache is missing
- Prompt to install if no completion is configured
- Export `resolveCompletionCachePath` and `completionCacheExists` for external use
- Update `installCompletion` to require cache existence (never use slow dynamic pattern)
- Add `usesSlowDynamicCompletion` to detect old `source <(...)` patterns
- Add `getShellProfilePath` helper for consistent profile path resolution
- Update `formatCompletionSourceLine` to always use cached file
- Add forceReload option to ensureLoaded to avoid stat I/O in normal
paths while still detecting cross-service writes in the timer path
- Post isolated job summary back to main session (restores the old
isolation.postToMainPrefix behavior via delivery model)
- Update legacy migration tests to check delivery.channel instead of
payload.channel (normalization now moves delivery fields to top-level)
- Remove legacy deliver/channel/to/bestEffortDeliver from payload schema
- Update protocol conformance test for delivery modes
- Regenerate GatewayModels.swift (isolation -> delivery)
- Enhanced the delivery configuration logic in CronJobEditor to explicitly set the bestEffort property based on job settings.
- Refactored the CLI command to streamline delivery object creation, ensuring proper handling of optional fields like channel and to.
- Improved code readability and maintainability by restructuring delivery assignment logic.
This update clarifies the delivery configuration process, enhancing the reliability of job settings in both the editor and CLI.
- Updated isolated cron jobs to support new delivery modes: `announce` and `none`, improving output management.
- Refactored job configuration to remove legacy fields and streamline delivery settings.
- Enhanced the `CronJobEditor` UI to reflect changes in delivery options, including a new segmented control for delivery mode selection.
- Updated documentation to clarify the new delivery configurations and their implications for job execution.
- Improved tests to validate the new delivery behavior and ensure backward compatibility with legacy settings.
This update provides users with greater flexibility in managing how isolated jobs deliver their outputs, enhancing overall usability and clarity in job configurations.