* feat(slack): populate thread session with existing thread history
When a new session is created for a Slack thread, fetch and inject
the full thread history as context. This preserves conversation
continuity so the bot knows what it previously said in the thread.
- Add resolveSlackThreadHistory() to fetch all thread messages
- Add ThreadHistoryBody to context payload
- Use thread history instead of just thread starter for new sessions
Fixes#4470
* chore: remove redundant comments
* fix: use threadContextNote in queue body
* fix(slack): address Greptile review feedback
- P0: Use thread session key (not base session key) for new-session check
This ensures thread history is injected when the thread session is new,
even if the base channel session already exists.
- P1: Fetch up to 200 messages and take the most recent N
Slack API returns messages in chronological order (oldest first).
Previously we took the first N, now we take the last N for relevant context.
- P1: Batch resolve user names with Promise.all
Avoid N sequential API calls when resolving user names in thread history.
- P2: Include file-only messages in thread history
Messages with attachments but no text are now included with a placeholder
like '[attached: image.png, document.pdf]'.
- P2: Add documentation about intentional 200-message fetch limit
Clarifies that we intentionally don't paginate; 200 covers most threads.
* style: add braces for curly lint rule
* feat(slack): add thread.initialHistoryLimit config option
Allow users to configure the maximum number of thread messages to fetch
when starting a new thread session. Defaults to 20. Set to 0 to disable
thread history fetching entirely.
This addresses the optional configuration request from #2608.
* chore: trigger CI
* fix(slack): ensure isNewSession=true on first thread turn
recordInboundSession() in prepare.ts creates the thread session entry
before session.ts reads the store, causing isNewSession to be false
on the very first user message in a thread. This prevented thread
context (history/starter) from being injected.
Add IsFirstThreadTurn flag to message context, set when
readSessionUpdatedAt() returns undefined for the thread session key.
session.ts uses this flag to force isNewSession=true.
* style: format prepare.ts for oxfmt
* fix: suppress InboundHistory/ThreadStarterBody when ThreadHistoryBody present (#13912)
When ThreadHistoryBody is fetched from the Slack API (conversations.replies),
it already contains pending messages and the thread starter. Passing both
InboundHistory and ThreadStarterBody alongside ThreadHistoryBody caused
duplicate content in the LLM context on new thread sessions.
Suppress InboundHistory and ThreadStarterBody when ThreadHistoryBody is
present, since it is a strict superset of both.
* remove verbose comment
* fix(slack): paginate thread history context fetch
* fix(slack): wire session file path options after main merge
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: replace file-based session store lock with in-process Promise chain mutex
Node.js is single-threaded, so file-based locking (open('wx') + polling +
stale eviction) is unnecessary and causes timeouts under heavy session load.
Replace with a simple per-storePath Promise chain that serializes access
without any filesystem overhead.
In a 1159-session environment over 3 hours:
- Lock timeouts: 25
- Stuck sessions: 157 (max 1031s, avg 388s)
- Slow listeners: 39 (max 265s, avg 70s)
Root cause: during sessions.json file I/O, await yields control and other
lock requests hit the 10s timeout waiting for the .lock file to be released.
* test: add comprehensive tests for Promise chain mutex lock
- Concurrent access serialization (10 parallel writers, counter integrity)
- Error resilience (single & multiple consecutive throws don't poison queue)
- Independent storePath parallelism (different paths run concurrently)
- LOCK_QUEUES cleanup after completion and after errors
- No .lock file created on disk
Also fix: store caught promise in LOCK_QUEUES to avoid unhandled rejection
warnings when queued fn() throws.
* fix: add timeout to Promise chain mutex to prevent infinite hangs on Windows
* fix(session-store): enforce strict queue timeout + cross-process lock
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
The newly added 'resolved' field contains secrets after ${ENV}
substitution. This commit ensures redactConfigSnapshot also redacts
the resolved field to prevent credential leaks in config.get responses.
The initial fix using snapshot.parsed broke configs with $include directives.
This commit adds a new 'resolved' field to ConfigFileSnapshot that contains
the config after $include and ${ENV} substitution but BEFORE runtime defaults
are applied. This is now used by config set/unset to avoid:
1. Breaking configs with $include directives
2. Leaking runtime defaults into the written config file
Also removes applyModelDefaults from writeConfigFile since runtime defaults
should only be applied when loading, not when writing.
* fix: exclude maxTokens and token-count fields from config redaction
The /token/i regex in SENSITIVE_KEY_PATTERNS falsely matched fields like
maxTokens, maxOutputTokens, maxCompletionTokens etc. These are numeric
config fields for token counts, not sensitive credentials.
Added a whitelist (SENSITIVE_KEY_WHITELIST) that explicitly excludes
known token-count field names from redaction. This prevents config
corruption when maxTokens gets replaced with __OPENCLAW_REDACTED__
during config round-trips.
Fixes#13236
* fix: honor deleteAfterRun for one-shot 'at' jobs with 'skipped' status
Previously, deleteAfterRun only triggered when result.status was 'ok'.
For one-shot 'at' jobs, a 'skipped' status (e.g. empty heartbeat file)
would leave the job in state but disabled, never getting cleaned up.
Now deleteAfterRun also triggers on 'skipped' status for 'at' jobs,
since a skipped one-shot job has no meaningful retry path.
Fixes#13249
* Cron: format timer.ts
---------
Co-authored-by: nice03 <niceyslee@gmail.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* Config: reload dotenv before env substitution on runtime loads
* Test: isolate config env var regression from host state env
* fix: keep dotenv vars resolvable on runtime config reloads (#12748) (thanks @rodrigouroz)
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* TypeScript: add extensions to tsconfig and fix type errors
- Add extensions/**/* to tsconfig.json includes
- Export ProviderAuthResult, AnyAgentTool from plugin-sdk
- Fix optional chaining for messageActions across channels
- Add missing type imports (MSTeamsConfig, GroupPolicy, etc.)
- Add type annotations for provider auth handlers
- Fix undici/fetch type compatibility in zalo proxy
- Correct ChannelAccountSnapshot property usage
- Add type casts for tool registrations
- Extract usage view styles and types to separate files
* TypeScript: fix optional debug calls and handleAction guards
Add xAI's Grok as a new web_search provider alongside Brave and Perplexity.
Uses the xAI /v1/responses API with tools: [{type: "web_search"}].
Configuration:
- tools.web.search.provider: "grok"
- tools.web.search.grok.apiKey or XAI_API_KEY env var
- tools.web.search.grok.model (default: grok-4-1-fast)
- tools.web.search.grok.inlineCitations (optional, embeds markdown links)
Returns AI-synthesized answers with citations similar to Perplexity.
* test: normalize paths in OPENCLAW_HOME tests for cross-platform support
* test: normalize paths in Nix integration tests for cross-platform support
* test: remove unnecessary Windows skip from pi-embedded-runner test
* test: fix nix integration tests for path.resolve behavior
* fix(paths): structurally resolve home dir to prevent Windows path bugs
Extract resolveRawHomeDir as a private function and gate the public
resolveEffectiveHomeDir through a single path.resolve() exit point.
This makes it structurally impossible for unresolved paths (missing
drive letter on Windows) to escape the function, regardless of how
many return paths exist in the raw lookup logic.
Simplify resolveRequiredHomeDir to only resolve the process.cwd()
fallback, since resolveEffectiveHomeDir now returns resolved values.
Fix shortenMeta in tool-meta.ts: the colon-based split for file:line
patterns (e.g. file.txt:12) conflicts with Windows drive letters
(C:\...) because indexOf(":") matches the drive colon first.
shortenHomeInString already handles file:line patterns correctly via
split/join, so the colon split was both unnecessary and harmful.
Update test assertions across all affected files to use path.resolve()
in expected values and input strings so they match the now-correct
resolved output on both Unix and Windows.
Fixes#12119
* fix(changelog): add paths Windows fix entry (#12125)
---------
Co-authored-by: Sebastian <19554889+sebslight@users.noreply.github.com>
* fix: use .js extension for ESM imports of RoutePeerKind
The imports incorrectly used .ts extension which doesn't resolve
with moduleResolution: NodeNext. Changed to .js and added 'type'
import modifier.
* fix tsconfig
* refactor: unify peer kind to ChatType, rename dm to direct
- Replace RoutePeerKind with ChatType throughout codebase
- Change 'dm' literal values to 'direct' in routing/session keys
- Keep backward compat: normalizeChatType accepts 'dm' -> 'direct'
- Add ChatType export to plugin-sdk, deprecate RoutePeerKind
- Update session key parsing to accept both 'dm' and 'direct' markers
- Update all channel monitors and extensions to use ChatType
BREAKING CHANGE: Session keys now use 'direct' instead of 'dm'.
Existing 'dm' keys still work via backward compat layer.
* fix tests
* test: update session key expectations for dmdirect migration
- Fix test expectations to expect :direct: in generated output
- Add explicit backward compat test for normalizeChatType('dm')
- Keep input test data with :dm: keys to verify backward compat
* fix: accept legacy 'dm' in session key parsing for backward compat
getDmHistoryLimitFromSessionKey now accepts both :dm: and :direct:
to ensure old session keys continue to work correctly.
* test: add explicit backward compat tests for dmdirect migration
- session-key.test.ts: verify both :dm: and :direct: keys are valid
- getDmHistoryLimitFromSessionKey: verify both formats work
* feat: backward compat for resetByType.dm config key
* test: skip unix-path Nix tests on Windows
Closes#5308
When users configure maxTokens larger than contextWindow (e.g., maxTokens: 40960
with contextWindow: 32768), the model may fail silently. This fix clamps
maxTokens to be at most contextWindow, preventing such invalid configurations.
* feat(bluebubbles): auto-strip markdown from outbound messages (#7402)
* fix(security): add timeout to webhook body reading (#6762)
Adds 30-second timeout to readBody() in voice-call, bluebubbles, and nostr
webhook handlers. Prevents Slow-Loris DoS (CWE-400, CVSS 7.5).
Merged with existing maxBytes protection in voice-call.
* fix(security): unify Error objects and lint fixes in webhook timeouts (#6762)
* fix: prevent plugins from auto-enabling without user consent (#3961)
Changes default plugin enabled state from true to false in enablePluginEntry().
Preserves existing enabled:true values. Fixes#3932.
* fix: apply hierarchical mediaMaxMb config to all channels (#8749)
Generalizes resolveAttachmentMaxBytes() to use account → channel → global
config resolution for all channels, not just BlueBubbles. Fixes#7847.
* fix(bluebubbles): sanitize attachment filenames against header injection (#10333)
Strip ", \r, \n, and \\ from filenames after path.basename() to prevent
multipart Content-Disposition header injection (CWE-93, CVSS 5.4).
Also adds sanitization to setGroupIconBlueBubbles which had zero filename
sanitization.
* fix(lint): exclude extensions/ from Oxlint preflight check (#9313)
Extensions use PluginRuntime|null patterns that trigger
no-redundant-type-constituents because PluginRuntime resolves to any.
Excluding extensions/ from Oxlint unblocks user upgrades.
Re-applies the approach from closed PR #10087.
* fix(bluebubbles): add tempGuid to createNewChatWithMessage payload (#7745)
Non-Private-API mode (AppleScript) requires tempGuid in send payloads.
The main sendMessageBlueBubbles already had it, but createNewChatWithMessage
was missing it, causing 400 errors for new chat creation without Private API.
* fix: send stop-typing signal when run ends with NO_REPLY (#8785)
Adds onCleanup callback to the typing controller that fires when the
controller is cleaned up while typing was active (e.g., after NO_REPLY).
Channels using createTypingCallbacks automatically get stop-typing on
cleanup. This prevents the typing indicator from lingering in group chats
when the agent decides not to reply.
* fix(telegram): deduplicate skill commands in multi-agent setup (#5717)
Two fixes:
1. Skip duplicate workspace dirs when listing skill commands across agents.
Multiple agents sharing the same workspace would produce duplicate commands
with _2, _3 suffixes.
2. Clear stale commands via deleteMyCommands before registering new ones.
Commands from deleted skills now get cleaned up on restart.
* fix: add size limits to unbounded in-memory caches (#4948)
Adds max-size caps with oldest-entry eviction to prevent OOM in
long-running deployments:
- BlueBubbles serverInfoCache: 64 entries (already has TTL)
- Google Chat authCache: 32 entries
- Matrix directRoomCache: 1024 entries
- Discord presenceCache: 5000 entries per account
* fix: address review concerns (#11093)
- Chain deleteMyCommands → setMyCommands to prevent race condition (#5717)
- Rename enablePluginEntry to registerPluginEntry (now sets enabled: false)
- Add Slow-Loris timeout test for readJsonBody (#6023)
* feat(memory): add native Voyage AI embedding support with batching
Cherry-picked from PR #2519, resolved conflict in memory-search.ts
(hasRemote -> hasRemoteConfig rename + added voyage provider)
* fix(memory): optimize voyage batch memory usage with streaming and deduplicate code
Cherry-picked from PR #2519. Fixed lint error: changed this.runWithConcurrency
to use imported runWithConcurrency function after extraction to internal.ts
* fix(ollama): add streaming config and fix OLLAMA_API_KEY env var support
Adds configurable streaming parameter to model configuration and sets streaming
to false by default for Ollama models. This addresses the corrupted response
issue caused by upstream SDK bug badlogic/pi-mono#1205 where interleaved
content/reasoning deltas in streaming responses cause garbled output.
Changes:
- Add streaming param to AgentModelEntryConfig type
- Set streaming: false default for Ollama models
- Add OLLAMA_API_KEY to envMap (was missing, preventing env var auth)
- Document streaming configuration in Ollama provider docs
- Add tests for Ollama model configuration
Users can now configure streaming per-model and Ollama authentication
via OLLAMA_API_KEY environment variable works correctly.
Fixes#8839
Related: badlogic/pi-mono#1205
* docs(ollama): use gpt-oss:20b as primary example
Updates documentation to use gpt-oss:20b as the primary example model
since it supports tool calling. The model examples now show:
- gpt-oss:20b as the primary recommended model (tool-capable)
- llama3.3 and qwen2.5-coder:32b as additional options
This provides users with a clear, working example that supports
OpenClaw's tool calling features.
* chore: remove unused vi import from ollama test
* security: add skill/plugin code safety scanner module
* security: integrate skill scanner into security audit
* security: add pre-install code safety scan for plugins
* style: fix curly brace lint errors in skill-scanner.ts
* docs: add changelog entry for skill code safety scanner
* security: redact credentials from config.get gateway responses
The config.get gateway method returned the full config snapshot
including channel credentials (Discord tokens, Slack botToken/appToken,
Telegram botToken, Feishu appSecret, etc.), model provider API keys,
and gateway auth tokens in plaintext.
Any WebSocket client—including the unauthenticated Control UI when
dangerouslyDisableDeviceAuth is set—could read every secret.
This adds redactConfigSnapshot() which:
- Deep-walks the config object and masks any field whose key matches
token, password, secret, or apiKey patterns
- Uses the existing redactSensitiveText() to scrub the raw JSON5 source
- Preserves the hash for change detection
- Includes 15 test cases covering all channel types
* security: make gateway config writes return redacted values
* test: disable control UI by default in gateway server tests
* fix: redact credentials in gateway config APIs (#9858) (thanks @abdelsfane)
---------
Co-authored-by: George Pickett <gpickett00@gmail.com>