Fixes duplicate message processing in Slack DMs where both message.im
and app_mention events fire for the same message, causing:
- 2x token/credit usage per message
- 2x API calls
- Duplicate agent invocations with same runId
Root cause: app_mention events should only fire for channel mentions,
not DMs. Added channel_type check to skip im/mpim in app_mention handler.
Evidence of bug (from production logs):
- Same runId firing twice within 200-300ms
- Example: runId 13cd482c... at 20:32:42.699Z and 20:32:42.954Z
After fix:
- One message = one runId = one processing run
- 50% reduction in duplicate processing
Slack's Events API includes the parent message's files array in every
thread reply event payload. This caused OpenClaw to re-download and
attach the parent's files to every text-only thread reply, creating
ghost media attachments.
The fix filters out files that belong to the thread starter by comparing
file IDs. The resolveSlackThreadStarter result is already cached, so
this adds no extra API calls.
Closes#32203
The session-store cache used only mtime for invalidation. In fast CI
runs (especially under bun), test writes to the session store can
complete within the same filesystem mtime granularity (~1s on HFS+/ext4),
so the cache returns stale data. This caused non-deterministic failures
in model precedence tests where a session override written to disk was
not observed by the next loadSessionStore() call.
Fix: add file size as a secondary cache invalidation signal. The cache
now checks both mtimeMs and sizeBytes — if either differs from the
cached values, it reloads from disk.
Changes:
- cache-utils.ts: add getFileSizeBytes() helper
- sessions/store.ts: extend SessionStoreCacheEntry with sizeBytes field,
check size in cache-hit path, populate size on cache writes
- sessions.cache.test.ts: add regression test for same-mtime rewrite
When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.
Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
(string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
resolves channel/to from ctx, guards on isDeliverableMessageChannel,
and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
covering disabled/enabled/custom-format/no-audio/failed-transcription/
non-deliverable-channel/missing-from/OriginatingTo/delivery-failure
Default echoFormat: '📝 "{transcript}"'
Closes#32102
Expose audio transcription through the PluginRuntime so external
plugins (e.g. marmot) can use openclaw's media-understanding provider
framework without importing unexported internal modules.
The new transcribeAudioFile() wraps runCapability({capability: "audio"})
and reads provider/model/apiKey from tools.media.audio in the config,
matching the pattern used by the Discord VC implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase test audio file sizes to meet MIN_AUDIO_FILE_BYTES (1024) threshold
introduced by the skip-empty-audio feature. Fix localPathRoots in skip-tiny-audio
tests so temp files pass path validation. Remove undefined loadApply() call
in apply.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a minimum file size guard (MIN_AUDIO_FILE_BYTES = 1024) before
sending audio to transcription APIs. Files below this threshold are
almost certainly empty or corrupt and would cause unhelpful errors
from Whisper/Deepgram/Groq providers.
Changes:
- Add 'tooSmall' skip reason to MediaUnderstandingSkipError
- Add MIN_AUDIO_FILE_BYTES constant (1024 bytes) to defaults
- Guard both provider and CLI audio paths in runner.ts
- Add comprehensive tests for tiny, empty, and valid audio files
- Update existing test fixtures to use audio files above threshold
runProviderEntry now calls resolveProxyFetchFromEnv() and passes the
result as fetchFn to transcribeAudio/describeVideo, so media provider
API calls respect HTTPS_PROXY/HTTP_PROXY behind corporate proxies.
Move makeProxyFetch to src/infra/net/proxy-fetch.ts and add
resolveProxyFetchFromEnv which reads standard proxy env vars
(HTTPS_PROXY, HTTP_PROXY, and lowercase variants) and returns a
proxy-aware fetch via undici's EnvHttpProxyAgent. Telegram re-exports
from the shared location to avoid duplication.
The openai provider implements transcribeAudio via
transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities
array only declared ["image"]. This caused the media-understanding
runner to skip the openai provider when processing inbound audio
messages, resulting in raw audio files being passed to agents
instead of transcribed text.
Fix: Add "audio" to the capabilities array so the runner correctly
selects the openai provider for audio transcription.
Co-authored-by: Cursor <cursoragent@cursor.com>