Commit Graph

453 Commits

Author SHA1 Message Date
Sid
8b8167d547 fix(agents): bypass pendingDescendantRuns guard for cron announce delivery (#35185)
* fix(agents): bypass pendingDescendantRuns guard for cron announce delivery

Standalone cron job completions were blocked from direct channel delivery
when the cron run had spawned subagents that were still registered as
pending. The pendingDescendantRuns guard exists for live orchestration
coordination and should not apply to fire-and-forget cron announce sends.

Thread the announceType through the delivery chain and skip both the
child-descendant and requester-descendant pending-run guards when the
announce originates from a cron job.

Closes #34966

* fix: ensure outbound session entry for cron announce with named agents (#32432)

Named agents may not have a session entry for their delivery target,
causing the announce flow to silently fail (delivered=false, no error).

Two fixes:
1. Call ensureOutboundSessionEntry when resolving the cron announce
   session key so downstream delivery can find channel metadata.
2. Fall back to direct outbound delivery when announce delivery fails
   to ensure cron output reaches the target channel.

Closes #32432

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: guard announce direct-delivery fallback against suppression leaks (#32432)

The `!delivered` fallback condition was too broad — it caught intentional
suppressions (active subagents, interim messages, SILENT_REPLY_TOKEN) in
addition to actual announce delivery failures.  Add an
`announceDeliveryWasAttempted` flag so the direct-delivery fallback only
fires when `runSubagentAnnounceFlow` was actually called and failed.

Also remove the redundant `if (route)` guard in
`resolveCronAnnounceSessionKey` since `resolved` being truthy guarantees
`route` is non-null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cron): harden announce synthesis follow-ups

---------

Co-authored-by: scoootscooob <zhentongfan@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-04 21:31:33 -06:00
Madoka
63ce7c74bd fix(feishu): comprehensive reply mechanism — outbound replyToId forwarding + topic-aware reply targeting (#33789)
* fix(feishu): comprehensive reply mechanism fix — outbound replyToId forwarding + topic-aware reply targeting

- Forward replyToId from ChannelOutboundContext through sendText/sendMedia
  to sendMessageFeishu/sendMarkdownCardFeishu/sendMediaFeishu, enabling
  reply-to-message via the message tool.

- Fix group reply targeting: use ctx.messageId (triggering message) in
  normal groups to prevent silent topic thread creation (#32980). Preserve
  ctx.rootId targeting for topic-mode groups (group_topic/group_topic_sender)
  and groups with explicit replyInThread config.

- Add regression tests for both fixes.

Fixes #32980
Fixes #32958
Related #19784

* fix: normalize Feishu delivery.to before comparing with messaging tool targets

- Add normalizeDeliveryTarget helper to strip user:/chat: prefixes for Feishu
- Apply normalization in matchesMessagingToolDeliveryTarget before comparison
- This ensures cron duplicate suppression works when session uses prefixed targets
  (user:ou_xxx) but messaging tool extract uses normalized bare IDs (ou_xxx)

Fixes review comment on PR #32755

(cherry picked from commit fc20106f16ccc88a5f02e58922bb7b7999fe9dcd)

* fix(feishu): catch thrown SDK errors for withdrawn reply targets

The Feishu Lark SDK can throw exceptions (SDK errors with .code or
AxiosErrors with .response.data.code) for withdrawn/deleted reply
targets, in addition to returning error codes in the response object.

Wrap reply calls in sendMessageFeishu and sendCardFeishu with
try-catch to handle thrown withdrawn/not-found errors (230011,
231003) and fall back to client.im.message.create, matching the
existing response-level fallback behavior.

Also extract sendFallbackDirect helper to deduplicate the
direct-send fallback block across both functions.

Closes #33496

(cherry picked from commit ad0901aec103a2c52f186686cfaf5f8ba54b4a48)

* feishu: forward outbound reply target context

(cherry picked from commit c129a691fcf552a1cebe1e8a22ea8611ffc3b377)

* feishu extension: tighten reply target fallback semantics

(cherry picked from commit f85ec610f267020b66713c09e648ec004b2e26f1)

* fix(feishu): align synthesized fallback typing and changelog attribution

* test(feishu): cover group_topic_sender reply targeting

---------

Co-authored-by: Xu Zimo <xuzimojimmy@163.com>
Co-authored-by: Munem Hashmi <munem.hashmi@gmail.com>
Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-04 20:32:28 -06:00
Gustavo Madeira Santana
e4b4486a96 Agent: unify bootstrap truncation warning handling (#32769)
Merged via squash.

Prepared head SHA: 5d6d4ddfa620011e267d892b402751847d5ac0c3
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 16:28:38 -05:00
Peter Steinberger
5193189953 refactor(tests): dedupe cron store migration setup 2026-03-03 01:54:27 +00:00
Peter Steinberger
fbb88d5063 refactor(tests): dedupe isolated agent cron turn assertions 2026-03-03 01:54:27 +00:00
Peter Steinberger
61f29830bc fix(test): resolve upstream typing drift in feishu and cron suites 2026-03-03 01:44:21 +00:00
Peter Steinberger
47736e3432 refactor(test): extract cron issue-regression harness and frozen-time helper 2026-03-03 01:44:21 +00:00
David Rudduck
11e1363d2d feat(hooks): add trigger and channelId to plugin hook agent context (#28623)
* feat(hooks): add trigger and channelId to plugin hook agent context

Adds `trigger` and `channelId` fields to `PluginHookAgentContext` so
plugins can determine what initiated the agent run and which channel
it originated from, without session-key parsing or Redis bridging.

trigger values: "user", "heartbeat", "cron", "memory"
channelId values: "telegram", "discord", "whatsapp", etc.

Both fields are threaded through run.ts and attempt.ts hookCtx so all
hook phases receive them (before_model_resolve, before_prompt_build,
before_agent_start, llm_input, llm_output, agent_end).

channelId falls back from messageChannel to messageProvider when the
former is not set. followup-runner passes originatingChannel so queued
followup runs also carry channel context.

* docs(changelog): note hook context parity fix for #28623

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-02 17:39:20 -08:00
Peter Steinberger
0f5f20ee6b refactor(tests): dedupe cron delivered status assertions 2026-03-03 01:37:12 +00:00
HCL
586f057c24 fix(cron): let resolveOutboundTarget handle missing delivery target fallback
The cron delivery path short-circuits with an error when `toCandidate` is
falsy (line 151), before reaching `resolveOutboundTarget()` which provides
the `plugin.config.resolveDefaultTo()` fallback. The direct send path in
`targets.ts` already uses this fallback correctly.

Remove the early `!toCandidate` exit so that `resolveOutboundTarget()`
can attempt the plugin-provided default. Guard the WhatsApp allowFrom
override against falsy `toCandidate` to maintain existing behavior when
a target IS resolved.

Fixes #32355

Signed-off-by: HCL <chenglunhu@gmail.com>
2026-03-03 01:09:47 +00:00
Peter Steinberger
d7bafae387 perf(test): trim fixture and serialization overhead in integration suites 2026-03-03 01:09:07 +00:00
Peter Steinberger
282b107e99 test(perf): speed up cron, memory, and secrets hotspots 2026-03-03 00:43:01 +00:00
Peter Steinberger
6a42d09129 refactor: dedupe gateway config and infra flows 2026-03-03 00:15:14 +00:00
ningding97
4d19dc8671 test(cron): assert embedded model on last call to avoid bun ordering flake
Bun runs can trigger multiple embedded agent invocations in a single cron
turn (e.g. retries/fallbacks), making assertions against call[0] flaky.
Assert against the last invocation instead.
2026-03-02 23:36:13 +00:00
Peter Steinberger
f9cbcfca0d refactor: modularize slack/config/cron/daemon internals 2026-03-02 22:30:21 +00:00
Peter Steinberger
7253e91300 fix: strengthen cron heartbeat multi-payload suppression (#32131) (thanks @adhishthite) 2026-03-02 22:16:18 +00:00
Adhish
2330c71b63 fix(cron): suppress delivery when multi-payload response contains HEARTBEAT_OK
When a cron agent emits multiple text payloads (narration + tool
summaries) followed by a final HEARTBEAT_OK, the delivery suppression
check `isHeartbeatOnlyResponse` fails because it uses `.every()` —
requiring ALL payloads to be heartbeat tokens. In practice, agents
narrate their work before signaling nothing needs attention.

Fix: check if ANY payload contains HEARTBEAT_OK (`.some()`) while
preserving the media delivery exception (if any payload has media,
always deliver). This matches the semantic intent: HEARTBEAT_OK is
the agent's explicit signal that nothing needs user attention.

Real-world example: heartbeat agent returns 3 payloads:
1. "It's 12:49 AM — quiet hours. Let me run the checks quickly."
2. "Emails: Just 2 calendar invites. Not urgent."
3. "HEARTBEAT_OK"

Previously: all 3 delivered to Telegram. Now: correctly suppressed.

Related: #32013 (fixed a different HEARTBEAT_OK leak path via system
events in timer.ts)
2026-03-02 22:16:18 +00:00
Peter Steinberger
91dd89313a refactor(core): dedupe command, hook, and cron fixtures 2026-03-02 21:31:36 +00:00
Peter Steinberger
f7765bc151 perf(cron): cache schedule evaluators and stagger offsets 2026-03-02 20:19:10 +00:00
Peter Steinberger
b1c30f0ba9 refactor: dedupe cli config cron and install flows 2026-03-02 19:57:33 +00:00
bmendonca3
4cd04e4652 fix(cron): migrate legacy string schedule and command jobs 2026-03-02 19:55:19 +00:00
scoootscooob
a3c5d21b4d fix(cron): suppress HEARTBEAT_OK summary from leaking into main session (#32013)
When an isolated cron agent returns HEARTBEAT_OK (nothing to announce),
the direct delivery is correctly skipped, but the fallback path in
timer.ts still enqueues the summary as a system event to the main
session. Filter out heartbeat-only summaries using isCronSystemEvent
before enqueuing, so internal ack tokens never reach user conversations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 19:51:51 +00:00
Peter Steinberger
39afcee864 test(perf): trim cron and audit fixture overhead 2026-03-02 19:48:02 +00:00
Peter Steinberger
1616113170 perf(runtime): reduce cron persistence and logger overhead 2026-03-02 19:34:04 +00:00
Peter Steinberger
83ec545bed test(perf): trim repeated setup in cron memory and config suites 2026-03-02 19:16:46 +00:00
Peter Steinberger
99392f9868 chore: keep #31930 scoped to voice webhook path fix 2026-03-02 18:57:46 +00:00
Andrii Furmanets
662f389f45 Tests: isolate webhook path suite and reset cron auth state 2026-03-02 18:57:46 +00:00
scoootscooob
4030de6c73 fix(cron): move session reaper to finally block so it runs reliably (#31996)
* fix(cron): move session reaper to finally block so it runs reliably

The cron session reaper was placed inside the try block of onTimer(),
after job execution and state updates. If the locked persist section
threw, the reaper was skipped — causing isolated cron run sessions to
accumulate indefinitely in sessions.json.

Move the reaper into the finally block so it always executes after a
timer tick, regardless of whether job execution succeeded. The reaper
is already self-throttled (MIN_SWEEP_INTERVAL_MS = 5 min) so calling
it more reliably has no performance impact.

Closes #31946

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: strengthen cron reaper failure-path coverage and changelog (#31996) (thanks @scoootscooob)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-03-02 18:38:59 +00:00
bboyyan
d94de5c4a1 fix(cron): normalize topic-qualified target.to in messaging tool suppress check (#29480)
* fix(cron): pass job.delivery.accountId through to delivery target resolution

* fix(cron): normalize topic-qualified target.to in messaging tool suppress check

When a cron job targets a Telegram forum topic (e.g. delivery.to =
"-1003597428309:topic:462"), delivery.to is stripped to the chatId
only by resolveOutboundTarget. However, the agent's message tool may
pass the full topic-qualified address as its target, causing
matchesMessagingToolDeliveryTarget to fail the equality check and not
suppress the tool send.

Strip the :topic:NNN suffix from target.to before comparing so the
suppress check works correctly for topic-bound cron deliveries.
Without this, the agent's message tool fires separately using the
announce session's accountId (often "default"), hitting 403 when
default bot is not in the multi-account target group.

* fix(cron): remove duplicate accountId keys after rebase

---------

Co-authored-by: jaxpkm <jaxpkm@jaxpkmdeMac-mini.local>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-02 10:32:06 -06:00
Glucksberg
09f49cd921 fix(cron): accept delivery mode "none" for sessionTarget="main" (#27431) (#28871) 2026-03-02 10:32:00 -06:00
Peter Steinberger
14fbd0e6b6 test(perf): reduce timer teardown overhead in cron issue regressions 2026-03-02 16:06:04 +00:00
Peter Steinberger
a3d2021eea test(cron): stabilize model precedence mocks in bun runs (#31594) 2026-03-02 15:47:21 +00:00
Peter Steinberger
998d477f5e test: stabilize cross-platform regression suites (#31594) 2026-03-02 15:47:21 +00:00
Evgeny Zislis
4b4ea5df8b feat(cron): add failure destination support to failed cron jobs (#31059)
* feat(cron): add failure destination support with webhook mode and bestEffort handling

Extends PR #24789 failure alerts with features from PR #29145:
- Add webhook delivery mode for failure alerts (mode: 'webhook')
- Add accountId support for multi-account channel configurations
- Add bestEffort handling to skip alerts when job has bestEffort=true
- Add separate failureDestination config (global + per-job in delivery)
- Add duplicate prevention (prevents sending to same as primary delivery)
- Add CLI flags: --failure-alert-mode, --failure-alert-account-id
- Add UI fields for new options in web cron editor

* fix(cron): merge failureAlert mode/accountId and preserve failureDestination on updates

- Fix mergeCronFailureAlert to merge mode and accountId fields
- Fix mergeCronDelivery to preserve failureDestination on updates
- Fix isSameDeliveryTarget to use 'announce' as default instead of 'none'
  to properly detect duplicates when delivery.mode is undefined

* fix(cron): validate webhook mode requires URL in resolveFailureDestination

When mode is 'webhook' but no 'to' URL is provided, return null
instead of creating an invalid plan that silently fails later.

* fix(cron): fail closed on webhook mode without URL and make failureDestination fields clearable

- sendCronFailureAlert: fail closed when mode is webhook but URL is missing
- mergeCronDelivery: use per-key presence checks so callers can clear
  nested failureDestination fields via cron.update

Note: protocol:check shows missing internalEvents in Swift models - this is
a pre-existing issue unrelated to these changes (upstream sync needed).

* fix(cron): use separate schema for failureDestination and fix type cast

- Create CronFailureDestinationSchema excluding after/cooldownMs fields
- Fix type cast in sendFailureNotificationAnnounce to use CronMessageChannel

* fix(cron): merge global failureDestination with partial job overrides

When job has partial failureDestination config, fall back to global
config for unset fields instead of treating it as a full override.

* fix(cron): avoid forcing announce mode and clear inherited to on mode change

- UI: only include mode in patch if explicitly set to non-default
- delivery.ts: clear inherited 'to' when job overrides mode, since URL
  semantics differ between announce and webhook modes

* fix(cron): preserve explicit to on mode override and always include mode in UI patches

- delivery.ts: preserve job-level explicit 'to' when overriding mode
- UI: always include mode in failureAlert patch so users can switch between announce/webhook

* fix(cron): allow clearing accountId and treat undefined global mode as announce

- UI: always include accountId in patch so users can clear it
- delivery.ts: treat undefined global mode as announce when comparing for clearing inherited 'to'

* Cron: harden failure destination routing and add regression coverage

* Cron: resolve failure destination review feedback

* Cron: drop unrelated timeout assertions from conflict resolution

* Cron: format cron CLI regression test

* Cron: align gateway cron test mock types

---------

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-02 09:27:41 -06:00
Peter Steinberger
a905b6dabc test(perf): merge duplicate one-shot retry regression paths 2026-03-02 15:23:58 +00:00
Peter Steinberger
d212721df1 test(perf): merge forum-topic direct-delivery scenarios 2026-03-02 15:17:28 +00:00
Peter Steinberger
a469d00345 test(perf): reuse cron heartbeat delivery temp homes per suite 2026-03-02 15:14:17 +00:00
Peter Steinberger
3fb0ab7435 test(perf): tighten cron issue-regression timeout windows 2026-03-02 15:11:14 +00:00
Peter Steinberger
64ac790aa8 test(perf): reuse temp-home root in cron announce delivery suite 2026-03-02 15:08:35 +00:00
Peter Steinberger
c5f1cf3c3b test(perf): reuse isolated-agent temp home root per suite 2026-03-02 15:00:08 +00:00
Peter Steinberger
cd18472405 test(perf): trim redundant cron regression setup coverage 2026-03-02 14:25:49 +00:00
Peter Steinberger
2a2a9902d9 test(perf): merge isolated-agent model precedence cases 2026-03-02 14:24:32 +00:00
Peter Steinberger
5561a6b659 test(perf): dedupe isolated-agent delivery announce cases 2026-03-02 14:24:32 +00:00
Peter Steinberger
a05b8f47b1 test(perf): tighten cron regression timeout windows 2026-03-02 14:20:31 +00:00
Yasunori Morishima(盛島康徳)
be8930d6f9 fix: clear stale runningAtMs in cron.run() before already-running check (#17949)
Add recomputeNextRunsForMaintenance() call in run() so that stale
runningAtMs markers (from a crashed Phase-1 persist) are cleared by the
existing normalizeJobTickState logic before the already-running guard.

Without this, a manual cron.run() could be blocked for up to
STUCK_RUN_MS (2 hours) even though no job was actually running.

Fixes #17554

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:47:36 -06:00
Kate Chapman
6df8bd9741 fix(cron): wrap computeJobNextRunAtMs in try-catch inside applyJobResult (#30905)
* fix(cron): wrap computeJobNextRunAtMs in try-catch inside applyJobResult

Without this guard, if the croner library throws during schedule
computation (timezone/expression edge cases), the exception propagates
out of applyJobResult and the entire state update is lost — runningAtMs
never clears, lastRunAtMs never advances, nextRunAtMs never recomputes.
After STUCK_RUN_MS (2h), stuck detection clears runningAtMs and the job
re-fires, creating a ~2h repeat cycle instead of the intended schedule.

The sibling function recomputeJobNextRunAtMs in jobs.ts already wraps
computeJobNextRunAtMs in try-catch; this was an oversight in the
applyJobResult call sites.

Changes:
- Error-backoff path: catch and fall back to backoff-only schedule
- Success path: catch and fall through to the MIN_REFIRE_GAP_MS safety net
- applyOutcomeToStoredJob: log a warning when job not found after forceReload

* fix(cron): use recordScheduleComputeError in applyJobResult catch blocks

Address review feedback: the original catch blocks only logged a warning,
which meant a persistent computeJobNextRunAtMs throw would cause a
MIN_REFIRE_GAP_MS (2s) hot loop on cron-kind jobs.

Now both catch blocks call recordScheduleComputeError (exported from
jobs.ts), which tracks consecutive schedule errors and auto-disables the
job after 3 failures — matching the existing behavior in
recomputeJobNextRunAtMs.

* test(cron): cover applyJobResult schedule-throw fallback paths

---------

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-02 07:38:33 -06:00
Andrey
21e19e42a3 fix(cron): skip isError payloads when picking summary/delivery content (#21454)
* fix(cron): skip isError payloads when picking summary/delivery content

buildEmbeddedRunPayloads appends isError warnings as the last payload.
Three functions in helpers.ts iterate last-to-first and pick the error
over real agent output. Use two-pass selection: prefer non-error payloads,
fall back to error-only when no real content exists.

Fixes: pickSummaryFromPayloads, pickLastNonEmptyTextFromPayloads,
pickLastDeliverablePayload — all now accept and filter isError.

* Changelog: note cron payload isError filtering (#21454)

---------

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-02 07:38:05 -06:00
Peter Steinberger
2c192a3795 test(perf): reduce cron overlap timer advance slack 2026-03-02 13:37:26 +00:00
Yuzuru Suzuki
6513c42d2d fix(cron): treat announce delivery failure as ok when execution succeeded (#31082)
* cron: treat announce delivery failure as ok when agent execution succeeded

* fix: set delivered:false and error on announce delivery failure paths

* Changelog: note cron announce delivery status handling (#31082)

---------

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-02 07:27:57 -06:00
Peter Steinberger
79cb5e2c9b test(perf): trim cron regression timeout windows 2026-03-02 13:25:49 +00:00