* feat(cron): add failure destination support with webhook mode and bestEffort handling
Extends PR #24789 failure alerts with features from PR #29145:
- Add webhook delivery mode for failure alerts (mode: 'webhook')
- Add accountId support for multi-account channel configurations
- Add bestEffort handling to skip alerts when job has bestEffort=true
- Add separate failureDestination config (global + per-job in delivery)
- Add duplicate prevention (prevents sending to same as primary delivery)
- Add CLI flags: --failure-alert-mode, --failure-alert-account-id
- Add UI fields for new options in web cron editor
* fix(cron): merge failureAlert mode/accountId and preserve failureDestination on updates
- Fix mergeCronFailureAlert to merge mode and accountId fields
- Fix mergeCronDelivery to preserve failureDestination on updates
- Fix isSameDeliveryTarget to use 'announce' as default instead of 'none'
to properly detect duplicates when delivery.mode is undefined
* fix(cron): validate webhook mode requires URL in resolveFailureDestination
When mode is 'webhook' but no 'to' URL is provided, return null
instead of creating an invalid plan that silently fails later.
* fix(cron): fail closed on webhook mode without URL and make failureDestination fields clearable
- sendCronFailureAlert: fail closed when mode is webhook but URL is missing
- mergeCronDelivery: use per-key presence checks so callers can clear
nested failureDestination fields via cron.update
Note: protocol:check shows missing internalEvents in Swift models - this is
a pre-existing issue unrelated to these changes (upstream sync needed).
* fix(cron): use separate schema for failureDestination and fix type cast
- Create CronFailureDestinationSchema excluding after/cooldownMs fields
- Fix type cast in sendFailureNotificationAnnounce to use CronMessageChannel
* fix(cron): merge global failureDestination with partial job overrides
When job has partial failureDestination config, fall back to global
config for unset fields instead of treating it as a full override.
* fix(cron): avoid forcing announce mode and clear inherited to on mode change
- UI: only include mode in patch if explicitly set to non-default
- delivery.ts: clear inherited 'to' when job overrides mode, since URL
semantics differ between announce and webhook modes
* fix(cron): preserve explicit to on mode override and always include mode in UI patches
- delivery.ts: preserve job-level explicit 'to' when overriding mode
- UI: always include mode in failureAlert patch so users can switch between announce/webhook
* fix(cron): allow clearing accountId and treat undefined global mode as announce
- UI: always include accountId in patch so users can clear it
- delivery.ts: treat undefined global mode as announce when comparing for clearing inherited 'to'
* Cron: harden failure destination routing and add regression coverage
* Cron: resolve failure destination review feedback
* Cron: drop unrelated timeout assertions from conflict resolution
* Cron: format cron CLI regression test
* Cron: align gateway cron test mock types
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Add recomputeNextRunsForMaintenance() call in run() so that stale
runningAtMs markers (from a crashed Phase-1 persist) are cleared by the
existing normalizeJobTickState logic before the already-running guard.
Without this, a manual cron.run() could be blocked for up to
STUCK_RUN_MS (2 hours) even though no job was actually running.
Fixes#17554
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cron): wrap computeJobNextRunAtMs in try-catch inside applyJobResult
Without this guard, if the croner library throws during schedule
computation (timezone/expression edge cases), the exception propagates
out of applyJobResult and the entire state update is lost — runningAtMs
never clears, lastRunAtMs never advances, nextRunAtMs never recomputes.
After STUCK_RUN_MS (2h), stuck detection clears runningAtMs and the job
re-fires, creating a ~2h repeat cycle instead of the intended schedule.
The sibling function recomputeJobNextRunAtMs in jobs.ts already wraps
computeJobNextRunAtMs in try-catch; this was an oversight in the
applyJobResult call sites.
Changes:
- Error-backoff path: catch and fall back to backoff-only schedule
- Success path: catch and fall through to the MIN_REFIRE_GAP_MS safety net
- applyOutcomeToStoredJob: log a warning when job not found after forceReload
* fix(cron): use recordScheduleComputeError in applyJobResult catch blocks
Address review feedback: the original catch blocks only logged a warning,
which meant a persistent computeJobNextRunAtMs throw would cause a
MIN_REFIRE_GAP_MS (2s) hot loop on cron-kind jobs.
Now both catch blocks call recordScheduleComputeError (exported from
jobs.ts), which tracks consecutive schedule errors and auto-disables the
job after 3 failures — matching the existing behavior in
recomputeJobNextRunAtMs.
* test(cron): cover applyJobResult schedule-throw fallback paths
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(cron): guard list sorting against malformed legacy jobs
Prevent list operations from crashing when old or corrupted cron entries are missing name/id fields by hardening sort comparators.
Closes#28862
* cron: format list sort guard test imports
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Backfill legacy jobs that still use schedule.cron and jobId so upgraded instances keep firing existing cron schedules instead of failing silently.
Closes#28861
* fix(cron): pass heartbeat target=last for main-session cron jobs
When a cron job with sessionTarget=main and wakeMode=now fires, it
triggers a heartbeat via runHeartbeatOnce. Since e2362d35 changed the
default heartbeat target from "last" to "none", these cron-triggered
heartbeats silently discard their responses instead of delivering them
to the last active channel (e.g. Telegram).
Fix: pass heartbeat: { target: "last" } from the cron timer to
runHeartbeatOnce for main-session jobs, and wire the override through
the gateway cron service builder. This restores delivery for
sessionTarget=main cron jobs without reverting the intentional default
change for regular heartbeats.
Regression introduced in: e2362d35 (2026-02-25)
Fixes#28508
* Cron: align server-cron wake routing expectations for main-target jobs
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Ignore persisted sessionKey overrides for sessionTarget=main jobs so cron system events consistently route to the agent main session after upgrades.
Closes#28770
* feat(cron): add --account flag for multi-account delivery routing
Add support for explicit delivery account routing in cron jobs across CLI, normalization, delivery planning, and isolated delivery target resolution.
Highlights:
- Add --account <id> to cron add and cron edit
- Add optional delivery.accountId to cron types and delivery plan
- Normalize and trim delivery.accountId in cron create/update normalization
- Prefer explicit accountId over session lastAccountId and bindings fallback
- Thread accountId through isolated cron run delivery resolution
- Preserve cron edit --best-effort-deliver/--no-best-effort-deliver behavior by keeping implicit announce mode
- Expand tests for account passthrough/merge/precedence and CLI account flows
* cron: resolve rebase duplicate accountId fields
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
- reject new lane enqueues once gateway drain begins
- always reset lane draining state and isolate onWait callback failures
- persist per-session abort cutoff and skip stale queued messages
- avoid false 600s agentTurn timeout in isolated cron jobs
Fixes#27407Fixes#27332Fixes#27427
Co-authored-by: Kevin Shenghui <shenghuikevin@github.com>
Co-authored-by: zjmy <zhangjunmengyang@gmail.com>
Co-authored-by: suko <miha.sukic@gmail.com>