Add recomputeNextRunsForMaintenance() call in run() so that stale
runningAtMs markers (from a crashed Phase-1 persist) are cleared by the
existing normalizeJobTickState logic before the already-running guard.
Without this, a manual cron.run() could be blocked for up to
STUCK_RUN_MS (2 hours) even though no job was actually running.
Fixes#17554
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cron): wrap computeJobNextRunAtMs in try-catch inside applyJobResult
Without this guard, if the croner library throws during schedule
computation (timezone/expression edge cases), the exception propagates
out of applyJobResult and the entire state update is lost — runningAtMs
never clears, lastRunAtMs never advances, nextRunAtMs never recomputes.
After STUCK_RUN_MS (2h), stuck detection clears runningAtMs and the job
re-fires, creating a ~2h repeat cycle instead of the intended schedule.
The sibling function recomputeJobNextRunAtMs in jobs.ts already wraps
computeJobNextRunAtMs in try-catch; this was an oversight in the
applyJobResult call sites.
Changes:
- Error-backoff path: catch and fall back to backoff-only schedule
- Success path: catch and fall through to the MIN_REFIRE_GAP_MS safety net
- applyOutcomeToStoredJob: log a warning when job not found after forceReload
* fix(cron): use recordScheduleComputeError in applyJobResult catch blocks
Address review feedback: the original catch blocks only logged a warning,
which meant a persistent computeJobNextRunAtMs throw would cause a
MIN_REFIRE_GAP_MS (2s) hot loop on cron-kind jobs.
Now both catch blocks call recordScheduleComputeError (exported from
jobs.ts), which tracks consecutive schedule errors and auto-disables the
job after 3 failures — matching the existing behavior in
recomputeJobNextRunAtMs.
* test(cron): cover applyJobResult schedule-throw fallback paths
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(cron): skip isError payloads when picking summary/delivery content
buildEmbeddedRunPayloads appends isError warnings as the last payload.
Three functions in helpers.ts iterate last-to-first and pick the error
over real agent output. Use two-pass selection: prefer non-error payloads,
fall back to error-only when no real content exists.
Fixes: pickSummaryFromPayloads, pickLastNonEmptyTextFromPayloads,
pickLastDeliverablePayload — all now accept and filter isError.
* Changelog: note cron payload isError filtering (#21454)
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* cron: treat announce delivery failure as ok when agent execution succeeded
* fix: set delivered:false and error on announce delivery failure paths
* Changelog: note cron announce delivery status handling (#31082)
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* test: fix typing and test fixture issues
* Fix type-test harness issues from session routing and mock typing
* Add routing regression test for session.mainKey precedence
* fix(cron): guard against year-rollback in croner nextRun
Croner can return a past-year timestamp for some timezone/date
combinations (e.g. Asia/Shanghai). When nextRun returns a value at or
before nowMs, retry from the next whole second and, if still stale,
from midnight-tomorrow UTC before giving up.
Closes#30351
* googlechat: guard API calls with SSRF-safe fetch
* test: fix hoisted plugin context mock setup
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(cron): guard list sorting against malformed legacy jobs
Prevent list operations from crashing when old or corrupted cron entries are missing name/id fields by hardening sort comparators.
Closes#28862
* cron: format list sort guard test imports
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Backfill legacy jobs that still use schedule.cron and jobId so upgraded instances keep firing existing cron schedules instead of failing silently.
Closes#28861