fix: recover from context overflow caused by oversized tool results (#11579)

* fix: gracefully handle oversized tool results causing context overflow

When a subagent reads a very large file or gets a huge tool result (e.g.,
gh pr diff on a massive PR), it can exceed the model's context window in
a single prompt. Auto-compaction can't help because there's no older
history to compact — just one giant tool result.

This adds two layers of defense:

1. Pre-emptive: Hard cap on tool result size (400K chars ≈ 100K tokens)
   applied in the session tool result guard before persistence. This
   prevents extremely large tool results from being stored in full,
   regardless of model context window size.

2. Recovery: When context overflow is detected and compaction fails,
   scan session messages for oversized tool results relative to the
   model's actual context window (30% max share). If found, truncate
   them in the session via branching (creating a new branch with
   truncated content) and retry the prompt.

The truncation preserves the beginning of the content (most useful for
understanding what was read) and appends a notice explaining the
truncation and suggesting offset/limit parameters for targeted reads.

Includes comprehensive tests for:
- Text truncation with newline-boundary awareness
- Context-window-proportional size calculation
- In-memory message truncation
- Oversized detection heuristics
- Guard-level size capping during persistence

* fix: prep fixes for tool result truncation PR (#11579) (thanks @tyler6204)
This commit is contained in:
Tyler Yust
2026-02-07 17:40:51 -08:00
committed by GitHub
parent b8c8130efe
commit 0deb8b0da1
6 changed files with 725 additions and 1 deletions

View File

@@ -27,6 +27,7 @@ Docs: https://docs.openclaw.ai
### Fixes
- Agents: recover from context overflow caused by oversized tool results (pre-emptive capping + fallback truncation). (#11579) Thanks @tyler6204.
- Cron: scheduler reliability (timer drift, restart catch-up, lock contention, stale running markers). (#10776) Thanks @tyler6204.
- Cron: store migration hardening (legacy field migration, parse error handling, explicit delivery mode persistence). (#10776) Thanks @tyler6204.
- Gateway/CLI: when `gateway.bind=lan`, use a LAN IP for probe URLs and Control UI links. (#11448) Thanks @AnonO6.