Skip to content

runtime: skip auto-compaction for non-token overflow#2819

Merged
dgageot merged 1 commit into
docker:trungutt/overflow-kind-classificationfrom
trungutt:trungutt/overflow-skip-compaction-on-wire
May 19, 2026
Merged

runtime: skip auto-compaction for non-token overflow#2819
dgageot merged 1 commit into
docker:trungutt/overflow-kind-classificationfrom
trungutt:trungutt/overflow-skip-compaction-on-wire

Conversation

@trungutt
Copy link
Copy Markdown
Contributor

@trungutt trungutt commented May 19, 2026

Stacked on #2818. Review after that one merges.

When a request is rejected because the whole request body is too big to send (e.g. HTTP 413, large paste, oversized attachment), the runtime today reacts the same way it reacts to a token-count overflow: it tries to recover by compacting the conversation.

Compaction is itself a model call. It has to send the same conversation history that just got rejected, including the oversized turn. So that call is rejected too. Then the runtime gives up — but only after burning a second provider call and several seconds of wall-clock latency, and the user sees a "context window exceeded — try /compact" message that is actively wrong (compaction is exactly what just failed, and starting a new session is the only thing that helps).

The same trap applies to media-size rejections: there is no media-stripping path during compaction yet, so the offending attachment gets resent and the compaction call fails the same way.

Today                                       After this change
─────                                       ─────────────────
User sends an oversized message             User sends an oversized message
   │                                           │
   ▼                                           ▼
Provider → wire-level rejection             Provider → wire-level rejection
   │                                           │
   ▼                                           ▼
"context overflow" → start compaction       Recognise: compaction cannot help
   │                                           │
   ▼                                           ▼
Compaction model call                       Skip compaction entirely
   │   (with same oversized history)           │
   ▼                                           ▼
Rejected again, same reason                 Emit a specific, actionable error
   │                                           │
   ▼                                           ▼
Give up; show "context window exceeded"     User sees the right message

What this change achieves

  • A wire-level overflow (request body too large) is surfaced immediately with the structured error code introduced in modelerrors: make overflow errors more specific #2818, instead of being routed through a compaction attempt that is guaranteed to fail.
  • A media-size overflow is treated the same way, because the current compaction path cannot remove the offending attachment.
  • Token-count overflow — the case compaction actually helps — is unchanged. Auto-compaction still fires there, with the same limits and the same retry semantics as before.

Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Comment thread pkg/modelerrors/modelerrors.go
Comment thread pkg/modelerrors/modelerrors.go Outdated
@trungutt trungutt changed the title refactor(runtime): skip auto-compaction for non-token overflow runtime: skip auto-compaction for non-token overflow May 19, 2026
Auto-compaction is only useful when the rejection is a token-count
overflow — summarising older turns reduces the input token count.

For wire-level overflow ([OverflowKindWire]) the request body itself
exceeds the provider's cap, and the latest turn alone is over the limit;
the compaction call would have to send the same oversized history and
would also be rejected.

For media overflow ([OverflowKindMedia]) we have no media-stripping
during compaction today, so a retry would resend the same attachment
and fail again.

In both cases the recovery attempt always fails, then we surface the
error anyway, while having spent an extra provider call and several
seconds of wall-clock latency. This change skips compaction for those
two kinds and surfaces the error directly. The token-overflow path is
unchanged.
@trungutt trungutt force-pushed the trungutt/overflow-skip-compaction-on-wire branch from a252f53 to dffb7bb Compare May 19, 2026 09:04
@trungutt trungutt marked this pull request as ready for review May 19, 2026 09:49
@trungutt trungutt requested a review from a team as a code owner May 19, 2026 09:49
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The implementation is correct and the logic is well-structured. Both overflow-kind classification and the compaction-skip decision are sound.

What was checked:

  • OverflowKind type and constants (tokens, wire, media) — correct
  • classifyOverflow two-tier classification (structured Tier 1 before substring Tier 2, media before wire before tokens) — correct priority ordering
  • OverflowKindOf fallback chain for legacy *ContextOverflowError wraps — correct; defaults to OverflowKindTokens as historical behaviour
  • handleStreamError skip-compaction branch: wire/media overflows return streamErrorFatalrunTurn returns turnExit → outer loop terminates immediately. No circuit-breaker bypass, no infinite retry.
  • FormatError switch: non-overflow errors (OverflowKindOf returns "") correctly fall through to err.Error(). All three overflow kinds have actionable messages.
  • classifyErrorCode switch covers all three new overflow kinds with distinct error codes.
  • Test coverage: TestHandleStreamError_WireOverflowSkipsCompaction and TestHandleStreamError_MediaOverflowSkipsCompaction directly exercise the new skip paths. TestClassifyOverflow covers Tier 1 and Tier 2 patterns including the 413-wins-over-token-prose case.

Hypotheses investigated and dismissed:

  1. Counter not incremented for wire/media = defeated circuit-breaker: Wire/media overflow always returns streamErrorFatal, which exits the loop via turnExit. The counter is irrelevant because there is no re-entry.
  2. "content length exceeds" misclassification: That entry is pre-existing, unchanged code (context line in diff, not a + line) — outside the scope of this review.

@trungutt trungutt changed the base branch from main to trungutt/overflow-kind-classification May 19, 2026 12:24
@dgageot dgageot merged commit 1340870 into docker:trungutt/overflow-kind-classification May 19, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants