runtime: recover oversized user message after wire/media overflow#2821
runtime: recover oversized user message after wire/media overflow#2821trungutt wants to merge 2 commits into
Conversation
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
2 medium-severity issues found in the overflow recovery implementation.
a1d06ad to
18a703a
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity findings in the new overflow-recovery code. The core approach — scrubbing oversized messages in-place, persisting the rewrite, and emitting a Warning event — is sound and the two learned patterns (wire/media scrub independence from compaction config; top-level Content vs MultiContent text scrubbing) are both correctly implemented.
b22d0fb to
c724506
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
Reviewed the overflow recovery changes across pkg/runtime/overflow_recovery.go, pkg/runtime/loop_steps.go, pkg/session/session.go, and the accompanying tests.
What was checked:
- Correctness of
findPersistedMessagewhen sub-sessions are present —GetAllMessages()appends top-level messages before sub-session messages, so even in a hypothetical ID-collision scenario the first match is always the correctly-rewritten top-level message ✅ - Notification ordering (
emitScrubNoticefires unconditionally after in-memory rewrite, regardless of persistence outcome) — correctly implemented per design intent ✅ UpdateMessagecall site — called with the post-mutation message content,messageID == 0guard handles un-persisted synthesised messages ✅- Store-write failure path — non-fatal, logged with
slog.WarnContext, and user is still notified via the scrub notice ✅ - Test coverage — happy path, persist-fails path, and no-user-message path all covered ✅
No confirmed or likely bugs found. The implementation correctly addresses the session-poisoning scenario described in the PR description.
Auto-compaction is only useful when the rejection is a token-count overflow — summarising older turns reduces the input token count. For wire-level overflow ([OverflowKindWire]) the request body itself exceeds the provider's cap, and the latest turn alone is over the limit; the compaction call would have to send the same oversized history and would also be rejected. For media overflow ([OverflowKindMedia]) we have no media-stripping during compaction today, so a retry would resend the same attachment and fail again. In both cases the recovery attempt always fails, then we surface the error anyway, while having spent an extra provider call and several seconds of wall-clock latency. This change skips compaction for those two kinds and surfaces the error directly. The token-overflow path is unchanged.
c724506 to
b334412
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🔴 CRITICAL
1 high-severity and 1 low-severity finding in the new overflow recovery code.
b334412 to
354f333
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
Reviewed the overflow recovery implementation across pkg/runtime/overflow_recovery.go, pkg/runtime/loop_steps.go, pkg/session/session.go, and pkg/session/store.go.
Summary of analysis:
The implementation is correct and well-structured. Three hypotheses were investigated and all were dismissed after verification:
-
Token overflow gap — The deliberate decision to only run
recoverFromOversizedTurnforOverflowKindWire/OverflowKindMedia(notOverflowKindTokens) is correct by design. Token overflow is caused by accumulated conversation history, not a single oversized message; scrubbing only the latest user message would not recover it. Auto-compaction handles the token case differently and correctly. -
Sub-session ID collision in
findPersistedMessage— SQLite auto-increment guarantees globally unique message IDs across the table. Additionally,GetAllMessagesreturns top-level messages before sub-session messages, sofindPersistedMessagewould always find the correct top-level match first even in the theoretical event of an ID collision. -
Misleading log fields for MultiContent scrubbing —
parts_replaced > 0already communicates to operators that content was scrubbed when only MultiContent parts are oversized; thetext_replaced/original_text_bytesfields are supplementary precision, not the primary scrub indicator.
Notable design positives:
RewriteLatestUserMessagecorrectly holdss.mu.Lock()for the full rewrite, preventing torn state- The
messageID == 0guard correctly skips the store write for unpersisted messages emitScrubNoticeis emitted unconditionally after a successful in-memory rewrite, even on persistence failure — this is the right user-observable behaviorscrubMessagePartpasses unknown part types through unchanged rather than silently dropping them
When the provider rejects a request because the body itself is over the wire-size cap or contains an oversized attachment, the offending user message stays verbatim in the session. Every subsequent call reloads that message as part of the conversation history and trips the same limit. The session is effectively dead until the user starts over. Add a hygiene step that runs on wire- and media-overflow rejections: walk back to the latest user message, replace each media part (image, file, document) with a text placeholder that records what was attached, and replace plain-text content over 1 MiB with a size-noting placeholder. The rewrite is mirrored to the session store so the next session load reflects it; the in-memory mutation alone keeps the current process healthy even if the store write fails. A Warning event is emitted so the UI can tell the user that their previous message was rewritten in place. The fatal ErrorEvent for the original rejection is still emitted — scrubbing is in addition to surfacing the error, not instead of it. Token-overflow is unchanged: it still goes through auto-compaction, which is the correct mechanism for that shape of failure.
354f333 to
693a6c1
Compare
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
Stacked on #2819. Review after that one merges.
The problem this step fixes
After #2818 + #2819, an oversized message produces the right error message and fails fast — but the same chat session cannot continue. The offending user message stays verbatim in
sess.Messagesfor the rest of the process, and every retry resends the same oversized payload alongside the new (smaller) one. The user sees the same rejection every time they hit Send.The original user-reported regression:
What this change does
After a wire- or media-overflow rejection, walk back to the most recent user message and rewrite it in place in memory:
The rewrite happens immediately after the failure event fires. The error itself is still emitted with the kind-specific code from #2818, and a Warning event explains to the user that the previous message was rewritten — so the hygiene action is observable rather than silent.
Before / after
Scope — what this PR is and is not
sess.Messagesrewritten so the chat session continues immediatelyWhy the persistence side is a separate follow-up
The persistence side requires
Message.IDto round-trip throughStore.AddMessage(currently the returned ID is discarded by thePersistenceObserver) and throughloadSessionItems(currently theidcolumn is not selected on reload). That gap is independent of overflow handling — it would affect anything that needsStore.UpdateMessageagainst an in-memory message, including any future compaction-by-id work or message-editing features.Folding that infrastructure fix into this PR doubled its size and conflated concerns. It now lands as its own focused change where it can be evaluated on its own merits (propagate the ID? position-based updates? new API?).
The scope of the persistence gap, for clarity:
For the user-reported regression (paste oversized → 413 → shortening keeps failing in the same chat), the same-process fix in this PR is sufficient.
What is preserved
ErrorEventfor the original rejection is still emitted — scrubbing is in addition to surfacing the error, not instead of it.Files touched
5 files, ~850 lines (about half of that is tests).