fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience#359
Conversation
…uard, overflow resilience Loop guard was counting total lifetime calls per (tool, args) hash, so browser_snapshot (empty args) got permanently blocked after 7 uses even when other tools ran between each call. Now tracks consecutive identical calls — the counter resets whenever a different tool runs. Observation tools (browser_snapshot, browser_tab_list) also get the poll multiplier and are excluded from ping-pong detection since snapshot→click is normal browser workflow. Worker context overflow was killing the worker after 3 retries of force-compact. Now: dedup stale tool results before every LLM call (replaces all but the most recent browser_snapshot/browser_tab_list result with a one-liner), pre-prompt compaction check every segment boundary, and reduced segment size (25→15 turns) for more frequent checks. Overflow retry kept at 2 as a safety net since dedup+compaction should prevent hitting it. Also fixed the get_element_center error message to tell the model to run browser_snapshot instead of suggesting screenshots/scrolling.
|
Caution Review failedPull request was closed or merged during review WalkthroughIntroduces in-run tool-result deduplication and transient retry/backoff logic in the agent worker, tightens segment and overflow thresholds, expands retriable LLM error detection, and extends LoopGuard to track per-(tool,args) counts and observation-tool behavior; also tweaks a browser error message. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/agent/worker.rs (1)
894-900: Consider centralizing the observation tool list.
DEDUP_TOOL_RESULTScontains the same tools asOBSERVATION_TOOLSinsrc/hooks/loop_guard.rs. While they serve different purposes (deduplication vs. loop guard thresholds), having two independently maintained lists creates a synchronization risk if new browser tools are added.Consider extracting a shared constant or at least adding a comment cross-referencing the other location to help future maintainers keep them in sync.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS duplicates the same tool names as OBSERVATION_TOOLS, risking drift; move the shared list into a single public constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to reference that new constant, or if you prefer a minimal change add a clear cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS; update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs (OBSERVATION_TOOLS) to import the shared constant so future tool additions only need one update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS duplicates the same tool names as
OBSERVATION_TOOLS, risking drift; move the shared list into a single public
constant (e.g., OBSERVATION_TOOL_NAMES) in a common module (or a crate-level
constants module) and replace both DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS to
reference that new constant, or if you prefer a minimal change add a clear
cross-reference comment above DEDUP_TOOL_RESULTS pointing to OBSERVATION_TOOLS;
update uses in worker.rs (DEDUP_TOOL_RESULTS) and loop_guard.rs
(OBSERVATION_TOOLS) to import the shared constant so future tool additions only
need one update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b7ab3229-5cd7-4dee-a7de-fae142bfa65b
📒 Files selected for processing (3)
src/agent/worker.rssrc/hooks/loop_guard.rssrc/tools/browser.rs
…mption, retain pingpong warnings - Fix dedup doc comment: transcript is not lossless since dedup mutates history in-place before persist_transcript runs. - Track (message_index, item_index) in dedup so multiple ToolResult entries in the same User message are handled correctly. - Use XOR in involves_observation_tool so two observation tools alternating (snapshot↔tab_list) is still caught as ping-pong. - Retain pingpong_ warning buckets when resetting consecutive counts so max_warnings_per_call escalation still works for alternating patterns.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/agent/worker.rs (1)
894-900: Avoid duplicating the browser observation-tool list.
DEDUP_TOOL_RESULTSnow has to stay in lockstep withsrc/hooks/loop_guard.rs::OBSERVATION_TOOLS. If one list changes without the other, a tool can be deduped without getting relaxed loop thresholds, or vice versa. Please move this classification into one shared constant/module and consume it from both places.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 894 - 900, DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified into a single shared constant so they stay in sync; create a new shared module (e.g. a constants or observation_tools module) that exports one constant (name it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs to import and use that shared constant; ensure the constant's type and contents match existing expectations and update any references to DEDUP_TOOL_RESULTS or OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.src/hooks/loop_guard.rs (1)
722-746: Add one regression test forbrowser_snapshot↔browser_tab_list.The new tests cover observation↔action and repeated observation calls, but the XOR branch that keeps observation↔observation alternation detectable is still unpinned. A small
snapshot → tab_list → snapshot → tab_list ...case would lock in the exact edge case this helper now depends on.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hooks/loop_guard.rs` around lines 722 - 746, Add a regression test named something like observation_tool_snapshot_tablist_alternation that uses LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot, tab_list, snapshot, ...), asserting that the alternating sequence is treated as observation↔observation instead of observation↔action: allow all calls up to the relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block containing "Warning") at the expected threshold; reference LoopGuard::check and LoopGuardVerdict in the test to pin the XOR alternation behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agent/worker.rs`:
- Around line 18-25: TURNS_PER_SEGMENT is currently reused by
default_max_turns(), unintentionally shrinking worker Rig budgets and causing
MaxTurnsError; separate the compaction cadence from worker turn caps by creating
a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use that
where compaction checks occur, then restore default_max_turns() to return the
intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.
---
Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 894-900: DEDUP_TOOL_RESULTS and OBSERVATION_TOOLS must be unified
into a single shared constant so they stay in sync; create a new shared module
(e.g. a constants or observation_tools module) that exports one constant (name
it something descriptive like OBSERVATION_TOOL_NAMES or OBSERVATION_TOOLS_SHARED
with the same type &[&str]) and replace the local DEDUP_TOOL_RESULTS in
src/agent/worker.rs and the OBSERVATION_TOOLS usage in src/hooks/loop_guard.rs
to import and use that shared constant; ensure the constant's type and contents
match existing expectations and update any references to DEDUP_TOOL_RESULTS or
OBSERVATION_TOOLS to the new symbol to avoid duplication or divergence.
In `@src/hooks/loop_guard.rs`:
- Around line 722-746: Add a regression test named something like
observation_tool_snapshot_tablist_alternation that uses
LoopGuard::new(worker_config()) and repeatedly calls LoopGuard::check
alternating between "browser_snapshot" and "browser_tab_list" (e.g., snapshot,
tab_list, snapshot, ...), asserting that the alternating sequence is treated as
observation↔observation instead of observation↔action: allow all calls up to the
relaxed warn threshold and then assert a warning (LoopGuardVerdict::Block
containing "Warning") at the expected threshold; reference LoopGuard::check and
LoopGuardVerdict in the test to pin the XOR alternation behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd87e125-2231-46bb-a2a8-18ec84249409
📒 Files selected for processing (2)
src/agent/worker.rssrc/hooks/loop_guard.rs
| /// How many turns per segment before we check context and potentially compact. | ||
| const TURNS_PER_SEGMENT: usize = 25; | ||
| /// | ||
| /// Kept relatively low so compaction checks run frequently. Fast models can | ||
| /// burn through many tool-call turns quickly, and each turn may add large | ||
| /// tool results (browser snapshots, shell output). Checking every 15 turns | ||
| /// instead of 25 reduces the chance of blowing past the context window | ||
| /// within a single segment. | ||
| const TURNS_PER_SEGMENT: usize = 15; |
There was a problem hiding this comment.
Decouple segment cadence from the worker turn cap.
This constant still feeds default_max_turns() on Line 290, so dropping it to 15 does more than increase compaction frequency: it also reduces each worker prompt to a 15-turn Rig budget and makes MaxTurnsError much more likely. If the intent is “compact every 15 turns,” that needs a separate checkpoint from the worker’s actual max_turns. As per coding guidelines, "Set explicit max_turns on all Rig agents: max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agent/worker.rs` around lines 18 - 25, TURNS_PER_SEGMENT is currently
reused by default_max_turns(), unintentionally shrinking worker Rig budgets and
causing MaxTurnsError; separate the compaction cadence from worker turn caps by
creating a new constant (e.g., COMPACTION_TURNS or SEGMENT_CHECK_TURNS) and use
that where compaction checks occur, then restore default_max_turns() to return
the intended worker cap (or stop using TURNS_PER_SEGMENT inside
default_max_turns()); also ensure all Rig agents explicitly set max_turns (call
max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for
channels) so the worker prompt budgets are fixed regardless of compaction
cadence.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/agent/worker.rs (2)
18-25:⚠️ Potential issue | 🟠 MajorDon’t couple compaction cadence to the worker turn budget.
This change still drops the worker’s Rig budget to 15 turns because
default_max_turns(TURNS_PER_SEGMENT)now uses the new segment constant directly. That makesMaxTurnsErrormuch more likely instead of only increasing maintenance frequency. If you want 15-turn compaction checkpoints, they need to be separate from the worker’s explicit max-turn budget.As per coding guidelines, "Set explicit
max_turnson all Rig agents:max_turns(50)for workers,max_turns(10)for branches,max_turns(5)for channels".Also applies to: 288-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 18 - 25, The compaction cadence constant TURNS_PER_SEGMENT was incorrectly reused to set the worker Rig's turn budget via default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert the worker Rig's explicit max_turns to the required budget (use max_turns(50) when constructing the worker Rig or call default_max_turns(50)) and keep TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50) (also update the similar occurrence around the code referenced at the other occurrence near lines 288-290).
325-334:⚠️ Potential issue | 🟠 MajorThis makes persisted worker transcripts lossy.
dedup_tool_results()rewrites the canonicalhistory, andpersist_transcript()later serializes that mutated history. After the first maintenance pass, olderbrowser_snapshot/browser_tab_listpayloads inworker_runs.transcriptare placeholders, not the original results. That still conflicts with the PR goal of keeping full transcripts in worker run records.Also applies to: 467-469, 502-503, 667-675, 907-984
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/agent/worker.rs` around lines 325 - 334, The maintenance step currently mutates the canonical history via dedup_tool_results() and maybe_compact_history(), causing persisted worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list payloads) to become placeholders instead of original results; instead, perform dedup/compaction on a copy used only for LLM context assembly and leave the canonical history intact for persist_transcript(). Concretely, change calls to dedup_tool_results(&mut history) and maybe_compact_history(&mut compacted_history, &mut history) so they operate on a cloned history (or return a compacted copy) used for model calls, or adjust dedup_tool_results to return a non-mutating compacted vector, and ensure persist_transcript() serializes the original, unmodified history. Also apply the same non-mutating approach at the other occurrences that call dedup_tool_results/maybe_compact_history so worker_runs.transcript keeps full original payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/agent/worker.rs`:
- Around line 18-25: The compaction cadence constant TURNS_PER_SEGMENT was
incorrectly reused to set the worker Rig's turn budget via
default_max_turns(TURNS_PER_SEGMENT), causing unintended MaxTurnsError; revert
the worker Rig's explicit max_turns to the required budget (use max_turns(50)
when constructing the worker Rig or call default_max_turns(50)) and keep
TURNS_PER_SEGMENT solely for compaction/checkpoint logic, ensuring compaction
checks still use TURNS_PER_SEGMENT while the Rig creation uses max_turns(50)
(also update the similar occurrence around the code referenced at the other
occurrence near lines 288-290).
- Around line 325-334: The maintenance step currently mutates the canonical
history via dedup_tool_results() and maybe_compact_history(), causing persisted
worker_runs.transcript entries (e.g., browser_snapshot and browser_tab_list
payloads) to become placeholders instead of original results; instead, perform
dedup/compaction on a copy used only for LLM context assembly and leave the
canonical history intact for persist_transcript(). Concretely, change calls to
dedup_tool_results(&mut history) and maybe_compact_history(&mut
compacted_history, &mut history) so they operate on a cloned history (or return
a compacted copy) used for model calls, or adjust dedup_tool_results to return a
non-mutating compacted vector, and ensure persist_transcript() serializes the
original, unmodified history. Also apply the same non-mutating approach at the
other occurrences that call dedup_tool_results/maybe_compact_history so
worker_runs.transcript keeps full original payloads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0b57f1ca-b5db-4bcc-b6c0-824308e84996
📒 Files selected for processing (1)
src/agent/worker.rs
Workers now catch retriable errors (upstream 500s, timeouts, rate limits that survived model-level retries) and back off with exponential delay before retrying, up to 5 attempts. Previously any error that wasn't a context overflow or cancellation killed the worker immediately. Also added missing patterns to is_retriable_error: generic server errors like 'The server had an error' and '500' status codes that OpenRouter wraps in various phrasings.
…ontext-management fix: browser context management — dedup snapshots, consecutive loop guard, overflow resilience
Summary
Loop guard now tracks consecutive identical calls, not lifetime totals.
browser_snapshot(always empty args) was getting permanently blocked after 7 total uses across a session. Now the counter resets when a different tool call runs in between. Observation tools (browser_snapshot,browser_tab_list) get the poll multiplier on thresholds and are excluded from ping-pong detection.Dedup stale tool results before every LLM call. All but the most recent
browser_snapshotandbrowser_tab_listresults are replaced with a one-liner ([browser_snapshot output superseded...]). The tool call/result structure stays intact but the multi-KB ARIA tree content is gone. Full transcript preserved in worker run records.Pre-prompt compaction and smaller segments. Context usage checked before every LLM call (not just at segment boundaries).
TURNS_PER_SEGMENTreduced from 25 → 15 so compaction checks run more frequently. Overflow retry reduced from 3 → 2 since dedup + pre-prompt compaction should prevent hitting it.Fixed
get_element_centererror message to tell the model to runbrowser_snapshotinstead of the useless "try scrolling or taking a screenshot."Test plan
All 460 lib tests pass including 3 new loop guard tests (
non_consecutive_identical_calls_allowed,consecutive_identical_calls_still_blocked,observation_tool_consecutive_gets_relaxed_threshold).Note
Technical changes: Three files modified (253 additions, 7 deletions). Core improvements: new
dedup_tool_results()function strips stale browser snapshots from history before LLM calls; loop guard refactored withlast_call_hashtracking to reset consecutive counters when non-identical tools run; observation tools categorized separately to get relaxed thresholds and bypass ping-pong detection. Context management now happens pre-prompt (not just post-max-turns), reducing overflow scenarios. Fully tested with new loop guard unit tests validating the consecutive-vs-interleaved behavior and observation tool multiplier.Written by Tembo for commit 18744a5. This will update automatically on new commits.