-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Bug
The negotiated timeout observer in ProbeAgent.run() can grant extensions that push the effective ai_timeout past the external hard timeout (visor_timeout / maxOperationTimeout's caller-side Promise.race). When this happens:
- The extension suppresses the wind-down (observer granted more time → no
gracefulTimeoutState.triggered) - The external hard timeout fires → kills the agent instantly via
AbortController.abort()orPromise.racerejection - Result:
undefinedor generic timeout message instead of partial results
Observed in production
Trace: 2793eb7d734bcec534a080c0d7e4cf20
explore-codecheck:maxOperationTimeout= 25min, external hard timeout = 30min- At T=25min: observer fires, sees 1 active
searchtool (running 3.4min), grants +10min extension - New effective deadline: 35min — exceeds the 30min hard timeout
- Between T=25-30min: agent keeps working (extension suppressed wind-down)
- At T=30min: external hard timeout fires → output =
undefined - Wind-down steps never executed despite
timeout_behavior: negotiatedbeing configured
Root cause
In ProbeAgent.js around line 3681-3704, the observer grants extensions capped to min(requested, maxPerRequest, remainingBudget) but never checks against the external hard timeout:
const grantedMs = Math.min(requestedMs, maxPerRequest, remainingBudget);
// Missing: Math.min(..., externalHardTimeout - elapsed)The observer has no knowledge of the external hard timeout ceiling, so it can grant useless extensions that will never be utilized.
Additional issue: history_length: 0 for MCP bridge tool calls
The observer reads this.history.slice(-6) and activeTools to make its decision. When the agent dispatches tools via the MCP bridge (SSE custom tools server), both are empty:
this.historyis only populated afterstreamText()resolves, not during tool executionactiveToolstracks Vercel AI SDK-level tool calls, but pending MCP responses aren't tracked there
This causes the observer to see history_length=0 and active_tools_count=0 even when the agent is actively waiting for MCP tool responses. In trace 2793eb7d734bcec534a080c0d7e4cf20, the generate-response observer saw history_length=0 at both 30min and 50min checkpoints, despite multiple successful MCP tool completions.
Suggested fixes
-
Cap extensions to external timeout headroom: Accept an optional
externalHardTimeoutparameter in ProbeAgent options and capgrantedMstoMath.min(grantedMs, externalHardTimeout - elapsed). Decline extensions when headroom < minimum useful time (e.g., 60s). -
Track MCP bridge tool calls in
activeTools: When the agent dispatches a tool call that goes through the MCP bridge, record it in theactiveToolsmap so the observer can see pending MCP responses as "active tools."
Workaround (implemented in Visor)
We've added capping on the Visor side (ai-review-service.ts):
- Cap
negotiatedTimeoutBudgetandnegotiatedTimeoutMaxPerRequesttovisorTimeout - aiTimeoutbefore passing to ProbeAgent - Call
triggerGracefulWindDown()on the agent when the externalPromise.racetimeout fires
This prevents the overshoot but doesn't fix the observer's blindness to MCP activity (history_length=0).