Skip to content

Negotiated timeout observer grants extensions past hard timeout ceiling #522

@buger

Description

@buger

Bug

The negotiated timeout observer in ProbeAgent.run() can grant extensions that push the effective ai_timeout past the external hard timeout (visor_timeout / maxOperationTimeout's caller-side Promise.race). When this happens:

  1. The extension suppresses the wind-down (observer granted more time → no gracefulTimeoutState.triggered)
  2. The external hard timeout fires → kills the agent instantly via AbortController.abort() or Promise.race rejection
  3. Result: undefined or generic timeout message instead of partial results

Observed in production

Trace: 2793eb7d734bcec534a080c0d7e4cf20

  • explore-code check: maxOperationTimeout = 25min, external hard timeout = 30min
  • At T=25min: observer fires, sees 1 active search tool (running 3.4min), grants +10min extension
  • New effective deadline: 35min — exceeds the 30min hard timeout
  • Between T=25-30min: agent keeps working (extension suppressed wind-down)
  • At T=30min: external hard timeout fires → output = undefined
  • Wind-down steps never executed despite timeout_behavior: negotiated being configured

Root cause

In ProbeAgent.js around line 3681-3704, the observer grants extensions capped to min(requested, maxPerRequest, remainingBudget) but never checks against the external hard timeout:

const grantedMs = Math.min(requestedMs, maxPerRequest, remainingBudget);
// Missing: Math.min(..., externalHardTimeout - elapsed)

The observer has no knowledge of the external hard timeout ceiling, so it can grant useless extensions that will never be utilized.

Additional issue: history_length: 0 for MCP bridge tool calls

The observer reads this.history.slice(-6) and activeTools to make its decision. When the agent dispatches tools via the MCP bridge (SSE custom tools server), both are empty:

  • this.history is only populated after streamText() resolves, not during tool execution
  • activeTools tracks Vercel AI SDK-level tool calls, but pending MCP responses aren't tracked there

This causes the observer to see history_length=0 and active_tools_count=0 even when the agent is actively waiting for MCP tool responses. In trace 2793eb7d734bcec534a080c0d7e4cf20, the generate-response observer saw history_length=0 at both 30min and 50min checkpoints, despite multiple successful MCP tool completions.

Suggested fixes

  1. Cap extensions to external timeout headroom: Accept an optional externalHardTimeout parameter in ProbeAgent options and cap grantedMs to Math.min(grantedMs, externalHardTimeout - elapsed). Decline extensions when headroom < minimum useful time (e.g., 60s).

  2. Track MCP bridge tool calls in activeTools: When the agent dispatches a tool call that goes through the MCP bridge, record it in the activeTools map so the observer can see pending MCP responses as "active tools."

Workaround (implemented in Visor)

We've added capping on the Visor side (ai-review-service.ts):

  • Cap negotiatedTimeoutBudget and negotiatedTimeoutMaxPerRequest to visorTimeout - aiTimeout before passing to ProbeAgent
  • Call triggerGracefulWindDown() on the agent when the external Promise.race timeout fires

This prevents the overshoot but doesn't fix the observer's blindness to MCP activity (history_length=0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexternal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions