Skip to content

Commit b4ed53d

Browse files
bugerclaude
andauthored
feat: negotiated timeout with independent observer LLM pattern (#518)
* feat: add negotiated timeout mode with independent observer LLM pattern When timeoutBehavior is set to 'negotiated', a separate LLM call (the "timeout observer") runs independently when the operation timeout fires. This works even when the main agent loop is blocked by long-running delegates or MCP tools. The observer evaluates in-flight tools and decides whether to grant more time or trigger graceful wind-down. Key features: - Independent observer pattern using generateText (not part of main loop) - In-flight tool tracking via toolCall event emitter with human-readable durations (e.g., "5m 23s") - Configurable budget, max requests, and per-request limits - Observer prompt includes stuck-loop detection guidance - On decline: aborts in-flight tools, makes dedicated summary LLM call with full conversation context - Schema-aware abort recovery (returns valid JSON, no markdown notice) - Task-aware abort recovery (acknowledges completed/incomplete tasks) - completionPrompt correctly skipped after abort summary - onStream callback receives abort summary text - Full OTEL tracing for all observer decisions and state transitions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add timeout modes documentation and SDK exports - Add timeout options to top-level index.d.ts (timeoutBehavior, gracefulTimeoutBonusSteps, negotiated timeout config, retry, fallback, enableDelegate, enableBash, enableTasks) - Export ENGINE_ACTIVITY_TIMEOUT_* constants from src/index.js - Add comprehensive Timeout Modes section to npm/README.md with examples, comparison table, and environment variable reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add dedicated TIMEOUT_MODES.md reference page Comprehensive guide covering all three timeout modes with flow diagrams, observer pattern details, in-flight tool tracking, stuck-loop detection, abort summary behavior, schema/task edge cases, configuration reference, telemetry events, and examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: move timeout-modes to docs/probe-agent/sdk/ Move from npm/docs/TIMEOUT_MODES.md to docs/probe-agent/sdk/timeout-modes.md to follow the project's documentation structure convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: trim README timeout section to brief summary with link Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rewrite timeout docs as comprehensive architecture guide Replace the negotiated-timeout-only docs with a full timeout architecture reference covering all 6 layers: operation, request, delegate, MCP, bash, and engine activity timeouts. Includes ASCII layer diagram, complete env var reference, known limitations about cross-layer coordination, and practical guidance for delegate/MCP timeout sizing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: delegate timeout inheritance and parent budget awareness Delegates now inherit timeout settings (timeoutBehavior, requestTimeout, gracefulTimeoutBonusSteps) from the parent agent. The delegate timeout is capped to min(configured, remainingParentBudget * 0.9) so subagents cannot consume the parent's entire budget. Subagent's internal maxOperationTimeout is set to (externalTimeout - 15s) to allow its own graceful wind-down before being externally killed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: two-phase graceful stop with MCP graceful_stop support When the negotiated timeout observer declines, instead of immediately hard-aborting everything, we now do a two-phase shutdown: 1. Signal running delegates via triggerGracefulWindDown() so they can finish their current step and produce a summary 2. Call graceful_stop on MCP servers that expose it, allowing agent-type MCP servers to wrap up and return partial results 3. Hard abort only fires as a safety net after gracefulStopDeadline (45s) Also adds InProcessMcpServer test helper using InMemoryTransport for fast, deterministic MCP testing without spawning external processes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: migrate MCP tests from stdio mock server to in-process Replace stdio-based mockMcpServer.js with InMemoryTransport-based in-process servers across all 4 MCP test files: - mcpClientManager.test.js (70 tests, 0.3s down from 10s+) - probeAgentMcp.test.js (15 tests, 2.7s down from 20s) - mcpErrorHandling.test.js (18 tests, 3.4s down from 20s) - mcpRobustness.test.js (12 tests, 3s down from 20s) Tests are now deterministic (no connection failures), faster (no process spawning), and use createStandardMockServer() which replicates all 7 tools from the original mockMcpServer.js. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update timeout architecture for two-phase graceful stop Major updates to the timeout documentation: - Two-phase graceful stop flow (signal wind-down, then hard abort) - Delegate budget awareness (capped to parent's remaining time) - Delegate timeout inheritance (timeoutBehavior, requestTimeout, etc.) - MCP graceful_stop convention with implementation example - gracefulStopDeadline configuration (new setting) - Updated "How Timeouts Coordinate" section replacing old limitations - New multi-agent MCP collaboration example - Added gracefulStopDeadline to TypeScript declarations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add OTEL log bridge and comprehensive observability instrumentation Add console-to-OTEL log bridge that patches console.log/info/warn/error to emit OpenTelemetry log records and append trace context. Add tracer events across graceful stop, delegation, subagent, and MCP lifecycles. Add observability documentation covering all spans, events, and integration patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle quoted file paths with spaces in extract/search targets (#519) Add splitQuotedString() parser that respects double/single-quoted strings when splitting targets on whitespace/comma. Quoted paths like "Customers/First American/Meeting Notes.md" are now preserved as single targets instead of being split on spaces. parseAndResolvePaths retains the existing space-splitting heuristic for unquoted strings (backwards compatible). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: search dedup includes path in key, per-key counters, context-aware messages (#520) - Include search path in dedup key so same query across different repos is allowed (was: query+exact only, blocking cross-repo searches) - Use per-key block counters instead of global counter to prevent cross-query pollution - Differentiate blocked messages: "found results" suggests extract, "no results" suggests different keywords - Detect ticket/issue ID patterns (e.g. TT-16546, JIRA-123) in no-results responses and suggest searching for technical concepts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve search dedup messaging, exact=true guidance, and max-iteration diagnostics (#520) - Enhance dedup blocked messages with exact=true/false toggle hints - Add prompt guidance for exact=true with punctuation/literal patterns - Expand "DUPLICATE SEARCH BLOCKED" advice with concrete alternatives - Add tool call tracking and descriptive failure summary when max iterations reached (helps parent agents understand what was tried) - Last-iteration warning now includes searches attempted summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a2bd4df commit b4ed53d

28 files changed

+4825
-502
lines changed
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Observability & Telemetry
2+
3+
Probe Agent provides comprehensive observability through three mechanisms:
4+
5+
1. **SimpleTelemetry** — lightweight file/console trace exporter (no heavy dependencies)
6+
2. **OTEL Log Bridge** — automatic forwarding of `console.log/info/warn/error` to OpenTelemetry Logs
7+
3. **Tracer Events** — structured telemetry events emitted throughout the agent lifecycle
8+
9+
## Quick Start
10+
11+
### Standalone CLI
12+
13+
```bash
14+
# File-based tracing
15+
probe agent "query" --trace-file ./traces.jsonl
16+
17+
# Console tracing
18+
probe agent "query" --trace-console
19+
```
20+
21+
### SDK
22+
23+
```javascript
24+
import { ProbeAgent, initializeSimpleTelemetryFromOptions } from '@anthropic-ai/probe';
25+
26+
const telemetry = initializeSimpleTelemetryFromOptions({
27+
traceFile: './traces.jsonl',
28+
traceConsole: false,
29+
});
30+
31+
const agent = new ProbeAgent({
32+
path: '.',
33+
telemetry,
34+
});
35+
```
36+
37+
## SimpleTelemetry
38+
39+
A zero-dependency tracer that writes span data as JSON Lines to a file or console. Each span contains:
40+
41+
- `traceId`, `spanId` — correlation identifiers
42+
- `name` — span name (e.g., `agent.session`, `ai.request`, `tool.call`)
43+
- `startTime`, `endTime`, `duration` — timing
44+
- `attributes` — structured key-value metadata
45+
- `events` — timestamped events within the span
46+
- `status``OK` or `ERROR`
47+
48+
### Span Types
49+
50+
| Span Name | Created By | Description |
51+
|-----------|------------|-------------|
52+
| `agent.session` | `createSessionSpan()` | Top-level session span |
53+
| `ai.request` | `createAISpan()` | Individual LLM API call |
54+
| `tool.call` | `createToolSpan()` | Tool execution |
55+
56+
## OTEL Log Bridge
57+
58+
When `initializeSimpleTelemetryFromOptions()` is called, it automatically patches `console.log`, `console.info`, `console.warn`, and `console.error` to:
59+
60+
1. **Emit OTEL Log Records** via `@opentelemetry/api-logs` (if installed)
61+
2. **Append trace context** `[trace_id=... span_id=...]` to console output (if `@opentelemetry/api` is installed)
62+
63+
This is a no-op if the OpenTelemetry packages are not installed — the bridge gracefully degrades.
64+
65+
### Severity Mapping
66+
67+
| Console Method | OTEL SeverityNumber | OTEL SeverityText |
68+
|---------------|--------------------:|-------------------|
69+
| `console.log` | 9 | INFO |
70+
| `console.info` | 9 | INFO |
71+
| `console.warn` | 13 | WARN |
72+
| `console.error` | 17 | ERROR |
73+
74+
### Integration with External OTEL Collectors
75+
76+
If your application configures an OpenTelemetry SDK with a `LoggerProvider` (e.g., via `@opentelemetry/sdk-logs`), all `console.*` calls from probe will automatically appear as log records in your collector. No additional configuration is needed beyond installing the packages:
77+
78+
```bash
79+
npm install @opentelemetry/api @opentelemetry/api-logs
80+
```
81+
82+
### Visor2 / External Tracer Integration
83+
84+
If you provide your own tracer adapter (e.g., visor2's `createProbeTracerAdapter()`), probe will emit events through your tracer. The OTEL log bridge only activates via `initializeSimpleTelemetryFromOptions()`, so external consumers that create `new SimpleTelemetry()` directly are not affected — they manage their own console instrumentation.
85+
86+
## Tracer Events Reference
87+
88+
All events are emitted via `SimpleAppTracer` methods. Each event includes `session.id` automatically.
89+
90+
### Agent Lifecycle
91+
92+
| Event | Method | Key Attributes |
93+
|-------|--------|----------------|
94+
| `iteration.step` | `recordIterationEvent('step')` | `iteration` |
95+
| `context.compacted` | `addEvent()` ||
96+
| `completion_prompt.started` | `recordEvent()` ||
97+
| `completion_prompt.completed` | `recordEvent()` ||
98+
99+
### AI Model
100+
101+
| Event | Method | Key Attributes |
102+
|-------|--------|----------------|
103+
| `ai.thinking` | `recordThinkingContent()` | `ai.thinking.content`, `ai.thinking.length`, `ai.thinking.hash` |
104+
| `ai.tool_decision` | `recordToolDecision()` | `ai.tool_decision.name`, `ai.tool_decision.params` |
105+
106+
### Tool Execution
107+
108+
| Event | Method | Key Attributes |
109+
|-------|--------|----------------|
110+
| `tool.result` | `recordToolResult()` | `tool.name`, `tool.result`, `tool.duration_ms`, `tool.success` |
111+
112+
### Token Usage
113+
114+
| Event | Method | Key Attributes |
115+
|-------|--------|----------------|
116+
| `tokens.turn` | `recordTokenTurn()` | `tokens.input`, `tokens.output`, `tokens.total`, `tokens.cache_read`, `tokens.cache_write`, `tokens.context_used`, `tokens.context_remaining` |
117+
118+
### Conversation
119+
120+
| Event | Method | Key Attributes |
121+
|-------|--------|----------------|
122+
| `conversation.turn.{role}` | `recordConversationTurn()` | `conversation.role`, `conversation.content`, `conversation.content.length`, `conversation.content.hash` |
123+
124+
### Errors
125+
126+
| Event | Method | Key Attributes |
127+
|-------|--------|----------------|
128+
| `error.{type}` | `recordErrorEvent()` | `error.type`, `error.message`, `error.stack`, `error.recoverable`, `error.context` |
129+
130+
Error types include: `wrapped_tool`, `unrecognized_tool`, `no_tool_call`, `circuit_breaker`.
131+
132+
### Delegation
133+
134+
| Event | Method | Key Attributes |
135+
|-------|--------|----------------|
136+
| `delegation.subagent_created` | `addEvent()` | `delegation.timeout_ms`, `delegation.parent_remaining_ms`, `delegation.timeout_behavior` |
137+
| `delegation.budget_capped` | `addEvent()` | `delegation.original_timeout_ms`, `delegation.capped_timeout_ms`, `delegation.parent_remaining_ms` |
138+
| `delegation.tool_started` | `recordDelegationEvent()` ||
139+
| `delegation.completed` | `recordDelegationEvent()` ||
140+
| `delegation.failed` | `recordDelegationEvent()` ||
141+
| `delegation.parent_abort_phase1` | `addEvent()` | `delegation.deadline_ms` |
142+
| `delegation.parent_abort_phase2` | `addEvent()` ||
143+
144+
### Graceful Stop (Two-Phase)
145+
146+
| Event | Method | Key Attributes |
147+
|-------|--------|----------------|
148+
| `graceful_stop.external_trigger` | `addEvent()` ||
149+
| `graceful_stop.initiated` | `addEvent()` | `graceful_stop.reason`, `graceful_stop.active_subagents`, `graceful_stop.has_mcp_bridge`, `graceful_stop.deadline_ms` |
150+
| `graceful_stop.signals_sent` | `addEvent()` | `graceful_stop.subagents_signalled`, `graceful_stop.subagent_errors`, `graceful_stop.mcp_servers_called`, `graceful_stop.mcp_servers_failed` |
151+
| `graceful_stop.deadline_expired` | `addEvent()` ||
152+
153+
### Graceful Timeout (Wind-Down)
154+
155+
| Event | Method | Key Attributes |
156+
|-------|--------|----------------|
157+
| `graceful_timeout.wind_down_started` | `addEvent()` ||
158+
| `graceful_timeout.wind_down_step` | `addEvent()` | `graceful_timeout.bonus_steps_remaining` |
159+
160+
### Negotiated Timeout (Observer)
161+
162+
| Event | Method | Key Attributes |
163+
|-------|--------|----------------|
164+
| `negotiated_timeout.observer_invoked` | `addEvent()` ||
165+
| `negotiated_timeout.observer_response` | `addEvent()` ||
166+
| `negotiated_timeout.observer_extended` | `addEvent()` | `negotiated_timeout.extension_ms` |
167+
| `negotiated_timeout.observer_declined` | `addEvent()` ||
168+
| `negotiated_timeout.observer_exhausted` | `addEvent()` ||
169+
| `negotiated_timeout.observer_error` | `addEvent()` ||
170+
| `negotiated_timeout.abort_summary_started` | `addEvent()` ||
171+
| `negotiated_timeout.abort_summary_completed` | `addEvent()` ||
172+
| `negotiated_timeout.abort_summary_error` | `addEvent()` ||
173+
174+
### Subagent Registry
175+
176+
| Event | Method | Key Attributes |
177+
|-------|--------|----------------|
178+
| `subagent.registered` | `addEvent()` | `subagent.session_id`, `subagent.active_count` |
179+
| `subagent.unregistered` | `addEvent()` | `subagent.session_id`, `subagent.active_count` |
180+
181+
### MCP (Model Context Protocol)
182+
183+
| Event | Method | Key Attributes |
184+
|-------|--------|----------------|
185+
| `mcp.initialization.started` | `recordMcpEvent()` ||
186+
| `mcp.initialization.completed` | `recordMcpEvent()` ||
187+
| `mcp.server.connecting` | `recordMcpEvent()` | server name |
188+
| `mcp.server.connected` | `recordMcpEvent()` | server name |
189+
| `mcp.server.connection_failed` | `recordMcpEvent()` | server name, error |
190+
| `mcp.server.disconnected` | `recordMcpEvent()` | server name |
191+
| `mcp.tools.discovered` | `recordMcpEvent()` | tool count |
192+
| `mcp.tools.filtered` | `recordMcpEvent()` | filter results |
193+
| `mcp.tool.start` | `recordMcpToolStart()` | `mcp.tool.name`, `mcp.tool.server`, `mcp.tool.params` |
194+
| `mcp.tool.end` | `recordMcpToolEnd()` | `mcp.tool.name`, `mcp.tool.server`, `mcp.tool.result`, `mcp.tool.duration_ms`, `mcp.tool.success` |
195+
| `mcp.graceful_stop.sweep_completed` | `recordMcpEvent()` | `servers_with_tool`, `servers_total`, `results` |
196+
| `mcp.disconnection.started` | `recordMcpEvent()` ||
197+
| `mcp.disconnection.completed` | `recordMcpEvent()` ||
198+
199+
### Bash Permissions
200+
201+
| Event | Method | Key Attributes |
202+
|-------|--------|----------------|
203+
| `bash.permissions.initialized` | `recordBashEvent()` ||
204+
| `bash.permission.allowed` | `recordBashEvent()` | command |
205+
| `bash.permission.denied` | `recordBashEvent()` | command |
206+
207+
### Task Management
208+
209+
| Event | Method | Key Attributes |
210+
|-------|--------|----------------|
211+
| `task.session_started` | `recordTaskEvent()` ||
212+
| `task.batch_created` | `recordTaskEvent()` ||
213+
| `task.validation_error` | `recordTaskEvent()` ||
214+
215+
### Validation
216+
217+
| Event | Method | Key Attributes |
218+
|-------|--------|----------------|
219+
| `json_validation.started` | `recordJsonValidationEvent()` ||
220+
| `json_validation.completed` | `recordJsonValidationEvent()` ||
221+
| `mermaid_validation.started` | `recordMermaidValidationEvent()` ||
222+
| `mermaid_validation.completed` | `recordMermaidValidationEvent()` ||
223+
| `mermaid_validation.maid_fix_completed` | `recordMermaidValidationEvent()` ||
224+
225+
## Debug Logging
226+
227+
All significant decisions and state changes are logged via `console.log("[DEBUG] ...")`. When the OTEL log bridge is active, these are automatically forwarded to the OTEL Logs pipeline.
228+
229+
Key debug log prefixes:
230+
- `[DEBUG] [GracefulStop]` — graceful stop lifecycle
231+
- `[DEBUG] [GracefulTimeout]` — graceful timeout wind-down
232+
- `[DEBUG] [NegotiatedTimeout]` — negotiated timeout observer
233+
- `[DEBUG] [Subagent]` — subagent registration/unregistration
234+
- `[DEBUG] [MCP]` — MCP server operations
235+
- `[DEBUG] [Delegation]` — delegate tool operations
236+
- `[SimpleTelemetry]` — telemetry system status
237+
238+
## Trace File Format
239+
240+
When using `--trace-file`, spans are written as JSON Lines (one JSON object per line):
241+
242+
```json
243+
{"traceId":"abc123","spanId":"def456","name":"tool.call","startTime":1710000000000,"endTime":1710000001000,"duration":1000,"attributes":{"tool.name":"search","session.id":"xyz"},"events":[],"status":"OK","timestamp":"2024-03-10T00:00:01.000Z"}
244+
```
245+
246+
Each line is a complete JSON object that can be parsed independently, making it easy to stream, tail, or ingest into log aggregation systems.

0 commit comments

Comments
 (0)