fix: make timeout opt-in instead of silently imposing 10-min wall-clock cap#41
Merged
Conversation
…ck cap
The TypeScript SDK (amplifier-agent-ts) silently imposed a hard 10-minute
wall-clock timeout on every agent turn due to this logic in session.ts:
const timeoutMs = this.params.timeoutMs ?? DEFAULT_TIMEOUT_MS;
// DEFAULT_TIMEOUT_MS = 10 * 60 * 1000
Callers passing timeoutMs: undefined ("no timeout") got a silent 10-min
cap that synthesized an 'engine_hung' error and SIGTERM/SIGKILL'd the
subprocess. This killed long-running agent turns.
Real-world impact: Amplifier agent tasks in Paperclip failed at exactly
10 minutes ("Engine subprocess hung past 600000ms timeout").
The Amplifier CLI itself imposes no per-turn timeout, so the wrapper SDK
should not silently impose one either.
This fix makes the timeout opt-in:
- timeoutMs: undefined/null/<= 0 → NO timer armed (no engine_hung from timeout)
- timeoutMs: > 0 → timer armed exactly as before
- DEFAULT_TIMEOUT_MS is now exported for callers that want to opt in
Verification:
- All wrapper tests pass (101/101 including 3 new long-window integration tests)
- Build and typecheck clean
- Long-window integration test proves timeoutMs:0 and timeoutMs:undefined
both run a full ~12s subprocess to completion with zero engine_hung error
- Positive control confirms timeout still fires when explicitly opt in (>0)
Changes:
- src/session.ts: timeout logic (only arm when timeoutMs > 0)
- src/index.ts: export DEFAULT_TIMEOUT_MS as opt-in constant
- test/session-subprocess.test.ts: 2 new test cases
- test/timeout-longwindow-integration.test.ts: real 12s subprocess integration tests
- dist/*: updated generated files (build output)
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This is an unrelated pre-existing lint error in the test suite, not a defect in the timeout fix. The error was blocking CI on PR #41. Fixed via ruff auto-fix and included in this PR to unblock the merge. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
`SessionHandleParams.timeoutMs` is typed `number | undefined`, so the runtime check `timeoutMs !== null` was unreachable in typed callers and misleadingly suggested `null` was a supported sentinel. Simplify to `timeoutMs !== undefined && timeoutMs > 0` and update the file docblock plus the `SessionHandleParams.timeoutMs` JSDoc to drop the `null` mention. Behaviour is unchanged: `null`, `undefined`, `0`, and negative values all still disable the timer (now via the type-correct path: `undefined` short- circuits the guard; `0` and negatives fail `> 0`). dist/ regenerated via `npm run build`. Tests: 101/101 passing under `bun run test`; typecheck clean. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…meoutMs is disabled Document the design gap introduced by PR #41: with the wall-clock timer opt-in, callers that pass `timeoutMs: 0` or `undefined` (including the amplifier-app-paperclip adapters per microsoft/amplifier-app-paperclip#13) get no wrapper-side hang detection. The 2s activity ticker still emits heartbeats but never escalates — it is feedback, not recovery. The new entry follows the ISSUE-001 template (status, summary, why-it- matters, design question, 5-step wire-up plan, reproducer, references). It frames the next iteration around progress-based detection (NDJSON event flow, tool/started without matching tool/finished, output-byte deltas) rather than re-introducing a wall-clock cap, so genuine long deep-work spans continue to work uninterrupted. References: #41, microsoft/amplifier-app-paperclip#13. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
manojp99
pushed a commit
to microsoft/amplifier-app-paperclip
that referenced
this pull request
Jun 8, 2026
Satisfies the merge gate documented in this PR's body. With amplifier-agent-ts@0.6.2 now published to npm, the explicit `timeoutSecToMs(0)` → `0` passthrough this PR adds is safe: 0 means "no timer" rather than firing immediately as it did against 0.6.1. Lockfile refreshed; diff scoped to amplifier-agent-ts only. Refs: microsoft/amplifier-agent#41 (the wrapper-side fix published as 0.6.2). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
manojp99
pushed a commit
to microsoft/amplifier-app-paperclip
that referenced
this pull request
Jun 8, 2026
…agent#41) (#13) * fix(adapters): explicit no-timeout passthrough instead of undefined The amplifier-local and acpx-local adapters map timeoutSec:0 ("no timeout") to the wrapper's timeoutMs parameter. Previously, they passed undefined, which the amplifier-agent-ts wrapper silently converted to a 10-minute hard wall-clock cap via DEFAULT_TIMEOUT_MS. This commit: - Adds a tested timeoutSecToMs(timeoutSec) helper that converts 0/negative values to explicit 0 instead of undefined, making the "no timeout" intent unambiguous at the wrapper boundary. - Applies the same helper to both amplifier-local (execute.ts:695) and acpx-local (execute.ts:1414, 1588) to fix the latent identical pattern. - The wrapper (amplifier-agent-ts >=0.7) now treats timeoutMs <= 0 as "no timeout" (optional arm timer), so explicit 0 prevents the silent DEFAULT_TIMEOUT_MS fallback. This fixes the "engine_hung" failures that truncated long-running agent tasks at exactly 10 minutes, confirmed in Paperclip "Stark Enterprises" test runs. Tests: amplifier-local 53/53 and acpx-local 40/40 passing; typecheck clean. Depends-On: microsoft/amplifier-agent PR #41 (wrapper timeout opt-in fix must merge and publish before this can ship). Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore(adapters): bump amplifier-agent-ts pin to ^0.6.2 Satisfies the merge gate documented in this PR's body. With amplifier-agent-ts@0.6.2 now published to npm, the explicit `timeoutSecToMs(0)` → `0` passthrough this PR adds is safe: 0 means "no timer" rather than firing immediately as it did against 0.6.1. Lockfile refreshed; diff scoped to amplifier-agent-ts only. Refs: microsoft/amplifier-agent#41 (the wrapper-side fix published as 0.6.2). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> Co-authored-by: Manoj Prabhakar Paidiparthy <mpaidiparthy@microsoft.com>
manojp99
added a commit
that referenced
this pull request
Jun 11, 2026
…g, prepare script (#48) * chore(wrapper-ts): release 0.6.2 — timeout opt-in Bumps amplifier-agent-ts from 0.6.1 to 0.6.2 to ship the timeout opt-in fix landed in #41. Tag wrapper-v0.6.2 will trigger the publish-wrapper.yml workflow to release this version to npm. Headline change: SessionHandle.submit() no longer silently imposes a 10-minute wall-clock cap when timeoutMs is undefined. The timer is now opt-in (timeoutMs > 0 to arm). DEFAULT_TIMEOUT_MS is now exported for callers that want the legacy cap. Refs: #41 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore(wrapper-ts): add prepare script for git-dep installs Enables consumers to install the wrapper directly from a git ref by auto-running 'npm run build' (which chains prebuild: gen-types -> tsc) during install. This produces dist/ that the published package would have shipped. Standard 'pnpm install' / 'npm install' from the published tarball is unaffected -- prepare only runs for git refs and 'npm publish'. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix(cli): surface enriched usage fields in CLI display (#45 follow-up) The streaming hook (fa3b237) enriched the wire `usage` notification with cost, model, provider, llmDurationMs, cacheReadTokens, cacheWriteTokens, sessionCostTotal, and (for delegated sub-agents) agentName. The CLI display formatter (defaults_cli.py:_summarize) only read inputTokens / outputTokens, so the stderr human-readable log line stayed minimal: [usage] in=4202 out=467 After this patch the line includes every enriched field the engine supplies (each guarded individually so terse usage events still render cleanly for older engines or providers that don't enrich): [usage] in=4202 out=467 cost=$0.067644 cache_read=9339 cache_write=9339 dur=6411ms model=claude-opus-4-5 provider=anthropic Downstream impact: hosts that capture amplifier-agent stderr (e.g. paperclip's amplifier-local adapter, which persists raw stderr to heartbeat-run NDJSON logs and renders it as the run transcript) will now see the enriched fields in their transcript views without any host-side changes. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(cli): add JsonDisplaySystem + `--display ndjson` for structured host consumption Closes the contract gap between the streaming-hook enrichment (#45) and host integrations that consume the wire-event stream. Today, CliDisplaySystem is hardcoded as the only DisplaySystem, writing `[type] summary` human-readable lines to stderr. Hosts using amplifier-agent-ts (e.g. paperclip's amplifier-local adapter) wire `parseNdjsonStream` onto child stderr, expecting one JSON-RPC notification per line -- but never get any, because no display ever emits JSON. The wrapper's `onJson` callback never fires; structured fields (cost, model, provider, llm duration, cache token counts, session cost total, delegated agentName) added by #45 sit in Python dicts and never reach disk. This adds a second DisplaySystem implementation alongside CliDisplay and a CLI flag to choose between them: amplifier-agent run --display text|ndjson (default: text) `text` keeps the existing human-readable behavior verbatim. `ndjson` swaps in JsonDisplaySystem, which emits one JSON object per event shaped as: {"method": "<event-type>", "params": <rest of event>} This matches the JSON-RPC notification shape the wrapper-ts session parser expects (session.ts:380-395), so host adapters can switch on `event.method` and receive the enriched fields directly on `event.params`. Backward-compatible by design: - Default is `text`; existing wrappers that don't pass `--display` continue to receive the human-text format. - The new flag is additive; no breaking change to argv contract. - JsonDisplaySystem ignores verbosity flags -- hosts filter their own consumption (the structured stream is the canonical contract). Contract notes for future maintainers: - The NDJSON stream is now part of the engine's external interface. Fields are additive-only; never rename a field without a versioning plan. Hosts should ignore unknown `params` keys. - Stdout discipline preserved: JsonDisplaySystem writes only to the injected stream (typically sys.stderr). The §4.1 envelope on stdout is unchanged. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(wrapper-ts): forward `displayMode` opt to engine via `--display` Pairs with the engine-side `JsonDisplaySystem` + `--display` flag added in commit 00d535f. Hosts can now request structured stderr emission (one JSON-RPC notification per line) by passing `displayMode: "ndjson"` to `spawnAgent()`. Without this, the wrapper's `parseNdjsonStream` consumer on `child.stderr` sees only human-readable text from `CliDisplaySystem` and never invokes `display.onEvent` for the engine's wire-event stream -- so hosts wanting cost/model/cache/duration enrichment from #45 never see it. Wiring: - `SpawnAgentParams.displayMode?: "text" | "ndjson"` (new public field). - Forwarded through `SessionHandleParams.displayMode` to `assembleArgv()`, which emits `--display <mode>` when set. - When omitted, the wrapper emits no `--display` flag, so older engines (pre-#45-followup) keep working with this wrapper. Backward-compatible by design: - The new field is optional everywhere on the path. - Existing callers (no displayMode) keep their current behavior. - Engine defaults to `text` if no `--display` flag is emitted. Engine compatibility note added to the public docstring: setting `displayMode` requires an engine that accepts `--display` (older engines fail with click "no such option"). Hosts using link: or paired releases will be in sync; hosts mixing wrapper@new + engine@old should omit `displayMode`. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore(wrapper-ts): regenerate dist/ for displayMode plumbing Built from the src/ changes in 82605c7. dist/ is tracked in this repo so consumers installing via git refs get the built artifacts without a separate build step. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix(wrapper-ts): also push NDJSON notifications onto the iterator queue Previously the parseNdjsonStream onJson handler in SessionHandle.submit() dispatched parsed notifications ONLY to params.display?.onEvent (the push callback). Iterator consumers (`for await (const ev of handle.submit(...))`) never saw them -- the iterator queue only received init, activity ticks, and the final terminal event. This is the third bug in the chain that prevented paperclip's amplifier-local adapter from recording cost data: 1. engine had no JSON display mode (fixed 00d535f: JsonDisplaySystem) 2. wrapper didn't forward --display flag (fixed 82605c7: displayMode) 3. wrapper delivered notifications to callback but not iterator (this fix) Paperclip's execute.ts iterates handle.submit() and switches on event.method for usage/result/tool/* events. With this fix, the existing `case "notification":` branch finally receives data and the adapter populates AdapterExecutionResult.{costUsd, usage, model, provider}. cost_events table starts getting rows. Hosts that subscribe via both display.onEvent AND the iterator will receive each notification twice -- acceptable trade-off; subscribe to one or the other. Documented inline. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(wrapper-ts): forward `workspace` opt to engine via `--workspace` Hosts that manage multiple agents per process (paperclip's amplifier-local adapter being the immediate case — runs CEO + CTO + sub-agents per company) need each agent's session state to land in its own engine workspace directory. Without this, every spawn shares a cwd-derived slug (e.g. "default-9e80f0e7") and all transcripts mingle under one workspaces/.../sessions/ tree, making debugging and history navigation painful. The engine already accepts `--workspace <name>` (validated against `[a-z0-9][a-z0-9-]{0,63}`). This plumbs it through: SpawnAgentParams.workspace → SessionHandleParams.workspace → AssembleArgvInput.workspace → argv: --workspace <slug> When omitted, the wrapper emits no `--workspace` flag — the engine falls back to cwd-derived auto-slug (existing behavior preserved). Backward-compatible by design: - Field is optional throughout. - Existing hosts (no workspace) keep their auto-derived slug. - Older engines that accept --workspace just receive what was already valid argv. Engine compatibility note: `--workspace` has been a click option on `amplifier-agent run` for a while (single_turn.py), so this doesn't gate behind a new engine version. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Manoj Prabhakar Paidiparthy <mpaidiparthy@microsoft.com> Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The TypeScript SDK () silently imposed a hard 10-minute wall-clock timeout on every agent turn. Root cause: in , the logic was:
Callers passing
timeoutMs: undefined("no timeout") got a silent 10-min cap that synthesized anengine_hungerror and SIGTERM/SIGKILL'd the subprocess. This killed long-running agent turns.Real-world impact: Amplifier agent tasks in Paperclip failed at exactly 10 minutes with "Engine subprocess hung past 600000ms timeout".
The Amplifier CLI itself imposes no per-turn timeout, so the wrapper SDK should not silently impose one either.
The Fix
Timeout is now opt-in:
timeoutMs: undefined/null/<= 0→ NO timer armed (no engine_hung from timeout)timeoutMs: > 0→ timer armed exactly as beforeDEFAULT_TIMEOUT_MSis now exported for callers that want to opt in to the 10-min capChanges
src/session.ts: timeout logic now only arms when
timeoutMs > 0clearTimeoutsites guardedSessionHandleParams.timeoutMssrc/index.ts:
DEFAULT_TIMEOUT_MSnowexport constSpawnAgentParams.timeoutMstest/session-subprocess.test.ts:
(k) timeoutMs: 0and(l) timeoutMs: undefinedboth confirm no engine_hung within 300mstest/timeout-longwindow-integration.test.ts (NEW):
spawnAgent()→submit()APItimeoutMs: 0→ engine runs ~12s, completes, NO engine_hungtimeoutMs: undefined→ engine runs ~12s, completes, NO engine_hungtimeoutMs: 500→ engine_hung fires ~500ms, subprocess cancelleddist/*: Updated generated TypeScript declarations
Verification
npm run buildsucceedsnpm run typechecksucceedsImpact & Next Steps
This PR fixes the wrapper behavior. The integration with amplifier-app-paperclip (and any other consumer) requires:
microsoft/amplifier-agent^0.6.1→^0.6.2inpackages/adapters/amplifier-local/package.jsonThe wrapper fix is backward compatible — existing callers passing explicit positive timeouts work unchanged; callers passing
undefinedor0now get the behavior they intended (no timeout).Testing Notes
Pre-existing test flake:
test/transport.test.ts > terminate()times out (unrelated to this change; test failure existed before). The critical tests for this fix all pass green.