feat(streaming-hook): enrich wire events with cost, LLM timing, model, thinking, and sub-agent attribution#45
Merged
Merged
Conversation
Adds NotRequired slots for cost (str, Decimal-as-string), llmDurationMs, model, provider, cacheReadTokens, cacheWriteTokens, sessionCostTotal, agentName. Changes existing cost slot from float to str to preserve monetary precision (matching kernel's Decimal-as-str wire serialization). Phase 1 Task 1 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Phase 1 Task 2: aligns existing roundtrip test with the cost: str type introduced in Task 1. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Adds NotRequired[str] agentName slot to ToolStartedNotification and ToolCompletedNotification so Phase 2 can attribute tool actions to sub-agents via session_id parsing. Phase 1 Task 3 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Regenerates the JSON Schemas for UsageNotification, ToolStartedNotification, and ToolCompletedNotification after the TypedDict changes in Tasks 1-3. spec.md unchanged (no taxonomy-level addition; only field additions). Phase 1 Task 4 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Extracts sub-agent name from delegated session ids of the form
{parent}-{child}_{agent_name}. Root sessions (no underscore) return None.
Phase 2 Task 1 of streaming-hook-enrichment plan.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…l/cache Extracts duration_ms, model, provider, cache_read_tokens, cache_write_tokens, and cost_usd from llm:response event data and attaches them as optional fields on the usage wire event. Cost preserved as Decimal-as-string for monetary precision. agentName attached for sub-agent sessions via _parse_agent_name. Absent kernel fields are omitted (not emitted as None) to respect the schema's additionalProperties:false. Phase 2 Task 2 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…eted
Parses session_id ({parent}-{child}_{agent_name} format) and attaches
agentName to tool/started and tool/completed wire events when the session
is delegated (sub-agent). Root sessions emit no agentName field.
Phase 2 Task 3 of streaming-hook-enrichment plan.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Subscribes to kernel thinking:delta and thinking:final events (extended reasoning blocks) and emits them as the already-defined thinking/delta and thinking/final canonical wire types. on_thinking_final falls back to data['block']['text'] when top-level text is absent, mirroring the content_block:end handling pattern. Extends CANONICAL_WIRE_EVENTS to include the two thinking types. Phase 2 Task 4 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Adds on_orchestrator_complete handler that aggregates per-call session.cost contributions via coordinator.collect_contributions and emits a session-total usage event with sessionCostTotal. Inline _sum_cost_usd helper preserves Decimal precision (no foundation import). Guarded against coordinators lacking collect_contributions. Phase 2 Task 5 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Updates mount() to register 10 hook handlers (was 7), adding thinking:delta, thinking:final, and orchestrator:complete. Docstring updated to reflect the new event list. Renames test_mount_registers_seven_handlers -> ten. Phase 2 Task 6 of streaming-hook-enrichment plan. Phase 2 complete. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
End-to-end test driving a realistic delegated turn (tool call, content block, thinking, enriched llm:response, orchestrator:complete) and asserting every new field reaches the wire: cost, llmDurationMs, model, provider, cacheReadTokens, cacheWriteTokens, agentName, thinking text, and sessionCostTotal. Phase 3 Task 1 of streaming-hook-enrichment plan. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…5 tasks) Captures the design decisions and per-task TDD sequence for the streaming hook enrichment work (Phases 1-3). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…r handler The real amplifier-core kernel exposes collect_contributions as an async coroutine, not a sync method. The previous code called it synchronously, passed the un-awaited coroutine into _sum_cost_usd, which then raised TypeError: 'coroutine' object is not iterable during iteration. This crashed on_orchestrator_complete on every turn against the real kernel. Caught by DTU integration testing — the unit test mock had a sync collect_contributions, masking the async/sync mismatch. This commit updates the mock to async def to match the real kernel contract so the test suite would have caught the bug. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Decimal('NaN') and Decimal('Infinity') are valid Decimal constructs that
do NOT raise InvalidOperation, so the existing try/except in _sum_cost_usd
did not catch them. A single NaN entering the accumulator would poison
the sum, emitting 'sessionCostTotal': 'NaN' on the wire and silently
breaking the budget-enforcement consumers this field exists to enable.
Adds an is_finite() guard immediately after Decimal construction, plus
test coverage for NaN-only, Infinity-only, and mixed inputs.
Also documents (in on_orchestrator_complete's docstring) that
sessionCostTotal reflects what collect_contributions returns, which may
differ from summing per-call cost fields due to how the kernel
accumulates contributions across the coordinator hierarchy. This is a
kernel concern (bridge_child_cost semantics in foundation), not a bug
in this hook.
Surfaced by code review on PR #45.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…chment
Regenerates wrappers/typescript/{src/types.ts,dist/types.d.ts} from the
updated JSON Schemas committed in 9402101. Auto-generated via:
cd wrappers/typescript
pnpm install
pnpm run gen:types
pnpm run build
New TypeScript shapes mirror the Python TypedDicts:
- UsageNotification: cost type is now string (Decimal precision), plus
llmDurationMs, model, provider, cacheReadTokens, cacheWriteTokens,
sessionCostTotal, and agentName all added as optional fields.
- ToolStartedNotification + ToolCompletedNotification: agentName added
as optional field for sub-agent attribution.
All additions are TypeScript optional (?), matching the Python
NotRequired contract. Backward-compatible at the wire — consumers
that don't read the new fields keep working unchanged.
Verified end-to-end against the real kernel via DTU; see prior commits
for details.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
manojp99
added a commit
that referenced
this pull request
Jun 11, 2026
…g, prepare script (#48) * chore(wrapper-ts): release 0.6.2 — timeout opt-in Bumps amplifier-agent-ts from 0.6.1 to 0.6.2 to ship the timeout opt-in fix landed in #41. Tag wrapper-v0.6.2 will trigger the publish-wrapper.yml workflow to release this version to npm. Headline change: SessionHandle.submit() no longer silently imposes a 10-minute wall-clock cap when timeoutMs is undefined. The timer is now opt-in (timeoutMs > 0 to arm). DEFAULT_TIMEOUT_MS is now exported for callers that want the legacy cap. Refs: #41 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore(wrapper-ts): add prepare script for git-dep installs Enables consumers to install the wrapper directly from a git ref by auto-running 'npm run build' (which chains prebuild: gen-types -> tsc) during install. This produces dist/ that the published package would have shipped. Standard 'pnpm install' / 'npm install' from the published tarball is unaffected -- prepare only runs for git refs and 'npm publish'. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix(cli): surface enriched usage fields in CLI display (#45 follow-up) The streaming hook (fa3b237) enriched the wire `usage` notification with cost, model, provider, llmDurationMs, cacheReadTokens, cacheWriteTokens, sessionCostTotal, and (for delegated sub-agents) agentName. The CLI display formatter (defaults_cli.py:_summarize) only read inputTokens / outputTokens, so the stderr human-readable log line stayed minimal: [usage] in=4202 out=467 After this patch the line includes every enriched field the engine supplies (each guarded individually so terse usage events still render cleanly for older engines or providers that don't enrich): [usage] in=4202 out=467 cost=$0.067644 cache_read=9339 cache_write=9339 dur=6411ms model=claude-opus-4-5 provider=anthropic Downstream impact: hosts that capture amplifier-agent stderr (e.g. paperclip's amplifier-local adapter, which persists raw stderr to heartbeat-run NDJSON logs and renders it as the run transcript) will now see the enriched fields in their transcript views without any host-side changes. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(cli): add JsonDisplaySystem + `--display ndjson` for structured host consumption Closes the contract gap between the streaming-hook enrichment (#45) and host integrations that consume the wire-event stream. Today, CliDisplaySystem is hardcoded as the only DisplaySystem, writing `[type] summary` human-readable lines to stderr. Hosts using amplifier-agent-ts (e.g. paperclip's amplifier-local adapter) wire `parseNdjsonStream` onto child stderr, expecting one JSON-RPC notification per line -- but never get any, because no display ever emits JSON. The wrapper's `onJson` callback never fires; structured fields (cost, model, provider, llm duration, cache token counts, session cost total, delegated agentName) added by #45 sit in Python dicts and never reach disk. This adds a second DisplaySystem implementation alongside CliDisplay and a CLI flag to choose between them: amplifier-agent run --display text|ndjson (default: text) `text` keeps the existing human-readable behavior verbatim. `ndjson` swaps in JsonDisplaySystem, which emits one JSON object per event shaped as: {"method": "<event-type>", "params": <rest of event>} This matches the JSON-RPC notification shape the wrapper-ts session parser expects (session.ts:380-395), so host adapters can switch on `event.method` and receive the enriched fields directly on `event.params`. Backward-compatible by design: - Default is `text`; existing wrappers that don't pass `--display` continue to receive the human-text format. - The new flag is additive; no breaking change to argv contract. - JsonDisplaySystem ignores verbosity flags -- hosts filter their own consumption (the structured stream is the canonical contract). Contract notes for future maintainers: - The NDJSON stream is now part of the engine's external interface. Fields are additive-only; never rename a field without a versioning plan. Hosts should ignore unknown `params` keys. - Stdout discipline preserved: JsonDisplaySystem writes only to the injected stream (typically sys.stderr). The §4.1 envelope on stdout is unchanged. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(wrapper-ts): forward `displayMode` opt to engine via `--display` Pairs with the engine-side `JsonDisplaySystem` + `--display` flag added in commit 00d535f. Hosts can now request structured stderr emission (one JSON-RPC notification per line) by passing `displayMode: "ndjson"` to `spawnAgent()`. Without this, the wrapper's `parseNdjsonStream` consumer on `child.stderr` sees only human-readable text from `CliDisplaySystem` and never invokes `display.onEvent` for the engine's wire-event stream -- so hosts wanting cost/model/cache/duration enrichment from #45 never see it. Wiring: - `SpawnAgentParams.displayMode?: "text" | "ndjson"` (new public field). - Forwarded through `SessionHandleParams.displayMode` to `assembleArgv()`, which emits `--display <mode>` when set. - When omitted, the wrapper emits no `--display` flag, so older engines (pre-#45-followup) keep working with this wrapper. Backward-compatible by design: - The new field is optional everywhere on the path. - Existing callers (no displayMode) keep their current behavior. - Engine defaults to `text` if no `--display` flag is emitted. Engine compatibility note added to the public docstring: setting `displayMode` requires an engine that accepts `--display` (older engines fail with click "no such option"). Hosts using link: or paired releases will be in sync; hosts mixing wrapper@new + engine@old should omit `displayMode`. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * chore(wrapper-ts): regenerate dist/ for displayMode plumbing Built from the src/ changes in 82605c7. dist/ is tracked in this repo so consumers installing via git refs get the built artifacts without a separate build step. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix(wrapper-ts): also push NDJSON notifications onto the iterator queue Previously the parseNdjsonStream onJson handler in SessionHandle.submit() dispatched parsed notifications ONLY to params.display?.onEvent (the push callback). Iterator consumers (`for await (const ev of handle.submit(...))`) never saw them -- the iterator queue only received init, activity ticks, and the final terminal event. This is the third bug in the chain that prevented paperclip's amplifier-local adapter from recording cost data: 1. engine had no JSON display mode (fixed 00d535f: JsonDisplaySystem) 2. wrapper didn't forward --display flag (fixed 82605c7: displayMode) 3. wrapper delivered notifications to callback but not iterator (this fix) Paperclip's execute.ts iterates handle.submit() and switches on event.method for usage/result/tool/* events. With this fix, the existing `case "notification":` branch finally receives data and the adapter populates AdapterExecutionResult.{costUsd, usage, model, provider}. cost_events table starts getting rows. Hosts that subscribe via both display.onEvent AND the iterator will receive each notification twice -- acceptable trade-off; subscribe to one or the other. Documented inline. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * feat(wrapper-ts): forward `workspace` opt to engine via `--workspace` Hosts that manage multiple agents per process (paperclip's amplifier-local adapter being the immediate case — runs CEO + CTO + sub-agents per company) need each agent's session state to land in its own engine workspace directory. Without this, every spawn shares a cwd-derived slug (e.g. "default-9e80f0e7") and all transcripts mingle under one workspaces/.../sessions/ tree, making debugging and history navigation painful. The engine already accepts `--workspace <name>` (validated against `[a-z0-9][a-z0-9-]{0,63}`). This plumbs it through: SpawnAgentParams.workspace → SessionHandleParams.workspace → AssembleArgvInput.workspace → argv: --workspace <slug> When omitted, the wrapper emits no `--workspace` flag — the engine falls back to cwd-derived auto-slug (existing behavior preserved). Backward-compatible by design: - Field is optional throughout. - Existing hosts (no workspace) keep their auto-derived slug. - Older engines that accept --workspace just receive what was already valid argv. Engine compatibility note: `--workspace` has been a click option on `amplifier-agent run` for a while (single_turn.py), so this doesn't gate behind a new engine version. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Manoj Prabhakar Paidiparthy <mpaidiparthy@microsoft.com> Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enhanced the streaming hook (
hook_streaming.py) to enrich wire events with cost tracking, LLM timing visibility, model/provider attribution, thinking event streaming, and sub-agent identification. This directly addresses reviewer feedback requesting budget safety nets and per-step visibility into what work is being performed and how long it takes.Key additions:
UsageNotificationnow includescost,llmDurationMs,model,provider,cacheReadTokens,cacheWriteTokens,sessionCostTotal,agentNametool/started,tool/completed) now carryagentNamefor delegated sessionsthinking:deltaandthinking:finalevents (reasoning now visible on the wire)orchestrator:completehandlerProblem
A user code review identified two critical gaps:
Cost visibility: "Plumbing cost data, once we have that we can allow users to set budgets for spend, giving a safety net (we will need visibility on cost for all harnesses that support wiring it in)"
Per-step visibility: "Increase visibility on 'what' is being done on each Amplifier step/sub-step, include time per step ... the current duration appears to be only for the tool calls and not the llm calls (the valuable one)"
Root cause: The streaming hook was extracting only token counts from LLM responses and discarding cost, duration, model, provider, and cache data already available from the kernel. It did not subscribe to thinking events at all, and did not track session-total cost.
Approach
Executed under TDD in three phases with full commit history:
Phase 1 — Protocol Extension (6 tasks)
UsageNotificationTypedDict: addedcost: NotRequired[str](Decimal precision),llmDurationMs,model,provider,cacheReadTokens,cacheWriteTokens,sessionCostTotal,agentNameagentName: NotRequired[str]toToolStartedNotificationandToolCompletedNotificationuv run python -m amplifier_agent_lib.protocol._gen --output-dir src/amplifier_agent_lib/protocolPhase 2 — Hook Enrichment (6 tasks)
_parse_agent_name(session_id)helper: parses{parent}-{child}_{agent_name}session ID format_sum_cost_usd(results): Decimal aggregator, replicated from foundation to keep hook free of unnecessary couplingon_llm_response: extractsduration_ms,model,provider,cost_usd,cache_read_tokens,cache_write_tokens; attaches optional fields only when presenton_thinking_deltaandon_thinking_finalhandlers: emit canonicalthinking/deltaandthinking/finalwire eventson_orchestrator_completehandler: callscoordinator.collect_contributions("session.cost"), aggregates via_sum_cost_usd, emits session-total usage eventagentNameto tool and usage events when session is delegatedCANONICAL_WIRE_EVENTStuple (5 → 7 types)mount()registers 10 handlers (was 7)Phase 3 — Verification (3 tasks)
Testing & Validation
Unit & Integration Tests
Environmental Note
11
test_conformance_parity.pytests fail due to missingtsx(TypeScript runner) on the dev box. These are pre-existing integration-environment gaps, marked@pytest.mark.integration, excluded from CI runs, and not regressions introduced by this work.Digital Twin Universe (End-to-End) Validation
Built a DTU environment from the local working tree (mirrored to Gitea, installed via
uv tool installfrom Gitea mirror) and ran real delegated turns against Anthropic's claude-opus-4-5.Integration bug surfaced: The original
on_orchestrator_completehandler calledcollect_contributionssynchronously, but the kernel exposes it as an async coroutine. Fixed in commit0d734d3(the mock was also tightened toasync defso future tests would catch this mismatch).Real wire output from a delegated turn:
{"type": "usage", "cost": "0.029421", "llmDurationMs": 8788, "model": "claude-opus-4-5", "provider": "anthropic", "inputTokens": 4202, "outputTokens": 442, "cacheReadTokens": 4192, "cacheWriteTokens": 2596} {"type": "tool/started", "name": "delegate", "args": {"agent": "explorer", ...}} {"type": "usage", "cost": "0.0324015", "llmDurationMs": 9968, "agentName": "explorer", "sessionId": "0000000000000000-6563bf04ec894842_explorer", ...} {"type": "tool/started", "name": "bash", "agentName": "explorer", "args": {"command": "curl -sL https://docs.python.org/3.13/whatsnew/3.13.html"}} {"type": "usage", "sessionCostTotal": "0.092882", ...}Every requested field reaches the wire in realistic multi-step delegated turns.
Wire Taxonomy
Additive only — no breaking changes:
CANONICAL_DISPLAY_EVENTS)NotRequired(optional)UsageNotification.cost(NotRequired[float]→NotRequired[str]); this slot existed but was never populated, so no live consumer was affectedFiles Changed
src/amplifier_agent_lib/bundle/hook_streaming.pysrc/amplifier_agent_lib/protocol/notifications.pysrc/amplifier_agent_lib/protocol/schemas/UsageNotification.schema.jsonsrc/amplifier_agent_lib/protocol/schemas/ToolStartedNotification.schema.jsonsrc/amplifier_agent_lib/protocol/schemas/ToolCompletedNotification.schema.jsontests/test_bundle_hook_streaming.pytests/test_protocol_notifications.pydocs/plans/2026-06-09-streaming-hook-enrichment-*.mdRelated Issues / PRs
Closes: (reviewer feedback from recent PR review)
Generated with Amplifier