feat(phase-2-0c): engine gap fixes + streaming hook + admin verb split#5
Merged
Conversation
… (locked)
Locks the Layer 3 boundary as one coherent shipping unit: TS + Py wrappers, generated wire spec (Python TypedDicts authoritative), cross-language conformance with shared YAML fixtures, engine gap fixes (a-e), and a vendored streaming hook. Amends design checkpoint Section 4: lifecycle is 'one-shot' only in v1 ('burst' reserved). Grounded empirically in Paperclip + NanoClaw surveys (2026-05-20).
Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Per design Assumption A0 (§10): Mode B (stdio JSON-RPC loop) has no consumers. Both Paperclip and NanoClaw spawn fresh subprocesses per query. Mode B was ~600 LOC of routing + lifecycle infrastructure with no consumer. Closes F11 (mid-burst death) and CR-4 (silent turn/cancel consumption) by structural construction: the code no longer exists. Files removed: - src/amplifier_agent_lib/protocol_points/defaults_stdio.py - src/amplifier_agent_cli/modes/stdio_loop.py - tests/test_defaults_stdio.py - tests/cli/test_stdio_loop_dispatch.py - tests/cli/test_stdio_loop_handshake.py - tests/cli/test_stdio_loop_subprocess.py - tests/test_l14_synthesis.py (tested Mode B L14 safety-net exclusively) Files modified: - src/amplifier_agent_cli/modes/single_turn.py: strip --stdio flag, stdio:bool param, --idle-timeout flag, and entire if stdio: block (~90 LOC) - src/amplifier_agent_cli/__main__.py: remove Mode B docstring reference - tests/cli/test_end_to_end.py: replace test_run_stdio_exits_0_on_stdin_close with test_stdio_flag_removed (verifies --stdio returns exit 2) - tests/cli/test_single_turn.py: remove test_run_stdio_delegates_to_event_loop - tests/test_smoke.py: add deletion-detector tests for both removed modules All 280 tests pass.
- Add cwd and providerOverride conditionally to Engine.boot() init_params
in single_turn._execute_turn. Both are NotRequired in InitializeParams
TypedDict so existing call sites that omit them remain valid.
- tests/test_engine.py: add test_boot_propagates_session_id_and_resume_and_cwd
verifying Engine.boot handles all four init_params fields (sessionId,
resume, cwd, providerOverride) without crashing, and correctly echoes
sessionId/resumed in sessionState.
- tests/cli/test_single_turn_init_params.py: two new tests
* test_run_passes_cwd_to_make_turn_handler — asserts cwd='/tmp' reaches
make_turn_handler AND is present in init_params sent to Engine.boot
* test_run_passes_provider_override_to_detect_provider — asserts
override='anthropic' reaches detect_provider AND appears as
providerOverride in init_params
Closes gap (c) from design §4.7.
…nostic
Per D8 admin verb split (Task 7):
- add src/amplifier_agent_cli/admin/prepare.py: Click command that runs
asyncio.run(load_and_prepare_cached(aaa_version=__version__)), exits 1
on exception (prints error + traceback), exits 0 with '[ OK ] bundle
cache primed' on success. Primes the cache at install time so the first
runtime invocation never pays manifest-resolution + clone + pip-install.
- add src/amplifier_agent_cli/admin/verify.py: Click command with
--check-hooks flag. _MINIMUM_SET defines the five required canonical
wire events ('result/delta', 'result/final', 'tool/started',
'tool/completed', 'usage'). _check_hooks() loads the bundle, locates
the streaming hook in mount_plan, imports
amplifier_agent_lib.bundle.hook_streaming, checks CANONICAL_WIRE_EVENTS
for minimum-set coverage; exits 0/[OK] or 1/[FAIL].
- modify __main__.py: register _prepare_command and _verify_command.
- doctor.py: no change — check_cache_state() already reports primed state
without calling load_and_prepare_cached; confirmed by new test.
Tests:
- tests/cli/test_admin_prepare.py: TDD RED/GREEN for prepare command
- tests/cli/test_admin_verify.py: TDD RED/GREEN for verify --check-hooks
- tests/cli/test_doctor.py: add test_doctor_does_not_call_load_and_prepare_cached
- Add sessionId: str to TurnSubmitResult TypedDict (methods.py) - Engine.submit_turn now returns sessionId from params in TurnSubmitResult - Add test_submit_turn_result_includes_session_id verifying the envelope Closes SC-6: every final-reply envelope must carry sessionId.
- Add hook_streaming.py with CANONICAL_WIRE_EVENTS constant
- Add StreamingEmitter class with _emit(), on_tool_pre() stubs
- Implement on_tool_pre: tool:pre -> tool/started with defensive
data.get('tool') or data.get('tool_name') + arguments/tool_input
- Add mount() registering 7 handlers on coordinator.hooks
- Add test_bundle_hook_streaming.py with 11A tests (4 pass)
- Implement on_tool_post: tool:post -> tool/completed with
durationMs=int(data.get('duration_ms', 0)) and result=data.get('result')
- 5 tests pass
- Implement on_content_block_start: init _delta_seen[block_id]=False, _block_text[block_id]='' - Implement on_content_block_delta: mark seen, emit result/delta - Implement on_content_block_end: emit fallback result/delta only when no delta fired (SC-1: loop-streaming fallback); cleanup both dicts - 9 tests pass
- Implement on_llm_response: emits usage event when in_tok or out_tok are non-zero, emits result/final when text is non-empty - 12 tests pass
- Implement on_tool_error: tool:error -> error with recoverable=True, default code='tool_failed' - 14 tests pass
- Add amplifier_agent_lib.bundle.hook_streaming to hooks: block in bundle.md with source: local (vendored Python module, no git URL needed) - Add test_prepared_bundle_mounts_hook_streaming to test_bundle_loader.py verifying mount_plan hooks contains the streaming hook entry The source: local marker signals that the module lives inside this wheel rather than at a git-hosted URL. Foundation's prepare() handles source: local gracefully (skips git installation), so no loader.py changes are required. amplifier-agent verify --check-hooks now exits 0 with minimum-set OK message.
Add tests/test_resume_continuity.py with a two-turn end-to-end test that validates context is preserved across CLI invocations sharing the same --session-id when the second invocation uses --resume. Test strategy: - Turn 1 (--fresh): plant 'My favorite color is purple. Please remember it.' - Turn 2 (--resume): ask 'What is my favorite color?' - Assert 'purple' appears in the JSON reply from turn 2. - Skips if ANTHROPIC_API_KEY is not set. BUNDLE.MD SWAP (A7 contingency): context-simple → context-persistent SC-4 confirmed that context-simple does NOT replay transcripts when is_resumed=True — turn-2 reply had no knowledge of the prior turn. Per design A7 contingency, context-simple was swapped to context-persistent (git+https://github.com/microsoft/amplifier-module-context-persistent@main). After the swap, the test passes: context-persistent correctly replays prior transcript state on --resume, enabling cross-turn memory. Also updated tests/test_bundle_loader.py: renamed test_prepared_bundle_declares_context_simple → test_prepared_bundle_declares_context_persistent to reflect the new module.
…val per design doc 2026-05-20 Exit gate satisfied: - amplifier-agent verify --check-hooks exits 0 (minimum-set events confirmed) - amplifier-agent run emits >=1 result/delta + exactly 1 result/final with real provider Changes: 1. tests/cli/test_end_to_end.py: Add Phase 2.0c exit gate tests - test_phase_2_0c_exit_gate_verify_check_hooks_exits_0 - test_phase_2_0c_exit_gate_real_turn_emits_result_events 2. pyproject.toml: Register streaming hook as amplifier.modules entry point Entry point name: amplifier_agent_lib.bundle.hook_streaming Enables kernel loader fallback (_load_direct) to find the hook when source: local URI activation fails (foundation resolver lacks local handler) 3. src/amplifier_agent_lib/bundle/hook_streaming.py: Fix event schema mismatch Kernel amplifier-core >=1.5 emits different event field names: - content_block:delta is NOT fired; text arrives in content_block:end - content_block:end uses block dict (not text key) + block_index (not block_id) - llm:response has usage sub-dict (not top-level input/output_tokens), no text Fixes: - _block_id() helper for block_index/block_id compat - on_content_block_end: extract text from block dict, skip thinking blocks - on_llm_response: nested usage dict + always emit result/final as completion signal - _emit: guard against None capability (no-op instead of TypeError) 4. tests/test_bundle_hook_streaming.py: Update test for new result/final behavior result/final is now always emitted from llm:response as turn-completion signal (text was already delivered via content_block:end in current kernel schema) 8 V1/NC failure modes designed out: F5, F7, F11, F13a, F15, NC-L14, NC-L16, CR-4
…port
Code-quality-reviewer findings from Phase 2.0c final review:
- prepare.py:26: err=False routes errors to stdout (violates project
stdout discipline). Changed to err=True.
- engine.py:150: inline __import__('os') broke import-style consistency.
Added top-level import os and replaced inline call.
Both findings: mechanical one-line fixes, no architectural impact.
Full test suite still green: 311 passed.
…l workaround)
The bundle.md entry `source: local` had no foundation handler, so it was
silently relying on the kernel's _load_entry_point fallback declared in
pyproject.toml — which logs a recoverable BundleNotFoundError ("Failed to
lazy-activate: No handler for URI: local") before the fallback fires.
Switch to the canonical pattern used by amplifier-app-cli's trace_collector
(main.py:2551): import the vendored hook module and call mount() programmatically
from _runtime.py after capability registration. Also patches spawn.py for child
sessions, since they previously also relied on the entry-point fallback.
- bundle.md: drop the `source: local` hook entry (sha256 changes; cache will
cold-prepare on next launch)
- pyproject.toml: drop the entry-point fallback (no longer needed)
- _runtime.py: import + await mount_streaming_hook(coordinator, {})
- spawn.py: propagate display.emit + mount for child sessions via
get_capability("display.emit") / register_capability() pattern
- tests: replace mount_plan-based regression test with a live-coordinator test
(using amplifier_core.create_test_coordinator) that fires tool:pre and
asserts tool/started reaches the display; fix _FakeHooks.register() stub
Co-author: Amplifier (Microsoft)
After the bug fix in da9889e moved the streaming hook from bundle.md (source: local) to programmatic mount in _runtime.py, the `verify --check-hooks` command's mount-plan inspection was stale — the hook is no longer declared at the bundle level. The hook IS correctly mounted at session-creation time (verified by tests/test_runtime_hook_mount.py). The verify command now checks only what the install-time gate actually needs: - hook_streaming is importable - CANONICAL_WIRE_EVENTS covers the minimum set - mount is a callable Live-coordinator mounting verification stays in the test suite, where it belongs. Co-author: Amplifier (Microsoft)
manojp99
pushed a commit
that referenced
this pull request
Jun 3, 2026
Adds named re-exports from the package entry point so consumers can
import internal helpers without reaching into private deep paths:
assembleArgv, AssembleArgvInput
resolveMcpConfigPath, cleanupSpillFile, McpSpillResult
buildEnv, resolveBinaryPath, probeEngineVersion,
DEFAULT_ALLOWLIST, BLOCKED_ENV_KEYS,
ResolveBinaryPathOptions, BuildEnvOptions
Transport, TransportOptions, ExitInfo
checkProtocolVersion, VersionCheckResult, VersionCheckOk,
VersionCheckFail, CheckProtocolVersionOptions
parseRunOutput, STDERR_TAIL_BYTES, SubprocessOutcome
makeApprovalHandler, ApprovalAdapter, ApprovalRequest,
ApprovalHandler
Each export is annotated @public.
Closes #5.
manojp99
pushed a commit
that referenced
this pull request
Jun 3, 2026
Wrapper hardening release closing 8 consumer-reported gaps at 0.5.0: #1 configPath surface #2 stderr NDJSON parsing #3 runChildProcess injection #4 display.onEvent dispatch #5 public re-exports #6 Transport dead code (root cause of #2/#4) #7 getEngineInfo() implementation #9 checkProtocolVersion() wired into init path #10 approval API mapped to engine -y/-n + approval.mode Issue #8 in the consumer report was a misread — InitializeParams. mcpConfigPath is intentionally retained in protocol-0.3.0. No type change needed; the schema is canonical and correct. This is a minor bump per 0.x convention even though some changes are BREAKING — the wrapper hasn't shipped a 1.0 yet, so breaking changes ride minor bumps. See CHANGELOG for the BREAKING list. Engine compatibility: requires amplifier-agent >= 0.4.0. Pinned protocol: 0.3.0.
manojp99
added a commit
that referenced
this pull request
Jun 3, 2026
…, approval, getEngineInfo, +5 more) (#36) * feat(wrapper-ts): re-export internal helpers from index.ts (#5) Adds named re-exports from the package entry point so consumers can import internal helpers without reaching into private deep paths: assembleArgv, AssembleArgvInput resolveMcpConfigPath, cleanupSpillFile, McpSpillResult buildEnv, resolveBinaryPath, probeEngineVersion, DEFAULT_ALLOWLIST, BLOCKED_ENV_KEYS, ResolveBinaryPathOptions, BuildEnvOptions Transport, TransportOptions, ExitInfo checkProtocolVersion, VersionCheckResult, VersionCheckOk, VersionCheckFail, CheckProtocolVersionOptions parseRunOutput, STDERR_TAIL_BYTES, SubprocessOutcome makeApprovalHandler, ApprovalAdapter, ApprovalRequest, ApprovalHandler Each export is annotated @public. Closes #5. * feat(wrapper-ts): wire checkProtocolVersion() into init path (#9) spawnAgent() now probes the engine's protocol version once during initialization (via amplifier-agent version --json) and runs checkProtocolVersion() against PROTOCOL_VERSION_REQUIRED_BY_WRAPPER BEFORE constructing a SessionHandle. Mismatch fails fast wrapper-side with AaaError(protocol_version_mismatch), saving a full subprocess roundtrip later. Adds two new SpawnAgentParams fields: - allowProtocolSkew?: boolean — bypass the check (mirrors engine's host_config.allowProtocolSkew) - _engineVersionProbe?: () => Promise<EngineVersionPayload> — test-only injection point for the probe Also bumps PROTOCOL_VERSION_REQUIRED_BY_WRAPPER from "0.2.0" to "0.3.0" to match the engine's current wire protocol (amplifier_agent_lib.protocol.methods.PROTOCOL_VERSION). The wrapper was shipping with a stale pin; the new check would have surfaced this at startup. Closes #9. * feat(wrapper-ts): add runChildProcess injection point (#3) Adds SpawnAgentParams.runChildProcess?: ChildProcessFactory — a public seam to substitute the subprocess factory used inside SessionHandle. When set, the wrapper invokes the factory in place of child_process.spawn, preserving the same options shape (detached, stdio, env, optional cwd). Useful for: - Sandboxing (e.g. wrapping the child in a container or namespace) - Test doubles (e.g. EventEmitter fakes that drive scripted outputs) - Harness wrapping (e.g. observing the subprocess from outside) ChildProcessFactory is exported as a @public type from index.ts. Closes #3. * feat(wrapper-ts)!: wire Transport NDJSON pipeline + dispatch to display.onEvent (#2, #4, #6) The engine emits one JSON object per line on the child subprocess's stderr stream for each wire-protocol notification (progress, result/delta, result/final, thinking/delta, thinking/final, tool/started, tool/completed, approval/request, approval/timeout, plus wire-level error). Before this change the wrapper buffered stderr as raw text and silently dropped every event — the existing Transport class implemented NDJSON parsing but was never wired anywhere (dead code). This change: - Adds parseNdjsonStream(stream, {onJson, onNonJson?}) — a standalone helper extracted from the parsing logic Transport already had. Resolves when the stream emits 'close'. Exported @public. - Wires parseNdjsonStream onto child.stderr inside SessionHandle.makeIterable(). JSON lines are parsed into 'notification' DisplayEvents and dispatched to params.display?.onEvent. Non-JSON lines (and JSON lines, for completeness) are still accumulated into stderrBuf so the stderrTail surface on parseRunOutput remains diagnostically useful. - Extends the DisplayEvent discriminated union with a new {type: 'notification', method: string, params: unknown} variant. **BREAKING**: existing exhaustive switch statements on event.type will no longer be exhaustive without a notification branch. - Threads SpawnAgentParams.display through to SessionHandle so the callback that was previously silently dropped is now actually fired (Issue #4). Closes #2, #4, #6. BREAKING CHANGE: display.onEvent callbacks are now actually invoked with wire-event notifications. Callers that registered onEvent expecting it to be a no-op may observe new event flow. The DisplayEvent union has a new 'notification' variant; exhaustive switch statements need a corresponding branch. * feat(wrapper-ts): surface --config flag via SpawnAgentParams.configPath (#1) Engine PR #27 / v0.4.0 added the --config <path> flag and the host_config layer (approval mode, MCP servers, provider defaults, allowProtocolSkew, etc.). The wrapper had no surface to forward this, so callers had to fall back to AMPLIFIER_AGENT_CONFIG in env.extra. This change: - Adds SpawnAgentParams.configPath?: string (public, @public TSDoc). - Adds AssembleArgvInput.configPath?: string. - assembleArgv emits --config <path> when configPath is set. - Threads configPath through SessionHandleParams to the per-submit argv assembly. Also drive-by adds approvalMode field to AssembleArgvInput (used by #10's commit). The argv-builder now reads input.approvalMode and emits -y / -n / nothing accordingly. Default remains -y for backward compat with callers that haven't opted into the approval API. Closes #1. * feat(wrapper-ts)!: wire approval API to engine -y/-n + approval.mode (#10) Previously, SpawnAgentParams.approval threw AaaError( approval_not_supported_in_v1) whenever set because it required the mid-turn onRequest callback that v1 doesn't support. This change extends SpawnAgentParams.approval to also accept the static-policy shape { mode: 'yes' | 'no' | 'prompt' }, which maps to engine argv: - 'yes' -> -y (auto-allow every tool call) - 'no' -> -n (auto-deny every tool call) - 'prompt' -> emit no flag; engine falls back to host_config.approval.mode or the bundle's TTY-based default. This is how a host hands policy resolution back to the engine. The legacy { onRequest, timeoutMs } form still throws approval_not_supported_in_v1 — the Mode A wire has no mid-turn channel. Mid-turn callbacks will return when WG-4 lands. Engine compatibility: { mode: 'prompt' } requires amplifier-agent >= 0.4.0 (PR #34 added host_config.approval.mode). Closes #10. BREAKING CHANGE: SpawnAgentParams.approval is now a union shape; callers passing { mode } no longer hit approval_not_supported_in_v1. Callers that defensively catch that error need to remove the try/catch when migrating to the mode shape. * feat(wrapper-ts): implement getEngineInfo() — engineVersion + bundleDigest (#7) Closes the Task-9 TODO: getEngineInfo() now returns the values captured during the engine version probe that spawnAgent() runs at init (Issue #9). Previously both fields were hardcoded empty strings. - engineVersion populated from `amplifier-agent version --json` payload's `version` field. - bundleDigest populated from the probe payload's optional `bundleDigest` field. The engine's current `version --json` output (from admin/version_info.py) only emits {version, protocolVersion} — bundleDigest will be empty string until a future engine release exposes it. Forward-compatible: when the engine adds it, the wrapper picks it up automatically with no further changes. DONE_WITH_CONCERNS for the bundleDigest follow-up: filed as an engine-side ask for a future PR. The wrapper does what it can with the data the engine surface exposes today; the contract is wired so the field will populate the moment the engine emits it. Closes #7. * chore(wrapper-ts): rebuild dist after hardening release changes Mirrors PR #29 / #31 pattern: dist/ is tracked so consumers installing from the git tarball get the compiled artifacts without a build step. Regenerated from npm run build after issues #1, #2, #3, #4, #5, #6, #7, #9, #10 landed. * chore(release): bump amplifier-agent-ts to 0.6.0 + CHANGELOG Wrapper hardening release closing 8 consumer-reported gaps at 0.5.0: #1 configPath surface #2 stderr NDJSON parsing #3 runChildProcess injection #4 display.onEvent dispatch #5 public re-exports #6 Transport dead code (root cause of #2/#4) #7 getEngineInfo() implementation #9 checkProtocolVersion() wired into init path #10 approval API mapped to engine -y/-n + approval.mode Issue #8 in the consumer report was a misread — InitializeParams. mcpConfigPath is intentionally retained in protocol-0.3.0. No type change needed; the schema is canonical and correct. This is a minor bump per 0.x convention even though some changes are BREAKING — the wrapper hasn't shipped a 1.0 yet, so breaking changes ride minor bumps. See CHANGELOG for the BREAKING list. Engine compatibility: requires amplifier-agent >= 0.4.0. Pinned protocol: 0.3.0. --------- Co-authored-by: Manoj Prabhakar Paidiparthy <mpaidiparthy@microsoft.com>
This was referenced Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2.0c of the AaA v2 wrapper-layer + wire-protocol boundary, per the locked design doc at
docs/designs/2026-05-20-aaa-v2-wrapper-and-wire.md. This phase delivers the engine-side prerequisites the TypeScript and Python wrappers (Phases 2.2 / 2.3) will sit on top of.What this PR ships:
DisplaySystem.emitis now async + single-argDisplayEvent.ApprovalSystem.requestsignature locked.agent/initializehonorssessionId/resume/cwd/providerOverride.turn/cancelremoved from the wire (cancel = SIGTERM).stdio_loop.py,defaults_stdio.py, and their tests removed (~600 LOC). Concurrency-race classes F5, F7, F11, F15, CR-4 eliminated by construction.src/amplifier_agent_lib/bundle/hook_streaming.pysubscribes to foundation kernel events (tool:pre,tool:post,content_block:start/end/delta,llm:response,tool:error) and emits the canonical 9-event wire taxonomy viactx.display.emit._runtime.pybridge. Threadsctx.displayandctx.approvalinto the foundation session as coordinator capabilities; mounts the streaming hook programmatically (matches the canonicalamplifier-app-clitrace_collectorpattern).doctor(diagnostics only),prepare(cache priming for install-time use),verify --check-hooks(hook coverage gate). Mitigates the "everything binary" anti-pattern.protocolVersionskew with high-fidelity remediation message and--allow-protocol-skewoverride.--session-id X --resumenow actually persists transcripts (swappedcontext-simple→context-persistentafter a real test revealed the previous bundle didn't replay).sessionId.Empirical validation: Confirmed against the Paperclip codebase (Codex + Claude Code both use one-shot per turn) and the NanoClaw codebase (Codex provider explicitly spawn-per-query; Claude SDK burst-style mid-turn push isn't reachable through our locked §4 API anyway). One-shot is the right v1 lifecycle.
Three post-execution bug fixes included in this PR:
b14181c— code-quality-reviewer findings (stdout discipline + import style)da9889e— streaming hooksource: localhad no foundation URI handler; switched to programmatic mount227dd42—verify --check-hookswas inspecting the stale mount_plan; now checks module shape onlyTest Plan
uv run python -m pytest tests/ -q)All checks passed!0 errors, 0 warnings, 0 informationsamplifier-agent verify --check-hooksexits 0 with[ OK ] hook coverage passesamplifier-agent run "Say hi in three words." --verboseproduces wire events ([result/delta],[result/final],[usage]) with noFailed to lazy-activateerrorstool/startedevents for child sessionsOut of scope (separate PRs)
protocol/_gen.py+ generatedspec.md+ JSON schemas + shared YAML wire-sequence fixtures + parity lintamplifier-agent-client-tsamplifier-agent-client-pyReferences
docs/designs/2026-05-20-aaa-v2-wrapper-and-wire.mddocs/plans/2026-05-20-phase-2-0c-engine-gap-fixes.mddocs/designs/2026-05-19-baked-in-bundle-{decision,revisit}.mdGenerated with Amplifier