Skip to content

feat: Week 9 — 0.1.0 release readiness#14

Merged
ThomasK33 merged 20 commits into
mainfrom
agent-terminal-v12b
Mar 27, 2026
Merged

feat: Week 9 — 0.1.0 release readiness#14
ThomasK33 merged 20 commits into
mainfrom
agent-terminal-v12b

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Week 9 — 0.1.0 release readiness

This PR implements the full Week 9 milestone, making the repository ready for a 0.1.0 release focused on reliable isolated terminal automation, TUI dogfooding, and reviewer-verifiable proof artifacts.

What changed

Lane 1 — Renderer isolation fix (src/renderer/browserPath.ts)

  • Explicitly resolves PLAYWRIGHT_BROWSERS_PATH from the original host HOME before any Playwright calls
  • Screenshots and WebM export now work in isolated sessions without manual env hacks
  • 4 unit tests (Linux, macOS, Windows fallback, env precedence)

Lane 2 — run command (src/cli/commands/run.ts, src/host/hostMain.ts, protocol)

  • New first-class run <session-id> [command] command for robust in-session command execution
  • Paste-mode injection + split-marker printf completion detection via waitForRender
  • --timeout, --no-wait, --file, --json flags
  • Full protocol plumbing: RunParamsSchema, RunResultSchema, input_run event type, replay support
  • 11 unit + 10 integration tests + golden envelope lock

Lane 3 — Doctor isolation diagnostics (src/cli/commands/doctor.ts)

  • home_isolation check: reports whether the environment is isolated
  • browser_cache_accessible check: verifies Playwright browser cache before attempting launch
  • Updated golden-envelope tests (9 env checks, 5 renderer checks)

Lane 4 — Week 9 proof bundle (dogfood/20260326-week9-release-readiness/)

  • End-to-end scenario: doctor → create → run × 2 → wait → screenshot → snapshot → asciicast → WebM → inspect → destroy
  • Screenshot, WebM recording, asciicast, JSON outputs, event-log evidence

Lane 5 — Release docs

  • RELEASE.md: explicit 0.1.0 contract (delivers, non-goals, known limitations)
  • README.md: new TUI Workflow, Isolation, and Run Command sections
  • Design doc updated for Week 9 completion status

Review hardening (found and fixed during deep review + dogfood)

  • P0 fix: run marker detected in terminal input echo — fixed with split-marker printf + explicit Enter after paste
  • P2 fixes: RunResult type dedup, marker quoting, timeout validation, schema superRefine, session-exit detection during wait
  • P3 fixes: JSDoc for browserPath.ts, doctor message disambiguation, macOS/Windows test coverage, doc references

Behavior changes

  • New CLI command: agent-terminal run <session-id> [command]
  • doctor --json now includes home_isolation and browser_cache_accessible checks
  • Renderer automatically resolves Playwright browser cache from the original host HOME
  • No breaking changes to existing commands

Validation

npm run typecheck   ✅
npm run lint        ✅ (0 warnings)
npm run format:check ✅
npm run build       ✅
npm test            ✅ 602 tests, 56 files — all pass

Hands-on dogfood verified: sleep 3 properly waits ~3.2s, multiline commands, --no-wait, timeout validation, error envelopes, screenshots, exports.

Known limitation (documented, acceptable for 0.1.0)

Commands with unmatched single quotes cause the printf marker line to be consumed by bash's quote continuation, leading to timeout instead of completion. This is a pre-existing limitation of the paste-injection design (properly quoted commands work fine). Documented as post-0.1.0 hardening target.

Proof artifacts

  • dogfood/20260326-week9-release-readiness/ — full release-readiness bundle
  • dogfood/run-command/ — dedicated run command proof
  • Screenshots and .webm recordings included in both bundles

📋 Implementation Plan

Week 9 / pre-0.1.0 release-readiness implementation plan

Objective

Execute the Week 9 release-readiness milestone for agent-terminal so the repository is ready to ship a credible 0.1.0 focused on reliable terminal automation, TUI dogfooding, and reviewer-verifiable proof artifacts.

This plan turns the Week 9 design doc into an execution plan for a small team of agents working in parallel with minimal merge-conflict risk.

Source context and verified inputs

This plan is based on the currently checked-in design/docs and repo structure, especially:

  • design/20260319_agent-terminal-v1/17-week-9-plan.md
  • design/20260319_agent-terminal-v1.md
  • src/cli/main.ts
  • src/cli/commands/doctor.ts
  • src/cli/commands/*.ts
  • src/host/hostMain.ts
  • src/host/renderer.ts
  • src/renderer/ghosttyWeb/backend.ts
  • src/protocol/schemas.ts
  • src/protocol/messages.ts
  • dogfood/
  • test/unit/, test/integration/, and test/e2e/

Repo investigation also identified the cleanest execution lanes as:

  1. isolated renderer/bootstrap reliability,
  2. a new high-level in-session command primitive,
  3. TUI-focused doctor / diagnostics work,
  4. release-grade Neovim/LazyVim/Claude dogfood proof,
  5. release docs / contract closeout.

Release thesis

The 0.1.0 bar is not “solve all future TUI automation.”

The 0.1.0 bar is:

  • isolated sessions work reliably,
  • renderer-backed inspection and artifacts work in the recommended isolated workflow,
  • users/agents can robustly prepare real TUI environments without fragile multiline typing hacks,
  • doctor and docs explain the environment clearly,
  • and at least one realistic TUI scenario is preserved as reviewer-facing proof.

Anything beyond that — native backends, mouse input, remote sessions, MCP wrapping, full semantic TUI automation — remains post-0.1.0 scope.

Team shape and lane split

Use 5 agent lanes plus one integration/release lane. If the team is smaller, merge lanes in this order:

  • merge Lane 5 into Lane 3,
  • then merge Lane 4 into the integration lane,
  • but keep Lane 1 and Lane 2 separate because they touch the riskiest implementation areas.

Lane 1 — Renderer isolation and bootstrap reliability

Mission: Make screenshot / snapshot / WebM flows work in isolated session environments without manual environment surgery.

Primary files/subsystems:

  • src/renderer/ghosttyWeb/backend.ts
  • src/host/renderer.ts
  • src/host/hostMain.ts
  • src/renderer/capabilities.ts
  • src/renderer/profiles.ts
  • renderer-focused tests under test/unit/renderer/, test/integration/, and test/e2e/

Expected outputs:

  • implementation changes for browser/bootstrap path handling under isolated HOME,
  • tests proving screenshots and WebM export from isolated sessions,
  • structured failure reporting when prerequisites are missing,
  • docs notes for any intentionally retained assumptions.

Lane 2 — High-level in-session command primitive

Mission: Add a first-class command for robust shell/script execution inside a running session.

Primary files/subsystems:

  • src/cli/main.ts
  • new src/cli/commands/run.ts or exec.ts
  • src/host/hostMain.ts
  • src/protocol/schemas.ts
  • src/protocol/messages.ts
  • relevant command tests in test/unit/, test/integration/, optionally test/e2e/

Expected outputs:

  • a stable CLI surface for in-session command execution,
  • JSON envelope/schema/test coverage,
  • implementation that composes with existing wait/snapshot/type behavior,
  • docs describing when to use this new primitive instead of type / paste / send-keys.

Lane 3 — TUI diagnostics and doctor improvements

Mission: Make doctor and related diagnostics explain renderer/TUI readiness clearly, especially under isolation.

Primary files/subsystems:

  • src/cli/commands/doctor.ts
  • src/protocol/schemas.ts
  • src/protocol/messages.ts
  • storage/session helpers under src/storage/ if needed
  • tests for doctor and diagnostics

Expected outputs:

  • richer doctor --json fields for renderer/bootstrap/TUI readiness,
  • clearer structured diagnostics for isolated-session cases,
  • stable tests/golden coverage for the new fields,
  • updated docs for interpreting doctor results.

Lane 4 — Release-grade Neovim/LazyVim/Claude proof bundle

Mission: Capture the Week 9 proof bar as a reviewer-facing dogfood bundle.

Primary files/subsystems:

  • dogfood/
  • optional fixture/test helpers if the scenario is partly automated
  • review bundle generation/validation scripts already in repo

Expected outputs:

  • a dedicated dogfood bundle for the isolated Neovim/LazyVim/Claude scenario,
  • milestone screenshots,
  • at least one .webm recording,
  • generated review page,
  • short notes summarizing expected vs observed behavior,
  • validation output from bundle tooling.

Lane 5 — Release docs and 0.1.0 contract statement

Mission: Describe the 0.1.0 bar clearly and keep the docs synchronized with the Week 9 implementation.

Primary files/subsystems:

  • design/20260319_agent-terminal-v1.md
  • design/20260319_agent-terminal-v1/17-week-9-plan.md
  • follow-up status doc if Week 9 completes during execution
  • README.md
  • any release/checklist/changelog doc the repo uses

Expected outputs:

  • updated release-readiness narrative,
  • explicit 0.1.0 contract/limitations text,
  • docs for the new in-session command primitive,
  • docs for TUI workflows and proof expectations,
  • links to Week 9 proof bundles and validation commands.

Integration lane — Merge, validate, and release readiness

Mission: Coordinate sequencing, merge work, run full validation, and determine whether the repo satisfies the 0.1.0 bar.

Primary responsibilities:

  • resolve design ambiguities before parallel coding starts,
  • keep lane boundaries clean,
  • run npm run verify / mise run ci plus targeted tests,
  • run dogfood bundle validation and review generation,
  • ensure docs and proof are in sync with code,
  • produce a final release-go/no-go checklist result.

Execution order and dependency graph

Use this sequence:

Phase 0 — Alignment and design decisions

Complete before substantial coding begins.

Required decisions

  1. Command primitive architecture

    • Decide whether the new command primitive is named run, exec, or another name.
    • Decide whether its implementation composes existing session primitives or introduces a more direct RPC execution path.
    • Decide the JSON contract: accepted/queued only vs boundary markers vs captured output summary.
  2. Renderer/bootstrap environment contract

    • Decide how browser assets are found under isolated HOME.
    • Prefer an implementation that works automatically over one that requires end users to set PLAYWRIGHT_BROWSERS_PATH manually.
  3. Doctor/TUI readiness surface

    • Decide which checks belong in doctor --json versus docs-only guidance.
    • Decide whether glyph/Nerd Font checks are explicit diagnostics, separate validation, or dogfood-only proof.
  4. Release-grade dogfood scope

    • Decide whether the Neovim/LazyVim scenario remains dogfood-only or partially graduates into automated tests.
    • For 0.1.0, dogfood-only proof is acceptable if it is reproducible and reviewable.

Phase 0 deliverable

A short recorded decision log in the implementation PR or follow-up docs, covering the four decisions above.

Phase 1 — Foundation implementation in parallel

Start these lanes in parallel after Phase 0 decisions are locked.

  • Lane 1: renderer/bootstrap reliability
  • Lane 2: command primitive
  • Lane 3: doctor improvements
  • Lane 5: docs skeletons and placeholders aligned to the planned code

Lane 4 should prepare the dogfood harness and scripts during this phase, but should not finalize proof artifacts until Lanes 1–3 have landed enough stability.

Phase 2 — Proof and integration

After Lanes 1–3 are code-complete and green on lane-local validation:

  • merge or rebase them into one integration branch,
  • rerun shared tests,
  • complete Lane 4 dogfood capture using the near-final behavior,
  • finalize Lane 5 docs based on the actual shipped surfaces,
  • generate bundle review pages and validator outputs.

Phase 3 — Release readiness decision

Only after code, docs, and proof are all present:

  • run full repo validation,
  • review the release checklist,
  • classify any remaining issues as blocker or explicit post-0.1.0 scope,
  • and only then decide whether the repo is ready to tag 0.1.0.

Detailed workstream plan

Workstream A — isolated renderer/bootstrap reliability

Goals

  • make screenshots work in isolated sessions without manual browser-cache path hacks,
  • make WebM export follow the same environment story,
  • preserve structured diagnostics when bootstrap still fails,
  • and test-lock the intended environment contract.

Concrete tasks

  1. Audit how the renderer resolves browser binaries, fonts, and related assets when session HOME / XDG dirs differ from the host environment.
  2. Trace the code path from CLI command -> host RPC -> renderer manager -> Ghostty/Playwright backend.
  3. Choose and implement the canonical strategy for browser asset resolution under isolation.
  4. Ensure the same strategy applies to both screenshot and record export --format webm.
  5. Add regression tests that create:
    • isolated AGENT_TERMINAL_HOME,
    • isolated session HOME,
    • isolated XDG dirs,
    • and still produce a screenshot successfully.
  6. Add at least one failing-path test proving structured diagnostics when renderer/bootstrap prerequisites are absent.
  7. Update docs or inline comments where environment propagation is intentionally subtle.

Acceptance criteria

  • a clean isolated session can produce a screenshot,
  • the same environment can export WebM when otherwise supported,
  • the behavior is test-covered,
  • and the implementation no longer relies on undocumented ambient machine state.

Risks to manage

  • accidentally hard-coding one machine’s browser cache layout,
  • making screenshot and WebM follow different environment rules,
  • fixing runtime behavior without reflecting it in doctor or docs.

Workstream B — first-class in-session command primitive

Goals

  • give users/agents one robust way to run setup commands inside a session,
  • eliminate here-doc / long-keystroke fragility for common setup flows,
  • and define a clear public contract for that workflow.

Concrete tasks

  1. Name the command and define the user story:
    • inline one-liner,
    • multiline script from file,
    • optional stdin if needed,
    • clear JSON output.
  2. Decide the minimum viable semantics for 0.1.0:
    • accepted/queued confirmation,
    • optional completion boundary,
    • output capture expectations,
    • timeout behavior.
  3. Wire the command into src/cli/main.ts.
  4. Implement the command handler in src/cli/commands/.
  5. Add or extend host RPC support in src/host/hostMain.ts.
  6. Update protocol schemas/messages so the surface is explicit and testable.
  7. Add unit tests for CLI parsing and envelopes.
  8. Add integration tests for:
    • successful one-line command,
    • multiline script,
    • non-zero exit or failure case,
    • timeout behavior,
    • shell-state-sensitive flows where relevant.
  9. Add at least one dogfood scenario that uses the new primitive instead of low-level typing for environment setup.
  10. Document when to use the new primitive versus type, paste, and send-keys.

Acceptance criteria

  • routine shell setup can be done through the new primitive reliably,
  • the JSON contract is schema-backed and test-covered,
  • docs/examples adopt the new primitive where appropriate,
  • and the command reduces the need for brittle keystroke simulation in dogfood scripts.

Risks to manage

  • over-designing the primitive into a full remote execution subsystem,
  • promising stderr separation or shell semantics the PTY model cannot actually guarantee,
  • introducing a public contract that is too ambiguous to stabilize for 0.1.0.

Workstream C — TUI-focused diagnostics and doctor

Goals

  • make renderer/TUI readiness understandable before users hit failures,
  • explain isolated-session caveats clearly,
  • and reduce trial-and-error when preparing real TUI workflows.

Concrete tasks

  1. Review current doctor --json coverage and identify which Week 9 pain points are not surfaced yet.
  2. Add fields or checks for:
    • renderer/bootstrap readiness,
    • isolated-environment implications,
    • browser asset availability,
    • any other prerequisites directly relevant to the supported TUI workflow.
  3. Decide whether glyph/Nerd Font confidence belongs in doctor, a helper, or the proof bundle only.
  4. Add tests for any newly emitted doctor fields.
  5. Update docs to explain how to interpret the diagnostics and what to do when a check fails.
  6. Ensure wording stays aligned with capability reporting and Week 9 docs.

Acceptance criteria

  • doctor --json makes it obvious whether the environment is expected to support renderer-backed TUI workflows,
  • the new fields are stable and tested,
  • and the docs explain the difference between unavailable prerequisites, failures, and acceptable 0.1.0 limitations.

Risks to manage

  • making doctor noisy without making it more actionable,
  • adding checks that are too machine-specific to be reliable,
  • drifting terminology between doctor, version, inspect, and the design docs.

Workstream D — release-grade Neovim/LazyVim/Claude proof bundle

Goals

  • preserve a realistic TUI workflow as the concrete 0.1.0 proof bar,
  • demonstrate isolated setup, glyph rendering, and keymap-driven interaction,
  • and leave behind reviewer-verifiable evidence.

Concrete tasks

  1. Create a new Week 9 dogfood bundle directory with the repo’s standard proof layout.
  2. Script the scenario using the preferred command surface after Workstream B lands.
  3. The scenario must show, at minimum:
    • isolated session creation,
    • setup of a Neovim/LazyVim config,
    • Claude CLI installation or verification,
    • compatible Neovim runtime if required by the scenario,
    • LazyVim launch,
    • a glyph-heavy screen proving Nerd Font rendering,
    • a keybind-driven Claude pane launch,
    • screenshots after the key milestones,
    • and at least one .webm recording of the workflow.
  4. Capture JSON outputs for the commands involved.
  5. Run bundle validation and generate the review page.
  6. Write short notes describing what was proven and any acceptable residual rough edges.

Acceptance criteria

  • a reviewer can inspect the bundle and understand the workflow without reproducing it locally,
  • the proof includes screenshots and video,
  • the bundle validates with repo tooling,
  • and the workflow matches the documented 0.1.0 TUI success case.

Risks to manage

  • capturing proof before the implementation stabilizes,
  • relying on manual local steps that are not written down in commands/notes,
  • allowing the bundle to become a one-off artifact instead of a reproducible scenario.

Workstream E — release docs and 0.1.0 contract statement

Goals

  • make the 0.1.0 promise explicit,
  • define what is reliable today and what is intentionally deferred,
  • and keep design/docs synchronized with the actual implementation.

Concrete tasks

  1. Update the main design index/status doc to position Week 9 as the 0.1.0 closeout milestone.
  2. Update README or equivalent user-facing docs to explain:
    • the supported TUI workflow,
    • the reference renderer model,
    • the new in-session command primitive,
    • the role of type / paste / send-keys,
    • known acceptable limitations.
  3. Add or refresh a concise release-readiness checklist for maintainers.
  4. Link to the Week 9 proof bundle(s) and validation commands.
  5. Ensure docs do not imply capabilities that are still future-scope.
  6. If a release/changelog doc exists or is appropriate, prepare the 0.1.0 summary there too.

Acceptance criteria

  • one clear doc set explains the 0.1.0 bar,
  • users can tell what is stable and what is intentionally out of scope,
  • and the docs are synchronized with the implementation that actually lands.

Risks to manage

  • documenting an aspirational workflow rather than the shipped one,
  • leaving the new command or diagnostics undocumented,
  • failing to distinguish 0.1.0 blockers from post-release enhancements.

Cross-lane merge strategy

Low-conflict ownership map

  • Lane 1 owns src/renderer/* and renderer-focused tests.
  • Lane 2 owns the new command file, protocol additions for it, and src/cli/main.ts registration.
  • Lane 3 owns doctor.ts and doctor-specific protocol/test changes.
  • Lane 4 owns dogfood/ artifacts and proof notes.
  • Lane 5 owns docs.
  • Integration lane handles any unavoidable shared-file conflicts in:
    • src/protocol/schemas.ts
    • src/protocol/messages.ts
    • src/cli/main.ts
    • README/design index docs.

Merge order

  1. Merge Lane 1 and Lane 2 first once they are independently green.
  2. Merge Lane 3 next once diagnostics can reflect Lane 1 behavior.
  3. Rebase Lane 4 on the integrated behavior and capture final proof.
  4. Finalize Lane 5 after the proof and command/doctor surfaces are stable.
  5. Run integration validation only after all five lanes are present.

Validation and quality gates

Every lane must pass its own focused validation before merge.

Lane-local validation template

  • relevant unit tests,
  • relevant integration/e2e tests,
  • npm run format:check,
  • npm run lint,
  • npm run typecheck,
  • npm run build if the touched area warrants it,
  • any dogfood or bundle validation required for that lane.

Full-integration validation

Before declaring Week 9 complete:

mise run ci

If mise is unavailable:

npm run verify

Also run the targeted proof commands for Week 9, including:

npm run validate-bundle -- <week9-bundle-dir>
npm run review-bundle -- <week9-bundle-dir>

If renderer-heavy proof is involved, also run the most relevant screenshot/WebM and renderer integration tests explicitly, even if they are already indirectly covered.

Dogfooding requirements

Week 9 must preserve the repo’s proof-first standard.

Required proof for implementation-changing work

For any change touching renderer/bootstrap behavior, session command execution, or TUI diagnostics, capture:

  • machine-readable JSON outputs,
  • screenshots,
  • generated review output,
  • and .webm recordings where the scenario is interaction-heavy.

Required Week 9 milestone proof

At a minimum, the final Week 9 deliverables should include proof for:

  1. isolated screenshot/render success,
  2. successful use of the new in-session command primitive,
  3. a TUI diagnostics/doctor scenario,
  4. a Neovim/LazyVim/Claude workflow with Nerd Font rendering and keybind activation.

Screenshot/video bar

For the release-grade Neovim scenario, capture all of the following:

  • setup screenshot,
  • screenshot showing compatible Neovim and/or environment readiness,
  • screenshot showing glyph-heavy LazyVim UI,
  • screenshot showing Claude launched from the keybind,
  • one .webm recording spanning the meaningful interaction flow.

Blockers vs non-blockers

Blockers for 0.1.0

Treat these as must-fix unless the team explicitly reclassifies them in docs:

  • isolated-session screenshots still require undocumented manual env hacks,
  • no robust high-level command surface exists for in-session setup,
  • doctor still leaves the main renderer/TUI prerequisites ambiguous,
  • no release-grade proof bundle exists for a realistic TUI workflow,
  • docs still over-promise relative to actual behavior.

Non-blockers for 0.1.0

Do not expand scope into these unless they naturally fall out of the work:

  • native renderer backends,
  • mouse input,
  • remote/network sessions,
  • MCP wrapping,
  • full semantic TUI automation,
  • cross-terminal pixel parity,
  • deep editor-specific product features beyond proof and setup.

Deliverables checklist for the integration lane

By the end of execution, the integration lane should be able to point to:

  • code changes implementing Workstreams A–C,
  • tests covering the new behavior,
  • at least one validated Week 9 proof bundle,
  • updated docs covering Workstream E,
  • a concise release-readiness checklist result,
  • and a clear statement of any remaining post-0.1.0 scope.

Suggested kickoff brief for the team

Use this as the implementation kickoff summary:

Week 9 is the 0.1.0 closeout milestone. Focus on reliability, ergonomics, and proof. Do not broaden scope into new feature families. Make isolated renderer-backed TUI workflows work cleanly, add a robust in-session command primitive, improve doctor for TUI readiness, capture a release-grade Neovim/LazyVim/Claude bundle, and update docs so the shipped 0.1.0 promise is explicit and honest.

Final go/no-go checklist

The repo is ready for 0.1.0 only if the answer to all of these is “yes”:

  • Can a clean isolated session produce screenshots without undocumented env workarounds?
  • Can the recommended command surface run real setup scripts robustly inside a session?
  • Does doctor explain renderer/TUI readiness clearly enough for a first-time user?
  • Is there a reviewer-ready Neovim/LazyVim/Claude proof bundle with screenshots and video?
  • Do docs explain both the strengths and acceptable limits of the release?
  • Are remaining major asks clearly post-0.1.0 work rather than hidden blockers?

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh

@ThomasK33 ThomasK33 merged commit 974a558 into main Mar 27, 2026
2 checks passed
@ThomasK33 ThomasK33 deleted the agent-terminal-v12b branch March 27, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant