Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c47e485
Add week 2 design plan
ThomasK33 Mar 20, 2026
9a08517
Add renderer contracts and profiles
ThomasK33 Mar 20, 2026
325b397
Add renderer replay protocol schemas
ThomasK33 Mar 20, 2026
fba226f
Add host renderer manager
ThomasK33 Mar 20, 2026
86d53b2
Add GhosttyWeb renderer backend
ThomasK33 Mar 20, 2026
338de22
Wire host snapshot and screenshot RPC handlers
ThomasK33 Mar 20, 2026
d1c16e4
Add renderer integration tests
ThomasK33 Mar 20, 2026
b4f72fe
Add snapshot and screenshot CLI commands
ThomasK33 Mar 20, 2026
1310b4d
Add waitForRender host handler
ThomasK33 Mar 20, 2026
80a4772
Add render-backed wait CLI modes
ThomasK33 Mar 20, 2026
383c222
test: add wait render integration coverage
ThomasK33 Mar 20, 2026
4d74252
Increase wait-render hook timeouts
ThomasK33 Mar 20, 2026
2c443f8
Track snapshot and screenshot artifacts
ThomasK33 Mar 20, 2026
10f1fbd
Expand doctor renderer smoke checks
ThomasK33 Mar 20, 2026
d8215f4
Fix doctor Playwright screenshot type
ThomasK33 Mar 20, 2026
3c10f64
test: add renderer slice e2e coverage
ThomasK33 Mar 20, 2026
5b569cd
Add renderer proof bundle and CI smoke coverage
ThomasK33 Mar 20, 2026
8aa9689
Format Week 2 files with Prettier
ThomasK33 Mar 20, 2026
8cadd23
Unify renderer replay and screenshot types
ThomasK33 Mar 20, 2026
422f69e
Add in-memory EventLog buffer
ThomasK33 Mar 20, 2026
8de54b8
Add replay event log size guard
ThomasK33 Mar 20, 2026
d021263
Batch Ghostty replay output writes
ThomasK33 Mar 20, 2026
277ddef
Use buffered events for replay input
ThomasK33 Mar 20, 2026
00b6ecb
Format test files with Prettier
ThomasK33 Mar 20, 2026
a5ed582
Harden renderer wait and screenshot safety
ThomasK33 Mar 20, 2026
54ef581
test: cover wait validation errors
ThomasK33 Mar 20, 2026
bc8bfc3
Fix lint regressions from review patches
ThomasK33 Mar 20, 2026
b24b813
Harden wait regex nested quantifier checks
ThomasK33 Mar 20, 2026
56e122d
Harden event log and renderer guards
ThomasK33 Mar 20, 2026
885420d
Add renderer safety regression tests
ThomasK33 Mar 20, 2026
e1918e3
Validate CLI RPC responses
ThomasK33 Mar 20, 2026
3ca0a74
Add event log and render polling guards
ThomasK33 Mar 20, 2026
77d1423
test: cover combined wait and concurrent renderer ops
ThomasK33 Mar 20, 2026
a933f55
Fix lint matcher typing in command tests
ThomasK33 Mar 20, 2026
794a04d
Harden inspect validation and add regression tests
ThomasK33 Mar 21, 2026
7a1bde7
Update Week 2 design docs with shipped status
ThomasK33 Mar 21, 2026
a76c07b
Refresh design docs and add post-hardening dogfood bundle
ThomasK33 Mar 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ permissions:
jobs:
quality-gates:
runs-on: ubuntu-latest
timeout-minutes: 15
timeout-minutes: 20
steps:
- name: Check out repository
uses: actions/checkout@v6
Expand All @@ -33,6 +33,9 @@ jobs:
- name: Install CI dependencies
run: mise run bootstrap-ci

- name: Install Playwright Chromium
run: npx playwright install chromium

- name: Check formatting
run: mise run format-check

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ Node/TypeScript CLI scaffold.

- GitHub Actions uses `mise` as the canonical entrypoint for tool setup and quality gates.
- The committed workflow in `.github/workflows/ci.yml` is hand-curated. `mise generate github-action` is useful as a scaffold, but the checked-in file is the maintained source of truth because it includes repo-specific triggers, bootstrap behavior, and step-level logs.
- CI uses `mise run bootstrap-ci` so pull requests get deterministic installs via `npm ci` without the extra Chromium download used by the local `bootstrap` task.
- CI uses `mise run bootstrap-ci` for deterministic `npm ci` installs, then explicitly runs `npx playwright install chromium` so renderer smoke coverage is exercised on GitHub Actions.
- For v1, CI intentionally follows the major-version tool pins declared in `mise.toml` (`node = "24"`, `python = "3"`). This repo does not commit a `mise.lock` yet.
29 changes: 29 additions & 0 deletions WEEK2-GAPS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Week 2 remaining gaps

The Week 2 renderer-backed inspection slice is complete, but the following work is still intentionally out of scope or not yet delivered:

## Export and packaging

- **Asciicast export** is not implemented yet.
- **WebM video export** is not implemented yet.
- **MCP wrapper** is not implemented yet.

## Renderer backends and platform coverage

- **Native renderer adapters** are not implemented yet; the current slice is centered on the reference `ghostty-web` path.
- **Cross-platform rendering parity** is not guaranteed yet.

## Input and topology

- **Mouse input support** is not implemented yet.
- **Remote/network sessions** are not implemented yet.

## Fidelity and determinism

- **Screenshot pixel-perfect determinism** is not guaranteed; font rendering can still vary by environment.
- **Scrollback in snapshots** is not implemented; snapshots currently report the visible viewport only.
- **Cursor blink animation in screenshots** is not captured; screenshots represent a static frame.

## Security & Isolation

- **Renderer CSP trade-off** currently allows `unsafe-inline`/`unsafe-eval` for the ghostty-web harness because the localhost-only loopback renderer still needs inline bootstrap code and WASM eval support in current browsers.
22 changes: 18 additions & 4 deletions design/20260319_agent-terminal-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ It is designed to let an agent:

This design intentionally describes a **general product**, not a Mux-specific implementation. A future Mux integration should consume `agent-terminal` as an external CLI/runtime rather than baking Mux-specific assumptions into the design.

## Current shipped status (2026-03-21)

The repository now ships the first renderer-backed vertical slice of this design:

- long-lived session hosts,
- PTY control and append-only event logs,
- renderer-backed `snapshot` and `wait`,
- deterministic `screenshot`,
- artifact manifests,
- and proof bundles under `dogfood/`.

Replay export artifacts such as asciicast and video remain part of the design direction, but they are still future work rather than shipped functionality.

## Executive summary

The recommended v1 shape is:
Expand Down Expand Up @@ -165,10 +178,10 @@ V1 is successful when an AI agent can:
4. wait until the screen reaches a target state,
5. fetch a semantic snapshot of the screen,
6. capture a PNG screenshot,
7. export an asciicast,
8. export a replay video,
9. destroy the session,
10. and leave behind an artifact bundle that a human reviewer can inspect.
7. destroy the session,
8. and leave behind an artifact bundle that a human reviewer can inspect.

Asciicast and replay-video export remain intended follow-on capabilities rather than current success criteria for the shipped slice.

## Deliverables in this design set

Expand All @@ -180,6 +193,7 @@ This design file is the entry point. Detailed supporting docs live in `design/20
- [04-implementation-plan.md](./20260319_agent-terminal-v1/04-implementation-plan.md)
- [05-dogfooding-and-validation.md](./20260319_agent-terminal-v1/05-dogfooding-and-validation.md)
- [06-roadmap-and-week-1-plan.md](./20260319_agent-terminal-v1/06-roadmap-and-week-1-plan.md)
- [07-week-2-plan.md](./20260319_agent-terminal-v1/07-week-2-plan.md)

## High-level architecture

Expand Down
72 changes: 63 additions & 9 deletions design/20260319_agent-terminal-v1/03-rendering-and-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,25 @@ V1 should support four artifact classes.
| ----------------- | ---------------------------------------------------- | -------------- |
| Semantic snapshot | Structured screen state for reasoning and assertions | Yes |
| Screenshot PNG | Visual verification of layout, color, and wrapping | Yes |
| Asciicast | Portable terminal replay artifact | Yes |
| Replay video | Reviewer-friendly visual playback | Yes |
| Asciicast | Portable terminal replay artifact | Not yet shipped |
| Replay video | Reviewer-friendly visual playback | Not yet shipped |

## Current implementation status (2026-03-21)

The current Week 2 implementation ships the first two artifact classes from this design:

- semantic snapshots,
- and screenshot PNGs.

It does **not** yet ship asciicast export or replay video export; those remain deferred and are tracked in `WEEK2-GAPS.md`.

The current renderer path is:

- host-prepared replay input,
- lazy `ghostty-web` boot in headless Chromium,
- viewport-scoped semantic extraction,
- deterministic screenshot capture,
- and manifest-backed artifact storage under `artifacts/`.

## 4. Canonical replay model

Expand All @@ -50,13 +67,7 @@ Everything visual should be reproducible from:
### 4.1 Replay input

```ts
export interface ReplayInput {
sessionId: string;
events: ReplayEvent[];
rows: number;
cols: number;
renderProfile: ResolvedRenderProfile;
}
const replayInput = ReplayInputSchema.parse(rawReplayInput);
```

### 4.2 Replay rules
Expand Down Expand Up @@ -112,6 +123,20 @@ export interface RenderProfile {
}
```

### 5.2.1 Current Week 2 profile shape

The shipped Week 2 profile shape is intentionally smaller than the fully elaborated interface below. Today it pins:

- profile name,
- light/dark theme mode,
- font family,
- font size,
- cursor style,
- foreground color,
- and background color.

That smaller shape was enough to make screenshot output stable for the reference renderer while leaving room to add richer font/padding/palette metadata later.

### 5.3 Determinism rules

To keep screenshots reproducible, v1 should:
Expand Down Expand Up @@ -282,6 +307,21 @@ For agent reasoning speed, `snapshot --format text` should return only:

That avoids forcing every reasoning step to parse full cell objects.

### 9.4 Current Week 2 snapshot scope

The shipped Week 2 snapshot shape is intentionally viewport-scoped.

It currently records:

- session ID,
- capture sequence,
- rows/cols,
- cursor row/col,
- alt-screen state,
- and visible lines.

It does not yet include per-cell styling or scrollback export. Those remain good future extensions, but the lighter snapshot is already sufficient for agent reasoning and renderer-backed waits.

## 10. Asciicast export

### 10.1 Why asciicast is mandatory
Expand Down Expand Up @@ -371,6 +411,20 @@ export interface ArtifactEntry {
- artifacts missing from disk are flagged during `inspect` and `doctor`,
- manifests never point at temp files.

### 12.3 Current Week 2 manifest and layout

The shipped Week 2 implementation currently writes artifacts under:

```text
artifacts/
manifest.json
snapshot-<seq>-structured.json
snapshot-<seq>-text.json
screenshot-<seq>-<profile>.png
```

That is simpler than the broader naming scheme below, but it already preserves the two most important debugging dimensions: capture sequence and render profile.

## 13. Future native renderer adapter contract

The reference renderer should not lock out native backends.
Expand Down
20 changes: 20 additions & 0 deletions design/20260319_agent-terminal-v1/05-dogfooding-and-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@ It is intentionally prescriptive.

A follow-up AI coding agent should treat this file as the minimum review protocol, not optional guidance.

## Current shipped state (2026-03-21)

This document still describes the *target* dogfooding protocol, but the current shipped product only supports a subset of the artifact expectations below.

Shipped today:

- JSON command outputs,
- semantic snapshots,
- PNG screenshots,
- artifact manifests,
- and notes / proof bundles under `dogfood/`.

Not yet shipped:

- `.cast` export,
- replay video export,
- and some of the richer fixture scenarios listed below.

Read the remainder of this file as the broader validation target, not a claim that every artifact class is already implemented.

## 1. Dogfooding goals

Dogfooding must prove that an agent can:
Expand Down
41 changes: 29 additions & 12 deletions design/20260319_agent-terminal-v1/06-roadmap-and-week-1-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,23 @@ It is intentionally biased toward:
- proof-heavy validation,
- and getting to a usable dogfood loop early.

## Status update (2026-03-21)

Week 1 is complete and has been superseded by a shipped Week 2 renderer-backed slice.

What shipped from the Week 1 plan:

- real session creation, inspection, listing, and teardown,
- a background host process per session,
- PTY spawn and output capture,
- input, paste, key, resize, and signal control,
- append-only event logging,
- `wait --exit` and `wait --idle-ms`,
- deterministic fixture coverage,
- and terminal-only proof bundles.

Week 2 then added renderer-backed snapshots, waits, screenshots, artifact manifests, and browser smoke checks. The Week 1 plan below is preserved as the original execution record, but its outcome and sign-off checklists should now be read as **completed history** rather than future work.

## 1. Current baseline in this repository

As of this draft, the repository already contains a narrow Phase 0 scaffold:
Expand Down Expand Up @@ -213,15 +230,15 @@ A coding agent working from this section should treat every unchecked item below

### Week 1 outcome checklist

- [ ] Real session creation and teardown exist.
- [ ] A background host process exists and is used for sessions.
- [ ] PTY spawn and output capture work.
- [ ] `create`, `list`, `inspect`, and `destroy` are implemented.
- [ ] `type`, `paste`, `send-keys`, `resize`, and `signal` are implemented.
- [ ] Append-only event logging exists.
- [ ] `wait --exit` and `wait --idle-ms` are implemented.
- [ ] One or two deterministic fixture apps exist.
- [ ] A terminal-only proof bundle shows that the control plane works.
- [x] Real session creation and teardown exist.
- [x] A background host process exists and is used for sessions.
- [x] PTY spawn and output capture work.
- [x] `create`, `list`, `inspect`, and `destroy` are implemented.
- [x] `type`, `paste`, `send-keys`, `resize`, and `signal` are implemented.
- [x] Append-only event logging exists.
- [x] `wait --exit` and `wait --idle-ms` are implemented.
- [x] One or two deterministic fixture apps exist.
- [x] A terminal-only proof bundle shows that the control plane works.

Renderer work is a stretch goal for week 1, not the baseline commitment.

Expand Down Expand Up @@ -301,10 +318,10 @@ Renderer work is a stretch goal for week 1, not the baseline commitment.

### Week 1 sign-off checklist

- [ ] All required implementation and checkpoint checkboxes above are complete.
- [ ] Relevant tests for the implemented week 1 scope pass.
- [x] All required implementation and checkpoint checkboxes above are complete.
- [x] Relevant tests for the implemented week 1 scope pass.
- [ ] The dogfood bundle contains screenshots and a screen recording.
- [ ] Remaining gaps are documented explicitly rather than implied.
- [x] Remaining gaps are documented explicitly rather than implied.

### Week 1 stretch goals

Expand Down
Loading
Loading