Refactor run completion coordination out of hostMain by ThomasK33 · Pull Request #70 · coder/agent-tty

ThomasK33 · 2026-04-29T13:06:11Z

Summary

Extract waited-run completion bookkeeping from hostMain into a dedicated RunCompletionCoordinator.
Keep hostMain focused on PTY/RPC/session/rendering orchestration while delegating sentinel, postamble, waiter, timeout, exit, and run_complete coordination.
Add focused coordinator unit tests for completion ordering, multiple active runs, timeout preservation, exit resolution, append failures, and final flush behavior.
Update domain glossary wording for Waited Run / Run Completion.

Fixes #62.

User-facing / automation-facing behavior

No public CLI JSON, protocol schema, message shape, or event-log shape changes are intended. Waited run completion, timeout, and session-exit semantics remain the same.

Validation

npm run test -- test/unit/host/runCompletionSentinel.test.ts test/unit/host/runCompletionCoordinator.test.ts
npm run test -- test/integration/run.test.ts
npm run typecheck
npm run lint
npm run format:check

Dogfood proof

Used an isolated AGENT_TTY_HOME, then destroyed the session and removed the isolated home after copying proof artifacts.

Proof bundle:

/tmp/agent-tty-run-completion-proof-nAOV64

Artifacts:

/tmp/agent-tty-run-completion-proof-nAOV64/screenshot.png
/tmp/agent-tty-run-completion-proof-nAOV64/session.cast
/tmp/agent-tty-run-completion-proof-nAOV64/session.webm
/tmp/agent-tty-run-completion-proof-nAOV64/events.jsonl
/tmp/agent-tty-run-completion-proof-nAOV64/summary.json

Verified completed waited run flags, timed-out run flags, late output after timeout, artifact cleanliness, and input_run / run_complete.inputRunSeq relationships.

📋 Implementation Plan

Plan: Issue #62 — Move run completion coordination out of `hostMain`

Goal

Refactor waited-run completion bookkeeping out of src/host/hostMain.ts into a dedicated host-side coordinator module while preserving all public CLI JSON, protocol/event-log shapes, PTY ingestion ordering, timeout behavior, session-exit behavior, replay/export semantics, and artifact cleanliness.

Evidence and constraints

GitHub issue Move run completion coordination out of the session host #62 asks to move marker/sentinel creation, postamble echo registration, waiter resolution, timeout behavior, exit resolution, and run_complete appends out of hostMain, while preserving the existing sentinel scanner module.
Current implementation locations inspected:
- src/host/hostMain.ts
  - run-completion helper types/functions near the top (ActiveRunCompletion, waiter/result types, signal sentinel generation, postamble construction)
  - run-completion state and waiter logic near host initialization
  - PTY data ingestion through sentinelScanner.feed(...) and appendSentinelPieces(...)
  - PTY exit flushing through sentinelScanner.flush() and postamble sanitizer flush
  - waited run RPC path that appends input_run, registers completion tracking, writes the command/postamble, waits, then best-effort catches renderer up
- src/host/runCompletionSentinel.ts
  - keep this as the existing low-level scanner/sanitizer implementation
- src/protocol/messages.ts and src/protocol/schemas.ts
  - do not change RunParams, RunResult, input_run, or run_complete schemas
- test/unit/host/runCompletionSentinel.test.ts and test/integration/run.test.ts
  - preserve existing sentinel/postamble and run-command behavior
Domain language resolved during grilling:
- Waited Run: caller asks agent-tty to wait until the command's completion signal is observed.
- Run Completion: host-observed endpoint for a Waited Run, distinct from Session exit and caller timeout.
- Caller timeout does not cancel underlying Run Completion; later internal completion bytes must still be recognized/hidden.
- After Session exit, an unobserved Run Completion can no longer arrive.
No ADR is needed: this is an internal refactor boundary, not a hard-to-reverse architecture decision.

Agreed design

New module boundary

Create src/host/runCompletionCoordinator.ts with a RunCompletionCoordinator that owns waited-run completion coordination only.

The coordinator owns:

marker creation
short random signal-sentinel generation and uniqueness checks
postamble construction
active completion registration
scanner/sanitizer instances
waiter subscription/resolution/rejection
timeout behavior
final exit waiter resolution and active-state cleanup
sequential handling of sentinel scanner pieces
deciding when to append output and run_complete via injected append callbacks

hostMain continues to own:

RPC command registration and parameter normalization
no-wait run behavior
PTY lifecycle and pty.write(...)
the serialized ptyIngestionQueue
session status transitions and manifest writes
renderer catch-up after completed waited runs
public RunResult construction

Dependency shape

Inject narrow append callbacks rather than the full EventLog or host state:

interface RunCompletionEventAppender {
  appendOutput(data: string): Promise<void>;
  appendRunComplete(payload: {
    marker: string;
    inputRunSeq?: number;
  }): Promise<number>;
}

hostMain should call coordinator ingestion methods from inside the existing ptyIngestionQueue so event ordering remains canonical.

Public contract preservation

Do not change:

RunParamsSchema
RunResultSchema
InputRunEventPayloadSchema
RunCompleteEventPayloadSchema
event record shapes
replay/export behavior
waited/no-wait CLI JSON result semantics

In particular, session exit for a waited run remains internally distinct but publicly implicit:

{
  accepted: true,
  completed: false,
  timedOut: false,
  seq,
  durationMs,
  marker,
}

Do not add an exited public flag in this issue.

Implementation phases

Phase 1 — Extract coordinator module

Add src/host/runCompletionCoordinator.ts.
Move run-completion-specific helpers from hostMain into the new module:
- shellOctalEscapedBytes(...)
- signal sentinel generation around buildRunCompleteSignalSentinel(...)
- buildRunCompletePostamble(...)
- run-completion waiter/result types
- active completion bookkeeping
Keep RunCompletionSentinelScanner and RunCompletionPostambleEchoSanitizer in src/host/runCompletionSentinel.ts.
Preserve existing defensive invariants, including:
- generated marker matches RUN_MARKER_PATTERN
- sentinel is non-empty and unique among active completions
- inputRunSeq is a non-negative integer
- active marker registration is unique
- wait timeout is a positive integer
- scanner-emitted run_complete maps to an active completion
- appendRunComplete(...) returns a non-negative integer sequence
Prefer behavior-based tests over exposing internal active-state getters.

Proposed public/internal coordinator API:

class RunCompletionCoordinator {
  constructor(appender: RunCompletionEventAppender);

  prepareWaitedRun(): PreparedWaitedRun;

  registerWaitedRun(params: {
    marker: string;
    inputRunSeq: number;
  }): RegisteredWaitedRunCompletion;

  ingestPtyData(data: string): Promise<void>;

  flushPtyDataOnExit(): Promise<void>;

  resolvePendingWaitersForExit(): void;
}

interface PreparedWaitedRun {
  marker: string;
}

interface RegisteredWaitedRunCompletion {
  postamble: string;
  sentinel: string;
  wait(timeoutMs: number): Promise<RunCompletionWaitOutcome>;
}

type RunCompletionWaitOutcome =
  | { kind: 'completed'; seq: number }
  | { kind: 'timeout' }
  | { kind: 'exited' };

Notes:

prepareWaitedRun() should be pure marker preparation. It must not mutate scanner/sanitizer/active-completion state.
registerWaitedRun(...) should happen only after input_run appends successfully, and should finalize a unique short sentinel plus postamble before the PTY write.
Registration must not introduce a new failure mode where input_run is committed and then registration throws because a prepared sentinel collided with another active run. Generate or regenerate the sentinel/postamble during registration so sentinel uniqueness is guaranteed before returning postamble; only impossible/programmer-error invariants may throw. Marker collision remains the same effectively-impossible UUID risk as today.
prepareWaitedRun() should not receive command; command validation and PTY writing stay in hostMain.
wait(timeoutMs) must clear timeout timers on completed/exited/rejected outcomes, and the timeout callback must no-op after the wait has resolved.
Keep marker format exactly __AT_MARKER_${crypto.randomUUID().replace(/-/g, '')}__.
Keep production waited runs using short random signal sentinels, not full marker-containing APC sentinels.

Phase 1 quality gate

New module typechecks locally with no public schema/protocol edits.
Existing sentinel scanner tests still pass after any import/export adjustments.

Phase 2 — Wire `hostMain` to coordinator

Replace local run-completion state in hostMain with one RunCompletionCoordinator instance.
Provide append callbacks that delegate to the existing eventLog.append(...) calls:
- appendOutput(data) → eventLog.append('output', { data })
- appendRunComplete(payload) → eventLog.append('run_complete', payload)
Preserve the no-wait run branch in hostMain unchanged except for removed local helper references.
Preserve waited-run order:
1. validate command/timeout in hostMain
2. const prepared = runCompletion.prepareWaitedRun()
3. append input_run with command, prepared.marker, and noWait
4. const completion = runCompletion.registerWaitedRun({ marker: prepared.marker, inputRunSeq: seq }), finalizing a unique sentinel/postamble before PTY write
5. write ${command}\n${completion.postamble} to PTY
6. wait with explicit effective timeout via completion.wait(effectiveTimeoutMs)
7. if completed, best-effort replayRendererThroughSeq(seq) in hostMain
8. return the same RunResult shape as today
Keep durationMs and the default 30_000 timeout in hostMain.
Keep PTY ingestion serialized by calling:
- runCompletion.ingestPtyData(data) from the existing pty.onData queue callback
- runCompletion.flushPtyDataOnExit() from the existing pty.onExit queue callback before handlePtyExit(...)
Keep exit ordering:
1. final ingestion flush gets the last chance to observe a sentinel
2. attempt to append exit
3. call runCompletion.resolvePendingWaitersForExit() in the same finally position where waiters are resolved today, so pending waiters resolve after the attempted exit append even if that append throws
4. write manifest and initiate shutdown
resolvePendingWaitersForExit() should be idempotent and clear remaining active completion bookkeeping after final flush.
Prefer clearing coordinator state by replacing its internal scanner/sanitizer instances after final flush instead of adding broader clear() APIs to runCompletionSentinel.ts, unless implementation proves awkward.
Do not add a closed state to the coordinator; if data is accidentally ingested after cleanup, it can pass through as ordinary output because no active sentinels/echoes remain.

Phase 2 quality gate

Review the diff to confirm hostMain still owns PTY/RPC/session/rendering orchestration and only delegates run-completion-specific bookkeeping.
Confirm no changes to protocol schemas/messages/event record shapes.

Phase 3 — Add focused coordinator tests

Add test/unit/host/runCompletionCoordinator.test.ts using fake appenders rather than real EventLog files.

Cover at least:

Completion append path and ordering
- register a waited run
- feed before + sentinel + after
- assert event order is output(before), then run_complete, then output(after)
- assert run_complete carries the correct marker and inputRunSeq
- assert waiter resolves { kind: 'completed', seq }
- if practical, include trailing postamble sanitizer output ordering without duplicating scanner tests
Multiple active waited runs
- register two waited runs
- complete them out of order
- assert each waiter resolves with the correct marker/sequence and each run_complete.inputRunSeq points to the right input_run sequence
Timeout preserves hidden-byte registrations
- use fake timers for deterministic timeout behavior
- wait with a short timeout
- assert waiter resolves { kind: 'timeout' }
- feed the sentinel later
- assert sentinel/postamble bytes do not append as output
- assert run_complete still appends
Timeout followed by append failure
- time out a waiter
- make appendRunComplete(...) throw when the later sentinel arrives
- assert ingestion rejects/fails loudly even though the original waiter is gone
Session exit waiter resolution
- register a waited run
- final flush without sentinel
- call resolvePendingWaitersForExit()
- assert waiter resolves { kind: 'exited' }
- assert no run_complete is appended by exit alone
Exit before timeout timer fires
- use fake timers
- resolve exit before timeout
- assert wait resolves { kind: 'exited' }
- advance timers and assert timeout cannot mutate state afterward
Appender failure propagation
- make appendRunComplete(...) throw for an active, non-timed-out waiter
- feed sentinel
- assert ingestion rejects and the relevant waiter rejects
- also cover or explicitly preserve that trailing echo output append failure during sentinel handling rejects the relevant waiter, matching today
- separately note that ordinary output append failure before any sentinel should propagate as ingestion failure without inventing new waiter semantics
Final flush preserves non-sentinel pending output
- if practical without duplicating scanner tests, verify partial non-sentinel tails flush as output.

Keep test/unit/host/runCompletionSentinel.test.ts focused on scanner/sanitizer algorithms; do not rewrite those algorithms.

Phase 3 quality gate

Coordinator unit tests prove the extracted state machine independently.
Existing sentinel tests remain the low-level scanner/sanitizer safety net.

Phase 4 — Focused validation

Run the narrowest useful validation first:

npm run test -- test/unit/host/runCompletionSentinel.test.ts test/unit/host/runCompletionCoordinator.test.ts
npm run test -- test/integration/run.test.ts
npm run typecheck
npm run lint

If the repo/environment has mise available and this is release-sensitive, also run:

mise run ci

If any validation cannot run, report the exact command, failure/blocker, and next-best check.

Phase 5 — CLI dogfooding and proof bundle

Use an isolated absolute home; never mutate the real ~/.agent-tty:

export AGENT_TTY_HOME="$(mktemp -d)"
npx tsx src/cli/main.ts start --json -- bash

Then, using the returned session id:

npx tsx src/cli/main.ts run <session-id> --json -- 'printf hello'
npx tsx src/cli/main.ts run <session-id> --json --timeout-ms 50 -- 'sleep 0.2; printf done'
# Poll/wait long enough for the timed-out command to actually finish before artifact checks.
# One simple option is to issue a subsequent waited no-op command, which queues after the prior shell work:
npx tsx src/cli/main.ts run <session-id> --json --timeout-ms 1000 -- 'true'
npx tsx src/cli/main.ts snapshot <session-id> --json --format text
npx tsx src/cli/main.ts screenshot <session-id> --json
npx tsx src/cli/main.ts record export <session-id> --format asciicast --json

If WebM export is available and inexpensive in the local environment, also capture a video proof:

npx tsx src/cli/main.ts record export <session-id> --format webm --json

Verify and document:

completed waited run reports completed: true, timedOut: false
timed-out run reports completed: false, timedOut: true
later command output appears after timeout when it eventually arrives
sentinel text, marker text, and postamble echo text are absent from snapshot, screenshot-rendered output, asciicast, and WebM if captured
events.jsonl contains matching input_run and later run_complete records with run_complete.inputRunSeq pointing to the input_run sequence
attach or list the screenshot path and recording/export paths in the final implementation report or PR body

Because this is a backend refactor, automated tests and event-log/artifact assertions are the release gate. Screenshot/video proof should still be captured for reviewer confidence when the renderer/export environment supports it; if renderer proof cannot run, report the blocker and rely on the targeted integration tests plus snapshot/asciicast checks.

Best-effort cleanup after dogfooding:

npx tsx src/cli/main.ts destroy <session-id> --json
rm -rf "$AGENT_TTY_HOME"

If cleanup commands differ in the current CLI, use the documented session stop/destroy command and remove only the isolated temp home created for this dogfood run.

Acceptance criteria

hostMain delegates waited-run completion coordination to RunCompletionCoordinator while retaining PTY lifecycle, RPC wiring, command dispatch, session state, and renderer catch-up ownership.
No-wait run behavior remains unchanged.
Waited run result fields and semantics remain unchanged for completed, timed-out, and exited sessions.
Timeout removes only the waiter; active sentinel/postamble registrations remain so eventual internal bytes are hidden and run_complete can still append.
Session exit resolves pending run waiters as exited, does not append run_complete by itself, and clears no-longer-possible active completions after final flush.
run_complete events are appended in the same event-log position as today when an active completion sentinel is observed, with the correct marker and inputRunSeq relationship, including output-before-sentinel and output-after-sentinel ordering.
Existing sentinel unit tests and run integration tests pass, especially artifact-cleanliness cases.
New coordinator unit tests cover completion, timeout, exit, append failure, and final flush behavior.
Typecheck and lint pass.
Dogfooding uses isolated AGENT_TTY_HOME and captures reviewer-facing artifact proof when supported.

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh

ThomasK33 · 2026-04-29T13:06:27Z

/coder-agents-review

coder-agents-review

Three P3s, four nits, clean on mechanics and correctness. This is a well-executed extraction that preserves the state machine semantics, simplifies hostMain, and comes with good unit test coverage. The coordinator's two-step prepare/register API models the actual ordering constraint nicely, and the callback interface keeps the dependency boundary narrow.

Hisoka spotted that the refactor silently fixes a TOCTOU window in the old sentinel-uniqueness check: old code yielded between the uniqueness check and registration, letting two concurrent RPCs pass identical sentinels. New code moves both into the synchronous body of registerWaitedRun(). The collision probability was ~2^-32 per pair, so this was dormant, but it is a real improvement.

Takumi traced all concurrency paths (timeout races, exit ordering, microtask sequencing, timer cleanup, pre-wait sentinel arrival) and found no issues. Mafuuu modeled all four lifecycle modes and confirmed behavioral equivalence. Pariston tried to argue against the extraction and could not. The 2 integration test failures (snapshot exit 9) are pre-existing on the base SHA.

"I tried to build a case against this refactor and could not." (Pariston)

Severity summary: 0 P0, 0 P1, 0 P2, 3 P3, 4 Nit, 3 Note.

🤖 This review was automatically generated with Coder Agents.

ThomasK33 · 2026-04-29T14:01:35Z

/coder-agents-review

coder-agents-review

All seven R1 findings verified as fixed. One new P3, two nits. This is ready to merge after the test gap below.

The fix commit (0ca44b0) is clean and scoped: doc comments added, assertRunMarker deduplicated, postamble sanitization test added, types renamed, inputRunSeq tightened, invariant messages improved, method renamed to resetForExit. No regressions introduced. All 10 reviewers confirmed the fixes.

Bisky found one remaining test gap: the sanitizer-buffered postamble bytes flushed on exit. The existing "flushes pending non-completed sentinel bytes" test covers only the scanner flush, not the sanitizer flush. These are separate code paths in flushPtyDataOnExit().

Process note: the fix commit subject (review: address run completion feedback) uses a non-standard type prefix. Consider refactor: or fix: per project convention.

Severity summary: 0 P0, 0 P1, 0 P2, 1 P3, 2 Nit.

🤖 This review was automatically generated with Coder Agents.

ThomasK33 · 2026-04-29T14:40:16Z

/coder-agents-review

coder-agents-review

All R2 fixes verified. No new findings from any reviewer across round 3. This PR is clean.

13 findings across 3 rounds: 4 P3, 6 Nit, 3 Note. All P3s and Nits fixed. Notes are informational.

The extraction is faithful, the test suite is thorough (12 coordinator tests covering all state machine transitions), and the coordinator boundary is well-designed. The two-step prepare/register protocol, the narrow appender interface, and the private-field encapsulation are all improvements over the old closure-based approach.

"I tried to build a case against this refactoring and could not." (Pariston, R3)

🤖 This review was automatically generated with Coder Agents.

ThomasK33 · 2026-04-29T15:13:14Z

/coder-agents-review

coder-agents-review

Post-rebase verification. PR rebased onto new base (includes merged #68 and #69). The PR's own files are byte-identical to R3. CI is green (11 passed). All prior fixes hold.

No new findings across 4 rounds. 13 total findings (4 P3, 6 Nit, 3 Note), all resolved. Ship it.

🤖 This review was automatically generated with Coder Agents.

coder-agents-review Bot reviewed Apr 29, 2026

View reviewed changes

coder-agents-review Bot approved these changes Apr 29, 2026

View reviewed changes

Comment thread test/unit/host/runCompletionCoordinator.test.ts

Comment thread src/host/runCompletionCoordinator.ts Outdated

Comment thread src/host/runCompletionCoordinator.ts Outdated

coder-agents-review Bot approved these changes Apr 29, 2026

View reviewed changes

ThomasK33 added 3 commits April 29, 2026 15:06

refactor: extract run completion coordinator

f6666be

review: address run completion feedback

389a9f6

test: cover postamble flush on exit

38b64a4

ThomasK33 force-pushed the grill-docs-wyjm branch from 60e8299 to 38b64a4 Compare April 29, 2026 15:08

coder-agents-review Bot approved these changes Apr 29, 2026

View reviewed changes

ThomasK33 merged commit 4237a84 into main Apr 29, 2026
11 checks passed

ThomasK33 deleted the grill-docs-wyjm branch April 29, 2026 16:22

Conversation

ThomasK33 commented Apr 29, 2026

Summary

User-facing / automation-facing behavior

Validation

Dogfood proof

Plan: Issue #62 — Move run completion coordination out of hostMain

Goal

Evidence and constraints

Agreed design

New module boundary

Dependency shape

Public contract preservation

Implementation phases

Phase 1 — Extract coordinator module

Phase 2 — Wire hostMain to coordinator

Phase 3 — Add focused coordinator tests

Phase 4 — Focused validation

Phase 5 — CLI dogfooding and proof bundle

Acceptance criteria

Uh oh!

ThomasK33 commented Apr 29, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThomasK33 commented Apr 29, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThomasK33 commented Apr 29, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasK33 commented Apr 29, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Plan: Issue #62 — Move run completion coordination out of `hostMain`

Phase 2 — Wire `hostMain` to coordinator