Skip to content

Refactor run completion coordination out of hostMain#70

Merged
ThomasK33 merged 3 commits into
mainfrom
grill-docs-wyjm
Apr 29, 2026
Merged

Refactor run completion coordination out of hostMain#70
ThomasK33 merged 3 commits into
mainfrom
grill-docs-wyjm

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

  • Extract waited-run completion bookkeeping from hostMain into a dedicated RunCompletionCoordinator.
  • Keep hostMain focused on PTY/RPC/session/rendering orchestration while delegating sentinel, postamble, waiter, timeout, exit, and run_complete coordination.
  • Add focused coordinator unit tests for completion ordering, multiple active runs, timeout preservation, exit resolution, append failures, and final flush behavior.
  • Update domain glossary wording for Waited Run / Run Completion.

Fixes #62.

User-facing / automation-facing behavior

No public CLI JSON, protocol schema, message shape, or event-log shape changes are intended. Waited run completion, timeout, and session-exit semantics remain the same.

Validation

npm run test -- test/unit/host/runCompletionSentinel.test.ts test/unit/host/runCompletionCoordinator.test.ts
npm run test -- test/integration/run.test.ts
npm run typecheck
npm run lint
npm run format:check

Dogfood proof

Used an isolated AGENT_TTY_HOME, then destroyed the session and removed the isolated home after copying proof artifacts.

Proof bundle:

/tmp/agent-tty-run-completion-proof-nAOV64

Artifacts:

/tmp/agent-tty-run-completion-proof-nAOV64/screenshot.png
/tmp/agent-tty-run-completion-proof-nAOV64/session.cast
/tmp/agent-tty-run-completion-proof-nAOV64/session.webm
/tmp/agent-tty-run-completion-proof-nAOV64/events.jsonl
/tmp/agent-tty-run-completion-proof-nAOV64/summary.json

Verified completed waited run flags, timed-out run flags, late output after timeout, artifact cleanliness, and input_run / run_complete.inputRunSeq relationships.


📋 Implementation Plan

Plan: Issue #62 — Move run completion coordination out of hostMain

Goal

Refactor waited-run completion bookkeeping out of src/host/hostMain.ts into a dedicated host-side coordinator module while preserving all public CLI JSON, protocol/event-log shapes, PTY ingestion ordering, timeout behavior, session-exit behavior, replay/export semantics, and artifact cleanliness.

Evidence and constraints

  • GitHub issue Move run completion coordination out of the session host #62 asks to move marker/sentinel creation, postamble echo registration, waiter resolution, timeout behavior, exit resolution, and run_complete appends out of hostMain, while preserving the existing sentinel scanner module.
  • Current implementation locations inspected:
    • src/host/hostMain.ts
      • run-completion helper types/functions near the top (ActiveRunCompletion, waiter/result types, signal sentinel generation, postamble construction)
      • run-completion state and waiter logic near host initialization
      • PTY data ingestion through sentinelScanner.feed(...) and appendSentinelPieces(...)
      • PTY exit flushing through sentinelScanner.flush() and postamble sanitizer flush
      • waited run RPC path that appends input_run, registers completion tracking, writes the command/postamble, waits, then best-effort catches renderer up
    • src/host/runCompletionSentinel.ts
      • keep this as the existing low-level scanner/sanitizer implementation
    • src/protocol/messages.ts and src/protocol/schemas.ts
      • do not change RunParams, RunResult, input_run, or run_complete schemas
    • test/unit/host/runCompletionSentinel.test.ts and test/integration/run.test.ts
      • preserve existing sentinel/postamble and run-command behavior
  • Domain language resolved during grilling:
    • Waited Run: caller asks agent-tty to wait until the command's completion signal is observed.
    • Run Completion: host-observed endpoint for a Waited Run, distinct from Session exit and caller timeout.
    • Caller timeout does not cancel underlying Run Completion; later internal completion bytes must still be recognized/hidden.
    • After Session exit, an unobserved Run Completion can no longer arrive.
  • No ADR is needed: this is an internal refactor boundary, not a hard-to-reverse architecture decision.

Agreed design

New module boundary

Create src/host/runCompletionCoordinator.ts with a RunCompletionCoordinator that owns waited-run completion coordination only.

The coordinator owns:

  • marker creation
  • short random signal-sentinel generation and uniqueness checks
  • postamble construction
  • active completion registration
  • scanner/sanitizer instances
  • waiter subscription/resolution/rejection
  • timeout behavior
  • final exit waiter resolution and active-state cleanup
  • sequential handling of sentinel scanner pieces
  • deciding when to append output and run_complete via injected append callbacks

hostMain continues to own:

  • RPC command registration and parameter normalization
  • no-wait run behavior
  • PTY lifecycle and pty.write(...)
  • the serialized ptyIngestionQueue
  • session status transitions and manifest writes
  • renderer catch-up after completed waited runs
  • public RunResult construction

Dependency shape

Inject narrow append callbacks rather than the full EventLog or host state:

interface RunCompletionEventAppender {
  appendOutput(data: string): Promise<void>;
  appendRunComplete(payload: {
    marker: string;
    inputRunSeq?: number;
  }): Promise<number>;
}

hostMain should call coordinator ingestion methods from inside the existing ptyIngestionQueue so event ordering remains canonical.

Public contract preservation

Do not change:

  • RunParamsSchema
  • RunResultSchema
  • InputRunEventPayloadSchema
  • RunCompleteEventPayloadSchema
  • event record shapes
  • replay/export behavior
  • waited/no-wait CLI JSON result semantics

In particular, session exit for a waited run remains internally distinct but publicly implicit:

{
  accepted: true,
  completed: false,
  timedOut: false,
  seq,
  durationMs,
  marker,
}

Do not add an exited public flag in this issue.

Implementation phases

Phase 1 — Extract coordinator module

  1. Add src/host/runCompletionCoordinator.ts.
  2. Move run-completion-specific helpers from hostMain into the new module:
    • shellOctalEscapedBytes(...)
    • signal sentinel generation around buildRunCompleteSignalSentinel(...)
    • buildRunCompletePostamble(...)
    • run-completion waiter/result types
    • active completion bookkeeping
  3. Keep RunCompletionSentinelScanner and RunCompletionPostambleEchoSanitizer in src/host/runCompletionSentinel.ts.
  4. Preserve existing defensive invariants, including:
    • generated marker matches RUN_MARKER_PATTERN
    • sentinel is non-empty and unique among active completions
    • inputRunSeq is a non-negative integer
    • active marker registration is unique
    • wait timeout is a positive integer
    • scanner-emitted run_complete maps to an active completion
    • appendRunComplete(...) returns a non-negative integer sequence
  5. Prefer behavior-based tests over exposing internal active-state getters.

Proposed public/internal coordinator API:

class RunCompletionCoordinator {
  constructor(appender: RunCompletionEventAppender);

  prepareWaitedRun(): PreparedWaitedRun;

  registerWaitedRun(params: {
    marker: string;
    inputRunSeq: number;
  }): RegisteredWaitedRunCompletion;

  ingestPtyData(data: string): Promise<void>;

  flushPtyDataOnExit(): Promise<void>;

  resolvePendingWaitersForExit(): void;
}

interface PreparedWaitedRun {
  marker: string;
}

interface RegisteredWaitedRunCompletion {
  postamble: string;
  sentinel: string;
  wait(timeoutMs: number): Promise<RunCompletionWaitOutcome>;
}

type RunCompletionWaitOutcome =
  | { kind: 'completed'; seq: number }
  | { kind: 'timeout' }
  | { kind: 'exited' };

Notes:

  • prepareWaitedRun() should be pure marker preparation. It must not mutate scanner/sanitizer/active-completion state.
  • registerWaitedRun(...) should happen only after input_run appends successfully, and should finalize a unique short sentinel plus postamble before the PTY write.
  • Registration must not introduce a new failure mode where input_run is committed and then registration throws because a prepared sentinel collided with another active run. Generate or regenerate the sentinel/postamble during registration so sentinel uniqueness is guaranteed before returning postamble; only impossible/programmer-error invariants may throw. Marker collision remains the same effectively-impossible UUID risk as today.
  • prepareWaitedRun() should not receive command; command validation and PTY writing stay in hostMain.
  • wait(timeoutMs) must clear timeout timers on completed/exited/rejected outcomes, and the timeout callback must no-op after the wait has resolved.
  • Keep marker format exactly __AT_MARKER_${crypto.randomUUID().replace(/-/g, '')}__.
  • Keep production waited runs using short random signal sentinels, not full marker-containing APC sentinels.

Phase 1 quality gate

  • New module typechecks locally with no public schema/protocol edits.
  • Existing sentinel scanner tests still pass after any import/export adjustments.

Phase 2 — Wire hostMain to coordinator

  1. Replace local run-completion state in hostMain with one RunCompletionCoordinator instance.
  2. Provide append callbacks that delegate to the existing eventLog.append(...) calls:
    • appendOutput(data)eventLog.append('output', { data })
    • appendRunComplete(payload)eventLog.append('run_complete', payload)
  3. Preserve the no-wait run branch in hostMain unchanged except for removed local helper references.
  4. Preserve waited-run order:
    1. validate command/timeout in hostMain
    2. const prepared = runCompletion.prepareWaitedRun()
    3. append input_run with command, prepared.marker, and noWait
    4. const completion = runCompletion.registerWaitedRun({ marker: prepared.marker, inputRunSeq: seq }), finalizing a unique sentinel/postamble before PTY write
    5. write ${command}\n${completion.postamble} to PTY
    6. wait with explicit effective timeout via completion.wait(effectiveTimeoutMs)
    7. if completed, best-effort replayRendererThroughSeq(seq) in hostMain
    8. return the same RunResult shape as today
  5. Keep durationMs and the default 30_000 timeout in hostMain.
  6. Keep PTY ingestion serialized by calling:
    • runCompletion.ingestPtyData(data) from the existing pty.onData queue callback
    • runCompletion.flushPtyDataOnExit() from the existing pty.onExit queue callback before handlePtyExit(...)
  7. Keep exit ordering:
    1. final ingestion flush gets the last chance to observe a sentinel
    2. attempt to append exit
    3. call runCompletion.resolvePendingWaitersForExit() in the same finally position where waiters are resolved today, so pending waiters resolve after the attempted exit append even if that append throws
    4. write manifest and initiate shutdown
  8. resolvePendingWaitersForExit() should be idempotent and clear remaining active completion bookkeeping after final flush.
  9. Prefer clearing coordinator state by replacing its internal scanner/sanitizer instances after final flush instead of adding broader clear() APIs to runCompletionSentinel.ts, unless implementation proves awkward.
  10. Do not add a closed state to the coordinator; if data is accidentally ingested after cleanup, it can pass through as ordinary output because no active sentinels/echoes remain.

Phase 2 quality gate

  • Review the diff to confirm hostMain still owns PTY/RPC/session/rendering orchestration and only delegates run-completion-specific bookkeeping.
  • Confirm no changes to protocol schemas/messages/event record shapes.

Phase 3 — Add focused coordinator tests

Add test/unit/host/runCompletionCoordinator.test.ts using fake appenders rather than real EventLog files.

Cover at least:

  1. Completion append path and ordering
    • register a waited run
    • feed before + sentinel + after
    • assert event order is output(before), then run_complete, then output(after)
    • assert run_complete carries the correct marker and inputRunSeq
    • assert waiter resolves { kind: 'completed', seq }
    • if practical, include trailing postamble sanitizer output ordering without duplicating scanner tests
  2. Multiple active waited runs
    • register two waited runs
    • complete them out of order
    • assert each waiter resolves with the correct marker/sequence and each run_complete.inputRunSeq points to the right input_run sequence
  3. Timeout preserves hidden-byte registrations
    • use fake timers for deterministic timeout behavior
    • wait with a short timeout
    • assert waiter resolves { kind: 'timeout' }
    • feed the sentinel later
    • assert sentinel/postamble bytes do not append as output
    • assert run_complete still appends
  4. Timeout followed by append failure
    • time out a waiter
    • make appendRunComplete(...) throw when the later sentinel arrives
    • assert ingestion rejects/fails loudly even though the original waiter is gone
  5. Session exit waiter resolution
    • register a waited run
    • final flush without sentinel
    • call resolvePendingWaitersForExit()
    • assert waiter resolves { kind: 'exited' }
    • assert no run_complete is appended by exit alone
  6. Exit before timeout timer fires
    • use fake timers
    • resolve exit before timeout
    • assert wait resolves { kind: 'exited' }
    • advance timers and assert timeout cannot mutate state afterward
  7. Appender failure propagation
    • make appendRunComplete(...) throw for an active, non-timed-out waiter
    • feed sentinel
    • assert ingestion rejects and the relevant waiter rejects
    • also cover or explicitly preserve that trailing echo output append failure during sentinel handling rejects the relevant waiter, matching today
    • separately note that ordinary output append failure before any sentinel should propagate as ingestion failure without inventing new waiter semantics
  8. Final flush preserves non-sentinel pending output
    • if practical without duplicating scanner tests, verify partial non-sentinel tails flush as output.

Keep test/unit/host/runCompletionSentinel.test.ts focused on scanner/sanitizer algorithms; do not rewrite those algorithms.

Phase 3 quality gate

  • Coordinator unit tests prove the extracted state machine independently.
  • Existing sentinel tests remain the low-level scanner/sanitizer safety net.

Phase 4 — Focused validation

Run the narrowest useful validation first:

npm run test -- test/unit/host/runCompletionSentinel.test.ts test/unit/host/runCompletionCoordinator.test.ts
npm run test -- test/integration/run.test.ts
npm run typecheck
npm run lint

If the repo/environment has mise available and this is release-sensitive, also run:

mise run ci

If any validation cannot run, report the exact command, failure/blocker, and next-best check.

Phase 5 — CLI dogfooding and proof bundle

Use an isolated absolute home; never mutate the real ~/.agent-tty:

export AGENT_TTY_HOME="$(mktemp -d)"
npx tsx src/cli/main.ts start --json -- bash

Then, using the returned session id:

npx tsx src/cli/main.ts run <session-id> --json -- 'printf hello'
npx tsx src/cli/main.ts run <session-id> --json --timeout-ms 50 -- 'sleep 0.2; printf done'
# Poll/wait long enough for the timed-out command to actually finish before artifact checks.
# One simple option is to issue a subsequent waited no-op command, which queues after the prior shell work:
npx tsx src/cli/main.ts run <session-id> --json --timeout-ms 1000 -- 'true'
npx tsx src/cli/main.ts snapshot <session-id> --json --format text
npx tsx src/cli/main.ts screenshot <session-id> --json
npx tsx src/cli/main.ts record export <session-id> --format asciicast --json

If WebM export is available and inexpensive in the local environment, also capture a video proof:

npx tsx src/cli/main.ts record export <session-id> --format webm --json

Verify and document:

  • completed waited run reports completed: true, timedOut: false
  • timed-out run reports completed: false, timedOut: true
  • later command output appears after timeout when it eventually arrives
  • sentinel text, marker text, and postamble echo text are absent from snapshot, screenshot-rendered output, asciicast, and WebM if captured
  • events.jsonl contains matching input_run and later run_complete records with run_complete.inputRunSeq pointing to the input_run sequence
  • attach or list the screenshot path and recording/export paths in the final implementation report or PR body

Because this is a backend refactor, automated tests and event-log/artifact assertions are the release gate. Screenshot/video proof should still be captured for reviewer confidence when the renderer/export environment supports it; if renderer proof cannot run, report the blocker and rely on the targeted integration tests plus snapshot/asciicast checks.

Best-effort cleanup after dogfooding:

npx tsx src/cli/main.ts destroy <session-id> --json
rm -rf "$AGENT_TTY_HOME"

If cleanup commands differ in the current CLI, use the documented session stop/destroy command and remove only the isolated temp home created for this dogfood run.

Acceptance criteria

  • hostMain delegates waited-run completion coordination to RunCompletionCoordinator while retaining PTY lifecycle, RPC wiring, command dispatch, session state, and renderer catch-up ownership.
  • No-wait run behavior remains unchanged.
  • Waited run result fields and semantics remain unchanged for completed, timed-out, and exited sessions.
  • Timeout removes only the waiter; active sentinel/postamble registrations remain so eventual internal bytes are hidden and run_complete can still append.
  • Session exit resolves pending run waiters as exited, does not append run_complete by itself, and clears no-longer-possible active completions after final flush.
  • run_complete events are appended in the same event-log position as today when an active completion sentinel is observed, with the correct marker and inputRunSeq relationship, including output-before-sentinel and output-after-sentinel ordering.
  • Existing sentinel unit tests and run integration tests pass, especially artifact-cleanliness cases.
  • New coordinator unit tests cover completion, timeout, exit, append failure, and final flush behavior.
  • Typecheck and lint pass.
  • Dogfooding uses isolated AGENT_TTY_HOME and captures reviewer-facing artifact proof when supported.

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three P3s, four nits, clean on mechanics and correctness. This is a well-executed extraction that preserves the state machine semantics, simplifies hostMain, and comes with good unit test coverage. The coordinator's two-step prepare/register API models the actual ordering constraint nicely, and the callback interface keeps the dependency boundary narrow.

Hisoka spotted that the refactor silently fixes a TOCTOU window in the old sentinel-uniqueness check: old code yielded between the uniqueness check and registration, letting two concurrent RPCs pass identical sentinels. New code moves both into the synchronous body of registerWaitedRun(). The collision probability was ~2^-32 per pair, so this was dormant, but it is a real improvement.

Takumi traced all concurrency paths (timeout races, exit ordering, microtask sequencing, timer cleanup, pre-wait sentinel arrival) and found no issues. Mafuuu modeled all four lifecycle modes and confirmed behavioral equivalence. Pariston tried to argue against the extraction and could not. The 2 integration test failures (snapshot exit 9) are pre-existing on the base SHA.

"I tried to build a case against this refactor and could not." (Pariston)

Severity summary: 0 P0, 0 P1, 0 P2, 3 P3, 4 Nit, 3 Note.

🤖 This review was automatically generated with Coder Agents.

Comment thread src/host/runCompletionCoordinator.ts
Comment thread src/host/runCompletionCoordinator.ts Outdated
Comment thread test/unit/host/runCompletionCoordinator.test.ts
Comment thread src/host/runCompletionCoordinator.ts
Comment thread src/host/runCompletionCoordinator.ts
Comment thread src/host/runCompletionCoordinator.ts Outdated
Comment thread src/host/runCompletionCoordinator.ts Outdated
Comment thread src/host/runCompletionCoordinator.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seven R1 findings verified as fixed. One new P3, two nits. This is ready to merge after the test gap below.

The fix commit (0ca44b0) is clean and scoped: doc comments added, assertRunMarker deduplicated, postamble sanitization test added, types renamed, inputRunSeq tightened, invariant messages improved, method renamed to resetForExit. No regressions introduced. All 10 reviewers confirmed the fixes.

Bisky found one remaining test gap: the sanitizer-buffered postamble bytes flushed on exit. The existing "flushes pending non-completed sentinel bytes" test covers only the scanner flush, not the sanitizer flush. These are separate code paths in flushPtyDataOnExit().

Process note: the fix commit subject (review: address run completion feedback) uses a non-standard type prefix. Consider refactor: or fix: per project convention.

Severity summary: 0 P0, 0 P1, 0 P2, 1 P3, 2 Nit.

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/host/runCompletionCoordinator.test.ts
Comment thread src/host/runCompletionCoordinator.ts Outdated
Comment thread src/host/runCompletionCoordinator.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All R2 fixes verified. No new findings from any reviewer across round 3. This PR is clean.

13 findings across 3 rounds: 4 P3, 6 Nit, 3 Note. All P3s and Nits fixed. Notes are informational.

The extraction is faithful, the test suite is thorough (12 coordinator tests covering all state machine transitions), and the coordinator boundary is well-designed. The two-step prepare/register protocol, the narrow appender interface, and the private-field encapsulation are all improvements over the old closure-based approach.

"I tried to build a case against this refactoring and could not." (Pariston, R3)

🤖 This review was automatically generated with Coder Agents.

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-rebase verification. PR rebased onto new base (includes merged #68 and #69). The PR's own files are byte-identical to R3. CI is green (11 passed). All prior fixes hold.

No new findings across 4 rounds. 13 total findings (4 P3, 6 Nit, 3 Note), all resolved. Ship it.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33 ThomasK33 merged commit 4237a84 into main Apr 29, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the grill-docs-wyjm branch April 29, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move run completion coordination out of the session host

1 participant