fix(pilot): surface failure reasons in terminal + break single-slot cascade + cross-worktree discover#125
Merged
Conversation
…ascade + cross-worktree discover
When a task fails in 'pilot build', make the reason visible on stderr and
in the run summary — no more 'task.failed T1 in 306s' followed by 10 bare
'task.blocked' lines with no signal about what actually broke.
- startStreamingLogger prints ' → <phase>: <reason>' beneath each
task.failed event. Blocked cascade de-noised to one summary line on
run.finished ('blocked: N task(s) waiting on failed dependency (...)').
- printSummary accepts the DB handle and renders a per-failed-task
block (phase, reason, session, worktree, elapsed, attempts) between
the counts line and the follow-up commands. Successful runs render
identically to before.
- WorktreePool retires preserved-on-failure slots into a separate
retiredSlots list on next acquire() and mints a fresh stub with a
'-<counter>'-suffixed path. One failure no longer cascade-blocks the
whole run.
- runOneTask's every task.failed event payload now carries a 'reason'
string matching tasks.last_error. Stall payloads additionally carry
eventCount + lastEventTs. Per-task session.jsonl is written to
<runDir>/tasks/<taskId>/session.jsonl (the path AGENTS.md already
documents) for post-mortem.
- discoverRun tries the cwd-scoped path first, then falls back to
scanning '<base>/<repoFolder>/pilot/runs/<runId>/state.db' across
every repo folder. 'pilot status --run <id>' / 'pilot logs --run <id>'
now work from any worktree.
…tests Bun's bundled SQLite on macOS CI rejects `1_700_000_000_000` as an unrecognized token when it appears inside the VALUES list of an INSERT statement. My local Bun tolerated it but GitHub Actions macos-latest does not, so the printSummary failure-block tests failed with "SQLiteError: unrecognized token" while Ubuntu (and my machine) passed. Numeric separators in JS parameter-binding arrays are fine (JS parser owns those); they only break when baked into the SQL string. Replaced all three in-SQL occurrences with the plain numeric literals.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pilot terminal failure visibility
Goal
Make
pilot buildfailures diagnosable from the terminal output alone.Today when a task fails, stderr says
task.failed T1 in 306sand prints10
task.blockedlines — no failure phase, no reason string, no sessionID, no worktree path. The operator has to
sqlite3into a per-repostate.dbthatpilot status --run <id>itself can't find from anotherworktree. Three compounding bugs make unattended runs effectively
undebuggable: (1)
task.failedevent payloads carry{phase, reason}but the streaming logger and summary discard both; (2) the single-slot
WorktreePoolpoisons the next task on any failure, cascading everydownstream task to
blocked; (3)discoverRunresolvesstate.dbrelative to
cwd, breaking cross-worktree lookups. After this planlands: failure phase + reason are visible inline, the summary carries a
per-failed-task detail block with session/worktree, blocked-cascade is
de-noised to one summary line, a pool preserve doesn't poison the rest
of the run, per-task SSE logs are captured to
runs/<id>/tasks/<task>/session.jsonlfor post-mortem, and
pilot status --run <id>works from any worktree.The underlying opencode stall that triggered the 300s silence on
T1-AUDIT-DOC is explicitly out of scope — that's opencode-layer, not
pilot-layer. Per-task forensics make the NEXT such stall diagnosable.
Constraints
~/.glorious/opencode/<repo>/pilot/(root AGENTS.md rule 10). No writes outside.
runstable — cross-repo discover is done viafilesystem scan, not by persisting origin repo on the run row.
work edits the pilot infrastructure itself, not the agents it invokes.
symbol names pin the contract surface.
patchbump with a user-facing CHANGELOG line.printSummarystyle). Scannability comes from indentation + blank-line separation.
Acceptance criteria
Forensics + pool (do first — data source for the rest)
src/pilot/paths.tsexportsgetTaskJsonlPath(cwd, runId, taskId)returning
<runDir>/tasks/<taskId>/session.jsonl, creating the parenttasks/<taskId>/directory on first access. Rejects unsafe taskIdsvia the same character-class guard as
isSafeRunId.JSONL file (one JSON object per line). Subscription is task-scoped
(not per-attempt), teed from the existing
deps.bus.on(sessionId, ...)hook.
appendEvent({kind: "task.failed", payload})insrc/pilot/worker/worker.tscarries areasonstring matching theone stored in
tasks.last_error. No bare{phase}-only payloadsremain in any failure branch.
task.failedpayload additionally carrieseventCount(events observed during the wait) andlastEventTs(epoch ms of the last event, or
nullif zero events).WorktreePool.acquire()afterpreserveOnFailure(slot)returnsa NEW slot (
preserved: false,prepared: false, fresh path afterprepare). The preserved slot is tracked in a separate
retiredSlotslist — still cleaned up by
shutdown({ keepPreserved: false }),still listed by
inspect().WorktreePool.prepare()on a retried slot appends-<counter>to the path from
worktreeDirOf(n)whenretryCounter.get(n) > 0.Worker count invariant (v0.1 = 1 concurrent) is preserved.
Terminal visibility (consumes the enriched data)
startStreamingLoggerinsrc/pilot/cli/build.tsprints, afterthe existing
task.failed <id> in <s>sline, a continuation line ofthe form
→ <phase>: <reason>(two-space indent, arrow prefix).Tolerates payloads missing
phase/reasonby omitting the→line.reasontruncated to 200 chars with…if longer.startStreamingLoggerno longer prints one line pertask.blockedevent. Instead it counts blocked events and emits asingle summary line on
run.finished:blocked: N task(s) waiting on failed dependency (<firstReason>).printSummaryaccepts the open DB handle and, when the run hasfailed/aborted tasks, prints a per-task detail block between the
counts line and the follow-up commands:
task.failedevent for that(runId, taskId); reason falls back tolast_errorif the eventpayload is missing it.
Failed tasks (0):heading.Cross-worktree discover
resolveBaseDirinsrc/pilot/paths.tsis exported (minimalchange — add
export).discoverRun({cwd, runId})triesgetStateDbPath(cwd, runId)first (fast path), then falls back to scanning direct children of
<base>for<child>/pilot/runs/<runId>/state.db. Returns thefirst hit. On miss, throws listing every path tried.
pilot status/pilot logs/pilot costinherit the fixbecause they all route through
discoverRun.Release hygiene
.changeset/pilot-terminal-failure-visibility.mdwith"@glrs-dev/harness-opencode": patchfrontmatter and a 1-2 sentenceuser-facing body.
Verification
bun run typecheckgreen.bun testgreen (all suites, including existing pilot tests).bun run buildgreen (dist emits without error).File-level changes
src/pilot/paths.ts
resolveBaseDir(addexportkeyword, no bodychange). Add
getTaskJsonlPath(cwd, runId, taskId)and matchingisSafeTaskIdguard (same character class asisSafeRunId).needs base-dir access for cross-repo scans.
resolveBaseDirvisibility change istrivial).
src/pilot/worktree/pool.ts
retiredSlots: WorktreeSlot[]andretryCounter: Map<number, number>to
WorktreePool. Reworkacquire()to retire a preserved slotinto
retiredSlots, bump the counter, and mint a fresh stub.Rework
prepare()so retried slots resolve to<worktreeDirOf(n)>-<counter>. Updateshutdown()andinspect()to cover both live and retired slots.
slot blocks all downstream tasks.
existing
workerCount=1invariant, the existingWorktreePool.prepare"preserved" throw (as defence in depth), andthe existing
"preserved slots are not reusable"test (stillpasses).
src/pilot/worker/worker.ts
session.createinrunOneTask, open an append-modeJSONL writer at
getTaskJsonlPath(cwd, runId, taskId). Register atask-scoped bus subscription that (a) appends each event as a JSON
line and (b) tracks
lastEventTs+eventCountcounters. Ensurethe subscription unsubscribes on every return path. Add
reason: <string>to everyappendEvent({kind: "task.failed", payload})call — the
reasonlocal variable is already computed formarkFailedSafe; just carry it into the event. Stall pathadditionally carries
eventCount+lastEventTs.and summary; persists forensic state for post-mortem of silent
stalls.
existing event fields removed or renamed. New JSONL writes are
best-effort (try/catch; failure doesn't fail the task).
src/pilot/cli/build.ts
startStreamingLogger— ontask.failed, emit acontinuation
→ <phase>: <reason>line when payload carries bothfields. On
task.blocked, count instead of print. Onrun.finished, emit the blocked-cascade summary line if anyoccurred. Extend
printSummary— acceptdb: Databasein args;when
counts.failed + counts.aborted > 0, render the per-taskfailure block between the counts line and the follow-up commands.
Update
executeRun's call toprintSummaryto pass the db handle.of this plan.
fail" so success-path output is byte-identical to today.
src/pilot/cli/discover.ts
runId !== undefinedbranch. Fast-pathgetStateDbPath(cwd, runId)unchanged. On miss, computeresolveBaseDir(),fs.readdirits direct children, and check<base>/<child>/pilot/runs/<runId>/state.dbfor each. Return thefirst hit. On total miss, throw with all paths tried.
pilot status --run <id>invoked from a different worktree(or different repo) of the run that created it currently fails with
"no state.db" — this is the bug that prevented me from investigating
the failing run in the first place.
test/pilot-paths.test.ts
getTaskJsonlPath— creates directory,returns expected path, rejects unsafe taskIds (
../, leading dot).test/pilot-worker.test.ts
stalls, the resulting
task.failedevent payload haseventCount: 2, a non-nulllastEventTs, and the JSONL filecontains exactly two parseable JSON lines.
test/pilot-worktree-pool.test.ts
describe("WorktreePool — retires preserved slot on re-acquire")with test
"retires preserved slot and mints fresh path":acquire → prepare T1 → preserve → acquire again → expect fresh
stub → prepare T2 → expect success with
-1-suffixed path →inspect()shows two slots. Existing"preserved slots are not reusable"test must still pass(prepare on the preserved slot itself still throws).
test/pilot-cli-build.test.ts
describe("startStreamingLogger", ...)block:"prints phase and reason on task.failed","de-noises blocked cascade","tolerates task.failed without phase/reason".Add new
describe("printSummary — failure block", ...)that usesopenStateDb(":memory:"), seeds a run with one failed + oneaborted task, and asserts the rendered block matches the expected
format.
test/pilot-cli-status.test.ts
"resolves run from a different worktree of the same repo"—uses
GLORIOUS_PILOT_DIRenv override to build a<base>with tworepo-folder subdirs, drops a state.db in one, chdirs into a dir
whose
getRepoFolderresolves to the other, expectsdiscoverRunto find it.
.changeset/pilot-terminal-failure-visibility.md
"@glrs-dev/harness-opencode": patchfrontmatter and a 1-2 sentence body describing the user-visible
fix (pilot build now prints failure phase/reason, summary carries
per-task detail, blocked cascade de-noised,
pilot status --run <id>works from any worktree).Test plan
bun test test/pilot-paths.test.ts— coversgetTaskJsonlPath.bun test test/pilot-worktree-pool.test.ts— covers the retire-on-preserve behaviour; existing tests must still pass.bun test test/pilot-worker.test.ts— covers enrichedtask.failedpayload + JSONL capture.bun test test/pilot-cli-build.test.ts— covers streaming-logger failure detail + blocked de-noise + printSummary failure block.bun test test/pilot-cli-status.test.ts— covers cross-worktree discover.bun test(full suite) — final gate.bun run typecheck— after every surface touched.bun run build— before shipping; ensuresdist/emits cleanly.Out of scope
Worktree preserved at
/Users/austinhess/.glorious/opencode/kn-eng/pilot/worktrees/01KQ5GD4EZ0EWZT5W7YJCDA4A5/00/for separate investigation; this plan makes NEXT such stall
diagnosable, doesn't chase this one.
--repo <path>override flag onpilot status/pilot logs.User picked cross-repo scan (option A); this path stays open for
future work if scan performance becomes an issue (it won't — ULIDs
are cheap).
scan is cheaper and ULIDs prevent collisions.
pilot validate(they're static globs, not race warnings).Open questions