fix(local): recover persona writer output from disk when stdout truncates#112
Conversation
…ates claude --print buffers a full response then emits it. For large workflow artifacts the LLM's max_output_tokens cap kicks in and the JSON gets cut off mid-emit — observed in production as ~38KB of valid JSON ending at `"language": "typescript",` with no closing braces and no `artifact.content`. The parser then throws "Workforce persona response must be structured JSON or include fenced TypeScript artifact and JSON metadata blocks" and Ricky reports a parse-error, even though the writer typically *did* succeed at writing the workflow source to disk via the Write tool (reliable now that bypass-permissions ships through the resolver — see fix/headless-claude-bypass-permissions). This commit adds a freshness-gated disk fallback so the parser can recover from a truncated stdout without trusting stale on-disk leftovers: - `PersonaResponseParseOptions.writerInvokedAtMs`: caller pins a `Date.now()`-style timestamp *before* spawning the writer subprocess. - `PersonaResponseParseOptions.statFile`: stat-by-path override for deterministic tests; defaults to `fs.statSync` (ENOENT → undefined). - `recoverArtifactFromTruncatedOutput`: only fires after every other parse path has rejected the stdout. It reads `expectedPath` from disk ONLY when (a) `writerInvokedAtMs` is set AND (b) the file's `mtimeMs` is strictly greater than that timestamp — proving the file is the current writer's output, not a stale artifact from a prior attempt. - The recovered content still has to pass `validateArtifactContent` (the existing `workflow(...)` builder check). A junk file on disk cannot smuggle itself past the structural validation. - `writeWorkflowWithWorkforcePersona` captures `writerInvokedAtMs` before its `sendMessage` call and threads it into the parser, opting in. Other callers of `parsePersonaWorkflowResponse` that don't pass the timestamp keep their prior behavior — the fallback never fires. Coverage: 5 new tests in `workforce-persona-writer.test.ts`: - recovers from truncated stdout when the file on disk is fresh - does NOT fall back to a stale file (mtime older than writerInvokedAtMs) - does NOT fall back when writerInvokedAtMs is absent - does NOT fall back when no file exists at expectedPath - does NOT fall back when the on-disk file fails structural validation 151 of 151 generation-suite tests pass. Pre-existing typecheck errors in `scheduled-agent.ts` are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR extends ChangesPersona Writer Truncated Stdout Disk Fallback
Possibly related PRs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/product/generation/workforce-persona-writer.test.ts (2)
297-309: ⚡ Quick winAssert the recovery
reasonmetadata to lock the contract.Line 308 only checks
recoveredFromDisk; it doesn’t verify the fallback reason string described by this PR. Adding it prevents silent regressions in downstream telemetry/UX expectations.Suggested test assertion update
- expect(parsed.metadata).toMatchObject({ recoveredFromDisk: true }); + expect(parsed.metadata).toMatchObject({ + recoveredFromDisk: true, + reason: 'stdout-truncated-or-unparseable', + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/product/generation/workforce-persona-writer.test.ts` around lines 297 - 309, The test currently only asserts parsed.metadata.recoveredFromDisk; add a strict assertion on the fallback reason string produced by parsePersonaWorkflowResponse to lock the contract: after creating parsed, add expect(parsed.metadata.reason).toBe('truncated-stdout') (or the exact fallback reason string introduced by this PR) so the test validates both recoveredFromDisk and the specific metadata.reason returned by parsePersonaWorkflowResponse.
311-359: ⚡ Quick winAdd an explicit equality-boundary test for mtime freshness gating.
The suite checks older/newer files, but not the strict boundary
mtimeMs === writerInvokedAtMs. Since behavior is “strictly greater than,” this edge case should be pinned.Suggested boundary test
+ it('does NOT fall back when mtime equals writerInvokedAtMs (strict > gate)', () => { + const artifactPath = 'workflows/generated/persona.ts'; + const writerInvokedAtMs = 1_000; + expect(() => + parsePersonaWorkflowResponse(truncatedJsonOutput, artifactPath, { + repoRoot: '/tmp/repo', + writerInvokedAtMs, + statFile: () => ({ mtimeMs: writerInvokedAtMs }), + readFileText: () => workflowSource(), + }), + ).toThrow(/structured JSON or include fenced TypeScript artifact/); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/product/generation/workforce-persona-writer.test.ts` around lines 311 - 359, Add a test asserting the strict boundary for mtime freshness: call parsePersonaWorkflowResponse with writerInvokedAtMs set (e.g., 1_000), have statFile return { mtimeMs: writerInvokedAtMs } (equal, not greater), use the existing truncatedJsonOutput and workflowSource helpers, and expect it to throw the same "structured JSON or include fenced TypeScript artifact" error; this pins the “strictly greater than” behavior in parsePersonaWorkflowResponse.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 232-235: The timestamp writerInvokedAtMs is captured too early
(before async resolver/task setup) allowing unrelated writes to outputPath to
appear newer; move the Date.now() assignment so writerInvokedAtMs is set
immediately before the spawn/sendMessage call (i.e., right before the
sendMessage(...) invocation around where the spawn occurs) so the mtime check
reliably proves this run wrote the file; update any references to
writerInvokedAtMs accordingly and ensure no other code path reads it earlier.
---
Nitpick comments:
In `@src/product/generation/workforce-persona-writer.test.ts`:
- Around line 297-309: The test currently only asserts
parsed.metadata.recoveredFromDisk; add a strict assertion on the fallback reason
string produced by parsePersonaWorkflowResponse to lock the contract: after
creating parsed, add expect(parsed.metadata.reason).toBe('truncated-stdout') (or
the exact fallback reason string introduced by this PR) so the test validates
both recoveredFromDisk and the specific metadata.reason returned by
parsePersonaWorkflowResponse.
- Around line 311-359: Add a test asserting the strict boundary for mtime
freshness: call parsePersonaWorkflowResponse with writerInvokedAtMs set (e.g.,
1_000), have statFile return { mtimeMs: writerInvokedAtMs } (equal, not
greater), use the existing truncatedJsonOutput and workflowSource helpers, and
expect it to throw the same "structured JSON or include fenced TypeScript
artifact" error; this pins the “strictly greater than” behavior in
parsePersonaWorkflowResponse.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d42ca3bb-4cc1-4535-8086-cd7bb756391d
📒 Files selected for processing (2)
src/product/generation/workforce-persona-writer.test.tssrc/product/generation/workforce-persona-writer.ts
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/product/generation/workforce-persona-writer.ts">
<violation number="1" location="src/product/generation/workforce-persona-writer.ts:235">
P2: Capture `writerInvokedAtMs` immediately before spawning `sendMessage`; taking it earlier weakens the freshness check and can recover the wrong on-disk artifact.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Re-trigger cubic
Both coderabbit and cubic-dev-ai flagged the same issue: pinning `writerInvokedAtMs = Date.now()` at the top of `writeWorkflowWithWorkforcePersona` runs *before* the `await resolver(...)` and `buildWorkflowPersonaTask(...)` calls. The resolver loads personas (filesystem + bundle merge) and the task builder reads spec files, so the window between the original capture and the actual `sendMessage` spawn can stretch into multiple seconds. A stale leftover file at `outputPath` whose mtime lands inside that window would falsely satisfy the freshness check and impersonate the current writer's output. Moves the capture to the line immediately above `resolved.context. sendMessage(task, ...)` — after all preceding awaits, before any code that could plausibly write to outputPath — so the timestamp now strictly precedes the spawn. The comment is rewritten to be explicit about why "immediately before" matters. All 55 writer-suite tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed the coderabbit + cubic-dev-ai finding on Moved the 55 of 55 writer-suite tests still pass. Branch updated at 4595f61. |
Ricky Eval ReviewRun: Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
Summary
Stop losing 22+ minutes of writer work to LLM output truncation. When the writer subprocess's stdout is cut off mid-JSON (because the LLM hit its
max_output_tokenscap before finishing the artifact-content string), the parser now falls back to reading the workflow file from disk — but only when the file's mtime proves it was actually written by THIS writer call, not a stale leftover.The bug this fixes
Driving 13 per-track sub-specs through
ricky --mode local --spec-file <spec>.md --run --workforce-personaproduced this failure mode after the bypass-permissions fix (#110) made the writer reliable enough to actually emit real content:claude
--printbuffers a full response then emits it. For large workflow artifacts (JSON-escaping a 20KB TypeScript source blows out to 50KB+ of output tokens), the LLM'smax_output_tokenscap kicks in and the JSON gets cut off mid-contentfield. There's no claude CLI flag for raising the cap.But the writer almost always did succeed at writing the workflow to disk via the Write tool — that part runs through the tool harness, not stdout. The parser was just throwing away a valid on-disk artifact because it had no parseable stdout wrapper.
What changed
PersonaResponseParseOptions.writerInvokedAtMs— caller pins aDate.now()-style timestamp before spawning the writer subprocess.PersonaResponseParseOptions.statFile— stat-by-path override for deterministic tests; defaults tofs.statSync(ENOENT →undefined).recoverArtifactFromTruncatedOutput— new private helper. Fires only after every other parse path has rejected the stdout. ReadsexpectedPathfrom disk ONLY when (a)writerInvokedAtMsis set AND (b) the file'smtimeMsis strictly greater than that timestamp — proving the file is the current writer's output, not a stale artifact from a prior attempt.validateArtifactContent(the existingworkflow(...)builder check). A junk file on disk cannot smuggle itself past the structural validation. If structural validation fails, the original truncation error is re-raised so the caller sees the real symptom.writeWorkflowWithWorkforcePersonacaptureswriterInvokedAtMsimmediately before itssendMessagecall and threads it into the parser. Other callers ofparsePersonaWorkflowResponsethat don't pass the timestamp keep their prior behavior — the fallback never fires for them.Why the freshness check matters
A naive "if a file exists at expectedPath, use it" fallback would let a stale leftover from a crashed prior attempt impersonate the current writer's output. By gating on
mtime > writerInvokedAtMs, the fallback proves the file was written during thissendMessagewindow. Plus the resulting metadata carries{ recoveredFromDisk: true, reason: 'stdout-truncated-or-unparseable' }so downstream callers and the auto-fix repair contract (#109) can see exactly why this artifact survived.Test plan
Five new tests in
workforce-persona-writer.test.ts:recoveredFromDisk: truemetadata.statFilereturns an mtime older thanwriterInvokedAtMs→ parser refuses, re-raises the original truncation error.writerInvokedAtMs→ fallback never fires, prior behavior preserved.statFilereturnsundefined(ENOENT) → fallback skipped, original error re-raised.mtime > writerInvokedAtMsbut content isexport const broken = 1;(noworkflow(call) →validateArtifactContentrejects, original error re-raised.151 of 151 generation-suite tests pass (was 146 + 5 new). Pre-existing
scheduled-agent.tstypecheck errors are unrelated.Stacks with
npm install -g.bypassPermissions(the prerequisite for this PR's fallback to be useful — without bypass-permissions the writer can't reliably use the Write tool, so there's no file on disk to fall back to).🤖 Generated with Claude Code