fix: [ENG-2126] address 4 dream spec-violation gaps post-merge#434
fix: [ENG-2126] address 4 dream spec-violation gaps post-merge#434danhdoan merged 2 commits intoproj/dreamingfrom
Conversation
Fixes four implementation gaps in proj/dreaming where shipped code violated the spec at notes/byterover-dream/. Filed together because they share the same surface and are all "spec says X, code does Y" bugs. Issue 1 — Counter race in incrementCurationCount With AGENT_MAX_CONCURRENT_TASKS=5, parallel curates raced on the read-modify-write of dream-state.json, dropping increments (empirically 7 parallel curates → counter at 3, 4 lost). Activity gate could stay locked silently, blocking auto-dream at agent-idle. Fix: serialize all writers via a generic update(updater) on DreamStateService, backed by a per-file AsyncMutex. incrementCurationCount, dream-executor's step-7 reset, and consolidate's pendingMerges clear all route through update() so no two writers can interleave their RMW windows. Issue 2 — pendingMerges never consumed by consolidate prune.ts wrote SUGGEST_MERGE entries to dreamState.pendingMerges for the next dream's Consolidate to pick up, but the consumer was never wired — entries sat forever, MERGE_INTO suggestions never landed. Fix: per notes/byterover-dream/6-dream-undo-and-cross-cycle.md:89-143, add loadAndClearPendingMerges() at the start of consolidate(): read pendingMerges, parallel-stat each sourceFile, add existing ones to changedFiles, surface mergeTarget+reason as non-binding LLM hints, clear pendingMerges unconditionally regardless of LLM outcome. Stale sourceFiles (deleted between dreams) are silently skipped. Issue 3 — Gate 3 (queue) bypassed on CLI dispatch path DreamTrigger was constructed with getQueueLength: () => 0 at the agent side. The idle-trigger path pre-checked gates 1-3 before dispatch, but the CLI path went straight to the agent with gate 3 effectively disabled, so brv dream (no --force) ran even when other tasks were queued. Fix: add a preDispatchCheck hook to TaskRouter; daemon wires the same getQueueLength resolver as the idle path; only dream tasks without --force are pre-checked. Pre-check errors fail open (agent gates 1+2 stay as defense-in-depth). Issue 4 — Partial dreams lost operations and broke undo allOperations was assembled only after all three ops succeeded; on timeout/error the catch block hardcoded operations:[] and zero summary. A consolidate that committed real file merges then a synthesize timeout would leave the context tree modified with no audit trail and no undo path (dream-undo reads operations from the log). Fix: hoist allOperations above the try block, push results incrementally after each step, use the accumulator in both partial and error log entries. Also run createReviewEntries for review-flagged ops that did complete on partial runs (gated on non-empty allOperations to preserve the "no log → no review entries" invariant on early failure). Tests - 28 new unit tests covering each fix's failure mode (counter race regression, pendingMerges round-trip, gate 3 short-circuit, partial-log preservation, review-entry creation from partial) - All 308 dream-related unit tests pass Verified interactively - Issue 1: 5 brv curate --detach + brv dream --force in parallel — all 5 increments accounted for in dream-state.json (4 consumed, 1 preserved) - Issue 2: seeded pendingMerge — first CONSOLIDATE op was the sourceFile→mergeTarget pair, pendingMerges cleared after the run - Issue 3: 8 parallel curates → CLI dream skipped with "Queue not empty (3 tasks pending)" - Issue 4: 3 regression tests force budget abort + non-abort error, asserting partial log + review entry creation
|
Claude finished @RyanNg1403's task in 3m 58s —— View job PR Review — fix: [ENG-2126] 4 dream spec-violation gaps
Overall Score: 4.5 / 5Verdict: Approve with minor notes. All four fixes are correct, well-scoped, and well-tested. The surgical framing holds — no surrounding code was unnecessarily touched. A handful of minor issues found, none blocking. Issue-by-Issue AssessmentIssue 1 — Counter race (
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cec2dd944d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses 10 review items from RyanNg1403 (Claude Code) and ChatGPT Codex on PR #434. Two real bugs (1× P1, 1× P2) found by Codex; the rest are documentation / test-assertion improvements from RyanNg. P1 (Codex) — task-router pre-dispatch skip polluted AgentPool counter preDispatchCheck's ineligible branch routed through handleTaskCompleted, which unconditionally calls agentPool.notifyTaskCompleted(projectPath). But the task was never submitted to the pool, so decrementing activeTasks under-represents real load and can let drainQueue dispatch an extra queued task beyond per-agent concurrency. Fix: extracted handleTaskSkippedByPreCheck — emits task:completed to client + project room and calls moveToCompleted, but skips notifyTaskCompleted AND lifecycle hooks (also resolves RyanNg #5 which flagged the lifecycle-hook side of the same call site). Three new regression tests assert the pool counter and onTaskCompleted hooks are not touched on skip, while the project broadcast still fires. P2 (Codex) — duplicate review entries on late dream failures dream-executor's success path runs createReviewEntries (step 6b), then dreamStateService.update (step 7). If step 7 throws, control jumps to catch which re-invokes createReviewEntries on the same allOperations, producing duplicate pending review entries. Fix: reviewEntriesWritten boolean set after the success-path call; catch-path call gated on !reviewEntriesWritten in addition to the existing allOperations.length > 0 invariant. One new regression test forces step 7 to throw after step 6b succeeds and asserts createReviewEntries is invoked exactly once. Doc / test improvements - dream-state-service.ts: comment on stateMutexes' monotonic growth bound (#1); JSDoc on write() cross-referencing update() to warn against bypassing the mutex (#2). - consolidate.ts: JSDoc on loadAndClearPendingMerges explaining the intentional two-phase access pattern (unguarded read for hints + mutex-guarded update for clear) and the fail-open behavior when the initial read throws (#3, #4). - consolidate.test.ts: assert update.calledOnce instead of write.calledOnce — surfaces intent (the contract) rather than the stub's implementation detail (#8). Verification - Lint: 0 errors (perfectionist/sort-classes flagged class-member order during audit; resolved by repositioning handleTaskSkippedByPreCheck alphabetically between handleTaskError and handleTaskStarted). - Typecheck: clean. - Unit tests: 6238 passing, 16 pre-existing pending, 0 failing. - /brv-code-quality skill: passed all 6 audit gates.
|
Claude finished @RyanNg1403's task in 2m 7s —— View job PR Review — fix: [ENG-2126] 4 dream spec-violation gaps (post-feedback update)
|

Summary
proj/dreamingwhere the shipped code violated the spec atnotes/byterover-dream/. All filed together because they share the same project surface and are all "spec says X, code does Y" bugs: counter race in the curation gate,pendingMergesnever consumed by the next consolidate, queue gate (Gate 3) bypassed on the CLI dispatch path, and partial/error dreams losing the operations audit trail (which also brokebrv dream --undofor partial runs).MERGE_INTOsuggestions silently lingered forever;brv dreamcompeted with queued tasks instead of yielding; and a dream that committed real merges before timing out left the context tree modified with no audit trail and no recovery path.update(updater)onDreamStateServicebacked by a per-file mutex, with all dream-state RMW callers (incrementCurationCount, executor's step-7 reset, consolidate'spendingMergesclear) routed through it;loadAndClearPendingMerges()consumer at the start of consolidate pernotes/byterover-dream/6-dream-undo-and-cross-cycle.md:89-143;preDispatchCheckhook onTaskRouterso the daemon pre-checks gate 3 on the CLI path (matching the existing idle-trigger pre-check);allOperationsaccumulator hoisted above the try block so partial/error log entries carry the ops that did complete, withcreateReviewEntriesinvoked for partial dreams.Type of change
Scope (select all touched areas)
Linked issues
ENG-2126(link); stacks onproj/dreamingafter enhance: [ENG-2128] inline file contents in operation prompts #432 (ENG-2128)Root cause (bug fixes only, otherwise write
N/A)incrementCurationCountdid unguarded read-modify-write ondream-state.json. Comment atdream-state-service.ts:27-28claimed "single-writer assumption" butAGENT_MAX_CONCURRENT_TASKS=5(constants.ts:94) allows up to 5 parallel curates per project — increments were lost to last-writer-wins. Two adjacent RMW sites (executor step-7 reset, consolidate pendingMerges clear) had the same race shape.prune.ts:417-424wroteSUGGEST_MERGEdecisions intodreamState.pendingMergesfor the next dream's Consolidate to pick up — the consumer was specified innotes/byterover-dream/6-dream-undo-and-cross-cycle.md:89-143but never wired inconsolidate.ts. Entries sat in state forever.agent-process.ts:513constructedDreamTriggerwithgetQueueLength: () => 0and a comment "this task is the only active one" — false underAGENT_MAX_CONCURRENT_TASKS=5. The idle-trigger path pre-checked gates 1-3 inbrv-server.ts:260before dispatch, but the CLI path went straight through with the agent-side gate 3 effectively disabled.dream-executor.ts:149declaredallOperationsafter the three op awaits; the catch handler had nothing to reference and hardcodedoperations:[]andsummary:zeroes. A consolidate that committed real merges followed by a synthesize timeout left files modified with no log record andbrv dream --undoas a no-op.incrementCurationCountagainst itself — flagged by PE review on this branch and addressed in the second-pass commit.notes/byterover-dream/6-...mdbut the consumer-side work was lost in the parent ticket split: ENG-2064 (parent) was marked Duplicate after splitting into ENG-2069 (Consolidate) and ENG-2070 (Undo); the cross-cycle consumer fell through the cracks.Test plan
test/unit/infra/dream/dream-state-service.test.ts— concurrent-increment +update()race regressionstest/unit/infra/dream/operations/consolidate.test.ts—pendingMerges consumptiondescribe block (6 tests)test/unit/infra/executor/dream-executor.test.ts— partial-log preservation on budget abort + non-abort error + review entries from partialtest/unit/infra/process/task-router.test.ts— eligible dispatch, ineligible short-circuit with skip reason, pre-check fail-open, no-op when absentDreamStateServiceinstances + interleavedupdate()↔incrementCurationCountcalls — both preserve all writeschangedFiles; stale sourceFile silently skipped; pendingMerges cleared after the run regardless of LLM outcome;mergeTarget/reasonsurfaced in prompt; no-op when service absent or list emptydreamshort-circuits withDream skipped: Queue not empty (N tasks pending);--forcebypasses; daemon pre-check errors fail open (agent gates 1+2 still apply)computeSummary(...); review entries created for review-flagged ops that did complete on partial runsUser-visible changes
None for normal
brv dreamusage — CLI surface, flags, output format, and dream-log schema are unchanged. Behavior changes visible only in edge cases:brv dream(no--force) printsDream skipped: Queue not empty (N tasks pending)when the agent has queued work, instead of running on top of it.brv dream --undonow works on partial dreams (previously a no-op when the dream timed out mid-pipeline).dream-log/*.jsonfor partial/error dreams now contains the operations that did complete (previouslyoperations: []).Evidence
dream-state-service,consolidate,synthesize,prune,dream-executor,dream-trigger,dream-undo,task-router).local-auto-test/dreaming/run.sh): 24/24 scenarios passing on this branch. Notable:force-merge: produces 1CONSOLIDATE/MERGEop as expectedforce-synthesize: produces 1SYNTHESIZE/CREATEop (confidence 1.0, 3 sources from 3 fixture domains)force-archive: producesPRUNE/ARCHIVEop, file removed and stub created in_archived/brv curate --detach+brv dream --forcein parallel → all 5 increments accounted for indream-state.json(4 consumed by the dream, 1 preserved post-step-7-reset)pendingMergeindream-state.json, ranbrv dream --force— firstCONSOLIDATEop was thesourceFile → mergeTargetpair,pendingMergescleared after the runbrv dream→Dream skipped: Queue not empty (3 tasks pending). Once queue drained, the samebrv dreamcompleted normallyChecklist
npm test)npm run lint— 0 errors; 186 pre-existing warnings unrelated to this PR)npm run typecheck)npm run build)main— branch is rebased on the latestproj/dreaming(which is itself ahead ofmainvia the dreaming feature work)Risks and mitigations
preDispatchCheckhook fail-open posture for gate 3 (Issue 3). If the daemon's gate-3 check throws, the dream proceeds anyway.AsyncMutexadds a small amount of serialization overhead to dream-state writes (Issue 1).DreamStateServiceinstances pointing at the same project share ordering without affecting other projects.brv review pendingthat the user didn't expect to see (Issue 4).notes/byterover-dream/6...md:181): "always write log and update state (even on partial/error)" implies review wiring is preserved. Without this, a partial dream's MERGE that wrote real files would never appear in review. Gated onallOperationsbeing non-empty so early failures (zero ops completed) still produce no review entries — the "no log → no review entries" invariant is preserved.