enhance: [ENG-2128] inline file contents in operation prompts by danhdoan · Pull Request #432 · campfirein/byterover-cli

danhdoan · 2026-04-16T08:51:15Z

Summary

Problem: Consolidate and synthesize operations in the dream feature produce zero output on clear-cut cases. Files with identical titles don't merge. Cross-domain patterns don't synthesize. Unit tests pass because they mock the LLM — production LLM calls return empty actions silently.
Why it matters: The dreaming feature can't ship with two of three operations broken in practice. Users running brv dream see "completed" with zero operations and no errors — no way to know the feature isn't working.
What changed: Inline file contents directly in LLM prompts. Remove the sandbox-variable + code_exec indirection that the LLM often skipped.
Update unit tests to assert on prompt content.
What did NOT change (scope boundary): Dream executor orchestration, gate logic, undo flow, dream log schema, operation schemas, review wiring. The silent catch {} blocks remain as a separate follow-up.

Type of change

Scope (select all touched areas)

Linked issues

Closes #
Related # (dreaming feature branch proj/dreaming)

Root cause (bug fixes only)

Root cause: Consolidate and synthesize passed file contents via agent.setSandboxVariableOnSession(sessionId, '__dream_*_files', payload) and instructed the LLM to invoke code_exec to read the variable before classifying. The LLM frequently skipped code_exec, evaluated paths without content, and returned an empty actions array. Prune already inlined content so it was unaffected.
Why this was not caught earlier: Unit tests mock the agent with canned responses, so the operation code path runs correctly in tests regardless of whether the live LLM would actually read the sandbox variable. The gap is between "code handles the response correctly" (tested) and "LLM produces that response on real input" (not tested). Added a pre-release auto-test suite at local-auto-test/dreaming/ to close this gap with real LLM calls against designed-to-force-operations fixtures.

Test plan

Coverage added:
- Unit test (updated existing to assert prompt content)
- Integration test
- Manual verification (auto-test suite)
Test file(s):
- test/unit/infra/dream/operations/consolidate.test.ts
- test/unit/infra/dream/operations/synthesize.test.ts
- test/unit/infra/dream/operations/prune.test.ts
Key scenario(s) covered:
- Consolidate prompt includes PATH markers and sibling file contents
- Synthesize prompt includes DOMAIN markers for each domain's _index.md
- Prune candidate blocks use marker format with 1000-char preview
- Auto-test force-merge: near-duplicate files → MERGE operation
- Auto-test force-synthesize: 3 domains sharing FlushCoordinator → SYNTHESIZE
- Auto-test force-archive: stale low-importance draft → ARCHIVE (unchanged)

User-visible changes

None for users of brv dream. The feature now produces operations it previously didn't produce, but the CLI surface, flags, and output format are unchanged. Dream log schema unchanged.

Evidence

Auto-test results before vs after the fix:

Before (unstashed code): 19/21 passed

force-merge: ✗ 0 operations (expected MERGE)
force-synthesize: ✗ 0 operations (expected SYNTHESIZE)
force-archive: ✓ ARCHIVE produced

After (this PR): 24/24 passed

force-merge: ✓ MERGE produced (1 file merged, other deleted)
force-synthesize: ✓ 2 SYNTHESIZE operations (files written with type: synthesis frontmatter)
force-archive: ✓ ARCHIVE (unchanged)

Unit tests: 6169 passing, 0 failing. Build clean. Lint clean (warnings pre-existing).

Checklist

Tests added or updated and passing (npm test)
Lint passes (npm run lint)
Type check passes (npm run build with tsc -b)
Build succeeds (npm run build)
Commits follow Conventional Commits format
Documentation updated (N/A — no user-facing change)
No breaking changes
Branch is up to date with proj/dreaming

Risks and mitigations

Risk: Inlining full file contents in the prompt could exceed model context limits on very large contexts.
- Mitigation (consolidate): capPayloadSize() enforces MAX_PAYLOAD_CHARS = 200_000 with changed-files-first eviction.
- Mitigation (prune): Implicit cap — max 20 candidates × 1000-char preview per candidate (~20KB ceiling).
- Gap (synthesize): No explicit cap on inlined _index.md content. Summaries are typically small, but a vault with many verbose domain summaries could hit model limits. Tracked as a follow-up task to add a cap similar to consolidate's.
Risk: Prompt wording shifted from "when in doubt, SKIP" to less conservative framing. Could produce false-positive merges.
- Mitigation: All MERGE and TEMPORAL_UPDATE operations already set needsReview: true. Operations flagged for review are queued in brv review pending before affecting the context tree permanently.
  brv dream --undo can also revert the last dream wholesale.
Risk: Follow-up needed for silent error handling.
- Mitigation: Not addressed in this PR to keep scope tight. Tracked as a separate item — summary.errors should reflect caught failures in the dream log so users can see when LLM calls fail.

github-actions · 2026-04-16T08:51:29Z

Claude finished @danhdoan's task in 2m 52s —— View job

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

Gather context and understand the request
Read implementation files (consolidate, synthesize, prune)
Read test files
Review diff vs base branch
Analyze for correctness, security, and code quality
Post review with Conventional Comments and score

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling. All three operations have been updated consistently, and the test suite now validates prompt content rather than just call counts.

What's good

Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching unrelated orchestration or schema code.
capPayloadSize already existed and is correctly reused for consolidate. The 200K char cap protects against context overflow.
Atomic writes (randomUUID temp file + rename) are preserved throughout—no regressions on write safety.
Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact for LLM-supplied placement values.
Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected, not executed.
Test assertions upgraded—new tests now verify that PATH: markers and file body content appear in the actual prompt, which is the right level of test for this kind of behavioral fix.
Scope is tight. Orchestration, gate logic, dream log schema, and undo flow are untouched per the stated boundary.

Issues found (inline comments posted)

#	File	Severity	Summary
1	`consolidate.ts:306`	nit	`marker` string recreated on every `map` iteration — hoist it outside
2	`prune.ts:290`	nit	`.slice(0, 1000)` on `contentPreview` is redundant (already capped in `buildCandidatePayload`)
3	`consolidate.ts:311` `synthesize.ts:301`	suggestion	Missing explicit `"do NOT use code_exec"` instruction — `prune.ts` has it; consolidate/synthesize don't. The root cause of this PR was exactly the LLM calling `code_exec`; worth being explicit in all three.
4	`synthesize.ts:68`	issue (tracked)	No size cap on inlined `_index.md` content, unlike consolidate's `capPayloadSize`. Acknowledged in PR risks — worth a TODO comment so it's not lost.

Security notes

Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. Mitigation is the same as always: LLM output is schema-validated before any filesystem action is taken.
No new attack surface introduced. Removing the sandbox-variable path actually reduces complexity.

Minor observations

setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. That's fine—the interface requires the method and other operations may still use it—but the stubs silently pass even if the implementation accidentally re-adds the call. Consider adding a expect(agent.setSandboxVariableOnSession.called).to.be.false assertion in the prompt-content tests to lock the removal in place.

github-actions · 2026-04-16T09:17:02Z

Claude finished @danhdoan's task in 2m 48s —— View job

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

Gather context and understand the request
Read changed files (consolidate, synthesize, prune + tests)
Review diff vs base branch (origin/proj/dreaming)
Analyze for correctness, security, and code quality
Post inline comments and summary review

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling (19/21 → 24/24). All three operations updated consistently. Tests now validate prompt content rather than sandbox-variable call counts, which is the right level of test for a behavioral prompt change.

What's good

Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching orchestration, gate logic, schema, or undo flow.
capPayloadSize correctly reused for consolidate. The 200K char cap protects against context overflow.
Atomic writes (randomUUID temp file + rename) preserved throughout — no regressions on write safety.
Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact.
Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected before any filesystem action.
Test assertions upgraded — the new tests verify PATH: / DOMAIN: markers and file body content appear in the actual prompt, locking the behavioral fix in place.
Scope is tight. Orchestration, gate logic, dream log schema, and undo flow untouched as stated.

Issues found (inline comments posted)

#	File	Severity	Summary
1	`consolidate.ts:306`	nit	`marker` recreated on every `map` iteration — hoist it outside (synthesize/prune already do this)
2	`prune.ts:290`	nit	`.slice(0, 1000)` is redundant — `buildCandidatePayload` already caps `contentPreview` at 1000 chars
3	`consolidate.ts:312` `synthesize.ts:302`	suggestion	Missing `"Do NOT use code_exec."` instruction — `prune.ts` has it; the root cause of this PR was exactly the LLM calling `code_exec` in these two files
4	`synthesize.ts:68`	issue (tracked)	No size cap on inlined `_index.md` content. Acknowledged in PR risks — worth a `TODO` comment so it isn't lost

Security notes

Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. LLM output is schema-validated before any filesystem action — same mitigation as always.
Removing the sandbox-variable path actually reduces complexity. No new attack surface introduced.

Minor observations

setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. The interface requires the method so the stubs are fine, but they silently pass even if the implementation accidentally re-adds the call. Consider adding expect(agent.setSandboxVariableOnSession.called).to.be.false in the prompt-content tests (consolidate includes path siblings test, synthesize creates session test, prune dedup between signals test) to lock the removal in place across all three operations.

github-actions · 2026-04-16T09:21:37Z

Claude finished @danhdoan's task in 3m 1s —— View job

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

Gather context and understand the request
Read changed files (consolidate, synthesize, prune + tests)
Review diff vs base branch (origin/proj/dreaming)
Analyze for correctness, security, and code quality
Post inline comments and summary review

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling (19/21 → 24/24). All three operations updated consistently, and the test suite now validates prompt content rather than sandbox-variable call counts, which is the right level of test for a behavioral prompt change.

What's good

Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching orchestration, gate logic, schema, or undo flow.
capPayloadSize correctly reused for consolidate. The 200K char cap protects against context overflow.
Atomic writes (randomUUID temp file + rename) preserved throughout — no regressions on write safety.
Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact.
Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected before any filesystem action.
Test assertions upgraded — the new tests verify PATH: / DOMAIN: markers and file body content appear in the actual prompt, locking the behavioral fix in place.
Prompt wording tightened across all three operations: consolidate now says "Default to MERGE… SKIP only when genuinely unrelated" and "Returning all SKIP when duplicates exist is a failure, not caution." The prune rules are now specific criteria rather than "be conservative." These changes are intentional and correct given the root cause.
Scope is tight. Orchestration, gate logic, dream log schema, and undo flow untouched as stated.

Issues found (inline comments posted)

#	File	Severity	Summary
1	`consolidate.ts:304–309`	nit	`marker` recreated on every `map` iteration — hoist it outside (synthesize/prune already do this)
2	`prune.ts:290`	nit	`.slice(0, 1000)` redundant — `buildCandidatePayload` already caps `contentPreview` at 1000 chars
3	`consolidate.ts:312` `synthesize.ts:302`	suggestion	Missing `"Do NOT use code_exec."` instruction — `prune.ts` has it; this PR's root cause was exactly the LLM making `code_exec` calls in these two files
4	`synthesize.ts:68`	issue (tracked)	No size cap on inlined `_index.md` content. Acknowledged in PR risks — worth a `TODO` comment so it isn't lost

Security notes

Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. LLM output is schema-validated before any filesystem action — same mitigation as always.
Removing the sandbox-variable path actually reduces complexity. No new attack surface introduced.

Minor observations

setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. The interface requires the method so the stubs are necessary, but they silently pass even if the implementation accidentally re-adds the call. Consider adding expect(agent.setSandboxVariableOnSession.called).to.be.false in the prompt-content tests (consolidate includes path siblings test, synthesize creates session test, prune dedup between signals test) to lock the removal in place across all three operations.

Consolidate and synthesize injected file content via setSandboxVariableOnSession and told the LLM to call code_exec to read it. The LLM often skipped code_exec, saw only paths without content, and returned {"actions": []} — producing zero operations on clear-cut cases. The outer catch {} swallowed the silence, so dream logs reported status: completed with no errors. Inline file contents directly in the prompt as marked blocks. Remove the sandbox-variable indirection. Tighten rule wording to nudge the LLM away from the SKIP-everything default. Prune already inlined content; also bump its preview size and rewrite conservative wording. Unit tests updated to assert on prompt content instead of sandbox-variable calls. Auto-test suite (local-auto-test/dreaming) now passes 24/24 with a working LLM provider: force-merge, force-synthesize, and force-archive scenarios all produce their expected operations end-to-end.

github-actions · 2026-04-16T09:26:23Z

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

Gather context and understand the request
Read changed files (consolidate, synthesize, prune + tests)
Review diff vs base branch (origin/proj/dreaming)
Analyze for correctness, security, and code quality
Post inline comments and summary review

I'll analyze this and get back to you.

View job run

github-actions · 2026-04-16T09:27:23Z

Claude finished @danhdoan's task in 1m 50s —— View job

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

Gather context and understand the request
Read changed files (consolidate, synthesize, prune + tests)
Review diff vs base branch (origin/proj/dreaming)
Analyze for correctness, security, and code quality
Post inline comments and summary review

Summary

Score: 4.5 / 5

Previous review feedback has been addressed cleanly. The three issues flagged as suggestions/nits in the prior reviews are all resolved in this sync. Two minor gaps remain open.

Previous review items — status

#	File	Previous finding	Status
1	`consolidate.ts:304`	`marker` recreated on every `map` iteration	✅ Fixed — `marker` hoisted outside `map`
2	`prune.ts:290`	Redundant `.slice(0, 1000)` in `buildPrompt`	✅ Fixed — `buildPrompt` now uses full `c.contentPreview` without an extra slice
3	`consolidate.ts:310`	Missing `"Do NOT use code_exec."` instruction	✅ Fixed — opening instruction now says `"Do NOT use code_exec."`
4	`synthesize.ts:302`	Missing `"Do NOT use code_exec."` instruction	✅ Fixed — same addition in synthesize
5	`synthesize.ts:68`	No size cap on inlined `_index.md` content	❌ Open — acknowledged as follow-up in PR description, but no `TODO` comment added yet

Remaining gaps

1. synthesize.ts:68 — Missing TODO comment for uncapped domain content

The PR description explicitly acknowledges the missing size cap as a follow-up. Without a TODO in the code, this is easy to forget when the file is next touched. Add one line:

// TODO: add a size cap for domain summaries (similar to consolidate's capPayloadSize)
// to guard against vaults with many verbose _index.md files hitting model context limits.
const prompt = buildPrompt(domains, existingSyntheses)

Fix this →

2. Tests — missing negative assertion for setSandboxVariableOnSession

All three test fixtures stub setSandboxVariableOnSession but never assert it is not called. If the implementation accidentally re-introduces a sandbox call, the tests still pass silently. Consider adding to each prompt-content test:

expect(agent.setSandboxVariableOnSession.called).to.be.false

consolidate.test.ts — 'includes path siblings as related files' test
synthesize.test.ts — 'creates session and passes domain summaries to LLM' test
prune.test.ts — 'caps candidates at 20' and 'dedup between signals' tests

Fix this →

What's good

All three previous nits/suggestions resolved cleanly in this sync.
prune.ts preview is now 1000 chars (up from 500) and inlined as-is — consistent with the same treatment in consolidate and synthesize.
Prompt wording improvements retained: the stronger "Returning all SKIP when duplicates exist is a failure, not caution" and the sharper ARCHIVE criteria in prune remain untouched.
Test assertions correctly pivot from setSandboxVariableOnSession call counts to prompt content inspection — that is the right level of test for a behavioral prompt change.
No regressions: atomic writes, path traversal guard, and Zod schema validation all intact.

Security notes

No new concerns since the prior review. Inline file content is a standard RAG pattern, user-owned data, and all LLM output is schema-validated before any filesystem action.

danhdoan requested review from bao-byterover and leehpham as code owners April 16, 2026 08:51