Skip to content

enhance: [ENG-2128] inline file contents in operation prompts#432

Merged
danhdoan merged 2 commits intoproj/dreamingfrom
enhance/ENG-2128
Apr 16, 2026
Merged

enhance: [ENG-2128] inline file contents in operation prompts#432
danhdoan merged 2 commits intoproj/dreamingfrom
enhance/ENG-2128

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: Consolidate and synthesize operations in the dream feature produce zero output on clear-cut cases. Files with identical titles don't merge. Cross-domain patterns don't synthesize. Unit tests pass because they mock the LLM — production LLM calls return empty actions silently.
  • Why it matters: The dreaming feature can't ship with two of three operations broken in practice. Users running brv dream see "completed" with zero operations and no errors — no way to know the feature isn't working.
  • What changed: Inline file contents directly in LLM prompts. Remove the sandbox-variable + code_exec indirection that the LLM often skipped.
    Update unit tests to assert on prompt content.
  • What did NOT change (scope boundary): Dream executor orchestration, gate logic, undo flow, dream log schema, operation schemas, review wiring. The silent catch {} blocks remain as a separate follow-up.

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes #
  • Related # (dreaming feature branch proj/dreaming)

Root cause (bug fixes only)

  • Root cause: Consolidate and synthesize passed file contents via agent.setSandboxVariableOnSession(sessionId, '__dream_*_files', payload) and instructed the LLM to invoke code_exec to read the variable before classifying. The LLM frequently skipped code_exec, evaluated paths without content, and returned an empty actions array. Prune already inlined content so it was unaffected.
  • Why this was not caught earlier: Unit tests mock the agent with canned responses, so the operation code path runs correctly in tests regardless of whether the live LLM would actually read the sandbox variable. The gap is between "code handles the response correctly" (tested) and "LLM produces that response on real input" (not tested). Added a pre-release auto-test suite at local-auto-test/dreaming/ to close this gap with real LLM calls against designed-to-force-operations fixtures.

Test plan

  • Coverage added:
    • Unit test (updated existing to assert prompt content)
    • Integration test
    • Manual verification (auto-test suite)
  • Test file(s):
    • test/unit/infra/dream/operations/consolidate.test.ts
    • test/unit/infra/dream/operations/synthesize.test.ts
    • test/unit/infra/dream/operations/prune.test.ts
  • Key scenario(s) covered:
    • Consolidate prompt includes PATH markers and sibling file contents
    • Synthesize prompt includes DOMAIN markers for each domain's _index.md
    • Prune candidate blocks use marker format with 1000-char preview
    • Auto-test force-merge: near-duplicate files → MERGE operation
    • Auto-test force-synthesize: 3 domains sharing FlushCoordinator → SYNTHESIZE
    • Auto-test force-archive: stale low-importance draft → ARCHIVE (unchanged)

User-visible changes

None for users of brv dream. The feature now produces operations it previously didn't produce, but the CLI surface, flags, and output format are unchanged. Dream log schema unchanged.

Evidence

Auto-test results before vs after the fix:

Before (unstashed code): 19/21 passed

  • force-merge: ✗ 0 operations (expected MERGE)
  • force-synthesize: ✗ 0 operations (expected SYNTHESIZE)
  • force-archive: ✓ ARCHIVE produced

After (this PR): 24/24 passed

  • force-merge: ✓ MERGE produced (1 file merged, other deleted)
  • force-synthesize: ✓ 2 SYNTHESIZE operations (files written with type: synthesis frontmatter)
  • force-archive: ✓ ARCHIVE (unchanged)

Unit tests: 6169 passing, 0 failing. Build clean. Lint clean (warnings pre-existing).

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint)
  • Type check passes (npm run build with tsc -b)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (N/A — no user-facing change)
  • No breaking changes
  • Branch is up to date with proj/dreaming

Risks and mitigations

  • Risk: Inlining full file contents in the prompt could exceed model context limits on very large contexts.
    • Mitigation (consolidate): capPayloadSize() enforces MAX_PAYLOAD_CHARS = 200_000 with changed-files-first eviction.
    • Mitigation (prune): Implicit cap — max 20 candidates × 1000-char preview per candidate (~20KB ceiling).
    • Gap (synthesize): No explicit cap on inlined _index.md content. Summaries are typically small, but a vault with many verbose domain summaries could hit model limits. Tracked as a follow-up task to add a cap similar to consolidate's.
  • Risk: Prompt wording shifted from "when in doubt, SKIP" to less conservative framing. Could produce false-positive merges.
    • Mitigation: All MERGE and TEMPORAL_UPDATE operations already set needsReview: true. Operations flagged for review are queued in brv review pending before affecting the context tree permanently.
      brv dream --undo can also revert the last dream wholesale.
  • Risk: Follow-up needed for silent error handling.
    • Mitigation: Not addressed in this PR to keep scope tight. Tracked as a separate item — summary.errors should reflect caught failures in the dream log so users can see when LLM calls fail.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Claude finished @danhdoan's task in 2m 52s —— View job


PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

  • Gather context and understand the request
  • Read implementation files (consolidate, synthesize, prune)
  • Read test files
  • Review diff vs base branch
  • Analyze for correctness, security, and code quality
  • Post review with Conventional Comments and score

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling. All three operations have been updated consistently, and the test suite now validates prompt content rather than just call counts.


What's good

  • Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching unrelated orchestration or schema code.
  • capPayloadSize already existed and is correctly reused for consolidate. The 200K char cap protects against context overflow.
  • Atomic writes (randomUUID temp file + rename) are preserved throughout—no regressions on write safety.
  • Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact for LLM-supplied placement values.
  • Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected, not executed.
  • Test assertions upgraded—new tests now verify that PATH: markers and file body content appear in the actual prompt, which is the right level of test for this kind of behavioral fix.
  • Scope is tight. Orchestration, gate logic, dream log schema, and undo flow are untouched per the stated boundary.

Issues found (inline comments posted)

# File Severity Summary
1 consolidate.ts:306 nit marker string recreated on every map iteration — hoist it outside
2 prune.ts:290 nit .slice(0, 1000) on contentPreview is redundant (already capped in buildCandidatePayload)
3 consolidate.ts:311 synthesize.ts:301 suggestion Missing explicit "do NOT use code_exec" instruction — prune.ts has it; consolidate/synthesize don't. The root cause of this PR was exactly the LLM calling code_exec; worth being explicit in all three.
4 synthesize.ts:68 issue (tracked) No size cap on inlined _index.md content, unlike consolidate's capPayloadSize. Acknowledged in PR risks — worth a TODO comment so it's not lost.

Security notes

  • Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. Mitigation is the same as always: LLM output is schema-validated before any filesystem action is taken.
  • No new attack surface introduced. Removing the sandbox-variable path actually reduces complexity.

Minor observations

  • setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. That's fine—the interface requires the method and other operations may still use it—but the stubs silently pass even if the implementation accidentally re-adds the call. Consider adding a expect(agent.setSandboxVariableOnSession.called).to.be.false assertion in the prompt-content tests to lock the removal in place.

Comment thread src/server/infra/dream/operations/consolidate.ts Outdated
Comment thread src/server/infra/dream/operations/prune.ts Outdated
Comment thread src/server/infra/dream/operations/consolidate.ts
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/dream/operations/synthesize.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Claude finished @danhdoan's task in 2m 48s —— View job


PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

  • Gather context and understand the request
  • Read changed files (consolidate, synthesize, prune + tests)
  • Review diff vs base branch (origin/proj/dreaming)
  • Analyze for correctness, security, and code quality
  • Post inline comments and summary review

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling (19/21 → 24/24). All three operations updated consistently. Tests now validate prompt content rather than sandbox-variable call counts, which is the right level of test for a behavioral prompt change.


What's good

  • Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching orchestration, gate logic, schema, or undo flow.
  • capPayloadSize correctly reused for consolidate. The 200K char cap protects against context overflow.
  • Atomic writes (randomUUID temp file + rename) preserved throughout — no regressions on write safety.
  • Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact.
  • Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected before any filesystem action.
  • Test assertions upgraded — the new tests verify PATH: / DOMAIN: markers and file body content appear in the actual prompt, locking the behavioral fix in place.
  • Scope is tight. Orchestration, gate logic, dream log schema, and undo flow untouched as stated.

Issues found (inline comments posted)

# File Severity Summary
1 consolidate.ts:306 nit marker recreated on every map iteration — hoist it outside (synthesize/prune already do this)
2 prune.ts:290 nit .slice(0, 1000) is redundant — buildCandidatePayload already caps contentPreview at 1000 chars
3 consolidate.ts:312 synthesize.ts:302 suggestion Missing "Do NOT use code_exec." instruction — prune.ts has it; the root cause of this PR was exactly the LLM calling code_exec in these two files
4 synthesize.ts:68 issue (tracked) No size cap on inlined _index.md content. Acknowledged in PR risks — worth a TODO comment so it isn't lost

Security notes

  • Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. LLM output is schema-validated before any filesystem action — same mitigation as always.
  • Removing the sandbox-variable path actually reduces complexity. No new attack surface introduced.

Minor observations

  • setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. The interface requires the method so the stubs are fine, but they silently pass even if the implementation accidentally re-adds the call. Consider adding expect(agent.setSandboxVariableOnSession.called).to.be.false in the prompt-content tests (consolidate includes path siblings test, synthesize creates session test, prune dedup between signals test) to lock the removal in place across all three operations.

Comment thread src/server/infra/dream/operations/consolidate.ts Outdated
Comment thread src/server/infra/dream/operations/consolidate.ts Outdated
Comment thread src/server/infra/dream/operations/prune.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Claude finished @danhdoan's task in 3m 1s —— View job


PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

  • Gather context and understand the request
  • Read changed files (consolidate, synthesize, prune + tests)
  • Review diff vs base branch (origin/proj/dreaming)
  • Analyze for correctness, security, and code quality
  • Post inline comments and summary review

Summary

Score: 4 / 5

Correct diagnosis, clean fix, strong evidence. The root cause (setSandboxVariableOnSession → LLM silently skipping code_exec) is well-explained and the solution—inlining content directly in the prompt—is the right call. The auto-test before/after numbers are compelling (19/21 → 24/24). All three operations updated consistently, and the test suite now validates prompt content rather than sandbox-variable call counts, which is the right level of test for a behavioral prompt change.


What's good

  • Root-cause fix is precise. Removing the sandbox-variable indirection eliminates the failure mode without touching orchestration, gate logic, schema, or undo flow.
  • capPayloadSize correctly reused for consolidate. The 200K char cap protects against context overflow.
  • Atomic writes (randomUUID temp file + rename) preserved throughout — no regressions on write safety.
  • Path traversal guard in writeSynthesisFile (absPath.startsWith(contextTreeDir + '/')) remains intact.
  • Zod schema validation of all LLM responses means malformed or prompt-injected output is rejected before any filesystem action.
  • Test assertions upgraded — the new tests verify PATH: / DOMAIN: markers and file body content appear in the actual prompt, locking the behavioral fix in place.
  • Prompt wording tightened across all three operations: consolidate now says "Default to MERGE… SKIP only when genuinely unrelated" and "Returning all SKIP when duplicates exist is a failure, not caution." The prune rules are now specific criteria rather than "be conservative." These changes are intentional and correct given the root cause.
  • Scope is tight. Orchestration, gate logic, dream log schema, and undo flow untouched as stated.

Issues found (inline comments posted)

# File Severity Summary
1 consolidate.ts:304–309 nit marker recreated on every map iteration — hoist it outside (synthesize/prune already do this)
2 prune.ts:290 nit .slice(0, 1000) redundant — buildCandidatePayload already caps contentPreview at 1000 chars
3 consolidate.ts:312 synthesize.ts:302 suggestion Missing "Do NOT use code_exec." instruction — prune.ts has it; this PR's root cause was exactly the LLM making code_exec calls in these two files
4 synthesize.ts:68 issue (tracked) No size cap on inlined _index.md content. Acknowledged in PR risks — worth a TODO comment so it isn't lost

Security notes

  • Prompt injection: Inlining file content directly in the prompt is a standard RAG pattern. The context tree is user-owned data, so risk is low. LLM output is schema-validated before any filesystem action — same mitigation as always.
  • Removing the sandbox-variable path actually reduces complexity. No new attack surface introduced.

Minor observations

  • setSandboxVariableOnSession stubs remain in all three test fixtures but are never asserted on. The interface requires the method so the stubs are necessary, but they silently pass even if the implementation accidentally re-adds the call. Consider adding expect(agent.setSandboxVariableOnSession.called).to.be.false in the prompt-content tests (consolidate includes path siblings test, synthesize creates session test, prune dedup between signals test) to lock the removal in place across all three operations.

Comment thread src/server/infra/dream/operations/consolidate.ts
Comment thread src/server/infra/dream/operations/prune.ts Outdated
Comment thread src/server/infra/dream/operations/consolidate.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
Consolidate and synthesize injected file content via
setSandboxVariableOnSession and told the LLM to call code_exec to read
it.
The LLM often skipped code_exec, saw only paths without content, and
returned {"actions": []} — producing zero operations on clear-cut cases.
The outer catch {} swallowed the silence, so dream logs reported
status: completed with no errors.

Inline file contents directly in the prompt as marked blocks. Remove the
sandbox-variable indirection. Tighten rule wording to nudge the LLM away
from the SKIP-everything default. Prune already inlined content; also
bump its preview size and rewrite conservative wording.

Unit tests updated to assert on prompt content instead of
sandbox-variable
calls. Auto-test suite (local-auto-test/dreaming) now passes 24/24 with
a
working LLM provider: force-merge, force-synthesize, and force-archive
scenarios all produce their expected operations end-to-end.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

  • Gather context and understand the request
  • Read changed files (consolidate, synthesize, prune + tests)
  • Review diff vs base branch (origin/proj/dreaming)
  • Analyze for correctness, security, and code quality
  • Post inline comments and summary review

I'll analyze this and get back to you.

View job run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Claude finished @danhdoan's task in 1m 50s —— View job


PR Review: enhance/ENG-2128 — Inline file contents in operation prompts

Tasks

  • Gather context and understand the request
  • Read changed files (consolidate, synthesize, prune + tests)
  • Review diff vs base branch (origin/proj/dreaming)
  • Analyze for correctness, security, and code quality
  • Post inline comments and summary review

Summary

Score: 4.5 / 5

Previous review feedback has been addressed cleanly. The three issues flagged as suggestions/nits in the prior reviews are all resolved in this sync. Two minor gaps remain open.


Previous review items — status

# File Previous finding Status
1 consolidate.ts:304 marker recreated on every map iteration Fixedmarker hoisted outside map
2 prune.ts:290 Redundant .slice(0, 1000) in buildPrompt FixedbuildPrompt now uses full c.contentPreview without an extra slice
3 consolidate.ts:310 Missing "Do NOT use code_exec." instruction Fixed — opening instruction now says "Do NOT use code_exec."
4 synthesize.ts:302 Missing "Do NOT use code_exec." instruction Fixed — same addition in synthesize
5 synthesize.ts:68 No size cap on inlined _index.md content Open — acknowledged as follow-up in PR description, but no TODO comment added yet

Remaining gaps

1. synthesize.ts:68 — Missing TODO comment for uncapped domain content

The PR description explicitly acknowledges the missing size cap as a follow-up. Without a TODO in the code, this is easy to forget when the file is next touched. Add one line:

// TODO: add a size cap for domain summaries (similar to consolidate's capPayloadSize)
// to guard against vaults with many verbose _index.md files hitting model context limits.
const prompt = buildPrompt(domains, existingSyntheses)

Fix this →

2. Tests — missing negative assertion for setSandboxVariableOnSession

All three test fixtures stub setSandboxVariableOnSession but never assert it is not called. If the implementation accidentally re-introduces a sandbox call, the tests still pass silently. Consider adding to each prompt-content test:

expect(agent.setSandboxVariableOnSession.called).to.be.false
  • consolidate.test.ts'includes path siblings as related files' test
  • synthesize.test.ts'creates session and passes domain summaries to LLM' test
  • prune.test.ts'caps candidates at 20' and 'dedup between signals' tests

Fix this →


What's good

  • All three previous nits/suggestions resolved cleanly in this sync.
  • prune.ts preview is now 1000 chars (up from 500) and inlined as-is — consistent with the same treatment in consolidate and synthesize.
  • Prompt wording improvements retained: the stronger "Returning all SKIP when duplicates exist is a failure, not caution" and the sharper ARCHIVE criteria in prune remain untouched.
  • Test assertions correctly pivot from setSandboxVariableOnSession call counts to prompt content inspection — that is the right level of test for a behavioral prompt change.
  • No regressions: atomic writes, path traversal guard, and Zod schema validation all intact.

Security notes

No new concerns since the prior review. Inline file content is a standard RAG pattern, user-owned data, and all LLM output is schema-validated before any filesystem action.

@danhdoan danhdoan merged commit 8f364c1 into proj/dreaming Apr 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants