Skip to content

feat: [ENG-2061] Synthesize operation — cross-domain pattern detection#395

Merged
danhdoan merged 3 commits intoproj/dreamingfrom
feat/ENG-2061
Apr 14, 2026
Merged

feat: [ENG-2061] Synthesize operation — cross-domain pattern detection#395
danhdoan merged 3 commits intoproj/dreamingfrom
feat/ENG-2061

Conversation

@RyanNg1403
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: No mechanism to detect and persist cross-domain patterns in the context tree
  • Why it matters: Recurring cross-cutting insights become searchable without LLM calls, cheaper retrieval
  • What changed: Added synthesize.ts pipeline (domain summary collection, LLM analysis, BM25 dedup, file creation), wired into executor after consolidate
  • What did NOT change (scope boundary): No new task types, no schema changes, no CLI changes

Type of change

  • New feature

Scope (select all touched areas)

  • Server / Daemon

Linked issues

  • Closes ENG-2061
  • Related ENG-2060

Root cause (bug fixes only, otherwise write N/A)

N/A (includes minor fix: manifest rebuild path in dream-undo was doubled)

Test plan

  • Coverage added:
    • Unit test
    • Manual verification only
  • Test file(s): test/unit/infra/dream/operations/synthesize.test.ts
  • Key scenario(s) covered:
    • Preconditions: < 2 domains, empty tree, skip _ and . dirs
    • LLM interaction: session lifecycle, domain summaries passed, empty response, LLM failure
    • File creation: correct placement, frontmatter (type/maturity/confidence/sources/synthesized_at), evidence section
    • Dedup: BM25 > 0.8 skip, 0.5-0.8 skip, < 0.5 create
    • Existing synthesis listed in prompt, name collision skip
    • Multiple candidates, slugify with special chars, needsReview gating
    • Abort signal respected
    • Interactive: 15+ curate commands, dream with consolidate+synthesize, JSON output confirms both sandbox variables read by LLM

User-visible changes

  • brv dream now runs synthesize after consolidate (transparent to user)
  • New synthesis files appear as regular draft .md files with type: synthesis frontmatter
  • brv search returns synthesis files for relevant queries (no additional wiring needed)

Evidence

Interactive testing confirmed:

  • JSON output shows __dream_synthesize_domains sandbox variable read via code_exec
  • Both consolidate and synthesize execute in sequence within a single dream
  • LLM correctly returns empty syntheses when no meaningful patterns found (quality gate working)
  • Dream undo works after combined consolidate+synthesize dream

Checklist

  • Tests added or updated and passing (npm test — 5725 passing)
  • Lint passes (npm run lint — 0 errors)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • No breaking changes

Risks and mitigations

  • Risk: Synthesize adds LLM cost to every dream even with 0 candidates
    • Mitigation: Single LLM call; < 2 domains short-circuits before any LLM interaction. Plan notes "can be disabled by returning [] early" if low-value.

Add synthesize pipeline: collect domain _index.md summaries, LLM
cross-domain analysis (single call), BM25 dedup (>0.5 threshold),
write as regular draft context files with type:synthesis frontmatter.

Also fixes manifest rebuild path in dream-undo (was passing
contextTreeDir instead of letting the service use its configured
baseDirectory, causing doubled path).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Claude finished @RyanNg1403's task in 3m 38s —— View job


PR Review: feat: [ENG-2061] Synthesize operation

Tasks

  • Gather context and understand the request
  • Read changed files
  • Analyze code for correctness, security, and quality
  • Post inline comments
  • Post overall summary

Overall Score: 3 / 5

Good pipeline design — clear flow, atomic writes, proper session lifecycle, and thorough test coverage. One security issue must be addressed before merge; a few code quality items are worth fixing.


Security (must fix)

LLM path traversal via placementcandidate.placement comes from LLM JSON output, is validated only as z.string() (no domain-list constraint in SynthesisCandidateSchema), and is passed directly into join(contextTreeDir, relativePath). Node's join() resolves .. segments, so a malicious or hallucinated response like { "placement": "../../etc" } would write a file outside the context tree. The access() name-collision check does not guard against this.

Two-layer fix needed:

  1. Schema layer (dream-response-schemas.ts): can't constrain at schema definition time (domains are runtime values), so add a post-parse domain-list check in synthesize.ts before calling writeSynthesisFile.
  2. Path layer (writeSynthesisFile): add a startsWith(contextTreeDir + '/') guard, mirroring the existing safePath() pattern in dream-undo.ts:128–133.

See inline comment at synthesize.ts:219. Fix this →


Correctness

  • Manual YAML serialization is fragile (synthesize.ts:230–248): domain names are written unescaped into a hand-rolled YAML block. A domain directory named auth:v2 or api#v1 (valid on all OSes) would corrupt the frontmatter. js-yaml is already imported — use dump() for generation too. (Inline comment at line 230.)
  • as cast in parseFrontmatterType (synthesize.ts:188): violates the project's "avoid as Type assertions" guideline. The narrowing check two lines later makes the cast unnecessary. (Inline comment at line 188.)
  • dream-undo.ts fix (line 82, buildManifest() → no-arg): correct — the manifest service's signature has dir?: string optional, and the undo caller should let the service resolve its own base directory. Good catch.

Performance / Cost

  • Synthesize runs unconditionally every dream cycle (dream-executor.ts:117): with ≥ 2 domains (steady state), every dream cycle fires an LLM call, even when changedFiles is empty. Gating synthesize() on changedFiles.size > 0 would skip the LLM call entirely when nothing changed. (Inline comment at line 117.)
  • Abort signal not propagated to LLM call (synthesize.ts:74): the signal is checked at entry but not forwarded to executeOnSession. A timeout firing mid-call won't interrupt the HTTP request, delaying lock release. (Inline comment at line 74.)

Design Observations

  • BM25 dedup searches all file types (synthesize.ts:204): a regular knowledge file with high lexical overlap will suppress a synthesis candidate, even though the individual fact being documented ≠ cross-domain synthesis. If the goal is "don't duplicate existing synthesis files", scoping the search to type: synthesis documents would be more precise. (Inline comment at line 204.)
  • SYNTHESIZE/UPDATE undo branch (dream-undo.ts:225): the current synthesize() only emits action: 'CREATE' operations, making the UPDATE handler dead code. It's defensively forward-looking, but if UPDATE is never planned, remove it; if it is planned, the undo handler should also capture previousTexts.

Tests

Coverage is strong — precondition guards, LLM lifecycle, file creation, frontmatter fields, dedup thresholds, name collision, abort signal, needsReview gating, special-char slugification, and multi-candidate output are all tested. Missing:

  • Test for placement path traversal (e.g. placement: '../../etc') — especially important given the security finding above.
  • Test that verifies synthesize is skipped when changedFiles is empty (once that gate is added).

Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/executor/dream-executor.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1770730b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
BM25 search against the full tree always returned score 1.0 because
synthesis derives from domain summaries already in the tree. Fixed to
only match against existing type:synthesis files, which is the correct
semantic — avoid recreating an existing synthesis, not avoid
synthesizing concepts the tree already covers.

Also fixes manifest rebuild path in dream-undo (buildManifest() not
buildManifest(contextTreeDir) — service already has baseDirectory).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Claude finished @RyanNg1403's task in 2m 40s —— View job


PR Review: feat: [ENG-2061] Synthesize operation

Tasks

  • Gather context and understand the request
  • Read changed files
  • Analyze code for correctness, security, and quality
  • Post overall summary

Overall Score: 3 / 5

The second commit (027b49a) correctly scopes BM25 dedup to synthesis files only — that design fix is appreciated and improves quality. However, the critical security issue from the previous review remains unaddressed, along with several correctness and performance items. The pipeline design, session lifecycle, atomic writes, and test coverage (15+ scenarios) are all solid.


What changed since last review

  • Fixed: BM25 dedup now only matches against pre-collected synthesis files (existingSyntheses), not the whole context tree. isDuplicateCandidate filters results.results.find((r) => existingSyntheses.includes(r.path)) — correct approach.

Security (must fix before merge)

Path traversal via placement — still unresolved from the previous review. candidate.placement is LLM output validated only as z.string(). join(contextTreeDir, candidate.placement + '/' + slug) will silently resolve .. segments. A hallucinated or adversarially injected placement: "../../etc" would write files outside the context tree, and the access() collision check does not guard against this.

The fix is two lines — dream-undo.ts already has safePath() doing exactly this. See inline comment at synthesize.ts:224. Fix this →

Correctness

  • Write failures silently drop accumulated results (synthesize.ts:94–101): if any writeSynthesisFile call throws mid-loop, the outer catch returns [], discarding operations already written to disk. Those files now exist but won't appear in the dream log, making undo incomplete. Handle per-candidate with individual try/catch, mirroring how consolidate works. See inline comment at line 94. Fix this →
  • Manual YAML serialization is fragile (synthesize.ts:237–255): domain names and facts are written unescaped. A directory named auth:v2 corrupts the frontmatter. js-yaml is already imported — use dump() for generation too. See inline comment at line 237. Fix this →
  • as cast in parseFrontmatterType (synthesize.ts:189): violates the project's "avoid as Type" guideline. The guard two lines later narrows correctly; the cast is redundant and unsafe. See inline comment at line 189. Fix this →

Performance

  • Abort signal not forwarded to LLM call (synthesize.ts:74): deps.signal is checked at entry but never passed to executeOnSession. A mid-call timeout won't cancel the HTTP request, delaying lock release. See inline comment at line 74.
  • Synthesize fires unconditionally (dream-executor.ts:117): with ≥ 2 domains, every dream cycle triggers an LLM call even when changedFiles is empty. Gating on changedFiles.size > 0 avoids unnecessary cost. See inline comment at line 117.

Tests

Coverage is strong for the scenarios tested. Still missing (as noted in the previous review):

  • A test for placement: '../../etc' path traversal — especially important given the security finding.
  • A test that synthesize skips the LLM call when changedFiles is empty (once that gate is added).

Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts Outdated
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/executor/dream-executor.ts Outdated
- Path traversal guard on LLM-controlled placement (reuses safePath pattern)
- Replace as-assertion with in-operator narrowing in parseFrontmatterType
- Use js-yaml dump() for frontmatter serialization (handles special chars)
- Propagate abort signal through executeOnSession interface to LLM calls
- Gate synthesize on changedFiles.size > 0 to skip no-op dream cycles
- Per-candidate error handling in write loop to preserve partial results
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Claude finished @RyanNg1403's task in 1m 46s —— View job


PR Review: feat: [ENG-2061] Synthesize operation

Tasks

  • Gather context and understand the request
  • Read changed files (latest commit)
  • Analyze code for correctness, security, and quality
  • Post overall summary

Overall Score: 4.5 / 5

Excellent follow-through — every issue raised in the two previous review passes is now resolved. The commit message accurately lists all six items addressed. The pipeline design, atomic writes, session lifecycle, and test coverage are all solid.


What changed in this commit (a8e07bc)

Issue Status
Path traversal guard on placement ✅ Fixed — resolve() + startsWith(contextTreeDir + '/') guard
as-cast in parseFrontmatterType ✅ Fixed — replaced with 'type' in raw + typeof raw.type === 'string' narrowing chain
Manual YAML serialization ✅ Fixed — yamlDump(frontmatter, {lineWidth: -1, sortKeys: true})
Abort signal not propagated to LLM ✅ Fixed — signal?: AbortSignal added to ICipherAgent.executeOnSession interface and wired through CipherAgent, synthesize, and consolidate
Synthesize fires every cycle ✅ Fixed — gated on changedFiles.size > 0 in dream-executor.ts:117
Write failures dropped partial results ✅ Fixed — per-candidate try/catch in write loop
Tests: path traversal case ✅ Added — 'rejects candidate with path-traversal placement'
Tests: partial write failure ✅ Added — 'preserves successful results when a later candidate fails to write'
Tests: signal forwarded to executeOnSession ✅ Added — 'passes abort signal to executeOnSession'

Remaining Minor Items

1. Dead code: SYNTHESIZE/UPDATE branch in dream-undo.ts:219–231

undoSynthesize guards against UPDATE with a throw, but synthesize() only ever emits action: 'CREATE'. The branch is unreachable with the current implementation. The previous review noted this — if UPDATE is never planned, remove it; if it is planned, add a comment saying previousTexts capture will be needed before it can be undone. As-is, it's defensively forward-looking but dead code.

2. No test for changedFiles.size > 0 gate in dream-executor.test.ts

The gate was added (correctly), but dream-executor.test.ts has no test asserting that synthesize() is skipped when changedFiles is empty. Low-priority given the gate is trivial — a one-liner ternary — but consistent with the project's coverage standards.


What's solid

  • Path traversal fix is correct: using resolve() (not join()) means ../../etc is fully normalized to an absolute path before the startsWith check. This is exactly the right approach and mirrors safePath() in dream-undo.ts.
  • parseFrontmatterType narrowing is clean: the 'type' in raw + typeof raw.type === 'string' chain means TypeScript narrows raw.type to string without any cast.
  • yamlDump with sortKeys: true produces deterministic output, which is a bonus for diffs.
  • Signal propagation was done end-to-end — interface, implementation, callers (synthesize and consolidate) all updated in one commit.
  • Test file now covers path traversal, partial write failure, abort signal propagation, and the signal-forwarding assertion — 19 scenarios total.

Copy link
Copy Markdown
Collaborator

@danhdoan danhdoan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@danhdoan danhdoan merged commit 98f7384 into proj/dreaming Apr 14, 2026
6 checks passed
@RyanNg1403 RyanNg1403 deleted the feat/ENG-2061 branch April 14, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants