Skip to content

Fix brain_expand for manual chunk ids#78

Merged
EtanHey merged 1 commit into
mainfrom
brainclaude/t1-expand-manual-chunks
Mar 13, 2026
Merged

Fix brain_expand for manual chunk ids#78
EtanHey merged 1 commit into
mainfrom
brainclaude/t1-expand-manual-chunks

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 13, 2026

Summary

  • add a regression test that expands a deterministic manual-* chunk through brain_expand
  • treat standalone/manual chunks as single-target context in get_context() instead of surfacing them as missing
  • preserve the existing missing-chunk error for truly unknown IDs

Verification

  • pytest -q tests/test_smart_search_entity_dedup.py -k manual_chunk_id_returns_target_content
  • pytest -q tests/test_smart_search_entity_dedup.py tests/test_search_chunk_id.py
  • ruff check src/brainlayer/search_repo.py tests/test_smart_search_entity_dedup.py

Summary by CodeRabbit

  • Bug Fixes

    • Fixed context retrieval to properly handle standalone chunks without conversation history, now returning the chunk information with content type details instead of error messages.
  • Tests

    • Added test coverage to verify chunk expansion and retrieval functionality works correctly with standalone chunks and manual entries.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 54f996d8-ee98-4502-b23d-4450cc371f5d

📥 Commits

Reviewing files that changed from the base of the PR and between 45069fa and dd6c55c.

📒 Files selected for processing (2)
  • src/brainlayer/search_repo.py
  • tests/test_smart_search_entity_dedup.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (1)
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Python package structure should follow the layout: src/brainlayer/ for package code, with separate modules for vector_store.py, embeddings.py, daemon.py, dashboard/, and mcp/ for different concerns
Use paths.py:get_db_path() for all database path resolution instead of hardcoding paths; support environment variable overrides and canonical path fallback (~/.local/share/brainlayer/brainlayer.db)
Lint and format Python code using ruff check src/ and ruff format src/
Preserve verbatim content for ai_code, stack_trace, and user_message message types during classification and chunking; skip noise content entirely; summarize build_log content; extract structure-only for dir_listing
Use AST-aware chunking with tree-sitter; never split stack traces; mask large tool output during chunking
Handle SQLite concurrency by implementing retry logic on SQLITE_BUSY errors; ensure each worker uses its own database connection
Prioritize MLX (Qwen2.5-Coder-14B-Instruct-4bit) on Apple Silicon (port 8080) as the enrichment backend; fall back to Ollama (glm-4.7-flash on port 11434) after 3 consecutive MLX failures; support backend override via BRAINLAYER_ENRICH_BACKEND environment variable
Brain graph API must expose endpoints: /brain/graph, /brain/node/{node_id} (FastAPI)
Backlog API must support endpoints: /backlog/items with GET, POST, PATCH, DELETE operations (FastAPI)
Provide brainlayer brain-export command to export brain graph as JSON for dashboard consumption
Provide brainlayer export-obsidian command to export as Markdown vault with backlinks and tags
For bulk database operations: stop enrichment workers first, checkpoint WAL before and after operations, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks with checkpoint every 3 batches, never delete from chunks while FTS trigger is active

Files:

  • src/brainlayer/search_repo.py
🧠 Learnings (1)
📚 Learning: 2026-03-12T14:22:54.798Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-12T14:22:54.798Z
Learning: Applies to src/brainlayer/mcp/**/*.py : MCP tools must implement: `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person` with legacy `brainlayer_*` aliases for backward compatibility

Applied to files:

  • tests/test_smart_search_entity_dedup.py
🔇 Additional comments (5)
src/brainlayer/search_repo.py (3)

584-584: LGTM: Adding content_type to the target chunk query.

This aligns with the surrounding context query on line 617 and ensures content type metadata is available for standalone chunks.


594-594: LGTM: Unpacking matches the updated SELECT.

The destructuring correctly handles the new content_type column.


596-611: Good fix for standalone/manual chunks.

The new behavior correctly returns a valid context structure instead of an error for chunks without conversation context. This ensures:

  • brain_expand works for manual-* chunks created via brain_store
  • The MCP handler renders the single-item context (per context snippet 1, neither error nor empty-context guards trigger)
  • Enrichment correctly gets an empty context_chunks list after filtering out the target (per context snippet 2)

One minor observation: the target dict (line 601) lacks content_type while the context entry includes it. This is likely fine since callers appear to read content_type from context entries, but worth noting for consistency if target is accessed directly elsewhere.

tests/test_smart_search_entity_dedup.py (2)

8-9: LGTM: Appropriate imports for test mocking.

SimpleNamespace provides a clean way to mock uuid4 return value with a .hex attribute, and patch enables the necessary mocking.


206-232: Good regression test for manual chunk expansion.

The test properly validates the fix:

  1. Deterministic UUID mocking ensures reproducible chunk ID
  2. Patches the correct vector store accessor for the MCP handler
  3. Asserts both the absence of the error message and presence of expected content

The patch target brainlayer.store.uuid.uuid4 is correct—uuid is imported as import uuid in brainlayer/store.py, so the patching approach properly mocks the uuid4 function at the correct module level.


📝 Walkthrough

Walkthrough

This change modifies the search repository's context retrieval to include content_type information and improves handling of chunks without conversation context, returning a properly formatted context entry instead of an error. A test is added to verify manual chunk ID expansion functionality.

Changes

Cohort / File(s) Summary
Context retrieval enhancement
src/brainlayer/search_repo.py
Updated get_context SELECT query to retrieve content_type field. Modified standalone target path to return a single-item context containing the target chunk with id, content, position, content_type, and is_target flag instead of returning an error with empty context.
Test coverage
tests/test_smart_search_entity_dedup.py
Added new test method to verify manual chunk ID expansion via brain_expand. Test mocks uuid4, creates a manual chunk, retrieves it, and asserts the returned content is accurate without "Unknown chunk_id" errors. Imports SimpleNamespace and patch for test utilities.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A context now complete with type so bright,
No more errors lurking in the dark of night,
Chunks stand alone with fields so true,
Manual IDs expand to show their view! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the brain_expand function to handle manual chunk IDs correctly.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch brainclaude/t1-expand-manual-chunks
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey EtanHey merged commit 50d637d into main Mar 13, 2026
5 checks passed
@EtanHey EtanHey deleted the brainclaude/t1-expand-manual-chunks branch March 13, 2026 01:52
EtanHey added a commit that referenced this pull request Mar 14, 2026


Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
EtanHey added a commit that referenced this pull request Mar 14, 2026
)

* Harden search validation and backfill coverage

* docs: update test count to 929 and add CHANGELOG entries for PRs #65-#78

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant