feat: tag PreCompact checkpoints by EtanHey · Pull Request #285 · EtanHey/brainlayer

EtanHey · 2026-05-15T21:53:25Z

Summary

add chunk_origin tagging for PreCompact checkpoints without repurposing the overloaded source column
backfill existing checkpoint-shaped rows during VectorStore schema init and log the migration count in schema_migrations
exclude checkpoint chunks from default search, add include_checkpoints, and expose brain_resume(session_id=None, lookback_days=7) for explicit recovery

Verification

pytest tests/test_precompact_chunk_origin.py tests/test_search_filter_params.py tests/test_think_recall_integration.py::TestMCPToolCount -q → 29 passed
pytest tests/test_precompact_chunk_origin.py tests/test_search_filter_params.py tests/test_hybrid_search.py tests/test_watcher_bridge.py tests/test_chunk_lifecycle.py tests/test_3tool_aliases.py -q → 135 passed
./scripts/run_tests.sh → 1860 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration 3 passed; isolated pytest 32 passed; bun 1 pass; regression shell passed
pre-push hook reran ./scripts/run_tests.sh with the same passing counts
snapshot backfill on /tmp/brainlayer-fm6-precompact-snapshot.db → 51 checkpoint rows tagged, 0 manual checkpoint-storage note false positives
touched-file ruff check and ruff format --check passed

Notes

Whole-repo ruff check / ruff format --check still report pre-existing script lint/format debt outside this PR; touched files are clean.
cr review --plain could not start locally because the CLI is not connected to the repository's CodeRabbit organization; requesting CodeRabbit on the PR instead.

Note

Medium Risk
Introduces a schema migration/backfill (chunks.chunk_origin) and changes default search/KG retrieval behavior to exclude checkpoint content, which could affect existing workflows and query results; adds additional retry/overfetch logic around SQLite reads that may impact performance under load.

Overview
Adds first-class chunk_origin classification (new chunks.chunk_origin column + chunk_origin.py detection helpers) and tags incoming data across ingestion paths (store, watcher/queue drain, upserts, chunk updates), including a one-time backfill migration recorded in schema_migrations.

Changes default retrieval to exclude precompact_checkpoint chunks across hybrid/text/vector search, exact chunk-id lookups, and KG fact queries unless include_checkpoints=true, with KNN overfetch/caching tweaks to avoid starving results when checkpoints are filtered.

Exposes an explicit brain_resume MCP tool to fetch recent PreCompact checkpoints (optionally filtered by session_id, with lock-resilient retries) and wires the new parameter/tool through MCP schemas and tests.

^{Reviewed by Cursor Bugbot for commit 8c758a1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add PreCompact checkpoint tagging, filtering, and `brain_resume` tool to VectorStore search

Introduces a chunk_origin column to the chunks table with a backfill migration that detects and tags existing PreCompact checkpoint content using content-marker heuristics in chunk_origin.py.
All write paths (store.py, vector_store.py, drain.py, watcher_bridge.py) now compute and persist chunk_origin on insert or update.
Vector, binary, and hybrid KNN searches exclude precompact_checkpoint chunks by default, overfetching to compensate and truncating to the requested result count; an include_checkpoints flag opts back in.
Adds a new brain_resume MCP tool in search_handler.py that queries recent PreCompact checkpoints filtered by lookback_days and optional session_id, with retry logic on SQLite busy/locked errors.
KG fact queries and exact chunk-id lookups also respect the checkpoint exclusion by default.
Risk: existing databases undergo a one-time backfill migration on first open; checkpoint count caching (keyed on PRAGMA data_version) adds reads on each KNN query when no cached value exists.

^{Macroscope summarized 8c758a1.}

Summary by CodeRabbit

New Features
- Automatic chunk-origin tagging and persistence, plus a one-time migration/backfill to populate origin data
- Search controls: new include_checkpoints flag (defaults to exclude) and overfetch logic so exclusions don’t starve results
- New brain_resume tool to return recent session checkpoint content
- Queue and ingestion now propagate chunk-origin metadata
Tests
- Extensive end-to-end coverage for detection, storage, migration, search filtering/overfetch, resume, and edge cases

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-15T21:53:33Z

📝 Walkthrough

Walkthrough

This PR introduces chunk-origin classification to identify and filter PreCompact checkpoint content throughout the brainlayer system. New constants and detection logic classify chunks by origin; schema migrations backfill existing data; storage and event pipelines propagate origin metadata; search operations exclude checkpoints by default while allowing explicit inclusion; and a new brain_resume MCP tool retrieves recent checkpoints for session recovery.

Changes

PreCompact Checkpoint Origin System

Layer / File(s)	Summary
Chunk origin classification and detection `src/brainlayer/chunk_origin.py`	New module defines origin string constants (`user_explicit`, `agent_explicit`, `auto_summary`, `precompact_checkpoint`, `unknown`), pattern matchers for PreCompact content, `is_precompact_checkpoint_content()` helper checking prefixes and wrapped markers, and `detect_chunk_origin()` returning valid explicit origins, precompact classification, or `unknown`.
SQLite busy helper `src/brainlayer/_helpers.py`	Adds APSW import and `_is_sqlite_busy_error()` to classify busy/locked errors for retry logic.
Vector store schema initialization and migration `src/brainlayer/vector_store.py`	Schema startup adds `chunk_origin TEXT DEFAULT 'unknown'` to `chunks`, conditionally ALTERs existing databases, performs one-time backfill migration (`2026_05_16_fm6_chunk_origin`) classifying legacy chunks via content heuristics, detects column presence in readonly mode, and creates `idx_chunks_chunk_origin`.
Chunk insertion and upsert with origin `src/brainlayer/store.py`, `src/brainlayer/vector_store.py`	`store_memory()` extends `INSERT` to include `chunk_origin` computed via `detect_chunk_origin(content)`. `VectorStore.upsert_chunks` includes origin in conflict-update `CASE` logic, preserving existing non-`unknown` origins and recomputing when content changes. `update_chunk` recomputes origin on content change. `get_chunk` returns origin when available and code invalidates checkpoint-count cache on relevant writes.
Event pipeline chunk origin propagation `src/brainlayer/drain.py`, `src/brainlayer/watcher_bridge.py`, `src/brainlayer/queue_io.py`	`drain` computes origin in `_apply_store`, `_apply_watcher`, `_apply_hook`. `watcher_bridge` derives origin from cleaned content and wires it into both arbitrated queue (`enqueue_watcher_chunk`) and direct SQLite insert paths. `enqueue_watcher_chunk` accepts optional `chunk_origin` and includes it in the queue payload.
Search repository filtering and overfetch logic `src/brainlayer/search_repo.py`	Adds `include_checkpoints: bool = False` to `SearchMixin.search`, `_binary_search`, and `hybrid_search`. Implements checkpoint-exclusion SQL clause, cached checkpoint-count lookup keyed by `PRAGMA data_version` with retry-on-busy, and `_effective_knn_k()` to increase KNN overfetch when checkpoints are excluded. Appends checkpoint-exclusion clauses across embedding KNN, text LIKE, binary KNN, and hybrid FTS5 paths, truncates KNN results to `n_results`, and threads flag into hybrid cache-key.
KG search checkpoint exclusion `src/brainlayer/kg_repo.py`	`KGMixin.kg_search` gains `include_checkpoints` and, when excluding checkpoints and `_has_chunk_origin` is true, LEFT JOINs the source chunk and filters out KG facts sourced from precompact checkpoints; `kg_hybrid_search` forwards the flag.
MCP tool registration and schema `src/brainlayer/mcp/__init__.py`	Imports `_brain_resume`, adds `include_checkpoints` boolean (default `False`) to `brain_search` and `brain_recall` (search-mode) input schemas, registers new read-only idempotent `brain_resume` tool with `session_id` and `lookback_days` inputs, and routes calls forwarding `include_checkpoints`.
Search handler checkpoint filtering and resume `src/brainlayer/mcp/search_handler.py`	Adds `_utcnow_iso()`. `_exact_chunk_lookup_result`, `_brain_search`, `_brain_recall`, and `_search` gain `include_checkpoints` parameter; exact lookup rejects precompact chunks when `include_checkpoints=False`. Implements `_escape_like_pattern`, a resume-query retry wrapper that uses `_is_sqlite_busy_error`, and `_brain_resume(session_id, lookback_days)` which fetches recent precompact checkpoint chunks (optionally scoped to a session), formats a markdown-like response, returns “No PreCompact checkpoints found.” when empty, and surfaces errors in structured outputs.
Comprehensive test coverage `tests/test_precompact_chunk_origin.py`, `tests/test_search_filter_params.py`, `tests/test_think_recall_integration.py`, `tests/test_tool_annotations.py`	New tests validate detection across formats, migration/backfill correctness, upsert/update origin recomputation, watcher/drain propagation, default-exclude search with optional inclusion, vector/binary overfetch behavior when filtering discards neighbors, cache invalidation, readonly legacy DB handling without `chunk_origin`, and `brain_resume` behaviors including retry, wildcard escaping, and error embedding. `brain_search` schema tests updated and MCP tool-count/annotations updated to include `brain_resume`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

EtanHey/brainlayer#79: Overlaps hybrid-search caching/keying and filter adjustments in search_repo.py.
EtanHey/brainlayer#263: Modifies MCP search flow and exact lookup routing, adjacent to _brain_search/_search changes.
EtanHey/brainlayer#214: Changes MCP→search plumbing and shared handler signatures similar to this PR's include_checkpoints threading.

Poem

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: tag PreCompact checkpoints' accurately describes the main change: introducing a chunk_origin tagging system for PreCompact checkpoint detection and filtering.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phase-1-fm6-precompact-tagging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

EtanHey · 2026-05-15T21:53:35Z

@coderabbitai review

EtanHey · 2026-05-15T21:53:35Z

@cursor @BugBot review

EtanHey · 2026-05-15T21:53:35Z

@codex review

cursor · 2026-05-15T21:53:38Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-15T21:53:40Z

✅ Actions performed

Review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c615e4f76

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T21:57:40Z

+            if not include_checkpoints:
+                where_clauses.append("COALESCE(c.chunk_origin, 'unknown') != 'precompact_checkpoint'")


Over-fetch KNN results before dropping checkpoints

When checkpoint chunks are among the nearest k neighbors, this new predicate can discard the entire sqlite-vec candidate set because k is still only n_results unless one of the older filters triggers over-fetch. I reproduced this with 20 precompact_checkpoint vectors closer than 3 normal vectors: search(query_embedding=..., n_results=5) returned no rows even though normal chunks matched just outside the initial KNN window; the same issue applies to the binary KNN path below. Please treat the default checkpoint exclusion like the other post-KNN filters and increase effective_k before applying it.

Useful? React with 👍 / 👎.

Fixed in c0529bb: default checkpoint exclusion now triggers KNN over-fetch for both float and binary vector search paths. Added regressions for checkpoint-nearest-neighbor windows.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-15T22:12:23Z

@coderabbitai review

EtanHey · 2026-05-15T22:12:23Z

@codex review

EtanHey · 2026-05-15T22:12:23Z

@cursor @BugBot re-review

cursor · 2026-05-15T22:12:25Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-15T22:12:28Z

✅ Actions performed

Review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0529bba23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T22:16:01Z

+                        WHEN excluded.chunk_origin IS NOT NULL AND excluded.chunk_origin != 'unknown'
+                            THEN excluded.chunk_origin
+                        WHEN chunks.chunk_origin IS NULL
+                            THEN COALESCE(excluded.chunk_origin, 'unknown')
+                        ELSE chunks.chunk_origin


Recompute chunk_origin when upserted content changes

When an indexed source is reprocessed with the same stable chunk id (for example the source_file:index ids produced by index_new), this conflict handler updates content but preserves an existing precompact_checkpoint origin whenever the new content classifies as unknown. If a previous checkpoint-shaped chunk is replaced by normal content, the row remains tagged as a checkpoint and is silently filtered out of default search even though its content is no longer a checkpoint; the origin needs to track the newly upserted content in this case.

Useful? React with 👍 / 👎.

Fixed in 97033ec: upsert_chunks now recomputes chunk_origin when a stable chunk_id receives different content, so a checkpoint row can become unknown/normal content instead of staying checkpoint-tagged. Added regression: test_upsert_chunks_recomputes_origin_when_stable_chunk_content_changes.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-15T22:29:26Z

@cursor @BugBot re-review

EtanHey · 2026-05-15T22:29:26Z

@codex review

EtanHey · 2026-05-15T22:29:26Z

@coderabbitai review

cursor · 2026-05-15T22:29:29Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-15T22:29:31Z

✅ Actions performed

Review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97033ec905

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T22:33:15Z

+            checkpoint_count = 0
+        if checkpoint_count <= 0:
+            return n_results
+        return min(n_results + checkpoint_count, 1000)


Avoid starving normal results when checkpoints exceed the cap

When a DB has more than 1000 checkpoint vectors that rank ahead of a normal chunk for a query, this cap still asks sqlite-vec for only the first 1000 neighbors and the later chunk_origin != 'precompact_checkpoint' filter removes all of them, so default vector/hybrid search can return no normal results even though matching non-checkpoint chunks exist. I reproduced this with 1001 precompact_checkpoint rows at the query vector plus one nearby unknown row; search(..., n_results=1) returned an empty result. Since these checkpoints are expected to accumulate over time, the overfetch needs to avoid a fixed cap or apply the exclusion before/inside candidate selection.

Useful? React with 👍 / 👎.

Fixed in 05d215f: checkpoint-aware KNN widening no longer has the fixed 1000 cap for checkpoint filtering, so a DB with 1001+ nearer checkpoint vectors still fetches far enough to return normal chunks. Added regression: test_vector_search_does_not_starve_normal_results_after_many_checkpoints.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce112cc42e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T23:20:22Z

                n_results=num_results,
                entity_name=entity_name,
                project_filter=normalized_project,
+                include_checkpoints=include_checkpoints,


Filter checkpoint-derived KG facts

When an entity is detected, include_checkpoints is only forwarded to the chunk search leg of kg_hybrid_search; both the SQL fact path above and kg_hybrid_search() still read KG relations without joining source_chunk_id back to chunks.chunk_origin. In the default include_checkpoints=False case, any facts extracted from PreCompact checkpoint chunks can still be returned for entity searches even though checkpoint chunks themselves are hidden, so checkpoint pollution remains visible through facts.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-15T23:20:22Z

+        where_clauses = [
+            "COALESCE(chunk_origin, 'unknown') = ?",


Guard brain_resume on legacy readonly databases

brain_resume unconditionally references chunk_origin, but VectorStore._init_readonly_db() explicitly supports opening existing databases without running migrations and records _has_chunk_origin=False when that column is absent. In that readonly/pre-migration context this query fails with no such column: chunk_origin and the new resume tool returns an error instead of falling back to content-pattern detection or an empty result.

Useful? React with 👍 / 👎.

EtanHey · 2026-05-15T23:37:48Z

@coderabbitai review
@codex review
@cursor @BugBot re-review

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-15T23:37:58Z

@EtanHey Sure, I'll kick off a review of PR #285!

✅ Actions performed

Review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5e58df8df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T23:41:26Z

            content[:200],  # summary = first 200 chars
            json.dumps(tags) if tags else None,
            float(importance) if importance is not None else None,
+            detect_chunk_origin(content),


Invalidate checkpoint-count cache after storing checkpoints

When a long-lived VectorStore has already populated _checkpoint_count_cache (for example after any prior default vector search), this path can now insert a PreCompact checkpoint but only clears the hybrid cache; it never calls _invalidate_checkpoint_count_cache(). If the new checkpoint has an embedding (or is embedded before another invalidation), the next default KNN search still uses the stale lower k, sqlite-vec can return the checkpoint as the top candidate, and the checkpoint filter then drops it, yielding zero or too few normal memories until another code path happens to invalidate the count.

Useful? React with 👍 / 👎.

EtanHey · 2026-05-15T23:51:42Z

@coderabbitai review
@codex review
@cursor @BugBot re-review

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-15T23:51:55Z

@EtanHey Sure, kicking off another review of PR #285! I'll go through all the changes in detail.

✅ Actions performed

Full review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0199ae3565

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T23:56:07Z

+        cached_count = getattr(self, "_checkpoint_count_cache", None)
+        if cached_count is not None:
+            checkpoint_count = cached_count


Refresh checkpoint counts after external writes

When a long-lived MCP VectorStore has already cached 0 checkpoints, checkpoints inserted by another process (for example the watcher/drain writer) never invalidate this in-memory value, so subsequent vector searches keep using k = n_results while the SQL filter removes the new nearest checkpoint rows. In that common multi-process path, normal memories just behind the checkpoint rows can disappear from default search until the server restarts; the count needs a TTL/DB-change check or should be recomputed instead of being cached indefinitely.

Useful? React with 👍 / 👎.

EtanHey · 2026-05-16T00:06:35Z

@coderabbitai review
@codex review
@cursor @BugBot re-review

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-16T00:06:46Z

@EtanHey Sure, I'll kick off another review of PR #285!

✅ Actions performed

Review triggered.

chatgpt-codex-connector · 2026-05-16T00:10:57Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/_helpers.py`:
- Around line 70-76: The current _is_sqlite_busy_error only checks
apsw.BusyError and two message strings, which can miss SQLITE_LOCKED paths;
update _is_sqlite_busy_error to also treat apsw.LockedError, and inspect the
APSW/SQLite result code on the exception (e.g., check getattr(exc, "result",
None) or any APSW result_code attribute) for SQLITE_BUSY and SQLITE_LOCKED (use
the apsw constants) and finally fall back to the existing casefold message
checks; reference the function _is_sqlite_busy_error and the apsw exception
classes (apsw.BusyError, apsw.LockedError) and the SQLite result codes
(SQLITE_BUSY, SQLITE_LOCKED) so retries trigger correctly under write
contention.

In `@src/brainlayer/mcp/search_handler.py`:
- Around line 184-185: The check for checkpoint chunks currently returns None
which allows the caller to continue into the generic search path; instead,
short-circuit by returning the same empty-result value the explicit chunk_id=
path uses (i.e., an empty list/collection of matches) so a checkpoint chunk-id
resolves as a clean miss; update the branch in search_handler.py that checks
include_checkpoints and CHUNK_ORIGIN_PRECOMPACT_CHECKPOINT (the
chunk.get("chunk_origin") check) to return that empty-result type rather than
None.
- Around line 770-772: The except block currently builds error_result =
_error_result(f"Resume error: {exc}") and returns (error_result.content, {})
which strips the isError flag; change the handler to return the
_error_result(...) directly (i.e., return _error_result(f"Resume error: {exc}"))
so the MCP caller receives the full error object; update the except in
search_handler (the block that references error_result and _error_result) to
mirror other handlers in this file that return _error_result(...) directly.

---

Outside diff comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 331-375: The _kg_facts_sql function currently performs one-shot
cursor.execute calls and swallows all exceptions, causing intermittent loss on
SQLITE_BUSY; update _kg_facts_sql to retry the DB queries on transient SQLite
lock errors (sqlite3.OperationalError with "database is locked" or error code
SQLITE_BUSY) using the same pattern as _execute_resume_query_with_retry (or the
retry/backoff logic in _search): wrap the cursor.execute blocks that fetch the
entity id and the facts_raw query in a retry loop with a small exponential
backoff and a capped max attempts, only re-raising or returning [] after retries
are exhausted, and keep existing behavior for non-lock exceptions. Ensure you
reference and update the calls that use cursor.execute in _kg_facts_sql so both
the id lookup and the SELECT for facts are retried.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7d73754-e91c-461a-8008-0c8bfe53c87a

📥 Commits

Reviewing files that changed from the base of the PR and between cf6b5f7 and 46778ae.

📒 Files selected for processing (15)

src/brainlayer/_helpers.py
src/brainlayer/chunk_origin.py
src/brainlayer/drain.py
src/brainlayer/kg_repo.py
src/brainlayer/mcp/__init__.py
src/brainlayer/mcp/search_handler.py
src/brainlayer/queue_io.py
src/brainlayer/search_repo.py
src/brainlayer/store.py
src/brainlayer/vector_store.py
src/brainlayer/watcher_bridge.py
tests/test_precompact_chunk_origin.py
tests/test_search_filter_params.py
tests/test_think_recall_integration.py
tests/test_tool_annotations.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: test (3.11)
GitHub Check: Macroscope - Correctness Check

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Files:

src/brainlayer/_helpers.py
tests/test_think_recall_integration.py
src/brainlayer/queue_io.py
src/brainlayer/chunk_origin.py
tests/test_tool_annotations.py
src/brainlayer/store.py
src/brainlayer/watcher_bridge.py
tests/test_search_filter_params.py
src/brainlayer/drain.py
src/brainlayer/kg_repo.py
src/brainlayer/mcp/__init__.py
src/brainlayer/vector_store.py
src/brainlayer/search_repo.py
src/brainlayer/mcp/search_handler.py
tests/test_precompact_chunk_origin.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Files:

src/brainlayer/_helpers.py
src/brainlayer/queue_io.py
src/brainlayer/chunk_origin.py
src/brainlayer/store.py
src/brainlayer/watcher_bridge.py
src/brainlayer/drain.py
src/brainlayer/kg_repo.py
src/brainlayer/mcp/__init__.py
src/brainlayer/vector_store.py
src/brainlayer/search_repo.py
src/brainlayer/mcp/search_handler.py

🪛 OpenGrep (1.20.0)

src/brainlayer/vector_store.py

[ERROR] 233-233: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🔇 Additional comments (32)

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-16T00:26:00Z

@coderabbitai review
@codex review
@cursor @BugBot re-review

Updated PR-A FM6 head to 8c758a1 after addressing latest review findings: APSW LockedError handling, exact checkpoint chunk-id no-fallback behavior, brain_resume MCP error preservation, and migration/runtime wrapped-marker parity.

coderabbitai · 2026-05-16T00:26:16Z

@EtanHey Sure! I'll kick off a fresh review of PR #285 against the updated head (8c758a1dd6ed5517a52983b9ef2f868dea66c332). The fixes you've described—APSW LockedError handling, exact checkpoint chunk-id no-fallback behavior, brain_resume MCP error preservation, and migration/runtime wrapped-marker parity—all sound like solid improvements. I'll cover those areas closely in the review.

✅ Actions performed

Review triggered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c758a1dd6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T00:30:35Z

+            cursor.execute(
+                """
+                UPDATE chunks
+                SET chunk_origin = ?
+                WHERE COALESCE(chunk_origin, ?) != ?


Handle LockedError during checkpoint backfill

When two BrainLayer processes open the same database for the first time after this change, this new backfill can hit an APSW LockedError/database table is locked on chunks; _init_db_with_retry() only retries apsw.BusyError, so a transient table lock here bypasses the startup backoff and fails MCP/store initialization instead of retrying. Please route the migration/update path through the same busy/locked classification used for reads, or catch LockedError in the init retry loop.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8c758a1. Configure here.}

cursor · 2026-05-16T00:34:32Z

+        if not getattr(self, "_has_chunk_origin", True):
+            return None
+        column = f"{table_alias}.chunk_origin" if table_alias else "chunk_origin"
+        return f"COALESCE({column}, 'unknown') != 'precompact_checkpoint'"


Hardcoded magic string instead of imported constant in exclusion clause

Low Severity

_checkpoint_exclusion_clause hardcodes the string 'precompact_checkpoint' in the SQL fragment even though CHUNK_ORIGIN_PRECOMPACT_CHECKPOINT is already imported in the same file and used correctly in _checkpoint_filtered_knn_k. Similarly, kg_repo.py uses the literal "precompact_checkpoint" in checkpoint_params.append(...) without importing or referencing the constant. If the constant value ever changes, these hardcoded strings will silently diverge.

Additional Locations (1)

src/brainlayer/kg_repo.py#L1004-L1005

^{Reviewed by Cursor Bugbot for commit 8c758a1. Configure here.}

cursor · 2026-05-16T00:34:32Z

+
+        if checkpoint_count <= 0:
+            return n_results
+        return n_results + checkpoint_count


Unbounded KNN k growth with checkpoint count

Medium Severity

_checkpoint_filtered_knn_k returns n_results + checkpoint_count without any upper bound. For databases accumulating many checkpoint chunks over time, every default search (which excludes checkpoints) will set the KNN k parameter to n_results + total_checkpoints, causing sqlite-vec to scan and rank an unbounded number of vectors. A database with thousands of checkpoints would make every simple 5-result search perform a multi-thousand-vector KNN scan.

^{Reviewed by Cursor Bugbot for commit 8c758a1. Configure here.}

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py Outdated

Comment thread src/brainlayer/vector_store.py

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from 3c615e4 to c0529bb Compare May 15, 2026 22:11

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py Outdated

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from c0529bb to 97033ec Compare May 15, 2026 22:28

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from 97033ec to 05d215f Compare May 15, 2026 22:42

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/mcp/search_handler.py Outdated

Comment thread src/brainlayer/mcp/search_handler.py

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from ce112cc to f5e58df Compare May 15, 2026 23:37

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/store.py Outdated

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from f5e58df to 0199ae3 Compare May 15, 2026 23:51

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py Outdated

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from 0199ae3 to 46778ae Compare May 16, 2026 00:06

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/brainlayer/vector_store.py Outdated

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/brainlayer/_helpers.py

Comment thread src/brainlayer/mcp/search_handler.py Outdated

Comment thread src/brainlayer/mcp/search_handler.py Outdated

feat: tag precompact checkpoints

8c758a1

EtanHey force-pushed the feat/phase-1-fm6-precompact-tagging branch from 46778ae to 8c758a1 Compare May 16, 2026 00:25

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

cursor Bot reviewed May 16, 2026

View reviewed changes

EtanHey merged commit fc9e6de into main May 16, 2026
7 checks passed

coderabbitai Bot mentioned this pull request May 18, 2026

fix: route BrainBar search through Python hybrid helper #293

Merged

		if not include_checkpoints:
		where_clauses.append("COALESCE(c.chunk_origin, 'unknown') != 'precompact_checkpoint'")

Conversation

EtanHey commented May 15, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Notes

Add PreCompact checkpoint tagging, filtering, and brain_resume tool to VectorStore search

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

cursor Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

EtanHey May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

cursor Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

EtanHey May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

EtanHey commented May 15, 2026

Uh oh!

cursor Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

EtanHey commented May 15, 2026 •

edited by macroscopeapp Bot

Loading

Add PreCompact checkpoint tagging, filtering, and `brain_resume` tool to VectorStore search

coderabbitai Bot commented May 15, 2026 •

edited

Loading