feat: Phase 5 — search/store hardening by EtanHey · Pull Request #26 · EtanHey/brainlayer

EtanHey · 2026-02-23T17:13:55Z

Summary

Project scoping: auto-detect project from CWD via ~/.config/brainlayer/scopes.yaml — brain_search auto-scopes when project param is None
Auto-importance: keyword-based heuristic scoring in brain_store (architectural +3, prohibition +2, length +1, file ref +1, baseline 3, cap 10)
Decision tracking: confidence_score, outcome, reversibility, files_changed fields in brain_store for type=decision
phase_commits table: SQLite table for commit history with phase/session/project linkage
Post-commit hook: hooks/post-commit.py auto-stores git commits into BrainLayer
Store queue buffer: DB lock → queue to pending-stores.jsonl, auto-flush on next successful store
CLI: brainlayer hooks install (symlinks hook), brainlayer flush (manual queue drain)
Scripts: scope-config.py (generate scopes.yaml from git repos), phase-boundaries.py (phase commit report)

Test plan

25 new tests in tests/test_phase5.py — all pass
258 total tests passing (0 failures, 2 skipped)
Ruff lint clean on all modified files
No regressions in existing test suite

🤖 Generated with Claude Code

Note

Medium Risk
Touches core MCP search/store paths and adds new persistence/queueing behavior around SQLite locking; failures are mostly best-effort, but regressions could affect write reliability and project scoping.

Overview
Hardens BrainLayer search/store behavior and adds phase/decision tracking. brain_search now auto-scopes project from CWD via a new scoping.py + scopes.yaml mapping, while brain_store gains keyword-based auto-importance when importance is omitted.

Extends stored-memory metadata and improves write reliability. store_memory and the MCP brain_store schema now accept decision-tracking fields (confidence_score, outcome, reversibility, files_changed), and stores that hit SQLite “locked/busy” errors are buffered to pending-stores.jsonl and auto-flushed on the next successful write (with a new brainlayer flush CLI command).

Adds commit/phase tooling. Introduces a phase_commits table in the DB, a hooks/post-commit.py hook (installable via brainlayer hooks install) to store commit summaries into BrainLayer, and helper scripts to generate scopes.yaml and report phase boundaries; adds Phase 5 tests covering these behaviors.

^{Written by Cursor Bugbot for commit 5917a1d. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

Release Notes

New Features
- Automatic Git commit tracking via post-commit hook integration
- CLI commands to install hooks and flush pending operations
- Decision and phase tracking with confidence scores and outcome metadata
- Automatic project detection based on repository scopes
- Auto-importance scoring for stored items
- Improved database resilience with queuing for locked state scenarios

…ks, store queue) - Project scoping: auto-detect project from CWD via ~/.config/brainlayer/scopes.yaml - Auto-importance: keyword-based heuristic scoring (arch +3, prohibition +2, length +1, file ref +1) - Decision tracking: confidence_score, outcome, reversibility, files_changed fields in brain_store - phase_commits table: commit history with phase/session/project linkage - Post-commit hook: auto-store git commits into BrainLayer (hooks/post-commit.py) - Store queue buffer: DB lock → queue to pending-stores.jsonl, flush on next success - CLI: brainlayer hooks install, brainlayer flush - Scripts: scope-config.py (generate scopes.yaml), phase-boundaries.py (report) - 25 new tests (all pass), 258 total passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-02-23T17:23:06Z

+    if remaining:
+        path.write_text("\n".join(remaining) + "\n")
+    else:
+        path.unlink(missing_ok=True)


Flush can lose concurrently queued store entries

Low Severity

_flush_pending_stores reads the entire JSONL queue file, processes entries, then rewrites or deletes it. Between the read at line 1813 and the write at line 1844, a concurrent _queue_store call (from another process hitting a DB lock) can append new entries. The subsequent write_text or unlink overwrites those entries, silently losing queued stores.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-23T20:29:12Z

📝 Walkthrough

Walkthrough

This PR introduces Phase 5 features for BrainLayer, adding commit tracking via post-commit hooks, project scoping from YAML configuration, auto-importance scoring for stored items, DB-lock resilience through request queueing, enhanced decision tracking metadata, phase boundary analysis tooling, and comprehensive test coverage across new and existing functionality.

Changes

Cohort / File(s)	Summary
Commit Tracking `hooks/post-commit.py`	New Git post-commit hook that automatically captures commit metadata (hash, message, changed files) and stores it in BrainLayer with journal type and commit/git tags, with graceful error handling for unavailable tools.
Analysis & Configuration Scripts `scripts/phase-boundaries.py`, `scripts/scope-config.py`	Two new utility scripts: phase-boundaries queries and reports phase_commits table statistics; scope-config scans for Git repos and generates scopes.yaml mapping with support for dry-run and custom output paths.
Project Scoping Module `src/brainlayer/scoping.py`	New module implementing project scope resolution from ~/.config/brainlayer/scopes.yaml with PyYAML support, fallback simple parser, longest-prefix matching, and CWD-based heuristics for automatic project detection.
Core Storage & Metadata Enhancement `src/brainlayer/store.py`, `src/brainlayer/vector_store.py`	Extended store\_memory signature with optional confidence\_score, outcome, reversibility, and files\_changed parameters; added phase\_commits table to track decisions with indexed fields on project and phase\_name.
MCP Enhancements `src/brainlayer/mcp/__init__.py`	Auto-importance scoring from keywords/length/file refs, JSONL-backed queuing for locked databases, auto-scoping in brain\_search, new metadata field propagation through storage pipelines, and improved resilience to DB locks with queued operation recovery.
CLI Commands `src/brainlayer/cli/__init__.py`	Two new commands: hooks (installs post-commit hook with symlink/backup support) and flush (processes pending-stores.jsonl queue with embedding-based flushing and status reporting).
Phase 5 Tests `tests/test_phase5.py`	Comprehensive test module covering auto-importance scoring, auto-type detection, scoping resolution with config/caching, phase\_commits schema, decision metadata persistence, and integration validation across new features.

Sequence Diagram

sequenceDiagram
    participant Client as Client/MCP
    participant Store as Brain Store
    participant DB as VectorStore/DB
    participant Queue as Pending Queue<br/>(JSONL)
    participant Embed as Embedding Service

    Client->>Store: brain_store request<br/>(content, metadata)
    Store->>Store: _auto_importance()<br/>score if not provided
    Store->>DB: Attempt insert to chunks<br/>with metadata
    
    alt DB Lock/Timeout
        DB-->>Store: Lock Error
        Store->>Queue: _queue_store()<br/>(request to JSONL)
        Store-->>Client: return<br/>queued=true
    else Success
        DB-->>Store: Insert OK
        Store->>Embed: Send to embedding model<br/>(with metadata fields)
        Embed-->>Store: Embedding complete
        Store->>Queue: _flush_pending_stores()<br/>process queued items
        Queue-->>DB: Retry queued inserts
        Store-->>Client: return<br/>queued=false,<br/>flushed count
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

test: Phase 4 QA — comprehensive tests for Phase 3 core fixes #4: Overlaps in project-name normalization and scoping behavior implementation, with shared concerns around resolving/detecting project scope context.
fix: Phase 8b code quality — dead code, MCP hardening, doc cleanup #17: Touches overlapping core systems—mcp internals, store metadata handling, and CLI command infrastructure—with code-level dependencies across these modules.

Poem

🐰 Commits now hop into the store,
Auto-scores keep track of more,
Projects scope with perfect aim,
DB locks? We queue the same!
Phase boundaries bloom so clear,
Quality's the vibe this year! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Phase 5 — search/store hardening' partially relates to the changeset. While the PR includes search/store enhancements (auto-scoping, auto-importance, decision tracking), it encompasses much broader functionality: git hooks, CLI commands, database schema changes, queueing/reliability mechanisms, and utility scripts. The title focuses narrowly on search/store aspects rather than capturing the main point of the overall change.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/phase-5-search-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/brainlayer/store.py (1)

41-98: ⚠️ Potential issue | 🟠 Major

Validate decision metadata before persisting

The new decision fields are documented with ranges/enums, but store_memory accepts any values. That can pollute analytics and break expectations when data is ingested from non‑MCP callers. Add lightweight validation (and optional type checks for files_changed).

🛠 Suggested fix

     if memory_type not in VALID_MEMORY_TYPES:
         raise ValueError(f"type must be one of: {', '.join(VALID_MEMORY_TYPES)}")
+
+    if confidence_score is not None and not (0.0 <= confidence_score <= 1.0):
+        raise ValueError("confidence_score must be between 0 and 1")
+    if outcome is not None and outcome not in {"pending", "validated", "reversed"}:
+        raise ValueError("outcome must be one of: pending, validated, reversed")
+    if reversibility is not None and reversibility not in {"easy", "hard", "destructive"}:
+        raise ValueError("reversibility must be one of: easy, hard, destructive")
+    if files_changed is not None and not all(isinstance(f, str) for f in files_changed):
+        raise ValueError("files_changed must be a list of file path strings")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/store.py` around lines 41 - 98, In store_memory, validate the
new decision metadata before adding it to meta: check confidence_score is a
number between 0 and 1 (raise ValueError otherwise), ensure outcome is one of
("pending","validated","reversed"), ensure reversibility is one of
("easy","hard","destructive"), and ensure files_changed is either None or a list
of strings (raise ValueError if any element is not str); perform these checks
after clamping importance and before building the meta dict (refer to function
store_memory, variables confidence_score, outcome, reversibility, files_changed)
and return clear ValueError messages on invalid input so bad values never get
persisted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/post-commit.py`:
- Around line 59-66: The hook calls an executable named "brainlayer-store" that
isn't packaged; change the subprocess command construction in
hooks/post-commit.py (the cmd list used with subprocess.run) to use the
canonical CLI form ["brainlayer", "store", ...] (i.e. replace the single-word
entrypoint with the "brainlayer" binary plus "store" subcommand and keep the
same flags/arguments), or alternatively add a "brainlayer-store" console_script
entry to pyproject.toml so the original cmd resolves; adjust whichever path you
choose and keep the existing subprocess.run usage and error handling.

In `@scripts/phase-boundaries.py`:
- Around line 61-80: The recent-commits block and phase-summary formatting treat
nullable fields and zero-confidence incorrectly: explicitly check for None
rather than using truthiness. In the rows loop (variable rows, avg_conf) change
the conf_display logic to render "N/A" only when avg_conf is None and render
"0.00" when avg_conf == 0.0; in the recent loop (variable recent,
commit_message/msg) guard against msg being None before slicing (e.g., use msg
if msg is not None else "" or assign msg = msg or "" before msg[:50]) and
similarly ensure phase/outcome formatting uses explicit None checks so
slicing/formatting never receives None.

In `@scripts/scope-config.py`:
- Around line 52-55: The YAML key (display_path) is unquoted which can produce
invalid YAML when the path contains ":" or special chars; in the loop that
builds lines (variables: repos, display_path, lines) change the list append to
quote the key and escape any internal quotes so keys become valid YAML — e.g.
compute an escaped_display_path = display_path.replace('"', '\\"') and use
lines.append(f'  "{escaped_display_path}": "{name}"') (or alternatively use a
YAML dumper like yaml.safe_dump to emit the mapping) so the generated YAML keys
are always quoted.
- Around line 14-15: Remove the unused imports by deleting the top-level import
statements for os and subprocess in scope-config.py; the file already uses
pathlib.Path for path handling so you should keep the pathlib import (if
present) and any other used imports but remove references to the symbols os and
subprocess to eliminate the unused-import warnings.
- Around line 94-96: The current write step uses args.output.write_text(content)
without an encoding and will silently overwrite existing files; update the logic
around args.output.write_text to (1) explicitly pass encoding="utf-8" when
writing, and (2) detect if args.output.exists() before writing and warn the user
and preserve the prior file (e.g., copy/rename the existing path to a
timestamped backup or .bak) or require an explicit --force flag before
overwriting; adjust the messages printed (the print(f"Wrote {args.output}
({len(repos)} repos)") call) to reflect backup/overwrite behavior and reference
args.output.parent.mkdir, args.output.exists, args.output.write_text, and repos
when implementing these changes.
- Around line 84-86: The script currently prints "No git repos found..." then
returns which yields exit code 0; modify the block under the if not repos: check
to call sys.exit(1) (or raise SystemExit(1)) after printing so the process exits
with a non-zero code; ensure the module imports sys at top if not already
present and update the if not repos: branch in scripts/scope-config.py
accordingly.

In `@src/brainlayer/cli/__init__.py`:
- Around line 1592-1594: Wrap the call to hook_source.chmod(0o755) in a
try/except that catches PermissionError and logs a warning (using rprint or the
existing logger) instead of letting the exception abort the installation; ensure
the symlink creation (target.symlink_to(hook_source.resolve())) remains
unchanged and that rprint still reports success (e.g., keep the
rprint(f"[green]Installed post-commit hook: {target} →
{hook_source.resolve()}[/]") call) so installs in read‑only locations succeed
with a warning about permissions.
- Around line 1544-1594: The hooks command currently hardcodes repo_root /
".git" / "hooks" which breaks for worktrees/submodules; update the hooks()
implementation to call git (subprocess.check_output) with ["git", "rev-parse",
"--git-path", "hooks"] to obtain the correct hooks directory path (fall back to
repo_root / ".git" / "hooks" only if that command fails), use that value for
hooks_dir, and handle errors similarly to the existing git check; also stop
calling hook_source.chmod(0o755) on the package source file—instead either
remove that chmod call or apply permissions to the created symlink
(target.chmod(...)) if you need the hook to be executable.

In `@src/brainlayer/mcp/__init__.py`:
- Around line 1921-1932: The queued item uses the raw project variable instead
of the normalized_project, causing inconsistent project names after flush;
update the call to _queue_store in the DB-locked handler to pass
normalized_project (the same value used on success) for the "project" field,
ensuring normalized_project is available in the scope where _queue_store is
invoked.
- Around line 1787-1848: _flush_pending_stores currently reads and rewrites the
live pending-stores.jsonl which can lose concurrent _queue_store appends; change
to perform an atomic swap: compute the path from _get_pending_store_path(),
atomically rename/move the original file to a temp file (e.g.,
pending-stores.jsonl.processing using os.rename/pathlib.Path.rename) so new
writers via _queue_store continue appending to a fresh file, read/process the
temp file lines calling store_memory from _flush_pending_stores, collect failed
lines, then append any failed lines to the current live file (open original in
"a" and write failures + newline) and finally remove the temp file (or unlink
with missing_ok); ensure all rename/read/append operations handle missing file
races and exceptions so no writes are lost.

In `@src/brainlayer/scoping.py`:
- Around line 95-105: The current prefix check using cwd.startswith(expanded)
can produce false positives; in the loop that builds matches (variables
scope_map, prefix, project, expanded, cwd, matches) replace the startswith logic
with a path-aware check: normalize/resolve both expanded and cwd
(Path(expanded).expanduser().resolve() and Path(cwd).resolve()) and use
Path.is_relative_to(resolved_expanded) (or, if compatibility required, compare
os.path.commonpath([resolved_cwd, resolved_expanded]) == str(resolved_expanded))
so only true directory/subpath relationships are matched before appending to
matches and returning the longest match.

---

Outside diff comments:
In `@src/brainlayer/store.py`:
- Around line 41-98: In store_memory, validate the new decision metadata before
adding it to meta: check confidence_score is a number between 0 and 1 (raise
ValueError otherwise), ensure outcome is one of
("pending","validated","reversed"), ensure reversibility is one of
("easy","hard","destructive"), and ensure files_changed is either None or a list
of strings (raise ValueError if any element is not str); perform these checks
after clamping importance and before building the meta dict (refer to function
store_memory, variables confidence_score, outcome, reversibility, files_changed)
and return clear ValueError messages on invalid input so bad values never get
persisted.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 851b17d and efc0cbf.

📒 Files selected for processing (9)

hooks/post-commit.py
scripts/phase-boundaries.py
scripts/scope-config.py
src/brainlayer/cli/__init__.py
src/brainlayer/mcp/__init__.py
src/brainlayer/scoping.py
src/brainlayer/store.py
src/brainlayer/vector_store.py
tests/test_phase5.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: test (3.11)
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: Cursor Bugbot

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py