fix(writer): single-writer pidfile mutex + retry/timeout/throttle hardening by EtanHey · Pull Request #309 · EtanHey/brainlayer

EtanHey · 2026-05-21T18:39:13Z

Summary

Add a pidfile-guarded single-writer mutex before writable VectorStore DB init; readonly opens do not touch the writer lock.
Harden lock ownership for basename collisions, symlink aliases, forked/inherited refs, close/open races, PID reuse, relative pidfile dirs, and empty pidfile write races.
Increase init retry budget, raise drain busy timeout, and raise enrichment LaunchAgent throttle.
Keep prompt/live retrieval read-only under the new mutex and harden recency fallback filter parity/retry behavior.

Writer Mutex Contract

Default pidfile path: /tmp/brainlayer-writer-${SHA256_RESOLVED_DB_PATH_16}-${RESOLVED_DB_BASENAME}.pid.
Override: BRAINLAYER_WRITER_PIDFILE_DIR=/path/to/dir; relative override values are normalized under /tmp.
RW init uses O_CREAT | O_EXCL, takes an exclusive flock before writing, and records PID plus process start-time metadata.
Existing live owner raises WriterInUseError: another writer is using $DB_PATH (pid $PID).
Dead, invalid, or PID-reused owners are reclaimed after verifying the opened pidfile is still the path target.
Same-process writer opens are ref-counted; release keeps ownership until unlink completes and always runs from close()/atexit.

Retry / Timeout / Throttle Math

_INIT_MAX_RETRIES: 5 -> 10, override BRAINLAYER_INIT_MAX_RETRIES, clamped to at least one attempt.
Backoff sleeps: 0.5 + 1 + 2 + 4 + 8 + 16 + 30 + 30 + 30 + 30 = 151.5s sleep budget.
With APSW busy timeout at 30s per attempt, worst-case wall time remains under the 600s cap.
Drain busy timeout: 200ms -> 30000ms, override BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS; invalid, non-positive, or APSW-overflowing env falls back to 30000.
Enrichment LaunchAgent ThrottleInterval: 30 -> 60 in scripts/launchd/com.brainlayer.enrichment.plist; installed user plist was also updated locally to 60.

RED -> GREEN

Initial required RED: pytest tests/test_writer_pidfile_mutex.py collected 7 required tests and all 7 failed before implementation.
Initial plist RED: test_enrichment_plist_throttle_interval_at_least_60s failed against ThrottleInterval=30 before restoring the template change.
Review-fix REDs were added and watched fail before fixes for basename collisions, symlink aliases, stale unlink races, same-process atexit reuse, release ownership race, close failure release, inherited refs, locked-before-write creation, PID-reuse owner metadata, relative pidfile dir normalization, hook readonly opens, recency filter parity/date handling/busy retry, retry budget clamp, and drain timeout env validation.
Latest focused GREEN: pytest tests/test_writer_pidfile_mutex.py tests/test_adaptive_injection.py tests/test_hybrid_search.py tests/test_db_lock_resilience.py -> 68 passed.

Enrich-Rate Bench

Baseline supplied by PR-alpha prompt:

Pre-PR-alpha post-T0, 2026-05-21T16:22:12Z: 213 enrichments / 77 min = ~166/hr.

Fresh >=30 min live sample command:

PYTHONPATH=src python3 - <<'PY'
from datetime import datetime, timezone, timedelta
from brainlayer.paths import get_db_path
from brainlayer.vector_store import VectorStore
start = datetime.now(timezone.utc) - timedelta(minutes=30)
end = datetime.now(timezone.utc)
store = VectorStore(get_db_path(), readonly=True)
count = 0
try:
    for _, enriched_at in store.conn.cursor().execute(
        "SELECT id, enriched_at FROM chunks WHERE enriched_at IS NOT NULL AND enriched_at NOT LIKE 'skipped:%'"
    ):
        if not enriched_at:
            continue
        try:
            dt = datetime.fromisoformat(str(enriched_at).replace('Z', '+00:00'))
        except ValueError:
            continue
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)
        if start <= dt <= end:
            count += 1
finally:
    store.close()
minutes = (end - start).total_seconds() / 60
print(start.isoformat(), end.isoformat(), count, f"{count / minutes * 60:.1f}/hr")
PY

Fresh sample observed during PR-alpha implementation:

Window: 2026-05-21T18:03:20.907164+00:00 to 2026-05-21T18:33:20.907301+00:00.
Count: 207 enrichments / 30.0 min = 414.0/hr.

Test Plan

pytest tests/test_writer_pidfile_mutex.py tests/test_adaptive_injection.py tests/test_hybrid_search.py tests/test_db_lock_resilience.py -> 68 passed.
ruff check src/brainlayer/vector_store.py hooks/brainlayer-prompt-search.py src/brainlayer/search_repo.py tests/test_writer_pidfile_mutex.py tests/test_adaptive_injection.py -> passed.
ruff format --check src/brainlayer/vector_store.py hooks/brainlayer-prompt-search.py src/brainlayer/search_repo.py tests/test_writer_pidfile_mutex.py tests/test_adaptive_injection.py -> 5 files already formatted.
pytest tests/test_eval_baselines.py -m live -> 34 passed, 5 xfailed, 2 xpassed.
Latest pre-push gate -> 2090 passed, 9 skipped, 75 deselected, 1 xfailed, 102 warnings; MCP registration 3 passed; isolated eval/hook routing 32 passed; Bun 1 passed; FTS determinism passed.
cr review --plain passed earlier on staged diff; a later local CLI attempt hung and was stopped, with hosted CodeRabbit running on the pushed commits.

Notes

Plain local pytest under system Python 3.13 was blocked earlier by optional dependency drift (deepchecks missing, numba incompatible with NumPy 2.4). The repo pre-push hook uses its Python 3.12 env and passed the configured gate.

Note

Add single-writer pidfile mutex and recency-intent search fallback to `VectorStore` and `hybrid_search`

Read-write VectorStore instances now acquire an exclusive pidfile lock on init; a second writer raises WriterInUseError. Same-process reuse is ref-counted and the lock is released on close even if connection teardown fails.
hybrid_search detects recency intent in query text and supplements FTS results with the most-recent matching chunks via a SQL fallback, retrying up to 3 times on SQLite busy errors with exponential backoff. Chunks ≤7 days old receive an extra 2.0× score boost when recency intent is detected.
The drain connection busy timeout is now configurable via BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS (default 30 s, was 200 ms).
Single-token entity matches for types technology and tool are skipped during entity span matching in the prompt-search hook.
The launchd enrichment job ThrottleInterval is raised from 30 s to 60 s.
Risk: any process that previously opened multiple read-write VectorStore instances concurrently will now fail with WriterInUseError unless re-entrant acquisition is detected correctly.

^{Macroscope summarized c668da8.}

Note

Medium Risk
Introduces a new cross-process single-writer gate for writable VectorStore opens and changes search fallback/ranking behavior; misconfiguration or unexpected multi-writer usage could now raise WriterInUseError or alter result ordering.

Overview
Adds a single-writer mutex for writable DB access. Writable VectorStore initialization now acquires a pidfile-based lock (with PID/start-time validation, symlink-safe path hashing, ref-counting, and guaranteed release on close()/atexit) and raises WriterInUseError when another live writer owns the database.

Hardens read paths and retrieval behavior under contention. Hybrid prompt search and multiple integration tests now open VectorStore(..., readonly=True); VectorStore init retry budget/backoff is increased and made configurable; drain’s APSW busy timeout is increased to 30s (env-configurable); the enrichment LaunchAgent throttle is raised to 60s.

Improves search quality edge cases. hybrid_search gains a “recency intent” fallback that supplements lexical candidates with the most recent 7 days (respecting existing filters and retrying on busy), plus an extra boost for ≤7-day results when recency words are present; prompt entity injection skips ambiguous single-token matches for technology/tool entity types.

^{Reviewed by Cursor Bugbot for commit c668da8. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

Release Notes

New Features
- Automatic recency prioritization: queries containing "current," "latest," "recent," "today," or "this week" now automatically retrieve results from the last 7 days with enhanced ranking when no explicit date range is specified.
Improvements
- Enhanced disambiguation for single-token entity type matching in search
- Improved database concurrency protection and configurable connection timeout behavior

coderabbitai · 2026-05-21T18:39:21Z

📝 Walkthrough

Walkthrough

This PR adds pidfile-based exclusive writer coordination to VectorStore, detects recency intent to fetch and boost 7-day recent chunks, makes SQLite busy-timeout and init-retry budget environment-configurable, skips ambiguous single-token KG entity matches, and updates tests/fixtures to open VectorStore in readonly mode.

Changes

VectorStore Writer Coordination and Search Enhancements

Layer / File(s)	Summary
Entity span filtering & hook readonly open `hooks/brainlayer-prompt-search.py`	Adds `AMBIGUOUS_SINGLE_TOKEN_ENTITY_TYPES` and skips single-token KG matches for those types; constructs `VectorStore(..., readonly=True)` in hybrid search hook.
Recency-intent detection & recent-chunk fallback `src/brainlayer/search_repo.py`, `tests/test_hybrid_search.py`	Adds regex helpers and `_has_recency_intent()`, retrieves last-7-days chunks when recency intent is detected without `date_from`, merges them into pre-RRF ranks/keyword data, and applies extra 2.0x boost to results with `age_days <= 7`; includes new tests and helper adjustments.
VectorStore writer pidfile mutex and init/retry behavior `src/brainlayer/vector_store.py`, `tests/test_writer_pidfile_mutex.py`	Introduces `WriterInUseError`, pidfile acquisition/release with `fcntl.flock`, deterministic pidfile naming, PID liveness/start-time checks, cross-process ref-counting, readonly init skip, and env-driven init retry/backoff tuning. Comprehensive pidfile tests added.
Drain busy-timeout, retry caps, and launchd throttle `src/brainlayer/drain.py`, `scripts/launchd/com.brainlayer.enrichment.plist`, `tests/test_writer_pidfile_mutex.py`	Adds `_drain_busy_timeout_ms()` reading `BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS` (fallback 30000 ms) and applies it via `conn.setbusytimeout()`. Makes init retry budget environment-configurable with capped exponential backoff. Updates launchd `ThrottleInterval` to 60s and adds tests asserting behavior.
Test fixtures & readonly openings `tests/test_engine.py`, `tests/test_eval_baselines.py`, `tests/test_think_recall_integration.py`, `tests/test_vector_store.py`, `tests/test_adaptive_injection.py`	Updates multiple fixtures and adds a unit test to assert `prompt_search.run_hybrid_search` opens `VectorStore` with `readonly=True`, converting integration fixtures to non-mutating readonly DB opens.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

EtanHey/brainlayer#213: Related changes to KG entity span matching in the prompt-search hook.
EtanHey/brainlayer#242: Overlaps with hybrid_search ranking changes (MMR/overfetch vs recency fallback).
EtanHey/brainlayer#304: Related readonly/open-path and VectorStore search wiring changes.

Poem

🐰 I nibble pidfiles neat and small,

I hop through recent results and call,
I slow the drain and tune the sleep,
Quiet tests now safely peep—
A rabbit hums: locks kept, searches leap.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly summarizes the main changes: implementation of a single-writer pidfile mutex, plus improvements to retry/timeout/throttle behavior. It aligns with the primary objectives in this comprehensive PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/single-writer-arbitration-option-a

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-21T18:39:47Z

@greptileai review

EtanHey · 2026-05-21T18:39:47Z

@cursor @BugBot review

EtanHey · 2026-05-21T18:39:47Z

@codex review

EtanHey · 2026-05-21T18:39:47Z

@coderabbitai review

cursor · 2026-05-21T18:39:50Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-21T18:39:54Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17e33ea01b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T18:43:23Z

+                    FROM chunks
+                    WHERE created_at >= datetime('now', '-7 days') {" ".join(recent_extra)}
+                    ORDER BY created_at DESC


Preserve entity scoping in recency-intent fallback

This fallback query pulls recent rows directly from chunks without applying entity_id, even though hybrid_search documents entity_id as a hard per-entity scope. Because these rows are injected into fts_ranks/keyword_data and the later FTS-only post-filter does not re-check entity membership, queries containing recency terms (e.g. “recent”, “latest”) can return chunks from other entities.

Useful? React with 👍 / 👎.

Fixed in acbf191. The recency-intent fallback now preserves entity_id and related active filters before injecting recent rows into ranking data; covered by a regression test.

chatgpt-codex-connector · 2026-05-21T18:43:23Z

+                try:
+                    pidfile.unlink()
+                except FileNotFoundError:


Make stale pidfile cleanup atomic before unlink

The stale-lock path unlinks based on a prior PID read/liveness check, but that check and unlink() are not atomic. If two writers race while recovering a stale pidfile, one process can create a fresh pidfile after the other has read the old PID, and the second process can then unlink that fresh lock and proceed, allowing concurrent writers to initialize the same DB.

Useful? React with 👍 / 👎.

Fixed in acbf191 and further hardened in 4378c17. Stale cleanup now opens and locks the pidfile, verifies the opened fd still matches the path with samestat before unlink, and ignores FileNotFoundError races.

EtanHey · 2026-05-21T18:48:35Z

PR-alpha CI is green and PR-beta #306 remains blocked on this landing first. Etan/admin: please merge with rebase only, no squash, when acceptable so beta can rebase onto the pidfile mutex on origin/main.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

brainlayer/src/brainlayer/vector_store.py

Line 314 in acbf191

raise last_err # type: ignore[misc]

Guard zero retry budget before re-raising init error

_INIT_MAX_RETRIES is now env-configurable, but when BRAINLAYER_INIT_MAX_RETRIES is set to 0 (or any negative value), the retry loop never executes and last_err stays None; this line then raises TypeError: exceptions must derive from BaseException instead of a meaningful DB-init failure. That turns a recoverable configuration mistake into a hard startup crash with misleading diagnostics.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T18:58:59Z

        _ingest_keyword_rows(fts_results, fts_ranks)
        _ingest_keyword_rows(trigram_fts_results, trigram_ranks)

+        recency_intent = any(term in query_text.lower() for term in _RECENCY_QUERY_TERMS)


Match recency intent on whole terms only

recency_intent is computed with substring checks (term in query_text.lower()), so the term "current" also matches unrelated words like "concurrent"/"recurrent". In those cases hybrid search incorrectly enters the recency fallback path and applies the extra ≤7-day score multiplier, which can reorder results toward recent chunks even when the user did not ask for recency.

Useful? React with 👍 / 👎.

Fixed in 1ece80a. This shares the same whole-term recency-intent helper and regression coverage as the Cursor finding.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ece80ab63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T19:04:56Z


    # Retry settings for DB init under contention (multiple MCP instances + enrichment)
-    _INIT_MAX_RETRIES = 5
+    _INIT_MAX_RETRIES = _int_env("BRAINLAYER_INIT_MAX_RETRIES", 10)


Clamp init retry env to at least one attempt

BRAINLAYER_INIT_MAX_RETRIES is read directly into _INIT_MAX_RETRIES without bounds checking, so setting it to 0 or a negative value makes _init_db_with_retry() run zero iterations and then execute raise last_err while last_err is still None, which raises TypeError and prevents writer startup entirely. This turns a retry-tuning env var into a hard boot failure under misconfiguration.

Useful? React with 👍 / 👎.

Fixed in c661a16. The retry budget is clamped to at least one attempt at env-read time and inside _init_db_with_retry, with a regression test proving a zero runtime budget re-raises the original BusyError.

chatgpt-codex-connector · 2026-05-21T19:04:56Z

+                           summary, tags, importance, intent,
+                           created_at, source, sender, language, decay_score
+                    FROM chunks
+                    WHERE created_at >= datetime('now', '-7 days') {" ".join(recent_extra)}


Compare timestamps as datetimes in recency fallback

The recency fallback filters with created_at >= datetime('now', '-7 days'), but created_at values are ISO strings like YYYY-MM-DDTHH:MM:SSZ while datetime() returns YYYY-MM-DD HH:MM:SS; SQLite then performs a lexical TEXT comparison, so same-day boundary rows (e.g. ...T00:00:00Z) are treated as newer than the cutoff even when they are actually older than 7 days. This widens the fallback window and can pull stale chunks into “latest/recent” results.

Useful? React with 👍 / 👎.

Fixed in 4378c17. The recency fallback now compares datetime(created_at) against datetime(now, -7 days), with a boundary regression test.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4378c1718c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T19:13:41Z

+        with self._PIDFILE_REFS_LOCK:
+            refs = self._PIDFILE_REFS.get(pidfile, 0)
+            if refs <= 1:
+                self._PIDFILE_REFS.pop(pidfile, None)
+                should_unlink = True


Keep pidfile ownership until release unlink finishes

Clearing _PIDFILE_REFS before the unlink step opens a race where one thread can call close() while another thread in the same process opens a new writer for the same DB: the new open goes through _handle_existing_writer_pidfile() (same PID) and marks itself acquired, then the closing thread still unlinks the pidfile, leaving an active writer with no lockfile. At that point a third process can create a fresh pidfile and start writing concurrently, violating the single-writer contract under concurrent open/close in one process.

Useful? React with 👍 / 👎.

Fixed in e00434a. Added a RED regression for the same-process close/open interleaving, then kept pidfile ownership under _PIDFILE_REFS_LOCK until the release path finishes the owned-file unlink and ref removal. Focused test and pre-push gate are green.

Fixed in e00434a and hardened further in d5e694c. Added regression coverage that close/open same-process races keep pidfile ownership through unlink and that mismatched same-process refs are preserved instead of cleared.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/brainlayer/vector_store.py (1)

2027-2038: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Always release the writer pidfile even if DB close fails.

If self.conn.close() raises, _release_writer_pidfile() is skipped and the process can leave a stale writer lock behind.

Suggested fix

 def close(self) -> None:
     """Close database connections."""
     # Close thread-local read connection if it exists
     if hasattr(self, "_local"):
         read_conn = getattr(self._local, "read_conn", None)
         if read_conn is not None:
             read_conn.close()
             self._local.read_conn = None
-    if hasattr(self, "conn"):
-        self.conn.close()
-    self._release_writer_pidfile()
+    try:
+        if hasattr(self, "conn"):
+            self.conn.close()
+    finally:
+        self._release_writer_pidfile()

As per coding guidelines **/*.py: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior" and "Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/vector_store.py` around lines 2027 - 2038, The close method
can raise while closing DB connections and skip calling _release_writer_pidfile,
leaving a stale writer lock; update close(self) to ensure
_release_writer_pidfile() is always executed by wrapping connection-close logic
in a try/finally (or similar) so that even if getattr(self, "conn").close() or
closing self._local.read_conn raises, _release_writer_pidfile() runs in the
finally block; handle/propagate exceptions as appropriate but do not allow them
to prevent _release_writer_pidfile() from executing (reference: close,
_release_writer_pidfile, conn, and _local.read_conn).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/search_repo.py`:
- Around line 1527-1541: The new recency query that builds recent_rows (the
cursor.execute block selecting from chunks and using candidate_fetch_count) must
be wrapped in SQLITE_BUSY retry/backoff logic: catch
sqlite3.OperationalError/SQLITE_BUSY around the cursor.execute call, retry with
exponential backoff (short sleeps between attempts) for a limited number of
attempts, and only raise after retries are exhausted; ensure this uses the
worker's own DB connection/cursor used elsewhere in this module (i.e., the same
cursor variable) to respect the guideline that each worker uses its own
connection. Replace the direct list(cursor.execute(...)) call with a retry loop
that logs or sleeps on SQLITE_BUSY and then executes the same query and
parameters (including recent_extra and recent_params) on success.
- Around line 1473-1539: The recency fallback rebuilds filters but forgot to
apply content_type_filter and date_to, so add checks for content_type_filter
(append "AND content_type = ?" and push content_type_filter into recent_params)
and for date_to (append a created_at upper-bound clause like "AND
datetime(created_at) <= datetime(?)" and push date_to into recent_params) before
executing the cursor query in the method that builds recent_extra (the block
referencing recent_extra, recent_params and the cursor.execute SELECT). Ensure
the new clauses are appended in the same manner/order as the other filters so
the final parameter list [*recent_params, min(candidate_fetch_count, 25)]
remains aligned.

In `@src/brainlayer/vector_store.py`:
- Around line 177-203: The race occurs between _handle_existing_writer_pidfile
and _release_writer_pidfile where refs and pidfile unlink can interleave, so
modify both functions to serialize ref checks/updates and unlink decisions using
the existing _PIDFILE_REFS_LOCK: in _handle_existing_writer_pidfile (after
reading other_pid and before incrementing self._PIDFILE_REFS for pidfile)
acquire _PIDFILE_REFS_LOCK to re-check the current ref count and the pidfile
existence and then increment and set
_writer_pidfile_acquired/_writer_pidfile_path_value and register atexit
atomically; likewise, when deciding to unlink a stale pidfile (the
os.path.samestat branch), acquire _PIDFILE_REFS_LOCK first and re-check that no
new refs were added and the on-disk pid still matches before calling
pidfile.unlink(); ensure _release_writer_pidfile also uses _PIDFILE_REFS_LOCK to
decrement refs and only unlink if the ref count reaches zero and the pidfile
still refers to our pid to prevent removing a pidfile re-created by the same
process.

In `@tests/test_hybrid_search.py`:
- Around line 434-521: The test suite adds recency fallback checks for
entity/sentiment and boundary behavior but omits parity tests for
content_type_filter and date_to; add two new tests mirroring the existing
patterns (e.g., follow test_recency_intent_fallback_preserves_entity_filter and
test_recency_intent_fallback_compares_created_at_as_datetime) that: (1) insert
recent chunks with different content_type values, call store.hybrid_search with
content_type_filter set and assert the non-matching content_type chunk is
excluded, and (2) insert chunks around a date_to boundary, monkeypatch
store.search same as
test_recency_intent_fallback_compares_created_at_as_datetime, call hybrid_search
with date_to set and assert chunks after date_to are excluded; reference test
function names and store.hybrid_search/search/_insert_chunk to locate where to
add them.

In `@tests/test_writer_pidfile_mutex.py`:
- Around line 235-242: The test test_init_retries_10_with_extended_backoff is
fragile because VectorStore._INIT_MAX_RETRIES can be preseeded by the
environment; make it deterministic by explicitly setting
VectorStore._INIT_MAX_RETRIES to 10 at the start of the test (or use monkeypatch
to set that attribute) before computing delays and asserting values, then
restore the original value at the end or use a fixture to isolate the change so
external BRAINLAYER_INIT_MAX_RETRIES/env settings cannot affect the assertion.

---

Outside diff comments:
In `@src/brainlayer/vector_store.py`:
- Around line 2027-2038: The close method can raise while closing DB connections
and skip calling _release_writer_pidfile, leaving a stale writer lock; update
close(self) to ensure _release_writer_pidfile() is always executed by wrapping
connection-close logic in a try/finally (or similar) so that even if
getattr(self, "conn").close() or closing self._local.read_conn raises,
_release_writer_pidfile() runs in the finally block; handle/propagate exceptions
as appropriate but do not allow them to prevent _release_writer_pidfile() from
executing (reference: close, _release_writer_pidfile, conn, and
_local.read_conn).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef9f0b9f-6f95-469d-90a4-8f125fca111d

📥 Commits

Reviewing files that changed from the base of the PR and between be11d44 and aeef3d2.

📒 Files selected for processing (11)

hooks/brainlayer-prompt-search.py
scripts/launchd/com.brainlayer.enrichment.plist
src/brainlayer/drain.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
tests/test_engine.py
tests/test_eval_baselines.py
tests/test_hybrid_search.py
tests/test_think_recall_integration.py
tests/test_vector_store.py
tests/test_writer_pidfile_mutex.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (3.11)
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: Cursor Bugbot
GitHub Check: Macroscope - Correctness Check

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

tests/test_think_recall_integration.py
tests/test_vector_store.py
src/brainlayer/drain.py
hooks/brainlayer-prompt-search.py
tests/test_engine.py
tests/test_hybrid_search.py
src/brainlayer/vector_store.py
src/brainlayer/search_repo.py
tests/test_writer_pidfile_mutex.py
tests/test_eval_baselines.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/drain.py
src/brainlayer/vector_store.py
src/brainlayer/search_repo.py

🪛 OpenGrep (1.21.0)

src/brainlayer/search_repo.py

[ERROR] 1528-1540: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🔇 Additional comments (8)

hooks/brainlayer-prompt-search.py (1)

44-46: LGTM!

Also applies to: 544-545

src/brainlayer/search_repo.py (1)

67-69: LGTM!

Also applies to: 84-86, 1680-1681

src/brainlayer/drain.py (1)

37-41: LGTM!

Also applies to: 71-74

scripts/launchd/com.brainlayer.enrichment.plist (1)

55-56: LGTM!

tests/test_engine.py (1)

348-350: LGTM!

Also applies to: 393-395

tests/test_eval_baselines.py (1)

68-70: LGTM!

Also applies to: 689-691

tests/test_think_recall_integration.py (1)

21-23: LGTM!

tests/test_vector_store.py (1)

14-16: LGTM!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeef3d2f7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T19:20:09Z

+        if recency_intent and not date_from:
+            recent_extra = []


Respect date_to in recency-intent fallback

Gate the recency fallback on both date bounds or apply date_to inside the fallback query. Right now hybrid_search enters the fallback whenever recency_intent is true and date_from is unset, so requests that set only date_to (for example, “latest notes before 2024-01-01”) still inject last-7-days chunks from chunks; those injected rows are treated as FTS-only candidates and the later post-filter does not enforce date_to, so results can violate the caller’s explicit upper date bound.

Useful? React with 👍 / 👎.

Fixed in d858e34. The recency fallback applies datetime(created_at) <= datetime(?) for date_to, with a regression test.

EtanHey · 2026-05-21T20:05:54Z

@codex review

EtanHey · 2026-05-21T20:05:54Z

@cursor @BugBot re-review

cursor · 2026-05-21T20:05:58Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-21T20:06:03Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-21T20:14:52Z

@coderabbitai review

EtanHey · 2026-05-21T20:14:54Z

@cursor @BugBot re-review

EtanHey · 2026-05-21T20:14:56Z

@codex review

cursor · 2026-05-21T20:14:57Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-05-21T20:14:59Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36e9cc2168

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T20:18:14Z

+        if not pidfile_dir.is_absolute():
+            pidfile_dir = Path("/tmp") / pidfile_dir
+        pidfile_dir = pidfile_dir.resolve()


Keep relative pidfile overrides under /tmp

Relative BRAINLAYER_WRITER_PIDFILE_DIR values can escape /tmp because the code appends the raw relative path and then calls resolve(). For example, ../locks becomes /locks, so writer startup may fail with permissions (or write lockfiles outside the intended sandbox) instead of honoring the documented “relative paths live under /tmp” contract.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-05-21T20:22:11Z

+        with self._PIDFILE_REFS_LOCK:
+            if self._PIDFILE_REFS.get(pidfile, 0) > 0:
+                ref_pid = self._PIDFILE_REF_PIDS.get(pidfile)
+                if ref_pid is not None and ref_pid != pid:
+                    self._PIDFILE_REFS.pop(pidfile, None)
+                    self._PIDFILE_REF_PIDS.pop(pidfile, None)
+                else:
+                    owner_pid, owner_start_time = self._read_writer_pidfile_owner(pidfile)
+                    if owner_pid != pid or not self._pidfile_owner_matches(owner_pid, owner_start_time):
+                        raise WriterInUseError(
+                            f"pidfile ref mismatch for {self.db_path}; refusing to clear active refs"
+                        )
+                    self._PIDFILE_REFS[pidfile] += 1


🟢 Low brainlayer/vector_store.py:155

When _read_writer_pidfile_owner() returns (None, None) due to a corrupt or empty pidfile, the condition owner_pid != pid evaluates to True because None != some_int, causing WriterInUseError to be raised even though the in-memory _PIDFILE_REFS state correctly reflects that this process owns the lock. Consider handling owner_pid is None as a valid case where the current process can reclaim ownership.

owner_pid, owner_start_time = self._read_writer_pidfile_owner(pidfile) - if owner_pid != pid or not self._pidfile_owner_matches(owner_pid, owner_start_time): + if owner_pid is None: + # pidfile corrupted but in-memory refs valid; reclaim ownership + pass + elif owner_pid != pid or not self._pidfile_owner_matches(owner_pid, owner_start_time): raise WriterInUseError( f"pidfile ref mismatch for {self.db_path}; refusing to clear active refs" )

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file src/brainlayer/vector_store.py around lines 155-167: When `_read_writer_pidfile_owner()` returns `(None, None)` due to a corrupt or empty pidfile, the condition `owner_pid != pid` evaluates to `True` because `None != some_int`, causing `WriterInUseError` to be raised even though the in-memory `_PIDFILE_REFS` state correctly reflects that this process owns the lock. Consider handling `owner_pid is None` as a valid case where the current process can reclaim ownership. Evidence trail: src/brainlayer/vector_store.py lines 150-172 (REVIEWED_COMMIT): `_acquire_writer_pidfile` method with the condition at line 163. src/brainlayer/vector_store.py lines 231-244 (REVIEWED_COMMIT): `_read_writer_pidfile_owner` returning `(None, None)` for empty files (line 235) and for OSError/ValueError (line 244).

I am leaving this unchanged intentionally. An unreadable or corrupt pidfile while in-memory writer refs exist is a mutex-integrity violation, not a safe reclaim path: treating owner_pid=None as valid could drop protection while a writer is still active. The current behavior raises and preserves the active refs instead of clearing them. Regression coverage: test_pidfile_ref_mismatch_does_not_clear_existing_refs.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/brainlayer/search_repo.py (1)

1532-1554: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Append recency-fallback ranks after existing FTS hits.

Line 1553 assigns recent-only rows rank i, restarting at 0 even when fts_ranks already has exact lexical matches. That lets fallback rows contribute the same RRF weight as the top FTS hit, and the later 2x recency boost can push non-matching recent chunks above genuinely matching results.

Suggested fix

+            recent_rank_offset = len(fts_ranks)
             for i, row in enumerate(recent_rows):
                 chunk_id = row[0]
-                fts_ranks.setdefault(chunk_id, i)
+                fts_ranks.setdefault(chunk_id, recent_rank_offset + i)
                 keyword_data.setdefault(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/search_repo.py` around lines 1532 - 1554, The recency fallback
loop resets ranks by using enumerate(recent_rows) starting at 0, allowing
recent-only rows to get the same rank as top FTS hits; change the enumeration to
start after existing FTS ranks (e.g., use enumerate(recent_rows,
start=len(fts_ranks)) or start=max(fts_ranks.values(), default=0)+1) so chunk_id
assignments in the loop that sets fts_ranks (the block iterating recent_rows and
using fts_ranks.setdefault) append ranks rather than colliding with existing FTS
ranks; ensure you do not overwrite existing fts_ranks and add a short comment
explaining the choice.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/vector_store.py`:
- Around line 2112-2122: The current finally block always calls
self._release_writer_pidfile(), which can drop the writer pidfile even if
read_conn.close() or self.conn.close() fails; instead, remove the unconditional
finally call and only invoke self._release_writer_pidfile() after both
connection closes complete successfully (i.e., call it after read_conn.close()
and self.conn.close() without wrapping those in a finally that always releases
the pidfile); if any close raises, do not release the pidfile and re-raise the
exception so the single-writer guard remains held; update the failing close test
to expect the pidfile to be retained rather than removed.

---

Outside diff comments:
In `@src/brainlayer/search_repo.py`:
- Around line 1532-1554: The recency fallback loop resets ranks by using
enumerate(recent_rows) starting at 0, allowing recent-only rows to get the same
rank as top FTS hits; change the enumeration to start after existing FTS ranks
(e.g., use enumerate(recent_rows, start=len(fts_ranks)) or
start=max(fts_ranks.values(), default=0)+1) so chunk_id assignments in the loop
that sets fts_ranks (the block iterating recent_rows and using
fts_ranks.setdefault) append ranks rather than colliding with existing FTS
ranks; ensure you do not overwrite existing fts_ranks and add a short comment
explaining the choice.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1ab4c20e-a711-4c76-a8fd-983278b6d842

📥 Commits

Reviewing files that changed from the base of the PR and between aeef3d2 and 36e9cc2.

📒 Files selected for processing (7)

hooks/brainlayer-prompt-search.py
src/brainlayer/drain.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py
tests/test_adaptive_injection.py
tests/test_hybrid_search.py
tests/test_writer_pidfile_mutex.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: test (3.11)
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: Macroscope - Correctness Check

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

tests/test_adaptive_injection.py
src/brainlayer/drain.py
hooks/brainlayer-prompt-search.py
src/brainlayer/search_repo.py
tests/test_hybrid_search.py
src/brainlayer/vector_store.py
tests/test_writer_pidfile_mutex.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/drain.py
src/brainlayer/search_repo.py
src/brainlayer/vector_store.py

🔇 Additional comments (3)

hooks/brainlayer-prompt-search.py (1)

46-46: LGTM!

Also applies to: 776-776

tests/test_adaptive_injection.py (1)

158-196: LGTM!

tests/test_writer_pidfile_mutex.py (1)

29-521: Please run the full pytest suite before merging this mutex change.

This adds good focused coverage, but the repo rule for risky DB/concurrency work is stricter than the targeted tests called out in the PR description. I would not treat the behavior change as safely validated until the full 929-test suite passes.

As per coding guidelines **/*.py: "Run pytest before claiming behavior changed safely; current test suite has 929 tests".

coderabbitai · 2026-05-21T20:23:40Z

+        try:
+            # Close thread-local read connection if it exists
+            if hasattr(self, "_local"):
+                read_conn = getattr(self._local, "read_conn", None)
+                if read_conn is not None:
+                    read_conn.close()
+                    self._local.read_conn = None
+            if hasattr(self, "conn"):
+                self.conn.close()
+        finally:
+            self._release_writer_pidfile()


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not release the writer pidfile when writer shutdown fails.

Lines 2112-2122 drop the pidfile in finally, so a failing read_conn.close() or self.conn.close() can release the single-writer guard while the writable connection is still alive. That breaks the mutex guarantee this PR is adding. Keep the pidfile until the writer connection has actually closed, and update the companion close-failure test to expect retention instead of removal.

Suggested fix

def close(self) -> None: """Close database connections.""" - try: - # Close thread-local read connection if it exists - if hasattr(self, "_local"): - read_conn = getattr(self._local, "read_conn", None) - if read_conn is not None: - read_conn.close() - self._local.read_conn = None - if hasattr(self, "conn"): - self.conn.close() - finally: - self._release_writer_pidfile() + if hasattr(self, "_local"): + read_conn = getattr(self._local, "read_conn", None) + if read_conn is not None: + read_conn.close() + self._local.read_conn = None + if hasattr(self, "conn"): + self.conn.close() + self._release_writer_pidfile()

As per coding guidelines `**/*.py`: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior" and "Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/brainlayer/vector_store.py` around lines 2112 - 2122, The current finally block always calls self._release_writer_pidfile(), which can drop the writer pidfile even if read_conn.close() or self.conn.close() fails; instead, remove the unconditional finally call and only invoke self._release_writer_pidfile() after both connection closes complete successfully (i.e., call it after read_conn.close() and self.conn.close() without wrapping those in a finally that always releases the pidfile); if any close raises, do not release the pidfile and re-raise the exception so the single-writer guard remains held; update the failing close test to expect the pidfile to be retained rather than removed.

I investigated this and am not applying the suggestion. PR-alpha explicitly requires pidfile unlink on close/atexit, and this PR already fixed the stale-lock failure mode where close exceptions skipped _release_writer_pidfile(). If APSW close raises while the process is unwinding, retaining the pidfile can permanently block future writers. _release_writer_pidfile() is idempotent and only unlinks when the pidfile still names this PID. Regression coverage: test_close_releases_pidfile_when_connection_close_raises.

@EtanHey, understood — thank you for the thorough explanation. The PID-checked idempotency of _release_writer_pidfile() and the risk of a permanently stranded pidfile from a dying process tip the balance in favour of the finally block here. I was treating this like a generic lock where retention-on-failure is always safer, but the process-scoped, PID-validated pidfile design means a failed-close process that hangs onto its pidfile is strictly worse than releasing it. The existing test test_close_releases_pidfile_when_connection_close_raises confirms the behaviour is intentional and covered.

I'll withdraw the suggestion.

✏️ Learnings added

Learnt from: EtanHey Repo: EtanHey/brainlayer PR: 309 File: src/brainlayer/vector_store.py:2114-2124 Timestamp: 2026-05-21T20:41:49.909Z Learning: In `src/brainlayer/vector_store.py`, `VectorStore.close()` intentionally releases the writer pidfile in a `finally` block (always, even if `read_conn.close()` or `self.conn.close()` raises). This is by design: the pidfile is PID-checked and idempotent, so releasing it during process unwind is safe, and retaining it after a failed close would permanently block future writers. Do not suggest moving `_release_writer_pidfile()` outside the `finally` block. Covered by `test_close_releases_pidfile_when_connection_close_raises`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8afb4387a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T20:32:53Z

+        timeout_ms = int(os.environ.get("BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS", "30000"))
+    except (TypeError, ValueError):
+        return 30000
+    return timeout_ms if timeout_ms > 0 else 30000


Clamp busy-timeout override to 32-bit APSW range

_drain_busy_timeout_ms() accepts any positive integer, but apsw.Connection.setbusytimeout() expects a C int; values above 2_147_483_647 raise OverflowError and abort drain startup. A misconfigured BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS (for example 3000000000) therefore turns a tuning knob into a hard failure path instead of falling back safely.

Useful? React with 👍 / 👎.

Fixed in c668da8. Added a RED regression for BRAINLAYER_DRAIN_BUSY_TIMEOUT_MS=3000000000 causing APSW OverflowError, then clamped invalid, non-positive, and >C-int values back to the 30000ms default. Verified with the focused drain timeout tests, the affected 68-test PR-alpha slice, and the full pre-push gate.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-05-21T20:41:38Z

Review update:

Fixed CodeRabbit outside-diff recency rank collision in e035747: added RED regression test_recency_intent_fallback_appends_ranks_after_fts_hits, then appended fallback ranks after existing FTS ranks.
Fixed drain busy-timeout overflow in c668da8 with RED regression test_drain_busy_timeout_overflow_env_falls_back_to_30s.
Latest local gate after c668da8: 2090 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration 3 passed; isolated eval/hook routing 32 passed; Bun 1 passed; FTS determinism passed.

@cursor @BugBot re-review latest head c668da8.

EtanHey · 2026-05-21T20:42:08Z

@codex review latest head c668da8.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c668da8c0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T20:47:06Z

    query_embedding = model.embed_query(prompt)

-    store = VectorStore(Path(db_path))
+    store = VectorStore(Path(db_path), readonly=True)


Preserve writable fallback when opening prompt search store

Opening VectorStore in read-only mode here causes prompt search to hard-fail when the DB file or its parent directory does not exist yet (fresh install, cleaned state, or custom path). In that scenario VectorStore(..., readonly=True) raises immediately, and the fallback path also opens SQLite read-only, so the hook returns an error instead of gracefully returning no memories. Before this change, the writable open path created the DB directory/file and allowed the hook to continue.

Useful? React with 👍 / 👎.

EtanHey · 2026-05-21T20:49:30Z

CI/review gates are green on head c668da8.

Attempted required merge command:
gh pr merge 309 --rebase

GitHub blocked it with base branch policy and suggested --admin. Per PR-alpha instructions and no-squash/no-admin-self-merge rule, requesting Etan admin-merge with rebase merge only.

@EtanHey please admin-merge PR #309 with rebase merge (no squash).

PR #309 added a pidfile mutex on VectorStore RW init to gate multiple enrich processes. Unintended consequence: the FastAPI daemon's lifespan also opens VectorStore RW at startup (daemon.py:83) — if the enrich supervisor is alive (which it normally is), the daemon fails to start entirely with WriterInUseError. That breaks: - brainlayer search CLI (DaemonClient hangs trying to start daemon) - brainlayer stats CLI (same) - any dashboard frontend on localhost:3000 etc. Empirically observed: enrich supervisor PID 14909 holds the pidfile, brainlayer-daemon refuses to bootstrap, /tmp/brainlayer.sock never appears, all CLI operations time out at the DAEMON_STARTUP_TIMEOUT. Fix: catch WriterInUseError on the lifespan VectorStore() call and fall back to readonly mode. In readonly mode: - Search endpoints work (the common case) - Stats/context/dashboard endpoints work - Write endpoints (/digest, /store, /entity PATCH, /backlog/items POST/PATCH/DELETE) will return 500 on the SQL layer Operator can restart the daemon (with no live writer) to regain RW mode for write endpoints. Or eventually move write endpoints to the arbitrated queue path (post-Phase-4 work). Empirically verified by smoke-test: - Direct lifespan invocation against current production state (enrich supervisor PID 14909 holds pidfile) - Pre-fix: WriterInUseError raised, lifespan fails to enter, daemon exits - Post-fix: warning logged, lifespan enters readonly, "LIFESPAN OK" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ience Etan-mandate 2026-05-22: "IMPROVE BRAINBAR AND MAKE IT SHOW GRAPH AND INJESTIONS ALREADY, WITHOUT DEGRATION!" + addendum: "AUTO MERGE AFTER /pr-loop, not sloppy." PR #314 wired the runtime database + InjectionStore so the "Warming memory…" placeholder cleared and BrainBarGraphTab + BrainBarInjectionTab get non-nil dependencies. But two failure modes remained: 1. When KG queries failed (typically transient ReadOnly / busy / locked from writer-pidfile contention with the Python enrich-supervisor + drain — PR #309), `KGViewModel.loadGraph` silently set nodes=[]/ edges=[] and returned false. The UI had no way to distinguish "empty graph" from "queries failing," so the canvas blanked. 2. Same pattern in `InjectionStore.refresh`: failures only `NSLog`'d and the panel held stale or empty events with no user-visible signal. This change adds a small `DegradationState` enum (`.healthy` / `.degraded(reason:)`) published on both view-models. `loadGraph` retries once on a 200ms backoff before reporting degraded, then keeps last-known- good nodes/edges so the user sees prior data rather than a blank canvas. `InjectionStore.refresh` flips state on failure and clears it on the next successful poll cycle. A small `DegradationBadge` (orange capsule with an exclamation-triangle icon) overlays the top-right of each tab when its state is degraded, with the underlying error in the SwiftUI .help tooltip. Per Etan's mandate: degraded ≠ hidden. Tests cover the enum surface, KG load-failure → degraded transition, KG success → healthy transition, and InjectionStore initial-healthy state. 387/387 tests pass, swift build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(writer): add single-writer pidfile mutex

17e33ea

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

macroscopeapp Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/brainlayer/vector_store.py Outdated

Comment thread src/brainlayer/drain.py

Comment thread src/brainlayer/search_repo.py

cursor Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py Outdated

fix(writer): address mutex review edge cases

acbf191

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

fix(search): bound recency intent terms

1ece80a

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

macroscopeapp Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/brainlayer/vector_store.py Outdated

Comment thread src/brainlayer/vector_store.py

Comment thread src/brainlayer/vector_store.py Outdated

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

fix(writer): guard zero init retry budget

c661a16

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

fix(writer): harden pidfile and recency edge cases

4378c17

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

fix(writer): register reused pidfile cleanup

aeef3d2

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/brainlayer/search_repo.py Outdated

Comment thread src/brainlayer/search_repo.py Outdated

Comment thread src/brainlayer/vector_store.py

Comment thread tests/test_hybrid_search.py

Comment thread tests/test_writer_pidfile_mutex.py Outdated

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

macroscopeapp Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/brainlayer/vector_store.py

fix(writer): reuse pidfile owner matcher

36e9cc2

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

macroscopeapp Bot reviewed May 21, 2026

View reviewed changes

fix(writer): retry transient pidfile create races

8afb438

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

fix(search): append recency fallback ranks

e035747

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

fix(drain): clamp busy timeout override

c668da8

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

EtanHey merged commit 634c222 into main May 21, 2026
7 checks passed

EtanHey mentioned this pull request May 22, 2026

fix(daemon): fall back to readonly when writer pidfile is held #311

Merged

4 tasks

Conversation

EtanHey commented May 21, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Writer Mutex Contract

Retry / Timeout / Throttle Math

RED -> GREEN

Enrich-Rate Bench

Test Plan

Notes

Add single-writer pidfile mutex and recency-intent search fallback to VectorStore and hybrid_search

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented May 21, 2026

Uh oh!

EtanHey commented May 21, 2026

Uh oh!

EtanHey commented May 21, 2026

Uh oh!

EtanHey commented May 21, 2026

Uh oh!

cursor Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EtanHey commented May 21, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

EtanHey commented May 21, 2026 •

edited by macroscopeapp Bot

Loading

Add single-writer pidfile mutex and recency-intent search fallback to `VectorStore` and `hybrid_search`

coderabbitai Bot commented May 21, 2026 •

edited

Loading