Skip to content

fix: consolidate DB paths — single canonical brainlayer.db#77

Merged
EtanHey merged 2 commits into
mainfrom
fix/db-consolidation
Mar 12, 2026
Merged

fix: consolidate DB paths — single canonical brainlayer.db#77
EtanHey merged 2 commits into
mainfrom
fix/db-consolidation

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 12, 2026

Summary

  • Removed dual-path (zikaron/brainlayer) logic — all 27 files now use paths.py:get_db_path() with single canonical path ~/.local/share/brainlayer/brainlayer.db
  • Fixed server.json env var mismatchBRAINLAYER_DB_PATHBRAINLAYER_DB to match what paths.py actually reads
  • Configured launchd enrichment for Groq — added BRAINLAYER_ENRICH_BACKEND=groq and GROQ_API_KEY placeholder with 1Password auto-resolution in install.sh

What changed

Area Files Change
Core paths.py Removed _LEGACY_DB_PATH, simplified to env var or canonical
CLI cli/__init__.py (7 commands) Path.home()/...get_db_path()
Scripts 8 files Added from brainlayer.paths import get_db_path
Hooks 2 files Replaced dual-path DB_PATHS list with single canonical
Config server.json Fixed env var name
LaunchD enrich plist + install.sh Groq backend + API key placeholder
Tests 3 files Removed legacy path test cases
Docs 3 files Updated path references

Pre-merge steps (already done)

  • Killed all brainlayer processes (indexer, enrichment, daemon, 5 MCP)
  • WAL checkpoint (390 pages)
  • WAL-safe backup (7.9G at zikaron-backup-2026-03-12.db)
  • Moved zikaron.dbbrainlayer.db (verified: 297,296 chunks)

Test plan

  • pytest tests/test_paths.py tests/test_phase3_qa.py — 30/30 pass
  • pytest tests/ (broad) — 779/779 pass
  • brainlayer stats — shows 297,296 chunks from canonical path
  • brainlayer search — works correctly
  • Post-merge: run ./scripts/launchd/install.sh to install agents
  • Post-merge: verify brainlayer enrich --stats works with Groq

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Environment variable override for database location (BRAINLAYER_DB)
    • Groq API backend configuration (GROQ_API_KEY)
  • Improvements

    • Consolidated data into a single canonical database location (~8GB)
    • Centralized and simplified database path resolution across tools and scripts
    • Removed legacy zikaron path fallback
  • Documentation

    • Updated data-location and configuration docs with storage stats and archival workflow
  • Tests

    • Updated path-resolution tests and test data to reflect consolidation changes

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Centralizes DB path resolution: removes legacy zikaron fallbacks and hardcoded paths, makes all scripts/CLI/hooks use get_db_path() (env var BRAINLAYER_DB or canonical ~/.local/share/brainlayer/brainlayer.db); docs and tests updated to reflect the canonical-only resolution. (≤50 words)

Changes

Cohort / File(s) Summary
Core Path Resolution
src/brainlayer/paths.py
Removed legacy _LEGACY_DB_PATH and multi-path fallback; get_db_path() now checks BRAINLAYER_DB then always uses canonical _CANONICAL_DB_PATH; added DEFAULT_DB_PATH.
Scripts using centralized path
scripts/backfill-context.py, scripts/backfill-metadata.py, scripts/classify-all.py, scripts/cloud_backfill.py, scripts/cloud_stream.py, scripts/pre-label.py, scripts/reembed_bge_m3.py, scripts/verify_hebrew_search.py
Replaced hardcoded home-based DB paths with from brainlayer.paths import get_db_path() and calls to get_db_path().
Hook scripts
hooks/brainlayer-prompt-search.py, hooks/session-cleanup.py, hooks/wal_checkpoint.py
Removed DB_PATHS loop and legacy-path checks; introduced single canonical _CANONICAL_DB and simplified get_db_path logic.
Scripts: fallback/error behavior
scripts/generate_style_card.py
Removed legacy fallback; if provided DB path missing, script now prints path and exits rather than trying legacy recovery.
CLI & Library modules
src/brainlayer/cli/__init__.py, src/brainlayer/cli/wizard.py, src/brainlayer/cli_new.py, src/brainlayer/clustering.py, src/brainlayer/migrate.py
Replaced hardcoded user-local DB path usages with centralized get_db_path() imports and references.
Launchd & Installer
scripts/launchd/com.brainlayer.enrich.plist, scripts/launchd/install.sh
Added env vars BRAINLAYER_ENRICH_BACKEND and GROQ_API_KEY; install script substitutes __GROQ_API_KEY__ from 1Password when available.
Runtime config
server.json
Renamed environment variable from BRAINLAYER_DB_PATH to BRAINLAYER_DB.
Documentation
CLAUDE.md, docs/configuration.md, docs/data-locations.md
Removed legacy zikaron references and legacy-path rationale; updated canonical path, sizes/counts, and session-archiver notes.
Tests & Fixtures
tests/test_paths.py, tests/test_phase3_qa.py, tests/eval_baselines.json
Removed legacy-path tests and expectations; updated tests to assert canonical-path behavior and normalized eval fixtures to reference brainlayer.db.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hopped through code and cleared the trail,

One env, one path, no legacy veil.
brainlayer.db tucked snug at home,
No more searches far to roam.
Crunch, commit, and hop — the changes sail! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: consolidating database path handling to use a single canonical brainlayer.db location instead of maintaining legacy dual-path logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/db-consolidation
📝 Coding Plan for PR comments
  • Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
scripts/backfill-context.py (1)

21-99: 🧹 Nitpick | 🔵 Trivial

Consider adding WAL checkpoint before and after bulk updates.

This script performs bulk UPDATE operations on chunks table. As per coding guidelines, bulk database operations should checkpoint WAL before and after to ensure durability and prevent WAL bloat.

♻️ Suggested enhancement
 def backfill():
     if not DB_PATH.exists():
         print(f"Database not found: {DB_PATH}")
         sys.exit(1)

     store = VectorStore(DB_PATH)
     cursor = store.conn.cursor()

+    # Checkpoint WAL before bulk operation
+    cursor.execute("PRAGMA wal_checkpoint(FULL)")
+
     # Check current state
     total = list(cursor.execute("SELECT COUNT(*) FROM chunks"))[0][0]
     # ... rest of function ...
     
     print("Done.", flush=True)
+
+    # Checkpoint WAL after bulk operation
+    cursor.execute("PRAGMA wal_checkpoint(FULL)")

     store.close()

As per coding guidelines: "Checkpoint WAL with PRAGMA wal_checkpoint(FULL) before and after bulk database operations"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/backfill-context.py` around lines 21 - 99, In backfill(), before
performing the bulk UPDATEs (the conversation_id UPDATE block and the
per-source_file position updates that use cursor.execute(... "UPDATE chunks
...")), execute a WAL checkpoint via the DB connection (e.g., call
store.conn.execute("PRAGMA wal_checkpoint(FULL)") or cursor.execute(...)) to
checkpoint WAL, and then again after finishing all bulk updates (before the
final verification queries and before store.close()) to ensure durability and
prevent WAL bloat; place these checkpoint calls immediately before the first
UPDATE and immediately after the loop that updates positions.
src/brainlayer/paths.py (1)

27-28: 🛠️ Refactor suggestion | 🟠 Major

Line 28 caches BRAINLAYER_DB too early.

DEFAULT_DB_PATH = get_db_path() freezes the env-resolved path the first time brainlayer.paths is imported. After this PR, that leaves mixed behavior between callers that use get_db_path() directly and callers that still import DEFAULT_DB_PATH. Prefer resolving overrides at call time and keep any exported constant non-env-sensitive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/paths.py` around lines 27 - 28, DEFAULT_DB_PATH is being set
by calling get_db_path() at import time which freezes environment-derived
overrides; change DEFAULT_DB_PATH to a non-env-sensitive literal (e.g.,
"brainlayer.db" or other static default) instead of calling get_db_path() and
ensure any code that needs the env-resolved path calls get_db_path() at runtime
(references: DEFAULT_DB_PATH, get_db_path()).
scripts/cloud_stream.py (2)

161-173: 🧹 Nitpick | 🔵 Trivial

Duplicated update_enrichment call — consider extracting helper.

The store.update_enrichment(...) block with all 10 parameters is repeated verbatim in the retry path (lines 198-210). This violates DRY and makes future changes error-prone.

♻️ Proposed refactor to extract helper
+def _apply_enrichment(store: VectorStore, chunk_id: str, enrichment: dict) -> None:
+    """Apply parsed enrichment to the store."""
+    store.update_enrichment(
+        chunk_id=chunk_id,
+        summary=enrichment.get("summary"),
+        tags=enrichment.get("tags"),
+        importance=enrichment.get("importance"),
+        intent=enrichment.get("intent"),
+        primary_symbols=enrichment.get("primary_symbols"),
+        resolved_query=enrichment.get("resolved_query"),
+        epistemic_level=enrichment.get("epistemic_level"),
+        version_scope=enrichment.get("version_scope"),
+        debt_impact=enrichment.get("debt_impact"),
+        external_deps=enrichment.get("external_deps"),
+    )

Then replace both occurrences with:

_apply_enrichment(store, chunk_id, enrichment)

Also applies to: 198-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cloud_stream.py` around lines 161 - 173, The update_enrichment call
is duplicated; extract a small helper (e.g., _apply_enrichment(store, chunk_id,
enrichment)) that calls store.update_enrichment with the ten fields (summary,
tags, importance, intent, primary_symbols, resolved_query, epistemic_level,
version_scope, debt_impact, external_deps) pulled from the enrichment dict, then
replace both inlined blocks with a single call to _apply_enrichment(store,
chunk_id, enrichment). Ensure the helper is defined near related functions in
scripts/cloud_stream.py and use the same parameter names (store, chunk_id,
enrichment) so retry-path and original path both call it.

231-231: 🧹 Nitpick | 🔵 Trivial

Consider periodic WAL checkpointing for bulk update operations.

This script can process hundreds of thousands of UPDATE operations. Without explicit checkpointing, the WAL file may grow significantly during long runs. As per coding guidelines: "Checkpoint WAL with PRAGMA wal_checkpoint(FULL) before and after bulk database operations."

♻️ Proposed checkpoint addition
     store = VectorStore(db_path)
+    store.conn.execute("PRAGMA wal_checkpoint(FULL)")

     # Load all chunks from JSONL files

And optionally checkpoint periodically in the batch loop:

         await asyncio.gather(*tasks)
+        
+        # Periodic WAL checkpoint every few batches
+        if (batch_start // batch_size) % 10 == 0:
+            store.conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")

And at the end before close:

+    store.conn.execute("PRAGMA wal_checkpoint(FULL)")
     store.close()

Also applies to: 289-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cloud_stream.py` at line 231, Add explicit WAL checkpointing around
the bulk update work to prevent unbounded WAL growth: before starting the large
update batch call the SQLite pragma "PRAGMA wal_checkpoint(FULL)" on the same DB
connection used by VectorStore (reference the instantiated store =
VectorStore(db_path) and use store.conn.execute(...) or store.execute(...)
depending on the class API), periodically invoke the same pragma inside the
batch loop (e.g., every N batches) to flush the WAL, and call it once more after
finishing the bulk updates and before closing the DB; wrap each checkpoint call
in a try/except and log failures with the existing logger so checkpoint errors
don't crash the run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/brainlayer-prompt-search.py`:
- Around line 71-80: get_db_path currently only returns BRAINLAYER_DB when the
file exists and returns None otherwise, diverging from the shared resolver;
update get_db_path to mirror the shared semantics by (1) returning the
BRAINLAYER_DB env value if set (do not require os.path.exists on it), (2)
otherwise returning _CANONICAL_DB if it exists, and (3) if neither branch
matches return _CANONICAL_DB as the final fallback (i.e., don't return None).
Change the logic in get_db_path and reference _CANONICAL_DB and the
BRAINLAYER_DB env lookup accordingly.

In `@hooks/session-cleanup.py`:
- Around line 23-32: The get_db_path function duplicates resolver logic; replace
the local implementation by reusing the consolidated resolver from
brainlayer.paths instead of maintaining duplicate behavior. Import the canonical
resolver (e.g., from brainlayer.paths import get_db_path) and call that function
from this module (remove or rename _CANONICAL_DB and the duplicated logic) so
BRAINLAYER_DB and canonical-path handling are resolved consistently via
brainlayer.paths.get_db_path().

In `@scripts/launchd/com.brainlayer.enrich.plist`:
- Around line 34-37: The plist contains an unreplaced GROQ_API_KEY placeholder
which can pass the existing check_backend_health() bool(GROQ_API_KEY) test but
cause silent auth failures; update install.sh to validate the GROQ_API_KEY after
the 1Password lookup (the code that sets GROQ_API_KEY) and abort with a clear
error (or at minimum print a conspicuous warning and exit non‑zero) if
GROQ_API_KEY is empty or still equals the placeholder "__GROQ_API_KEY__"; ensure
the installer does not write an empty or placeholder value into the plist and
surface a user-facing message explaining that the Groq API key is required.

In `@scripts/launchd/install.sh`:
- Line 18: The install script currently embeds GROQ_API_KEY directly into the
com.brainlayer.enrich.plist via the GROQ_API_KEY substitution, which stores the
secret in plain text; instead, stop writing the raw GROQ_API_KEY into the plist
in install.sh and change the install flow to either (a) store the secret in the
macOS Keychain and write only a key identifier to the plist, or (b) write a
wrapper/launcher (referenced in ProgramArguments) that retrieves the secret at
startup using security find-generic-password, or (c) integrate a runtime secrets
manager call from the service startup; update the sed substitution and any lines
that write to the plist so they place a placeholder or keychain identifier (not
the secret) and ensure the service binary (or wrapper script) uses security
find-generic-password to fetch GROQ_API_KEY at runtime.
- Line 18: When resolving GROQ_API_KEY in install.sh (the
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields
credential 2>/dev/null || echo "")}" fallback), detect when the final value is
empty and emit a clear warning message to the installer indicating that the
enrich plist will be installed without a GROQ_API_KEY and the enrichment daemon
will fail (reference the GROQ_API_KEY variable and the enrich plist install
path). Update install.sh to check the resolved GROQ_API_KEY after assignment and
print a visible warning (e.g., to stderr) with actionable guidance (set
GROQ_API_KEY env var or provide it via 1Password) before proceeding with
installing the enrich plist.
- Line 18: The GROQ_API_KEY retrieval assumes a 1Password item named
GROQ_API_KEY of type "API Credential" with a field named "credential" which can
fail silently for other item types (e.g., "Login" uses "password"); update the
GROQ_API_KEY assignment (the line using GROQ_API_KEY="${GROQ_API_KEY:-$(op item
get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}") to
include a comment documenting the required item type and field, and add clearer
fallback/error guidance (log or echo a hint) when the op command returns empty;
alternatively replace the op item get invocation with op read usage and document
the expected secret reference syntax so users can choose the correct item
type/field.

In `@scripts/wal_checkpoint.py`:
- Around line 19-28: Refactor get_db_path to reuse the path-resolution from
brainlayer.paths instead of duplicating _CANONICAL_DB logic: import the
appropriate resolver (e.g., a function or constant such as get_db_path or
CANONICAL_DB) from brainlayer.paths and call it here, then apply the
checkpoint-specific existence check (keep the current env existence requirement)
— i.e., use the upstream resolver to obtain the candidate path and only return
it from this script if os.path.exists(candidate_path); reference the existing
symbols get_db_path and _CANONICAL_DB in this file and the resolver in
brainlayer.paths so the behavior stays consistent while preserving the script’s
extra existence check.

In `@src/brainlayer/paths.py`:
- Around line 14-24: get_db_path currently skips checking the old legacy DB
location and returns a fresh canonical DB (and also returns BRAINLAYER_DB
without ensuring its parent exists), causing installs with only the legacy DB to
get a new empty DB and custom paths to fail; update get_db_path to first check
for the legacy path (e.g., the old "~/.local/share/zikaron/zikaron.db") and if
it exists return that Path as an upgrade guard before falling back to the
canonical path, and when returning the env-provided path (env from
os.environ.get("BRAINLAYER_DB")) ensure its parent directory is created
(mkdir(parents=True, exist_ok=True)) before returning; reference get_db_path,
_CANONICAL_DB_PATH, and the env variable handling to locate where to add these
checks and the mkdir call.

---

Outside diff comments:
In `@scripts/backfill-context.py`:
- Around line 21-99: In backfill(), before performing the bulk UPDATEs (the
conversation_id UPDATE block and the per-source_file position updates that use
cursor.execute(... "UPDATE chunks ...")), execute a WAL checkpoint via the DB
connection (e.g., call store.conn.execute("PRAGMA wal_checkpoint(FULL)") or
cursor.execute(...)) to checkpoint WAL, and then again after finishing all bulk
updates (before the final verification queries and before store.close()) to
ensure durability and prevent WAL bloat; place these checkpoint calls
immediately before the first UPDATE and immediately after the loop that updates
positions.

In `@scripts/cloud_stream.py`:
- Around line 161-173: The update_enrichment call is duplicated; extract a small
helper (e.g., _apply_enrichment(store, chunk_id, enrichment)) that calls
store.update_enrichment with the ten fields (summary, tags, importance, intent,
primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact,
external_deps) pulled from the enrichment dict, then replace both inlined blocks
with a single call to _apply_enrichment(store, chunk_id, enrichment). Ensure the
helper is defined near related functions in scripts/cloud_stream.py and use the
same parameter names (store, chunk_id, enrichment) so retry-path and original
path both call it.
- Line 231: Add explicit WAL checkpointing around the bulk update work to
prevent unbounded WAL growth: before starting the large update batch call the
SQLite pragma "PRAGMA wal_checkpoint(FULL)" on the same DB connection used by
VectorStore (reference the instantiated store = VectorStore(db_path) and use
store.conn.execute(...) or store.execute(...) depending on the class API),
periodically invoke the same pragma inside the batch loop (e.g., every N
batches) to flush the WAL, and call it once more after finishing the bulk
updates and before closing the DB; wrap each checkpoint call in a try/except and
log failures with the existing logger so checkpoint errors don't crash the run.

In `@src/brainlayer/paths.py`:
- Around line 27-28: DEFAULT_DB_PATH is being set by calling get_db_path() at
import time which freezes environment-derived overrides; change DEFAULT_DB_PATH
to a non-env-sensitive literal (e.g., "brainlayer.db" or other static default)
instead of calling get_db_path() and ensure any code that needs the env-resolved
path calls get_db_path() at runtime (references: DEFAULT_DB_PATH,
get_db_path()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ad74adf4-7b57-4997-a8f2-60b14162da48

📥 Commits

Reviewing files that changed from the base of the PR and between 7089c06 and 433226b.

📒 Files selected for processing (27)
  • CLAUDE.md
  • docs/configuration.md
  • docs/data-locations.md
  • hooks/brainlayer-prompt-search.py
  • hooks/session-cleanup.py
  • scripts/backfill-context.py
  • scripts/backfill-metadata.py
  • scripts/classify-all.py
  • scripts/cloud_backfill.py
  • scripts/cloud_stream.py
  • scripts/generate_style_card.py
  • scripts/launchd/com.brainlayer.enrich.plist
  • scripts/launchd/install.sh
  • scripts/pre-label.py
  • scripts/reembed_bge_m3.py
  • scripts/verify_hebrew_search.py
  • scripts/wal_checkpoint.py
  • server.json
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
  • src/brainlayer/cli_new.py
  • src/brainlayer/clustering.py
  • src/brainlayer/migrate.py
  • src/brainlayer/paths.py
  • tests/eval_baselines.json
  • tests/test_paths.py
  • tests/test_phase3_qa.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (6)
scripts/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

scripts/**/*.py: Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation
Checkpoint WAL with PRAGMA wal_checkpoint(FULL) before and after bulk database operations
Drop FTS triggers (especially chunks_fts_delete) before bulk deletes and recreate after — FTS triggers cause massive performance degradation on large datasets
Batch delete operations in 5-10K chunks with checkpoint every 3 batches
Never delete from chunks table while FTS trigger is active on large datasets

Files:

  • scripts/generate_style_card.py
  • scripts/wal_checkpoint.py
  • scripts/verify_hebrew_search.py
  • scripts/backfill-metadata.py
  • scripts/cloud_stream.py
  • scripts/reembed_bge_m3.py
  • scripts/pre-label.py
  • scripts/backfill-context.py
  • scripts/classify-all.py
  • scripts/cloud_backfill.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Python package structure with Typer CLI located in src/brainlayer/

Files:

  • src/brainlayer/migrate.py
  • src/brainlayer/paths.py
  • src/brainlayer/cli_new.py
  • src/brainlayer/clustering.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Lint and format Python code using ruff check src/ and ruff format src/

Files:

  • src/brainlayer/migrate.py
  • src/brainlayer/paths.py
  • src/brainlayer/cli_new.py
  • src/brainlayer/clustering.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
**/classify*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Preserve verbatim content for ai_code, stack_trace, and user_message during classification

Files:

  • scripts/classify-all.py
**/*test*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Run tests with pytest

Files:

  • tests/test_paths.py
  • tests/test_phase3_qa.py
**/paths.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/paths.py: Resolve DB path using paths.py:get_db_path() — prefers zikaron path (~/.local/share/zikaron/zikaron.db) if it exists, falls through to brainlayer path (~/.local/share/brainlayer/brainlayer.db)
Store database at ~/.local/share/brainlayer/brainlayer.db, prompts cache at ~/.local/share/brainlayer/prompts/, socket at /tmp/brainlayer.sock, and enrichment lock at /tmp/brainlayer-enrichment.lock

Files:

  • src/brainlayer/paths.py
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Fix 8 scripts in `scripts/` that hardcode `brainlayer.db` path to use dynamic path resolution during DB consolidation
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Resolve DB path using `paths.py:get_db_path()` — prefers zikaron path (`~/.local/share/zikaron/zikaron.db`) if it exists, falls through to brainlayer path (`~/.local/share/brainlayer/brainlayer.db`)
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Store database at `~/.local/share/brainlayer/brainlayer.db`, prompts cache at `~/.local/share/brainlayer/prompts/`, socket at `/tmp/brainlayer.sock`, and enrichment lock at `/tmp/brainlayer-enrichment.lock`
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Fix 8 scripts in `scripts/` that hardcode `brainlayer.db` path to use dynamic path resolution during DB consolidation

Applied to files:

  • CLAUDE.md
  • scripts/wal_checkpoint.py
  • src/brainlayer/migrate.py
  • scripts/verify_hebrew_search.py
  • scripts/backfill-metadata.py
  • scripts/cloud_stream.py
  • scripts/reembed_bge_m3.py
  • scripts/pre-label.py
  • docs/configuration.md
  • scripts/backfill-context.py
  • hooks/brainlayer-prompt-search.py
  • tests/eval_baselines.json
  • scripts/classify-all.py
  • tests/test_paths.py
  • src/brainlayer/paths.py
  • hooks/session-cleanup.py
  • scripts/cloud_backfill.py
  • docs/data-locations.md
  • src/brainlayer/cli_new.py
  • tests/test_phase3_qa.py
  • src/brainlayer/clustering.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Store database at `~/.local/share/brainlayer/brainlayer.db`, prompts cache at `~/.local/share/brainlayer/prompts/`, socket at `/tmp/brainlayer.sock`, and enrichment lock at `/tmp/brainlayer-enrichment.lock`

Applied to files:

  • CLAUDE.md
  • scripts/generate_style_card.py
  • server.json
  • scripts/wal_checkpoint.py
  • src/brainlayer/migrate.py
  • scripts/launchd/install.sh
  • scripts/verify_hebrew_search.py
  • scripts/backfill-metadata.py
  • scripts/cloud_stream.py
  • scripts/reembed_bge_m3.py
  • scripts/pre-label.py
  • docs/configuration.md
  • scripts/backfill-context.py
  • hooks/brainlayer-prompt-search.py
  • scripts/classify-all.py
  • tests/test_paths.py
  • src/brainlayer/paths.py
  • hooks/session-cleanup.py
  • scripts/cloud_backfill.py
  • docs/data-locations.md
  • src/brainlayer/cli_new.py
  • tests/test_phase3_qa.py
  • src/brainlayer/clustering.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Resolve DB path using `paths.py:get_db_path()` — prefers zikaron path (`~/.local/share/zikaron/zikaron.db`) if it exists, falls through to brainlayer path (`~/.local/share/brainlayer/brainlayer.db`)

Applied to files:

  • CLAUDE.md
  • scripts/generate_style_card.py
  • scripts/wal_checkpoint.py
  • src/brainlayer/migrate.py
  • scripts/verify_hebrew_search.py
  • scripts/backfill-metadata.py
  • scripts/cloud_stream.py
  • scripts/reembed_bge_m3.py
  • scripts/pre-label.py
  • docs/configuration.md
  • scripts/backfill-context.py
  • hooks/brainlayer-prompt-search.py
  • scripts/classify-all.py
  • tests/test_paths.py
  • src/brainlayer/paths.py
  • hooks/session-cleanup.py
  • scripts/cloud_backfill.py
  • docs/data-locations.md
  • src/brainlayer/cli_new.py
  • tests/test_phase3_qa.py
  • src/brainlayer/clustering.py
  • src/brainlayer/cli/__init__.py
  • src/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Extract, classify, chunk, embed, and index session data following the BrainLayer pipeline

Applied to files:

  • CLAUDE.md
  • docs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Checkpoint WAL with `PRAGMA wal_checkpoint(FULL)` before and after bulk database operations

Applied to files:

  • scripts/wal_checkpoint.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/enrichment*.py : Use MLX (`Qwen2.5-Coder-14B-Instruct-4bit`) as primary enrichment backend on Apple Silicon (port 8080) with Ollama (`glm-4.7-flash`) fallback on port 11434

Applied to files:

  • scripts/launchd/com.brainlayer.enrich.plist
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to golems/scripts/enrichment-lazy.sh : Use enrichment worker script `golems/scripts/enrichment-lazy.sh` with launchd scheduling at nice=20 priority, processing in batches of 50

Applied to files:

  • scripts/launchd/com.brainlayer.enrich.plist
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/enrichment*.py : Auto-switch enrichment backend from MLX to Ollama after 3 consecutive MLX failures; allow override with `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

  • scripts/launchd/com.brainlayer.enrich.plist
  • scripts/cloud_stream.py
  • scripts/cloud_backfill.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/mcp/**/*.py : Implement MCP tools: `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person` with legacy `brainlayer_*` aliases

Applied to files:

  • hooks/brainlayer-prompt-search.py
  • docs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Source data comes from JSONL format in `~/.claude/projects/`

Applied to files:

  • docs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/export*.py : Export knowledge base as brain graph JSON via `brainlayer brain-export` for dashboard visualization

Applied to files:

  • docs/data-locations.md
  • src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/export*.py : Export knowledge base to Obsidian Markdown vault with backlinks and tags via `brainlayer export-obsidian`

Applied to files:

  • docs/data-locations.md
  • src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/vector_store.py : Use APSW with sqlite-vec for vector storage (in `vector_store.py`)

Applied to files:

  • src/brainlayer/cli/__init__.py
🪛 markdownlint-cli2 (0.21.0)
CLAUDE.md

[warning] 34-34: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (18)
server.json (1)

24-28: LGTM!

The environment variable name change from BRAINLAYER_DB_PATH to BRAINLAYER_DB correctly aligns with the canonical path resolution in src/brainlayer/paths.py and other scripts.

src/brainlayer/cli_new.py (2)

9-9: LGTM!

Correctly imports get_db_path from the centralized paths module.


115-122: LGTM!

Migration command now uses the centralized path resolver and properly checks existence before prompting for overwrite.

docs/data-locations.md (2)

9-9: LGTM!

Database stats updated to reflect the consolidated state (297K+ chunks at ~8GB).


17-18: LGTM!

Path resolution documentation correctly reflects the simplified two-step resolution (env var → canonical path), matching src/brainlayer/paths.py.

scripts/backfill-context.py (1)

15-18: LGTM!

Correctly imports and uses the centralized get_db_path() function, aligning with the DB consolidation goals.

src/brainlayer/migrate.py (1)

20-27: LGTM!

Migration script correctly imports and uses the centralized get_db_path() function, replacing the previous hardcoded path.

tests/eval_baselines.json (1)

4-4: LGTM!

Test baseline correctly updated to use the canonical brainlayer.db path, consistent with the DB consolidation.

scripts/classify-all.py (1)

17-20: LGTM. This switches the script to the shared DB resolver instead of another hard-coded path.

Based on learnings: Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation.

scripts/pre-label.py (1)

17-20: LGTM. This is the right consolidation point for path lookup in this script.

Based on learnings: Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation.

scripts/reembed_bge_m3.py (1)

21-37: LGTM. Using the shared resolver for DEFAULT_DB keeps this maintenance script aligned with the canonical DB path.

Based on learnings: Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation.

docs/configuration.md (1)

34-37: LGTM. This matches the consolidated path resolution and removes ambiguity about the default DB location.

Based on learnings: Store database at ~/.local/share/brainlayer/brainlayer.db, prompts cache at ~/.local/share/brainlayer/prompts/, socket at /tmp/brainlayer.sock, and enrichment lock at /tmp/brainlayer-enrichment.lock.

scripts/verify_hebrew_search.py (1)

12-21: LGTM. This removes another standalone DB-path assumption and keeps the verification script on the same resolver as the app.

Based on learnings: Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation.

tests/test_phase3_qa.py (1)

35-41: LGTM. This test now exercises the fresh-install canonical path behavior the rest of the PR is standardizing on.

tests/test_paths.py (2)

20-30: LGTM — Test correctly reflects canonical-only path resolution.

The updated docstring and test logic align with the PR's goal of removing legacy dual-path behavior. The test properly verifies that:

  1. The canonical path is returned when no env override exists
  2. The parent directory is created if missing

32-37: Integration test looks good.

The local import of DEFAULT_DB_PATH inside the test function is appropriate for an integration test that depends on production state.

scripts/cloud_stream.py (2)

41-46: Path resolution correctly updated to use dynamic get_db_path().

This change properly implements the PR objective of consolidating DB paths by importing and using the centralized path resolver instead of hardcoding the path. Based on learnings: "Fix 8 scripts in scripts/ that hardcode brainlayer.db path to use dynamic path resolution during DB consolidation."


368-408: CLI implementation looks correct.

The --db argument correctly falls back to DEFAULT_DB_PATH (now using get_db_path()), and the file selection logic is reasonable.

Comment on lines +71 to 80
_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")


def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
if env and os.path.exists(env):
return env
for p in DB_PATHS:
if os.path.exists(p):
return p
if os.path.exists(_CANONICAL_DB):
return _CANONICAL_DB
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Match the shared DB resolver semantics here. This hook now has its own get_db_path() again, and it already diverges from src/brainlayer/paths.py: it only honors BRAINLAYER_DB if the file exists and otherwise returns None instead of the canonical path. That reintroduces resolver drift into the consolidation PR.

♻️ Minimal fix
 def get_db_path():
     env = os.environ.get("BRAINLAYER_DB")
-    if env and os.path.exists(env):
+    if env:
         return env
-    if os.path.exists(_CANONICAL_DB):
-        return _CANONICAL_DB
-    return None
+    os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True)
+    return _CANONICAL_DB
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")
def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
if env and os.path.exists(env):
return env
for p in DB_PATHS:
if os.path.exists(p):
return p
if os.path.exists(_CANONICAL_DB):
return _CANONICAL_DB
return None
_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")
def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
if env:
return env
os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True)
return _CANONICAL_DB
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 71 - 80, get_db_path
currently only returns BRAINLAYER_DB when the file exists and returns None
otherwise, diverging from the shared resolver; update get_db_path to mirror the
shared semantics by (1) returning the BRAINLAYER_DB env value if set (do not
require os.path.exists on it), (2) otherwise returning _CANONICAL_DB if it
exists, and (3) if neither branch matches return _CANONICAL_DB as the final
fallback (i.e., don't return None). Change the logic in get_db_path and
reference _CANONICAL_DB and the BRAINLAYER_DB env lookup accordingly.

Comment thread hooks/session-cleanup.py
Comment on lines +23 to 32
_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")


def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
if env and os.path.exists(env):
return env
for p in DB_PATHS:
if os.path.exists(p):
return p
if os.path.exists(_CANONICAL_DB):
return _CANONICAL_DB
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Reuse the consolidated resolver behavior here too. The local helper no longer matches brainlayer.paths.get_db_path(), so BRAINLAYER_DB overrides and canonical-path handling now have to be maintained in two places again.

♻️ Minimal fix
 def get_db_path():
     env = os.environ.get("BRAINLAYER_DB")
-    if env and os.path.exists(env):
+    if env:
         return env
-    if os.path.exists(_CANONICAL_DB):
-        return _CANONICAL_DB
-    return None
+    os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True)
+    return _CANONICAL_DB
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/session-cleanup.py` around lines 23 - 32, The get_db_path function
duplicates resolver logic; replace the local implementation by reusing the
consolidated resolver from brainlayer.paths instead of maintaining duplicate
behavior. Import the canonical resolver (e.g., from brainlayer.paths import
get_db_path) and call that function from this module (remove or rename
_CANONICAL_DB and the duplicated logic) so BRAINLAYER_DB and canonical-path
handling are resolved consistently via brainlayer.paths.get_db_path().

Comment on lines +34 to +37
<key>BRAINLAYER_ENRICH_BACKEND</key>
<string>groq</string>
<key>GROQ_API_KEY</key>
<string>__GROQ_API_KEY__</string>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if install.sh validates GROQ_API_KEY before sed replacement

rg -n 'GROQ_API_KEY' scripts/launchd/install.sh -A3 -B3

Repository: EtanHey/brainlayer

Length of output: 804


🏁 Script executed:

#!/bin/bash
# Check full install.sh for any validation logic around GROQ_API_KEY

cat -n scripts/launchd/install.sh | head -60

Repository: EtanHey/brainlayer

Length of output: 2652


Unreplaced placeholder and missing API key validation cause silent auth failures.

If install.sh is not run, the literal __GROQ_API_KEY__ placeholder remains in the installed plist. This passes the bool(GROQ_API_KEY) check in check_backend_health() (since it's a non-empty string) but fails silently with authentication errors when calling the Groq API.

Additionally, if install.sh runs but the 1Password lookup fails (line 18), GROQ_API_KEY is set to an empty string, which then gets substituted into the plist—again causing silent auth failures without warning the user.

Add validation in install.sh to abort if GROQ_API_KEY is empty after the lookup attempt, or warn the user clearly that the Groq API key is missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/launchd/com.brainlayer.enrich.plist` around lines 34 - 37, The plist
contains an unreplaced GROQ_API_KEY placeholder which can pass the existing
check_backend_health() bool(GROQ_API_KEY) test but cause silent auth failures;
update install.sh to validate the GROQ_API_KEY after the 1Password lookup (the
code that sets GROQ_API_KEY) and abort with a clear error (or at minimum print a
conspicuous warning and exit non‑zero) if GROQ_API_KEY is empty or still equals
the placeholder "__GROQ_API_KEY__"; ensure the installer does not write an empty
or placeholder value into the plist and surface a user-facing message explaining
that the Groq API key is required.

BRAINLAYER_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
BRAINLAYER_BIN="${BRAINLAYER_BIN:-$(which brainlayer 2>/dev/null || echo "$HOME/.local/bin/brainlayer")}"
PYTHON3="${PYTHON3:-$(which python3 2>/dev/null || echo "/usr/bin/python3")}"
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

API key will be stored in plain text in the plist file.

The sed substitution embeds $GROQ_API_KEY directly into ~/Library/LaunchAgents/com.brainlayer.enrich.plist. While this is standard practice for launchd environment variables, be aware:

  • The plist is readable by any process running as your user
  • The key persists on disk in plain text

If this is a concern, alternatives include using macOS Keychain with security find-generic-password at daemon startup, or a secrets manager integration at runtime rather than install time.

Also applies to: 45-45

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/launchd/install.sh` at line 18, The install script currently embeds
GROQ_API_KEY directly into the com.brainlayer.enrich.plist via the GROQ_API_KEY
substitution, which stores the secret in plain text; instead, stop writing the
raw GROQ_API_KEY into the plist in install.sh and change the install flow to
either (a) store the secret in the macOS Keychain and write only a key
identifier to the plist, or (b) write a wrapper/launcher (referenced in
ProgramArguments) that retrieves the secret at startup using security
find-generic-password, or (c) integrate a runtime secrets manager call from the
service startup; update the sed substitution and any lines that write to the
plist so they place a placeholder or keychain identifier (not the secret) and
ensure the service binary (or wrapper script) uses security
find-generic-password to fetch GROQ_API_KEY at runtime.

🧹 Nitpick | 🔵 Trivial

Consider warning when GROQ_API_KEY is empty for the enrich plist.

The fallback chain (env var → 1Password CLI → empty string) silently succeeds even when the key is missing. When installing the enrich plist with an empty key, the enrichment daemon will fail at runtime with "Groq error: GROQ_API_KEY not set" (per enrichment.py:537-540). Users may not realize the root cause.

Consider adding a warning when installing the enrich plist specifically:

💡 Suggested improvement
 install_plist() {
     local name="$1"
     local src="$SCRIPT_DIR/com.brainlayer.${name}.plist"
     local dst="$LAUNCH_DIR/com.brainlayer.${name}.plist"

     if [ ! -f "$src" ]; then
         echo "ERROR: $src not found"
         return 1
     fi

+    # Warn if installing enrich without GROQ_API_KEY
+    if [ "$name" = "enrich" ] && [ -z "$GROQ_API_KEY" ]; then
+        echo "WARNING: GROQ_API_KEY is not set. Enrichment will fail at runtime."
+        echo "  Set GROQ_API_KEY env var or configure 1Password CLI with item 'GROQ_API_KEY'"
+    fi
+
     # Replace placeholders
     sed \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/launchd/install.sh` at line 18, When resolving GROQ_API_KEY in
install.sh (the GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY
--reveal --fields credential 2>/dev/null || echo "")}" fallback), detect when
the final value is empty and emit a clear warning message to the installer
indicating that the enrich plist will be installed without a GROQ_API_KEY and
the enrichment daemon will fail (reference the GROQ_API_KEY variable and the
enrich plist install path). Update install.sh to check the resolved GROQ_API_KEY
after assignment and print a visible warning (e.g., to stderr) with actionable
guidance (set GROQ_API_KEY env var or provide it via 1Password) before
proceeding with installing the enrich plist.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

1Password CLI op item get fields syntax

💡 Result:

In 1Password CLI 2, op item get supports selecting specific item fields with --fields as a comma-separated list, where each entry is a selector like label=<field label> (use quotes if the label has spaces). Example:

op item get "work email" --fields label=username,label=password

[1]

Example with labels that contain spaces:

op item get "<item>" --format json --fields "label=access key","label=secret access key"

[2]

Note: CLI 2 also added the ability to “retrieve fields by field type” when getting items, but the exact selector form depends on your CLI version—check op item get --help for your installed build. [3]


🌐 Web query:

1Password CLI 1.x vs 2.x "op item get" --fields syntax differences

💡 Result:

1Password CLI 1.x (op get item) --fields

  • Command shape: op get item <item> --fields <field1>,<field2>,...
  • Field names: you pass field labels/names directly (comma-separated), e.g. password,token. No label= prefix. Example from 1Password’s CLI 1.0 announcement: op get item GitLab --fields password,token. [1]
  • Output behavior: when one field is requested, CLI 1 returns it as a simple string; for multiple fields it returns a structured output (JSON or CSV depending on --format). [2]

1Password CLI 2.x (op item get) --fields

  • Command shape: op item get <item> --fields label=<field1>,label=<field2>,...
  • Field names: you must use a selector prefix; for labels it’s label= before each field, still comma-separated. The CLI 2 reference explicitly documents this label= requirement. [3]
  • Note: CLI 2 docs also recommend using op read when you want “the contents of a specific field” (instead of relying on op item get). [4]

Quick “same request” example (username + password)

  • CLI 1.x: op get item "work email" --fields username,password [1]
  • CLI 2.x: op item get "work email" --fields label=username,label=password [3]

Sources:
[1] 1Password blog (CLI 1.0): example op get item ... --fields password,token https://blog.1password.com/cli-1-0-release/
[2] 1Password CLI 1 release notes (v1.0.0): single-field string vs multi-field JSON/CSV https://app-updates.agilebits.com/product_history/CLI
[3] 1Password Developer “CLI reference”: --fields requires label= prefixes in CLI 2 https://developer.1password.com/docs/cli/reference/
[4] 1Password Developer “op item get” reference: “To retrieve the contents of a specific field, use op read instead.” https://developer.1password.com/docs/cli/reference/management-commands/item/


🌐 Web query:

1Password CLI "op item get" credential field API

💡 Result:

To retrieve the credential value from an API Credential item in 1Password:

  • Recommended (single field): use op read with a secret reference (this is what 1Password suggests for “contents of a specific field”).

    op read "op://<vault>/<item>/credential"

    op item get is for item details; for a specific field’s contents, use op read. [1][2]

  • Alternative: use op item get --fields (and reveal concealed fields)

    op item get "<item>" --vault "<vault>" --fields credential --reveal

    (API Credential items commonly have fields like username, credential, and hostname.) [3]

Sources
[1] https://developer.1password.com/docs/cli/reference/management-commands/item/
[2] https://developer.1password.com/docs/cli/secret-reference-syntax/
[3] https://support.1password.com/item-categories/


🏁 Script executed:

git ls-files | head -20

Repository: EtanHey/brainlayer

Length of output: 606


🏁 Script executed:

# Check if there's any documentation about 1Password CLI version requirements
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l "1password\|op " 2>/dev/null | head -10

Repository: EtanHey/brainlayer

Length of output: 422


🏁 Script executed:

# Look for other uses of 'op' command in scripts to determine CLI version pattern
rg "op (item|get)" --type sh -A 2 -B 2

Repository: EtanHey/brainlayer

Length of output: 529


🏁 Script executed:

# Check if there's any documentation about the GROQ_API_KEY setup
cat scripts/launchd/install.sh

Repository: EtanHey/brainlayer

Length of output: 2984


🏁 Script executed:

# Look for any README or documentation mentioning 1Password setup
rg -i "groq|1password|op " --type md

Repository: EtanHey/brainlayer

Length of output: 28486


Document the 1Password item type requirement and add error guidance.

The command assumes a 1Password item named GROQ_API_KEY of type API Credential with a field named credential. This will fail silently if the item is a different type (e.g., Login items use password instead). Add a comment documenting the expected item type, or improve error handling to guide users on the required setup:

# Item must be type "API Credential" with field "credential"
# If using "Login" type, use field "password" instead
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}"

Alternatively, use op read which provides better secret reference syntax.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/launchd/install.sh` at line 18, The GROQ_API_KEY retrieval assumes a
1Password item named GROQ_API_KEY of type "API Credential" with a field named
"credential" which can fail silently for other item types (e.g., "Login" uses
"password"); update the GROQ_API_KEY assignment (the line using
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields
credential 2>/dev/null || echo "")}") to include a comment documenting the
required item type and field, and add clearer fallback/error guidance (log or
echo a hint) when the op command returns empty; alternatively replace the op
item get invocation with op read usage and document the expected secret
reference syntax so users can choose the correct item type/field.

Comment thread scripts/wal_checkpoint.py
Comment on lines +19 to 28
_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")


def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
if env and os.path.exists(env):
return env
for p in DB_PATHS:
if os.path.exists(p):
return p
if os.path.exists(_CANONICAL_DB):
return _CANONICAL_DB
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider importing from brainlayer.paths to avoid duplication.

This script duplicates the path resolution logic from src/brainlayer/paths.py with a subtle behavioral difference: it requires the env-specified path to exist (os.path.exists(env)), while paths.py trusts the env var unconditionally.

For a checkpoint script, requiring existence makes sense, but the duplication could lead to drift. Consider importing and extending:

♻️ Suggested refactor
-_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")
-
-
-def get_db_path():
-    env = os.environ.get("BRAINLAYER_DB")
-    if env and os.path.exists(env):
-        return env
-    if os.path.exists(_CANONICAL_DB):
-        return _CANONICAL_DB
-    return None
+import sys
+from pathlib import Path
+
+# Add src to path for standalone script usage
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from brainlayer.paths import get_db_path as _get_db_path
+
+
+def get_db_path():
+    """Get DB path, but only if it exists (checkpoint needs existing DB)."""
+    path = _get_db_path()
+    return str(path) if path.exists() else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/wal_checkpoint.py` around lines 19 - 28, Refactor get_db_path to
reuse the path-resolution from brainlayer.paths instead of duplicating
_CANONICAL_DB logic: import the appropriate resolver (e.g., a function or
constant such as get_db_path or CANONICAL_DB) from brainlayer.paths and call it
here, then apply the checkpoint-specific existence check (keep the current env
existence requirement) — i.e., use the upstream resolver to obtain the candidate
path and only return it from this script if os.path.exists(candidate_path);
reference the existing symbols get_db_path and _CANONICAL_DB in this file and
the resolver in brainlayer.paths so the behavior stays consistent while
preserving the script’s extra existence check.

Comment thread src/brainlayer/paths.py
Comment on lines 14 to 24
def get_db_path() -> Path:
"""Resolve the BrainLayer database path.

Checks BRAINLAYER_DB env var first, then falls back to whichever
known path exists (preferring the legacy zikaron path if both exist,
since that's where the real data is).
Checks BRAINLAYER_DB env var first, then uses the canonical path.
"""
env = os.environ.get("BRAINLAYER_DB")
if env:
return Path(env)

if _LEGACY_DB_PATH.exists():
return _LEGACY_DB_PATH

# Ensure parent dir exists for fresh installs
_CANONICAL_DB_PATH.parent.mkdir(parents=True, exist_ok=True)
return _CANONICAL_DB_PATH
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add an upgrade guard before falling through to a fresh canonical DB.

Removing the legacy-path check here means an existing install that still only has ~/.local/share/zikaron/zikaron.db will now boot against a brand-new empty database. The same branch also returns BRAINLAYER_DB without creating its parent directory, so a first-time custom path can fail on open.

Proposed fix
 def get_db_path() -> Path:
     """Resolve the BrainLayer database path.
 
     Checks BRAINLAYER_DB env var first, then uses the canonical path.
     """
     env = os.environ.get("BRAINLAYER_DB")
     if env:
-        return Path(env)
+        path = Path(env).expanduser()
+        path.parent.mkdir(parents=True, exist_ok=True)
+        return path
+
+    legacy_db = Path.home() / ".local" / "share" / "zikaron" / "zikaron.db"
+    if legacy_db.exists() and not _CANONICAL_DB_PATH.exists():
+        raise RuntimeError(
+            f"Legacy database detected at {legacy_db}. Move it to {_CANONICAL_DB_PATH} before running BrainLayer."
+        )
 
     _CANONICAL_DB_PATH.parent.mkdir(parents=True, exist_ok=True)
     return _CANONICAL_DB_PATH
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/paths.py` around lines 14 - 24, get_db_path currently skips
checking the old legacy DB location and returns a fresh canonical DB (and also
returns BRAINLAYER_DB without ensuring its parent exists), causing installs with
only the legacy DB to get a new empty DB and custom paths to fail; update
get_db_path to first check for the legacy path (e.g., the old
"~/.local/share/zikaron/zikaron.db") and if it exists return that Path as an
upgrade guard before falling back to the canonical path, and when returning the
env-provided path (env from os.environ.get("BRAINLAYER_DB")) ensure its parent
directory is created (mkdir(parents=True, exist_ok=True)) before returning;
reference get_db_path, _CANONICAL_DB_PATH, and the env variable handling to
locate where to add these checks and the mkdir call.

EtanHey and others added 2 commits March 12, 2026 16:12
All 27 files now use a single canonical path (~/.local/share/brainlayer/brainlayer.db)
resolved through paths.py:get_db_path(). Removed legacy _LEGACY_DB_PATH and dual-path
fallback logic from paths.py, hooks, scripts, CLI, and docs.

Changes:
- paths.py: single canonical path, env var override only
- CLI (7 commands): import get_db_path() instead of hardcoding
- Scripts (8 files): import from brainlayer.paths
- Hooks (2 files): canonical path only, no dual-path fallback
- server.json: fix env var name BRAINLAYER_DB_PATH → BRAINLAYER_DB
- Launchd enrich plist: add BRAINLAYER_ENRICH_BACKEND=groq + GROQ_API_KEY placeholder
- install.sh: resolve GROQ_API_KEY from env/1Password at install time
- Tests: remove legacy path test cases, update eval baselines db path
- Docs: update data-locations.md, configuration.md, CLAUDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move `from ..paths import get_db_path` to first-party import section
in cli/__init__.py and cli/wizard.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EtanHey EtanHey force-pushed the fix/db-consolidation branch from f5d446a to fba575c Compare March 12, 2026 14:12
@EtanHey EtanHey merged commit 45069fa into main Mar 12, 2026
4 of 5 checks passed
@EtanHey EtanHey deleted the fix/db-consolidation branch March 12, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant