fix: consolidate DB paths — single canonical brainlayer.db#77
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughCentralizes DB path resolution: removes legacy zikaron fallbacks and hardcoded paths, makes all scripts/CLI/hooks use Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
scripts/backfill-context.py (1)
21-99: 🧹 Nitpick | 🔵 TrivialConsider adding WAL checkpoint before and after bulk updates.
This script performs bulk UPDATE operations on
chunkstable. As per coding guidelines, bulk database operations should checkpoint WAL before and after to ensure durability and prevent WAL bloat.♻️ Suggested enhancement
def backfill(): if not DB_PATH.exists(): print(f"Database not found: {DB_PATH}") sys.exit(1) store = VectorStore(DB_PATH) cursor = store.conn.cursor() + # Checkpoint WAL before bulk operation + cursor.execute("PRAGMA wal_checkpoint(FULL)") + # Check current state total = list(cursor.execute("SELECT COUNT(*) FROM chunks"))[0][0] # ... rest of function ... print("Done.", flush=True) + + # Checkpoint WAL after bulk operation + cursor.execute("PRAGMA wal_checkpoint(FULL)") store.close()As per coding guidelines: "Checkpoint WAL with
PRAGMA wal_checkpoint(FULL)before and after bulk database operations"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/backfill-context.py` around lines 21 - 99, In backfill(), before performing the bulk UPDATEs (the conversation_id UPDATE block and the per-source_file position updates that use cursor.execute(... "UPDATE chunks ...")), execute a WAL checkpoint via the DB connection (e.g., call store.conn.execute("PRAGMA wal_checkpoint(FULL)") or cursor.execute(...)) to checkpoint WAL, and then again after finishing all bulk updates (before the final verification queries and before store.close()) to ensure durability and prevent WAL bloat; place these checkpoint calls immediately before the first UPDATE and immediately after the loop that updates positions.src/brainlayer/paths.py (1)
27-28: 🛠️ Refactor suggestion | 🟠 MajorLine 28 caches
BRAINLAYER_DBtoo early.
DEFAULT_DB_PATH = get_db_path()freezes the env-resolved path the first timebrainlayer.pathsis imported. After this PR, that leaves mixed behavior between callers that useget_db_path()directly and callers that still importDEFAULT_DB_PATH. Prefer resolving overrides at call time and keep any exported constant non-env-sensitive.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/paths.py` around lines 27 - 28, DEFAULT_DB_PATH is being set by calling get_db_path() at import time which freezes environment-derived overrides; change DEFAULT_DB_PATH to a non-env-sensitive literal (e.g., "brainlayer.db" or other static default) instead of calling get_db_path() and ensure any code that needs the env-resolved path calls get_db_path() at runtime (references: DEFAULT_DB_PATH, get_db_path()).scripts/cloud_stream.py (2)
161-173: 🧹 Nitpick | 🔵 TrivialDuplicated
update_enrichmentcall — consider extracting helper.The
store.update_enrichment(...)block with all 10 parameters is repeated verbatim in the retry path (lines 198-210). This violates DRY and makes future changes error-prone.♻️ Proposed refactor to extract helper
+def _apply_enrichment(store: VectorStore, chunk_id: str, enrichment: dict) -> None: + """Apply parsed enrichment to the store.""" + store.update_enrichment( + chunk_id=chunk_id, + summary=enrichment.get("summary"), + tags=enrichment.get("tags"), + importance=enrichment.get("importance"), + intent=enrichment.get("intent"), + primary_symbols=enrichment.get("primary_symbols"), + resolved_query=enrichment.get("resolved_query"), + epistemic_level=enrichment.get("epistemic_level"), + version_scope=enrichment.get("version_scope"), + debt_impact=enrichment.get("debt_impact"), + external_deps=enrichment.get("external_deps"), + )Then replace both occurrences with:
_apply_enrichment(store, chunk_id, enrichment)Also applies to: 198-210
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/cloud_stream.py` around lines 161 - 173, The update_enrichment call is duplicated; extract a small helper (e.g., _apply_enrichment(store, chunk_id, enrichment)) that calls store.update_enrichment with the ten fields (summary, tags, importance, intent, primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact, external_deps) pulled from the enrichment dict, then replace both inlined blocks with a single call to _apply_enrichment(store, chunk_id, enrichment). Ensure the helper is defined near related functions in scripts/cloud_stream.py and use the same parameter names (store, chunk_id, enrichment) so retry-path and original path both call it.
231-231: 🧹 Nitpick | 🔵 TrivialConsider periodic WAL checkpointing for bulk update operations.
This script can process hundreds of thousands of UPDATE operations. Without explicit checkpointing, the WAL file may grow significantly during long runs. As per coding guidelines: "Checkpoint WAL with
PRAGMA wal_checkpoint(FULL)before and after bulk database operations."♻️ Proposed checkpoint addition
store = VectorStore(db_path) + store.conn.execute("PRAGMA wal_checkpoint(FULL)") # Load all chunks from JSONL filesAnd optionally checkpoint periodically in the batch loop:
await asyncio.gather(*tasks) + + # Periodic WAL checkpoint every few batches + if (batch_start // batch_size) % 10 == 0: + store.conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")And at the end before close:
+ store.conn.execute("PRAGMA wal_checkpoint(FULL)") store.close()Also applies to: 289-299
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/cloud_stream.py` at line 231, Add explicit WAL checkpointing around the bulk update work to prevent unbounded WAL growth: before starting the large update batch call the SQLite pragma "PRAGMA wal_checkpoint(FULL)" on the same DB connection used by VectorStore (reference the instantiated store = VectorStore(db_path) and use store.conn.execute(...) or store.execute(...) depending on the class API), periodically invoke the same pragma inside the batch loop (e.g., every N batches) to flush the WAL, and call it once more after finishing the bulk updates and before closing the DB; wrap each checkpoint call in a try/except and log failures with the existing logger so checkpoint errors don't crash the run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hooks/brainlayer-prompt-search.py`:
- Around line 71-80: get_db_path currently only returns BRAINLAYER_DB when the
file exists and returns None otherwise, diverging from the shared resolver;
update get_db_path to mirror the shared semantics by (1) returning the
BRAINLAYER_DB env value if set (do not require os.path.exists on it), (2)
otherwise returning _CANONICAL_DB if it exists, and (3) if neither branch
matches return _CANONICAL_DB as the final fallback (i.e., don't return None).
Change the logic in get_db_path and reference _CANONICAL_DB and the
BRAINLAYER_DB env lookup accordingly.
In `@hooks/session-cleanup.py`:
- Around line 23-32: The get_db_path function duplicates resolver logic; replace
the local implementation by reusing the consolidated resolver from
brainlayer.paths instead of maintaining duplicate behavior. Import the canonical
resolver (e.g., from brainlayer.paths import get_db_path) and call that function
from this module (remove or rename _CANONICAL_DB and the duplicated logic) so
BRAINLAYER_DB and canonical-path handling are resolved consistently via
brainlayer.paths.get_db_path().
In `@scripts/launchd/com.brainlayer.enrich.plist`:
- Around line 34-37: The plist contains an unreplaced GROQ_API_KEY placeholder
which can pass the existing check_backend_health() bool(GROQ_API_KEY) test but
cause silent auth failures; update install.sh to validate the GROQ_API_KEY after
the 1Password lookup (the code that sets GROQ_API_KEY) and abort with a clear
error (or at minimum print a conspicuous warning and exit non‑zero) if
GROQ_API_KEY is empty or still equals the placeholder "__GROQ_API_KEY__"; ensure
the installer does not write an empty or placeholder value into the plist and
surface a user-facing message explaining that the Groq API key is required.
In `@scripts/launchd/install.sh`:
- Line 18: The install script currently embeds GROQ_API_KEY directly into the
com.brainlayer.enrich.plist via the GROQ_API_KEY substitution, which stores the
secret in plain text; instead, stop writing the raw GROQ_API_KEY into the plist
in install.sh and change the install flow to either (a) store the secret in the
macOS Keychain and write only a key identifier to the plist, or (b) write a
wrapper/launcher (referenced in ProgramArguments) that retrieves the secret at
startup using security find-generic-password, or (c) integrate a runtime secrets
manager call from the service startup; update the sed substitution and any lines
that write to the plist so they place a placeholder or keychain identifier (not
the secret) and ensure the service binary (or wrapper script) uses security
find-generic-password to fetch GROQ_API_KEY at runtime.
- Line 18: When resolving GROQ_API_KEY in install.sh (the
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields
credential 2>/dev/null || echo "")}" fallback), detect when the final value is
empty and emit a clear warning message to the installer indicating that the
enrich plist will be installed without a GROQ_API_KEY and the enrichment daemon
will fail (reference the GROQ_API_KEY variable and the enrich plist install
path). Update install.sh to check the resolved GROQ_API_KEY after assignment and
print a visible warning (e.g., to stderr) with actionable guidance (set
GROQ_API_KEY env var or provide it via 1Password) before proceeding with
installing the enrich plist.
- Line 18: The GROQ_API_KEY retrieval assumes a 1Password item named
GROQ_API_KEY of type "API Credential" with a field named "credential" which can
fail silently for other item types (e.g., "Login" uses "password"); update the
GROQ_API_KEY assignment (the line using GROQ_API_KEY="${GROQ_API_KEY:-$(op item
get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}") to
include a comment documenting the required item type and field, and add clearer
fallback/error guidance (log or echo a hint) when the op command returns empty;
alternatively replace the op item get invocation with op read usage and document
the expected secret reference syntax so users can choose the correct item
type/field.
In `@scripts/wal_checkpoint.py`:
- Around line 19-28: Refactor get_db_path to reuse the path-resolution from
brainlayer.paths instead of duplicating _CANONICAL_DB logic: import the
appropriate resolver (e.g., a function or constant such as get_db_path or
CANONICAL_DB) from brainlayer.paths and call it here, then apply the
checkpoint-specific existence check (keep the current env existence requirement)
— i.e., use the upstream resolver to obtain the candidate path and only return
it from this script if os.path.exists(candidate_path); reference the existing
symbols get_db_path and _CANONICAL_DB in this file and the resolver in
brainlayer.paths so the behavior stays consistent while preserving the script’s
extra existence check.
In `@src/brainlayer/paths.py`:
- Around line 14-24: get_db_path currently skips checking the old legacy DB
location and returns a fresh canonical DB (and also returns BRAINLAYER_DB
without ensuring its parent exists), causing installs with only the legacy DB to
get a new empty DB and custom paths to fail; update get_db_path to first check
for the legacy path (e.g., the old "~/.local/share/zikaron/zikaron.db") and if
it exists return that Path as an upgrade guard before falling back to the
canonical path, and when returning the env-provided path (env from
os.environ.get("BRAINLAYER_DB")) ensure its parent directory is created
(mkdir(parents=True, exist_ok=True)) before returning; reference get_db_path,
_CANONICAL_DB_PATH, and the env variable handling to locate where to add these
checks and the mkdir call.
---
Outside diff comments:
In `@scripts/backfill-context.py`:
- Around line 21-99: In backfill(), before performing the bulk UPDATEs (the
conversation_id UPDATE block and the per-source_file position updates that use
cursor.execute(... "UPDATE chunks ...")), execute a WAL checkpoint via the DB
connection (e.g., call store.conn.execute("PRAGMA wal_checkpoint(FULL)") or
cursor.execute(...)) to checkpoint WAL, and then again after finishing all bulk
updates (before the final verification queries and before store.close()) to
ensure durability and prevent WAL bloat; place these checkpoint calls
immediately before the first UPDATE and immediately after the loop that updates
positions.
In `@scripts/cloud_stream.py`:
- Around line 161-173: The update_enrichment call is duplicated; extract a small
helper (e.g., _apply_enrichment(store, chunk_id, enrichment)) that calls
store.update_enrichment with the ten fields (summary, tags, importance, intent,
primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact,
external_deps) pulled from the enrichment dict, then replace both inlined blocks
with a single call to _apply_enrichment(store, chunk_id, enrichment). Ensure the
helper is defined near related functions in scripts/cloud_stream.py and use the
same parameter names (store, chunk_id, enrichment) so retry-path and original
path both call it.
- Line 231: Add explicit WAL checkpointing around the bulk update work to
prevent unbounded WAL growth: before starting the large update batch call the
SQLite pragma "PRAGMA wal_checkpoint(FULL)" on the same DB connection used by
VectorStore (reference the instantiated store = VectorStore(db_path) and use
store.conn.execute(...) or store.execute(...) depending on the class API),
periodically invoke the same pragma inside the batch loop (e.g., every N
batches) to flush the WAL, and call it once more after finishing the bulk
updates and before closing the DB; wrap each checkpoint call in a try/except and
log failures with the existing logger so checkpoint errors don't crash the run.
In `@src/brainlayer/paths.py`:
- Around line 27-28: DEFAULT_DB_PATH is being set by calling get_db_path() at
import time which freezes environment-derived overrides; change DEFAULT_DB_PATH
to a non-env-sensitive literal (e.g., "brainlayer.db" or other static default)
instead of calling get_db_path() and ensure any code that needs the env-resolved
path calls get_db_path() at runtime (references: DEFAULT_DB_PATH,
get_db_path()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ad74adf4-7b57-4997-a8f2-60b14162da48
📒 Files selected for processing (27)
CLAUDE.mddocs/configuration.mddocs/data-locations.mdhooks/brainlayer-prompt-search.pyhooks/session-cleanup.pyscripts/backfill-context.pyscripts/backfill-metadata.pyscripts/classify-all.pyscripts/cloud_backfill.pyscripts/cloud_stream.pyscripts/generate_style_card.pyscripts/launchd/com.brainlayer.enrich.plistscripts/launchd/install.shscripts/pre-label.pyscripts/reembed_bge_m3.pyscripts/verify_hebrew_search.pyscripts/wal_checkpoint.pyserver.jsonsrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.pysrc/brainlayer/cli_new.pysrc/brainlayer/clustering.pysrc/brainlayer/migrate.pysrc/brainlayer/paths.pytests/eval_baselines.jsontests/test_paths.pytests/test_phase3_qa.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (6)
scripts/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
scripts/**/*.py: Fix 8 scripts inscripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation
Checkpoint WAL withPRAGMA wal_checkpoint(FULL)before and after bulk database operations
Drop FTS triggers (especiallychunks_fts_delete) before bulk deletes and recreate after — FTS triggers cause massive performance degradation on large datasets
Batch delete operations in 5-10K chunks with checkpoint every 3 batches
Never delete fromchunkstable while FTS trigger is active on large datasets
Files:
scripts/generate_style_card.pyscripts/wal_checkpoint.pyscripts/verify_hebrew_search.pyscripts/backfill-metadata.pyscripts/cloud_stream.pyscripts/reembed_bge_m3.pyscripts/pre-label.pyscripts/backfill-context.pyscripts/classify-all.pyscripts/cloud_backfill.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use Python package structure with Typer CLI located in
src/brainlayer/
Files:
src/brainlayer/migrate.pysrc/brainlayer/paths.pysrc/brainlayer/cli_new.pysrc/brainlayer/clustering.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Lint and format Python code using
ruff check src/andruff format src/
Files:
src/brainlayer/migrate.pysrc/brainlayer/paths.pysrc/brainlayer/cli_new.pysrc/brainlayer/clustering.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.py
**/classify*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Preserve verbatim content for
ai_code,stack_trace, anduser_messageduring classification
Files:
scripts/classify-all.py
**/*test*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Run tests with
pytest
Files:
tests/test_paths.pytests/test_phase3_qa.py
**/paths.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/paths.py: Resolve DB path usingpaths.py:get_db_path()— prefers zikaron path (~/.local/share/zikaron/zikaron.db) if it exists, falls through to brainlayer path (~/.local/share/brainlayer/brainlayer.db)
Store database at~/.local/share/brainlayer/brainlayer.db, prompts cache at~/.local/share/brainlayer/prompts/, socket at/tmp/brainlayer.sock, and enrichment lock at/tmp/brainlayer-enrichment.lock
Files:
src/brainlayer/paths.py
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Fix 8 scripts in `scripts/` that hardcode `brainlayer.db` path to use dynamic path resolution during DB consolidation
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Resolve DB path using `paths.py:get_db_path()` — prefers zikaron path (`~/.local/share/zikaron/zikaron.db`) if it exists, falls through to brainlayer path (`~/.local/share/brainlayer/brainlayer.db`)
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Store database at `~/.local/share/brainlayer/brainlayer.db`, prompts cache at `~/.local/share/brainlayer/prompts/`, socket at `/tmp/brainlayer.sock`, and enrichment lock at `/tmp/brainlayer-enrichment.lock`
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Fix 8 scripts in `scripts/` that hardcode `brainlayer.db` path to use dynamic path resolution during DB consolidation
Applied to files:
CLAUDE.mdscripts/wal_checkpoint.pysrc/brainlayer/migrate.pyscripts/verify_hebrew_search.pyscripts/backfill-metadata.pyscripts/cloud_stream.pyscripts/reembed_bge_m3.pyscripts/pre-label.pydocs/configuration.mdscripts/backfill-context.pyhooks/brainlayer-prompt-search.pytests/eval_baselines.jsonscripts/classify-all.pytests/test_paths.pysrc/brainlayer/paths.pyhooks/session-cleanup.pyscripts/cloud_backfill.pydocs/data-locations.mdsrc/brainlayer/cli_new.pytests/test_phase3_qa.pysrc/brainlayer/clustering.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Store database at `~/.local/share/brainlayer/brainlayer.db`, prompts cache at `~/.local/share/brainlayer/prompts/`, socket at `/tmp/brainlayer.sock`, and enrichment lock at `/tmp/brainlayer-enrichment.lock`
Applied to files:
CLAUDE.mdscripts/generate_style_card.pyserver.jsonscripts/wal_checkpoint.pysrc/brainlayer/migrate.pyscripts/launchd/install.shscripts/verify_hebrew_search.pyscripts/backfill-metadata.pyscripts/cloud_stream.pyscripts/reembed_bge_m3.pyscripts/pre-label.pydocs/configuration.mdscripts/backfill-context.pyhooks/brainlayer-prompt-search.pyscripts/classify-all.pytests/test_paths.pysrc/brainlayer/paths.pyhooks/session-cleanup.pyscripts/cloud_backfill.pydocs/data-locations.mdsrc/brainlayer/cli_new.pytests/test_phase3_qa.pysrc/brainlayer/clustering.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/paths.py : Resolve DB path using `paths.py:get_db_path()` — prefers zikaron path (`~/.local/share/zikaron/zikaron.db`) if it exists, falls through to brainlayer path (`~/.local/share/brainlayer/brainlayer.db`)
Applied to files:
CLAUDE.mdscripts/generate_style_card.pyscripts/wal_checkpoint.pysrc/brainlayer/migrate.pyscripts/verify_hebrew_search.pyscripts/backfill-metadata.pyscripts/cloud_stream.pyscripts/reembed_bge_m3.pyscripts/pre-label.pydocs/configuration.mdscripts/backfill-context.pyhooks/brainlayer-prompt-search.pyscripts/classify-all.pytests/test_paths.pysrc/brainlayer/paths.pyhooks/session-cleanup.pyscripts/cloud_backfill.pydocs/data-locations.mdsrc/brainlayer/cli_new.pytests/test_phase3_qa.pysrc/brainlayer/clustering.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/cli/wizard.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Extract, classify, chunk, embed, and index session data following the BrainLayer pipeline
Applied to files:
CLAUDE.mddocs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to scripts/**/*.py : Checkpoint WAL with `PRAGMA wal_checkpoint(FULL)` before and after bulk database operations
Applied to files:
scripts/wal_checkpoint.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/enrichment*.py : Use MLX (`Qwen2.5-Coder-14B-Instruct-4bit`) as primary enrichment backend on Apple Silicon (port 8080) with Ollama (`glm-4.7-flash`) fallback on port 11434
Applied to files:
scripts/launchd/com.brainlayer.enrich.plist
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to golems/scripts/enrichment-lazy.sh : Use enrichment worker script `golems/scripts/enrichment-lazy.sh` with launchd scheduling at nice=20 priority, processing in batches of 50
Applied to files:
scripts/launchd/com.brainlayer.enrich.plist
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/enrichment*.py : Auto-switch enrichment backend from MLX to Ollama after 3 consecutive MLX failures; allow override with `BRAINLAYER_ENRICH_BACKEND` environment variable
Applied to files:
scripts/launchd/com.brainlayer.enrich.plistscripts/cloud_stream.pyscripts/cloud_backfill.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/mcp/**/*.py : Implement MCP tools: `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person` with legacy `brainlayer_*` aliases
Applied to files:
hooks/brainlayer-prompt-search.pydocs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Source data comes from JSONL format in `~/.claude/projects/`
Applied to files:
docs/data-locations.md
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/export*.py : Export knowledge base as brain graph JSON via `brainlayer brain-export` for dashboard visualization
Applied to files:
docs/data-locations.mdsrc/brainlayer/cli/__init__.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/export*.py : Export knowledge base to Obsidian Markdown vault with backlinks and tags via `brainlayer export-obsidian`
Applied to files:
docs/data-locations.mdsrc/brainlayer/cli/__init__.py
📚 Learning: 2026-03-02T21:14:04.731Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T21:14:04.731Z
Learning: Applies to **/vector_store.py : Use APSW with sqlite-vec for vector storage (in `vector_store.py`)
Applied to files:
src/brainlayer/cli/__init__.py
🪛 markdownlint-cli2 (0.21.0)
CLAUDE.md
[warning] 34-34: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (18)
server.json (1)
24-28: LGTM!The environment variable name change from
BRAINLAYER_DB_PATHtoBRAINLAYER_DBcorrectly aligns with the canonical path resolution insrc/brainlayer/paths.pyand other scripts.src/brainlayer/cli_new.py (2)
9-9: LGTM!Correctly imports
get_db_pathfrom the centralized paths module.
115-122: LGTM!Migration command now uses the centralized path resolver and properly checks existence before prompting for overwrite.
docs/data-locations.md (2)
9-9: LGTM!Database stats updated to reflect the consolidated state (297K+ chunks at ~8GB).
17-18: LGTM!Path resolution documentation correctly reflects the simplified two-step resolution (env var → canonical path), matching
src/brainlayer/paths.py.scripts/backfill-context.py (1)
15-18: LGTM!Correctly imports and uses the centralized
get_db_path()function, aligning with the DB consolidation goals.src/brainlayer/migrate.py (1)
20-27: LGTM!Migration script correctly imports and uses the centralized
get_db_path()function, replacing the previous hardcoded path.tests/eval_baselines.json (1)
4-4: LGTM!Test baseline correctly updated to use the canonical
brainlayer.dbpath, consistent with the DB consolidation.scripts/classify-all.py (1)
17-20: LGTM. This switches the script to the shared DB resolver instead of another hard-coded path.Based on learnings: Fix 8 scripts in
scripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation.scripts/pre-label.py (1)
17-20: LGTM. This is the right consolidation point for path lookup in this script.Based on learnings: Fix 8 scripts in
scripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation.scripts/reembed_bge_m3.py (1)
21-37: LGTM. Using the shared resolver forDEFAULT_DBkeeps this maintenance script aligned with the canonical DB path.Based on learnings: Fix 8 scripts in
scripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation.docs/configuration.md (1)
34-37: LGTM. This matches the consolidated path resolution and removes ambiguity about the default DB location.Based on learnings: Store database at
~/.local/share/brainlayer/brainlayer.db, prompts cache at~/.local/share/brainlayer/prompts/, socket at/tmp/brainlayer.sock, and enrichment lock at/tmp/brainlayer-enrichment.lock.scripts/verify_hebrew_search.py (1)
12-21: LGTM. This removes another standalone DB-path assumption and keeps the verification script on the same resolver as the app.Based on learnings: Fix 8 scripts in
scripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation.tests/test_phase3_qa.py (1)
35-41: LGTM. This test now exercises the fresh-install canonical path behavior the rest of the PR is standardizing on.tests/test_paths.py (2)
20-30: LGTM — Test correctly reflects canonical-only path resolution.The updated docstring and test logic align with the PR's goal of removing legacy dual-path behavior. The test properly verifies that:
- The canonical path is returned when no env override exists
- The parent directory is created if missing
32-37: Integration test looks good.The local import of
DEFAULT_DB_PATHinside the test function is appropriate for an integration test that depends on production state.scripts/cloud_stream.py (2)
41-46: Path resolution correctly updated to use dynamicget_db_path().This change properly implements the PR objective of consolidating DB paths by importing and using the centralized path resolver instead of hardcoding the path. Based on learnings: "Fix 8 scripts in
scripts/that hardcodebrainlayer.dbpath to use dynamic path resolution during DB consolidation."
368-408: CLI implementation looks correct.The
--dbargument correctly falls back toDEFAULT_DB_PATH(now usingget_db_path()), and the file selection logic is reasonable.
| _CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db") | ||
|
|
||
|
|
||
| def get_db_path(): | ||
| env = os.environ.get("BRAINLAYER_DB") | ||
| if env and os.path.exists(env): | ||
| return env | ||
| for p in DB_PATHS: | ||
| if os.path.exists(p): | ||
| return p | ||
| if os.path.exists(_CANONICAL_DB): | ||
| return _CANONICAL_DB | ||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Match the shared DB resolver semantics here. This hook now has its own get_db_path() again, and it already diverges from src/brainlayer/paths.py: it only honors BRAINLAYER_DB if the file exists and otherwise returns None instead of the canonical path. That reintroduces resolver drift into the consolidation PR.
♻️ Minimal fix
def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
- if env and os.path.exists(env):
+ if env:
return env
- if os.path.exists(_CANONICAL_DB):
- return _CANONICAL_DB
- return None
+ os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True)
+ return _CANONICAL_DB📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db") | |
| def get_db_path(): | |
| env = os.environ.get("BRAINLAYER_DB") | |
| if env and os.path.exists(env): | |
| return env | |
| for p in DB_PATHS: | |
| if os.path.exists(p): | |
| return p | |
| if os.path.exists(_CANONICAL_DB): | |
| return _CANONICAL_DB | |
| return None | |
| _CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db") | |
| def get_db_path(): | |
| env = os.environ.get("BRAINLAYER_DB") | |
| if env: | |
| return env | |
| os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True) | |
| return _CANONICAL_DB |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hooks/brainlayer-prompt-search.py` around lines 71 - 80, get_db_path
currently only returns BRAINLAYER_DB when the file exists and returns None
otherwise, diverging from the shared resolver; update get_db_path to mirror the
shared semantics by (1) returning the BRAINLAYER_DB env value if set (do not
require os.path.exists on it), (2) otherwise returning _CANONICAL_DB if it
exists, and (3) if neither branch matches return _CANONICAL_DB as the final
fallback (i.e., don't return None). Change the logic in get_db_path and
reference _CANONICAL_DB and the BRAINLAYER_DB env lookup accordingly.
| _CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db") | ||
|
|
||
|
|
||
| def get_db_path(): | ||
| env = os.environ.get("BRAINLAYER_DB") | ||
| if env and os.path.exists(env): | ||
| return env | ||
| for p in DB_PATHS: | ||
| if os.path.exists(p): | ||
| return p | ||
| if os.path.exists(_CANONICAL_DB): | ||
| return _CANONICAL_DB | ||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Reuse the consolidated resolver behavior here too. The local helper no longer matches brainlayer.paths.get_db_path(), so BRAINLAYER_DB overrides and canonical-path handling now have to be maintained in two places again.
♻️ Minimal fix
def get_db_path():
env = os.environ.get("BRAINLAYER_DB")
- if env and os.path.exists(env):
+ if env:
return env
- if os.path.exists(_CANONICAL_DB):
- return _CANONICAL_DB
- return None
+ os.makedirs(os.path.dirname(_CANONICAL_DB), exist_ok=True)
+ return _CANONICAL_DB🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hooks/session-cleanup.py` around lines 23 - 32, The get_db_path function
duplicates resolver logic; replace the local implementation by reusing the
consolidated resolver from brainlayer.paths instead of maintaining duplicate
behavior. Import the canonical resolver (e.g., from brainlayer.paths import
get_db_path) and call that function from this module (remove or rename
_CANONICAL_DB and the duplicated logic) so BRAINLAYER_DB and canonical-path
handling are resolved consistently via brainlayer.paths.get_db_path().
| <key>BRAINLAYER_ENRICH_BACKEND</key> | ||
| <string>groq</string> | ||
| <key>GROQ_API_KEY</key> | ||
| <string>__GROQ_API_KEY__</string> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if install.sh validates GROQ_API_KEY before sed replacement
rg -n 'GROQ_API_KEY' scripts/launchd/install.sh -A3 -B3Repository: EtanHey/brainlayer
Length of output: 804
🏁 Script executed:
#!/bin/bash
# Check full install.sh for any validation logic around GROQ_API_KEY
cat -n scripts/launchd/install.sh | head -60Repository: EtanHey/brainlayer
Length of output: 2652
Unreplaced placeholder and missing API key validation cause silent auth failures.
If install.sh is not run, the literal __GROQ_API_KEY__ placeholder remains in the installed plist. This passes the bool(GROQ_API_KEY) check in check_backend_health() (since it's a non-empty string) but fails silently with authentication errors when calling the Groq API.
Additionally, if install.sh runs but the 1Password lookup fails (line 18), GROQ_API_KEY is set to an empty string, which then gets substituted into the plist—again causing silent auth failures without warning the user.
Add validation in install.sh to abort if GROQ_API_KEY is empty after the lookup attempt, or warn the user clearly that the Groq API key is missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/launchd/com.brainlayer.enrich.plist` around lines 34 - 37, The plist
contains an unreplaced GROQ_API_KEY placeholder which can pass the existing
check_backend_health() bool(GROQ_API_KEY) test but cause silent auth failures;
update install.sh to validate the GROQ_API_KEY after the 1Password lookup (the
code that sets GROQ_API_KEY) and abort with a clear error (or at minimum print a
conspicuous warning and exit non‑zero) if GROQ_API_KEY is empty or still equals
the placeholder "__GROQ_API_KEY__"; ensure the installer does not write an empty
or placeholder value into the plist and surface a user-facing message explaining
that the Groq API key is required.
| BRAINLAYER_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" | ||
| BRAINLAYER_BIN="${BRAINLAYER_BIN:-$(which brainlayer 2>/dev/null || echo "$HOME/.local/bin/brainlayer")}" | ||
| PYTHON3="${PYTHON3:-$(which python3 2>/dev/null || echo "/usr/bin/python3")}" | ||
| GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
API key will be stored in plain text in the plist file.
The sed substitution embeds $GROQ_API_KEY directly into ~/Library/LaunchAgents/com.brainlayer.enrich.plist. While this is standard practice for launchd environment variables, be aware:
- The plist is readable by any process running as your user
- The key persists on disk in plain text
If this is a concern, alternatives include using macOS Keychain with security find-generic-password at daemon startup, or a secrets manager integration at runtime rather than install time.
Also applies to: 45-45
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/launchd/install.sh` at line 18, The install script currently embeds
GROQ_API_KEY directly into the com.brainlayer.enrich.plist via the GROQ_API_KEY
substitution, which stores the secret in plain text; instead, stop writing the
raw GROQ_API_KEY into the plist in install.sh and change the install flow to
either (a) store the secret in the macOS Keychain and write only a key
identifier to the plist, or (b) write a wrapper/launcher (referenced in
ProgramArguments) that retrieves the secret at startup using security
find-generic-password, or (c) integrate a runtime secrets manager call from the
service startup; update the sed substitution and any lines that write to the
plist so they place a placeholder or keychain identifier (not the secret) and
ensure the service binary (or wrapper script) uses security
find-generic-password to fetch GROQ_API_KEY at runtime.
🧹 Nitpick | 🔵 Trivial
Consider warning when GROQ_API_KEY is empty for the enrich plist.
The fallback chain (env var → 1Password CLI → empty string) silently succeeds even when the key is missing. When installing the enrich plist with an empty key, the enrichment daemon will fail at runtime with "Groq error: GROQ_API_KEY not set" (per enrichment.py:537-540). Users may not realize the root cause.
Consider adding a warning when installing the enrich plist specifically:
💡 Suggested improvement
install_plist() {
local name="$1"
local src="$SCRIPT_DIR/com.brainlayer.${name}.plist"
local dst="$LAUNCH_DIR/com.brainlayer.${name}.plist"
if [ ! -f "$src" ]; then
echo "ERROR: $src not found"
return 1
fi
+ # Warn if installing enrich without GROQ_API_KEY
+ if [ "$name" = "enrich" ] && [ -z "$GROQ_API_KEY" ]; then
+ echo "WARNING: GROQ_API_KEY is not set. Enrichment will fail at runtime."
+ echo " Set GROQ_API_KEY env var or configure 1Password CLI with item 'GROQ_API_KEY'"
+ fi
+
# Replace placeholders
sed \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/launchd/install.sh` at line 18, When resolving GROQ_API_KEY in
install.sh (the GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY
--reveal --fields credential 2>/dev/null || echo "")}" fallback), detect when
the final value is empty and emit a clear warning message to the installer
indicating that the enrich plist will be installed without a GROQ_API_KEY and
the enrichment daemon will fail (reference the GROQ_API_KEY variable and the
enrich plist install path). Update install.sh to check the resolved GROQ_API_KEY
after assignment and print a visible warning (e.g., to stderr) with actionable
guidance (set GROQ_API_KEY env var or provide it via 1Password) before
proceeding with installing the enrich plist.
🧩 Analysis chain
🌐 Web query:
1Password CLI op item get fields syntax
💡 Result:
In 1Password CLI 2, op item get supports selecting specific item fields with --fields as a comma-separated list, where each entry is a selector like label=<field label> (use quotes if the label has spaces). Example:
op item get "work email" --fields label=username,label=password[1]
Example with labels that contain spaces:
op item get "<item>" --format json --fields "label=access key","label=secret access key"[2]
Note: CLI 2 also added the ability to “retrieve fields by field type” when getting items, but the exact selector form depends on your CLI version—check op item get --help for your installed build. [3]
🌐 Web query:
1Password CLI 1.x vs 2.x "op item get" --fields syntax differences
💡 Result:
1Password CLI 1.x (op get item) --fields
- Command shape:
op get item <item> --fields <field1>,<field2>,... - Field names: you pass field labels/names directly (comma-separated), e.g.
password,token. Nolabel=prefix. Example from 1Password’s CLI 1.0 announcement:op get item GitLab --fields password,token. [1] - Output behavior: when one field is requested, CLI 1 returns it as a simple string; for multiple fields it returns a structured output (JSON or CSV depending on
--format). [2]
1Password CLI 2.x (op item get) --fields
- Command shape:
op item get <item> --fields label=<field1>,label=<field2>,... - Field names: you must use a selector prefix; for labels it’s
label=before each field, still comma-separated. The CLI 2 reference explicitly documents thislabel=requirement. [3] - Note: CLI 2 docs also recommend using
op readwhen you want “the contents of a specific field” (instead of relying onop item get). [4]
Quick “same request” example (username + password)
- CLI 1.x:
op get item "work email" --fields username,password[1] - CLI 2.x:
op item get "work email" --fields label=username,label=password[3]
Sources:
[1] 1Password blog (CLI 1.0): example op get item ... --fields password,token https://blog.1password.com/cli-1-0-release/
[2] 1Password CLI 1 release notes (v1.0.0): single-field string vs multi-field JSON/CSV https://app-updates.agilebits.com/product_history/CLI
[3] 1Password Developer “CLI reference”: --fields requires label= prefixes in CLI 2 https://developer.1password.com/docs/cli/reference/
[4] 1Password Developer “op item get” reference: “To retrieve the contents of a specific field, use op read instead.” https://developer.1password.com/docs/cli/reference/management-commands/item/
🌐 Web query:
1Password CLI "op item get" credential field API
💡 Result:
To retrieve the credential value from an API Credential item in 1Password:
-
Recommended (single field): use
op readwith a secret reference (this is what 1Password suggests for “contents of a specific field”).op read "op://<vault>/<item>/credential"
op item getis for item details; for a specific field’s contents, useop read. [1][2] -
Alternative: use
op item get --fields(and reveal concealed fields)op item get "<item>" --vault "<vault>" --fields credential --reveal
(API Credential items commonly have fields like
username,credential, andhostname.) [3]
Sources
[1] https://developer.1password.com/docs/cli/reference/management-commands/item/
[2] https://developer.1password.com/docs/cli/secret-reference-syntax/
[3] https://support.1password.com/item-categories/
🏁 Script executed:
git ls-files | head -20Repository: EtanHey/brainlayer
Length of output: 606
🏁 Script executed:
# Check if there's any documentation about 1Password CLI version requirements
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l "1password\|op " 2>/dev/null | head -10Repository: EtanHey/brainlayer
Length of output: 422
🏁 Script executed:
# Look for other uses of 'op' command in scripts to determine CLI version pattern
rg "op (item|get)" --type sh -A 2 -B 2Repository: EtanHey/brainlayer
Length of output: 529
🏁 Script executed:
# Check if there's any documentation about the GROQ_API_KEY setup
cat scripts/launchd/install.shRepository: EtanHey/brainlayer
Length of output: 2984
🏁 Script executed:
# Look for any README or documentation mentioning 1Password setup
rg -i "groq|1password|op " --type mdRepository: EtanHey/brainlayer
Length of output: 28486
Document the 1Password item type requirement and add error guidance.
The command assumes a 1Password item named GROQ_API_KEY of type API Credential with a field named credential. This will fail silently if the item is a different type (e.g., Login items use password instead). Add a comment documenting the expected item type, or improve error handling to guide users on the required setup:
# Item must be type "API Credential" with field "credential"
# If using "Login" type, use field "password" instead
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields credential 2>/dev/null || echo "")}"Alternatively, use op read which provides better secret reference syntax.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/launchd/install.sh` at line 18, The GROQ_API_KEY retrieval assumes a
1Password item named GROQ_API_KEY of type "API Credential" with a field named
"credential" which can fail silently for other item types (e.g., "Login" uses
"password"); update the GROQ_API_KEY assignment (the line using
GROQ_API_KEY="${GROQ_API_KEY:-$(op item get GROQ_API_KEY --reveal --fields
credential 2>/dev/null || echo "")}") to include a comment documenting the
required item type and field, and add clearer fallback/error guidance (log or
echo a hint) when the op command returns empty; alternatively replace the op
item get invocation with op read usage and document the expected secret
reference syntax so users can choose the correct item type/field.
| _CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db") | ||
|
|
||
|
|
||
| def get_db_path(): | ||
| env = os.environ.get("BRAINLAYER_DB") | ||
| if env and os.path.exists(env): | ||
| return env | ||
| for p in DB_PATHS: | ||
| if os.path.exists(p): | ||
| return p | ||
| if os.path.exists(_CANONICAL_DB): | ||
| return _CANONICAL_DB | ||
| return None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider importing from brainlayer.paths to avoid duplication.
This script duplicates the path resolution logic from src/brainlayer/paths.py with a subtle behavioral difference: it requires the env-specified path to exist (os.path.exists(env)), while paths.py trusts the env var unconditionally.
For a checkpoint script, requiring existence makes sense, but the duplication could lead to drift. Consider importing and extending:
♻️ Suggested refactor
-_CANONICAL_DB = os.path.expanduser("~/.local/share/brainlayer/brainlayer.db")
-
-
-def get_db_path():
- env = os.environ.get("BRAINLAYER_DB")
- if env and os.path.exists(env):
- return env
- if os.path.exists(_CANONICAL_DB):
- return _CANONICAL_DB
- return None
+import sys
+from pathlib import Path
+
+# Add src to path for standalone script usage
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from brainlayer.paths import get_db_path as _get_db_path
+
+
+def get_db_path():
+ """Get DB path, but only if it exists (checkpoint needs existing DB)."""
+ path = _get_db_path()
+ return str(path) if path.exists() else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/wal_checkpoint.py` around lines 19 - 28, Refactor get_db_path to
reuse the path-resolution from brainlayer.paths instead of duplicating
_CANONICAL_DB logic: import the appropriate resolver (e.g., a function or
constant such as get_db_path or CANONICAL_DB) from brainlayer.paths and call it
here, then apply the checkpoint-specific existence check (keep the current env
existence requirement) — i.e., use the upstream resolver to obtain the candidate
path and only return it from this script if os.path.exists(candidate_path);
reference the existing symbols get_db_path and _CANONICAL_DB in this file and
the resolver in brainlayer.paths so the behavior stays consistent while
preserving the script’s extra existence check.
| def get_db_path() -> Path: | ||
| """Resolve the BrainLayer database path. | ||
|
|
||
| Checks BRAINLAYER_DB env var first, then falls back to whichever | ||
| known path exists (preferring the legacy zikaron path if both exist, | ||
| since that's where the real data is). | ||
| Checks BRAINLAYER_DB env var first, then uses the canonical path. | ||
| """ | ||
| env = os.environ.get("BRAINLAYER_DB") | ||
| if env: | ||
| return Path(env) | ||
|
|
||
| if _LEGACY_DB_PATH.exists(): | ||
| return _LEGACY_DB_PATH | ||
|
|
||
| # Ensure parent dir exists for fresh installs | ||
| _CANONICAL_DB_PATH.parent.mkdir(parents=True, exist_ok=True) | ||
| return _CANONICAL_DB_PATH |
There was a problem hiding this comment.
Add an upgrade guard before falling through to a fresh canonical DB.
Removing the legacy-path check here means an existing install that still only has ~/.local/share/zikaron/zikaron.db will now boot against a brand-new empty database. The same branch also returns BRAINLAYER_DB without creating its parent directory, so a first-time custom path can fail on open.
Proposed fix
def get_db_path() -> Path:
"""Resolve the BrainLayer database path.
Checks BRAINLAYER_DB env var first, then uses the canonical path.
"""
env = os.environ.get("BRAINLAYER_DB")
if env:
- return Path(env)
+ path = Path(env).expanduser()
+ path.parent.mkdir(parents=True, exist_ok=True)
+ return path
+
+ legacy_db = Path.home() / ".local" / "share" / "zikaron" / "zikaron.db"
+ if legacy_db.exists() and not _CANONICAL_DB_PATH.exists():
+ raise RuntimeError(
+ f"Legacy database detected at {legacy_db}. Move it to {_CANONICAL_DB_PATH} before running BrainLayer."
+ )
_CANONICAL_DB_PATH.parent.mkdir(parents=True, exist_ok=True)
return _CANONICAL_DB_PATH🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/paths.py` around lines 14 - 24, get_db_path currently skips
checking the old legacy DB location and returns a fresh canonical DB (and also
returns BRAINLAYER_DB without ensuring its parent exists), causing installs with
only the legacy DB to get a new empty DB and custom paths to fail; update
get_db_path to first check for the legacy path (e.g., the old
"~/.local/share/zikaron/zikaron.db") and if it exists return that Path as an
upgrade guard before falling back to the canonical path, and when returning the
env-provided path (env from os.environ.get("BRAINLAYER_DB")) ensure its parent
directory is created (mkdir(parents=True, exist_ok=True)) before returning;
reference get_db_path, _CANONICAL_DB_PATH, and the env variable handling to
locate where to add these checks and the mkdir call.
All 27 files now use a single canonical path (~/.local/share/brainlayer/brainlayer.db) resolved through paths.py:get_db_path(). Removed legacy _LEGACY_DB_PATH and dual-path fallback logic from paths.py, hooks, scripts, CLI, and docs. Changes: - paths.py: single canonical path, env var override only - CLI (7 commands): import get_db_path() instead of hardcoding - Scripts (8 files): import from brainlayer.paths - Hooks (2 files): canonical path only, no dual-path fallback - server.json: fix env var name BRAINLAYER_DB_PATH → BRAINLAYER_DB - Launchd enrich plist: add BRAINLAYER_ENRICH_BACKEND=groq + GROQ_API_KEY placeholder - install.sh: resolve GROQ_API_KEY from env/1Password at install time - Tests: remove legacy path test cases, update eval baselines db path - Docs: update data-locations.md, configuration.md, CLAUDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move `from ..paths import get_db_path` to first-party import section in cli/__init__.py and cli/wizard.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f5d446a to
fba575c
Compare
Summary
paths.py:get_db_path()with single canonical path~/.local/share/brainlayer/brainlayer.dbBRAINLAYER_DB_PATH→BRAINLAYER_DBto match whatpaths.pyactually readsBRAINLAYER_ENRICH_BACKEND=groqandGROQ_API_KEYplaceholder with 1Password auto-resolution ininstall.shWhat changed
paths.py_LEGACY_DB_PATH, simplified to env var or canonicalcli/__init__.py(7 commands)Path.home()/...→get_db_path()from brainlayer.paths import get_db_pathDB_PATHSlist with single canonicalserver.jsonPre-merge steps (already done)
zikaron-backup-2026-03-12.db)zikaron.db→brainlayer.db(verified: 297,296 chunks)Test plan
pytest tests/test_paths.py tests/test_phase3_qa.py— 30/30 passpytest tests/(broad) — 779/779 passbrainlayer stats— shows 297,296 chunks from canonical pathbrainlayer search— works correctly./scripts/launchd/install.shto install agentsbrainlayer enrich --statsworks with Groq🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests