Skip to content

fix: restore hook keyword extraction and safer readonly access#198

Merged
EtanHey merged 2 commits into
mainfrom
fix/hook-keyword-extraction
Apr 4, 2026
Merged

fix: restore hook keyword extraction and safer readonly access#198
EtanHey merged 2 commits into
mainfrom
fix/hook-keyword-extraction

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 4, 2026

Summary

  • restore prompt hook keyword extraction for short/high-signal terms like T3, AI, PR, and keep code searchable
  • widen automatic entity injection to project/technology/tool, add injection event writes, and add the search-before-assume reminder
  • remove readonly WAL pragmas from both hooks, add Hebrew style guidance for session start, and cover the new behavior with hook tests

Test plan

  • python3 -m pytest tests/test_conditional_hooks.py -v
  • manual keyword extraction snippet from the spec
  • python3 -m pytest (blocked by existing tests/test_engine.py::TestSessionsIntegration::test_sessions_returns_list -> apsw.BusyError: database is locked after 455 passing tests)

Note

Fix keyword extraction and safer readonly SQLite access in prompt-search and session-start hooks

  • Restores extraction of two-character tokens (e.g. ai, pr) in extract_keywords by changing the length filter from > 2 to >= 2, and removes code from stop words.
  • Adds ASSUME_TRIGGERS detection in brainlayer-prompt-search.py: prompts containing assumption-prone keywords now append a SEARCH-BEFORE-ASSUME reminder line to output.
  • Expands entity injection in detect_entities_in_prompt to include project, technology, and tool types, and adds record_injection_event to persist injection events into a SQLite injection_events table when a session_id is present.
  • Hardens readonly SQLite connections in both hooks: connection timeout reduced to 1s, journal_mode=WAL removed, and PRAGMA busy_timeout=1000 + PRAGMA query_only=true added instead.
  • Adds tests covering short-token retention, new entity types, injection event persistence, assume-warning triggering, and updated PRAGMA execution.

Macroscope summarized 7a90ea1.

Summary by CodeRabbit

  • New Features

    • Added "Search Before Assume" reminder messaging when assumption-prone language is detected
    • Extended automatic entity context injection to recognize projects, technologies, and tools
    • Added conditional Hebrew language guidance for specific workflows
    • Improved keyword extraction to capture shorter meaningful terms in searches
  • Tests

    • Added comprehensive test coverage for new detection and injection features

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 4, 2026

@codex review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 4, 2026

@cursor @BugBot review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 4, 2026

@coderabbitai review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 4, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

Two hook scripts were enhanced with improved injection handling and database connectivity: brainlayer-prompt-search.py gained assumption-trigger detection, expanded entity-context injection, relaxed keyword filtering, and event recording to SQLite; brainlayer-session-start.py received conditional Hebrew-style context injection and adjusted readonly database connection settings. Comprehensive tests were added to verify these new features.

Changes

Cohort / File(s) Summary
Hook Script Enhancements
hooks/brainlayer-prompt-search.py, hooks/brainlayer-session-start.py
Modified readonly SQLite connection setup (reduced timeout from 2→1 second, added PRAGMA busy_timeout=1000, removed unconditional journal_mode=WAL). Added assumption-trigger detection (ASSUME_TRIGGERS) with "SEARCH-BEFORE-ASSUME" reminder appending. Relaxed keyword extraction filtering (len(w) >= 2 instead of > 2). Expanded entity-context injection to include project, technology, and tool types. Introduced record_injection_event() function for persisting injection metadata to SQLite.
Test Coverage
tests/test_conditional_hooks.py
Added SQLite test infrastructure with FakeCursor and FakeConn classes. Implemented four new tests: keyword extraction with short tokens, entity detection for project/tool types, injection event persistence, and assumption-trigger warning output validation. All tests verify SQLite pragma execution and correct message formatting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #103: Modifies hooks/brainlayer-prompt-search.py main control flow and injection bookkeeping with ASSUME_TRIGGERS and SQLite-backed event recording, overlapping with the prompt-search enhancements in this PR.
  • PR #114: Modifies the same hook scripts (hooks/brainlayer-prompt-search.py and hooks/brainlayer-session-start.py) with main control flow and test updates around hook activation behavior.

Poem

🐰 Whiskers twitch at triggers found,
"Search first, assume less!" resound,*
Hebrew wisdom gently flows,
Shorter keywords—onward goes,
Events logged with SQLite sound!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: restore hook keyword extraction and safer readonly access' directly reflects the main changes: keyword extraction threshold changes and readonly database connection modifications across both hook files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hook-keyword-extraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread hooks/brainlayer-prompt-search.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/brainlayer-prompt-search.py`:
- Line 169: INJECT_TYPES currently includes "technology" which doesn't match the
entity contract; update the INJECT_TYPES set in
hooks/brainlayer-prompt-search.py (the INJECT_TYPES symbol) to mirror the
contract's allowed types by replacing "technology" with "concept" (or add
"concept" and remove "technology") so the branch will actually match entities
defined in contracts/entity-types.yaml.
- Around line 317-321: The connection sets PRAGMA busy_timeout=1000 which
exceeds the search hook DEADLINE_MS (450) and can block past the hook deadline;
change the busy_timeout to be safely below DEADLINE_MS (e.g., compute
busy_timeout = max(50, DEADLINE_MS - X) or a fixed value <= 400), update the
conn.execute("PRAGMA busy_timeout=...") call in the same block where conn and
db_path are used, and add an explicit fast-fail/log (using the existing logger)
if the computed busy_timeout would still exceed DEADLINE_MS so the hook fails
fast instead of silently missing its latency budget; keep PRAGMA query_only=true
as-is.
- Around line 403-409: The assumption-warning currently uses substring checks
(assume_detected = any(trigger in prompt_lower for trigger in ASSUME_TRIGGERS))
which causes false positives; change the detection to use whole-word matching
against prompt_lower (e.g., regex with word-boundaries or tokenized word set) so
triggers like "ram", "age", "tax" don't match inside other words—update the
logic that sets assume_detected (referencing ASSUME_TRIGGERS, prompt_lower, and
the assume_detected variable) to perform r'\b'+re.escape(trigger)+r'\b' searches
(or build a combined word-boundary regex) before calling
lines.append(...brain_search() reminder...), ensuring case-insensitive matching.
- Around line 241-259: record_injection_event() currently does a synchronous
sqlite3.connect with a 2s busy timeout and swallows sqlite3.Error, which blocks
UserPromptSubmit and can exceed the <500ms hook target; change it to perform a
non-blocking, best-effort write off the critical path (e.g., enqueue the event
to a background worker/thread or use asyncio.create_task) and reduce/remove the
connection busy timeout so it never stalls the hook; also stop silencing
errors—log exceptions from sqlite3 (including error details) so failures are
visible, and ensure you include or respect the timestamp column behavior
consistent with the production schema (either provide timestamp or rely on DB
DEFAULT) when preparing the INSERT in record_injection_event.

In `@hooks/brainlayer-session-start.py`:
- Around line 225-229: The DB open uses sqlite3.connect(..., timeout=1) and
PRAGMA busy_timeout=1000 which can block ~1s and exceed DEADLINE_MS (450);
change the connect timeout and PRAGMA busy_timeout to values well below
DEADLINE_MS (e.g., total wait <<450ms) so session-start cannot block past its
SLA, e.g., set timeout and busy_timeout to a small fraction of DEADLINE_MS
(and/or use immediate fail semantics), update the conn creation site and the
PRAGMA lines (the sqlite3.connect call and the conn.execute("PRAGMA
busy_timeout=...") calls) and add an explicit comment flagging this concurrency
risk as required by guidelines so reviewers see the intentional trade-off.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4a3f6d21-fab6-4a95-a487-ed3fa14903bb

📥 Commits

Reviewing files that changed from the base of the PR and between 58ea11c and 239366f.

📒 Files selected for processing (3)
  • hooks/brainlayer-prompt-search.py
  • hooks/brainlayer-session-start.py
  • tests/test_conditional_hooks.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

  • hooks/brainlayer-session-start.py
  • hooks/brainlayer-prompt-search.py
  • tests/test_conditional_hooks.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest for testing

Files:

  • tests/test_conditional_hooks.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/**/*.py : Implement session dedup coordination via `/tmp/brainlayer_session_{id}.json`, with SessionStart writing injected chunk_ids and UserPromptSubmit skipping already-injected
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/**/*.py : Implement session dedup coordination via `/tmp/brainlayer_session_{id}.json`, with SessionStart writing injected chunk_ids and UserPromptSubmit skipping already-injected

Applied to files:

  • hooks/brainlayer-session-start.py
  • hooks/brainlayer-prompt-search.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts

Applied to files:

  • hooks/brainlayer-session-start.py
  • hooks/brainlayer-prompt-search.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Implement retry logic on `SQLITE_BUSY` errors; each worker must use its own database connection

Applied to files:

  • hooks/brainlayer-session-start.py
📚 Learning: 2026-03-14T02:20:54.656Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior

Applied to files:

  • hooks/brainlayer-session-start.py
📚 Learning: 2026-03-29T23:19:50.743Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/**/*.py : All database connections must retry on `SQLITE_BUSY`; each worker uses its own connection

Applied to files:

  • hooks/brainlayer-session-start.py
  • hooks/brainlayer-prompt-search.py
📚 Learning: 2026-03-29T23:19:51.321Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Use `hooks/dedup_coordination.py` module for session dedup coordination; detect handoff sessions (prompts with 'handoff' or 'session-handoff') and skip auto-search

Applied to files:

  • hooks/brainlayer-prompt-search.py
🔇 Additional comments (3)
hooks/brainlayer-session-start.py (1)

291-299: Nice scoped Hebrew-style guidance.

Gating this on the resolved project/search term keeps the reminder targeted instead of adding global style noise to every session.

hooks/brainlayer-prompt-search.py (1)

139-143: Good call keeping 2-character keywords.

This restores high-signal terms like AI, PR, and T3 without reopening single-character query noise.

tests/test_conditional_hooks.py (1)

131-153: Helpful regression coverage on the changed hook behavior.

These cases pin the readonly PRAGMAs, short-token extraction, project/tool entity injection, and the new reminder-emission paths introduced here.

Also applies to: 178-204, 234-255

"""
# Entity types that warrant automatic context injection
INJECT_TYPES = {"person", "company", "agent"}
INJECT_TYPES = {"person", "company", "agent", "project", "technology", "tool"}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

technology will not match the current entity contract.

The provided contracts/entity-types.yaml only defines agent, person, tool, project, and concept. Unless the contract/schema changes in the same PR, this new branch silently matches nothing and the "technology" part of the feature is not actually delivered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` at line 169, INJECT_TYPES currently
includes "technology" which doesn't match the entity contract; update the
INJECT_TYPES set in hooks/brainlayer-prompt-search.py (the INJECT_TYPES symbol)
to mirror the contract's allowed types by replacing "technology" with "concept"
(or add "concept" and remove "technology") so the branch will actually match
entities defined in contracts/entity-types.yaml.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai This hook reads entity_type from existing rows in kg_entities — it doesn't create entities. The DB already stores technology, project, tool as entity_type values (72, 24, and 1 entities respectively). The entity contracts file defines the write-side schema; the hook is read-only. No action needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EtanHey, thanks for the clarification — that's a clear and important distinction. The contracts/entity-types.yaml governs the write-side schema (entity creation), while this hook only reads entity_type values from existing kg_entities rows. Since the DB already contains technology (72), project (24), and tool (1) entities, the expanded INJECT_TYPES set will match them correctly. My concern was misplaced — no action needed.


✏️ Learnings added
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:36.497Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/**/*.py : Implement session dedup coordination via `/tmp/brainlayer_session_{id}.json`, with SessionStart writing injected chunk_ids and UserPromptSubmit skipping already-injected

Comment on lines +241 to +259
def record_injection_event(db_path, session_id, prompt, chunk_ids, token_count):
"""Best-effort write of an injection event for BrainBar's live viewer."""
if not db_path or not session_id or not chunk_ids:
return

try:
conn = sqlite3.connect(db_path, timeout=2)
conn.execute("PRAGMA busy_timeout=2000")
conn.execute(
"""
INSERT INTO injection_events (session_id, query, chunk_ids, token_count)
VALUES (?, ?, ?, ?)
""",
(session_id, prompt[:1000], json.dumps(chunk_ids), token_count),
)
conn.commit()
conn.close()
except sqlite3.Error:
pass
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n -C2 '\binjection_events\b' .

Repository: EtanHey/brainlayer

Length of output: 4993


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the full schema in BrainDatabase.swift
sed -n '186,202p' ./brain-bar/Sources/BrainBar/BrainDatabase.swift

# Check lines 430-436 in the hook file
sed -n '430,436p' ./hooks/brainlayer-prompt-search.py

Repository: EtanHey/brainlayer

Length of output: 888


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if there are comments about the 500ms target in the hook file
rg -n "500|timeout|target" ./hooks/brainlayer-prompt-search.py | head -20

# Check the schema in test vs production to understand intent
echo "=== Test schema ==="
sed -n '208,217p' ./tests/test_conditional_hooks.py

echo -e "\n=== Python INSERT statement ==="
sed -n '248,255p' ./hooks/brainlayer-prompt-search.py

Repository: EtanHey/brainlayer

Length of output: 942


Synchronous DB write with 2s timeout violates the hook's <500ms performance target and silently swallows all errors.

record_injection_event() runs synchronously from UserPromptSubmit (lines 241-259 and 430-436), setting a 2s timeout on sqlite3.connect() plus PRAGMA busy_timeout=2000. When the main database is locked, this blocks hook execution for up to 2 seconds, far exceeding the documented <500ms total target at line 12. All sqlite3.Error exceptions are silently caught and discarded, making failures invisible and hindering diagnosis.

The injection_events schema does exist in brain-bar/Sources/BrainBar/BrainDatabase.swift:186-202 with a production-ready CREATE TABLE IF NOT EXISTS. However, the Python code omits the timestamp column present in the real schema; this works only because the Swift schema provides a DEFAULT, but indicates schema drift between implementations.

Per coding guidelines: "brain_digest is write-heavy and must not run in parallel with other MCP work." This synchronous write from within the hook violates that concurrency constraint. Consider making event recording asynchronous or removing it from the critical path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 241 - 259,
record_injection_event() currently does a synchronous sqlite3.connect with a 2s
busy timeout and swallows sqlite3.Error, which blocks UserPromptSubmit and can
exceed the <500ms hook target; change it to perform a non-blocking, best-effort
write off the critical path (e.g., enqueue the event to a background
worker/thread or use asyncio.create_task) and reduce/remove the connection busy
timeout so it never stalls the hook; also stop silencing errors—log exceptions
from sqlite3 (including error details) so failures are visible, and ensure you
include or respect the timestamp column behavior consistent with the production
schema (either provide timestamp or rely on DB DEFAULT) when preparing the
INSERT in record_injection_event.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Two points here:

  1. injection_events table: if it doesn't exist, the INSERT fails and the except sqlite3.Error: pass catches it silently. This function is explicitly best-effort — see the comment. No action needed for table existence.

  2. busy_timeout=1000: This is how long SQLite waits for a BUSY lock, not total connection time. When no other writer holds a lock (normal case), connection is instant (<1ms). The 450ms DEADLINE_MS guards the search phase separately. However, we are fixing the conn.close() leak with try/finally — that's a real improvement.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EtanHey, both points are well-taken:

  1. Silent failure — Agreed. For a best-effort telemetry write, silently swallowing sqlite3.Error (including "no such table") is the right design. No action needed.

  2. busy_timeout vs. connection timeout — Correct. sqlite3.connect(timeout=2) only applies to the initial file-open on WAL contention; once the file is open, PRAGMA busy_timeout=1000 governs lock-wait on individual statements. In the no-contention case this is effectively free. The DEADLINE_MS guard on the search phase handles the latency-sensitive path independently.

The one actionable item is the conn.close() leak you already flagged. A with statement (context manager) is the cleanest fix:

def record_injection_event(db_path, session_id, prompt, chunk_ids, token_count):
    """Best-effort write of an injection event for BrainBar's live viewer."""
    if not db_path or not session_id or not chunk_ids:
        return

    try:
        with sqlite3.connect(db_path, timeout=2) as conn:
            conn.execute("PRAGMA busy_timeout=2000")
            conn.execute(
                """
                INSERT INTO injection_events (session_id, query, chunk_ids, token_count)
                VALUES (?, ?, ?, ?)
                """,
                (session_id, prompt[:1000], json.dumps(chunk_ids), token_count),
            )
            # conn.commit() is called automatically by the context manager on success
    except sqlite3.Error:
        pass

Note: sqlite3.connect used as a context manager auto-commits on success and auto-rolls-back on exception, so the explicit conn.commit() can also be dropped. The connection is always closed when the with block exits.


✏️ Learnings added
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:241-259
Timestamp: 2026-04-04T15:21:59.853Z
Learning: In `hooks/brainlayer-prompt-search.py` (Python), `record_injection_event()` is explicitly best-effort telemetry: silent `except sqlite3.Error: pass` is intentional — table non-existence or lock failures are acceptable silent failures. `sqlite3.connect(timeout=2)` is the file-open timeout; `PRAGMA busy_timeout` governs per-statement lock-wait. The `DEADLINE_MS` (450ms) guard applies only to the FTS search phase, not to this side-channel write.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/**/*.py : Implement session dedup coordination via `/tmp/brainlayer_session_{id}.json`, with SessionStart writing injected chunk_ids and UserPromptSubmit skipping already-injected

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Implement retry logic on `SQLITE_BUSY` errors; each worker must use its own database connection

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/**/*.py : All database connections must retry on `SQLITE_BUSY`; each worker uses its own connection

Comment on lines +317 to 321
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=1)
# AIDEV-NOTE: Do NOT set journal_mode=WAL on readonly — it requires a write lock.
# WAL is already set by the Python writer. Just set query_only for safety.
conn.execute("PRAGMA busy_timeout=1000")
conn.execute("PRAGMA query_only=true")
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The readonly wait is longer than the search hook's whole deadline.

DEADLINE_MS is 450, but this connection can wait 1000ms before the search phase even starts. On a busy DB, the hook misses its own latency budget and then exits silently with no context.

As per coding guidelines: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 317 - 321, The connection
sets PRAGMA busy_timeout=1000 which exceeds the search hook DEADLINE_MS (450)
and can block past the hook deadline; change the busy_timeout to be safely below
DEADLINE_MS (e.g., compute busy_timeout = max(50, DEADLINE_MS - X) or a fixed
value <= 400), update the conn.execute("PRAGMA busy_timeout=...") call in the
same block where conn and db_path are used, and add an explicit fast-fail/log
(using the existing logger) if the computed busy_timeout would still exceed
DEADLINE_MS so the hook fails fast instead of silently missing its latency
budget; keep PRAGMA query_only=true as-is.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Same as above — busy_timeout=1000 is lock-wait time, not total. Normal case (no concurrent writer) connects in <1ms. The DEADLINE_MS=450 guards the search phase separately via elapsed_ms(start) < DEADLINE_MS checks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EtanHey — fair point, and you're right on the distinction. busy_timeout is a lock-contention fallback; in WAL mode a mode=ro connection never contends with writers, so the 1000ms ceiling is essentially unreachable on the normal path. The elapsed_ms(start) < DEADLINE_MS guards independently protect each search phase from a slow normal execution.

My concern was valid to flag per the concurrency guideline, but your explanation resolves it. Acknowledged.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Implement retry logic on `SQLITE_BUSY` errors; each worker must use its own database connection

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Applies to **/*.py : Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:50.743Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/hooks/**/*.py : Implement session dedup coordination via `/tmp/brainlayer_session_{id}.json`, with SessionStart writing injected chunk_ids and UserPromptSubmit skipping already-injected

Comment thread hooks/brainlayer-prompt-search.py
Comment on lines +225 to 229
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=1)
# AIDEV-NOTE: WAL already set by Python writer. Don't set on readonly — needs write lock.
conn.execute("PRAGMA busy_timeout=1000")
conn.execute("PRAGMA query_only=true")
except sqlite3.Error:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The DB wait here is longer than the hook's whole deadline.

DEADLINE_MS is 450, but timeout=1 / busy_timeout=1000 lets this path block for about 1s before any query runs. On a contended DB, session-start now misses its own SLA by design and then falls back to no injected context.

As per coding guidelines: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-session-start.py` around lines 225 - 229, The DB open uses
sqlite3.connect(..., timeout=1) and PRAGMA busy_timeout=1000 which can block ~1s
and exceed DEADLINE_MS (450); change the connect timeout and PRAGMA busy_timeout
to values well below DEADLINE_MS (e.g., total wait <<450ms) so session-start
cannot block past its SLA, e.g., set timeout and busy_timeout to a small
fraction of DEADLINE_MS (and/or use immediate fail semantics), update the conn
creation site and the PRAGMA lines (the sqlite3.connect call and the
conn.execute("PRAGMA busy_timeout=...") calls) and add an explicit comment
flagging this concurrency risk as required by guidelines so reviewers see the
intentional trade-off.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant