Skip to content

Optimize hook injection fallback and filtering#232

Merged
EtanHey merged 1 commit into
mainfrom
feat/hook-injection-optimization
Apr 10, 2026
Merged

Optimize hook injection fallback and filtering#232
EtanHey merged 1 commit into
mainfrom
feat/hook-injection-optimization

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 10, 2026

Summary

  • filter entity-card kg_entity_chunks reads to exclude co_occurs_with links when the column exists
  • truncate injected snippets at the last sentence boundary before the character limit instead of hard-cutting mid-sentence
  • inject a low-confidence fallback message when the hook would otherwise add no memories

Test plan

  • Pre-change: pytest tests/test_hook_slim.py tests/test_follow_up_rewrite.py -q
  • Red phase: pytest tests/test_hook_slim.py tests/test_adaptive_injection.py tests/test_prompt_classification.py -q
  • Post-change: pytest tests/test_hook_slim.py tests/test_adaptive_injection.py tests/test_prompt_classification.py tests/test_follow_up_rewrite.py -q

Summary by CodeRabbit

  • New Features

    • Low-confidence search results now display a helpful fallback message when relevance scores fall below threshold.
    • Improved snippet truncation at sentence boundaries for better readability.
    • Enhanced result filtering to exclude irrelevant relation types.
  • Tests

    • Added comprehensive test coverage for low-confidence fallback scenarios, sentence-boundary truncation, and noise filtering.

Note

Optimize hook injection fallback and filter co_occurs_with relations from entity chunk retrieval

  • Adds a low-confidence fallback message (guidance to use brain_search()) when search results are empty or all scores fall below a 0.30 threshold via build_low_confidence_fallback in brainlayer-prompt-search.py
  • Filters out co_occurs_with relation type rows from get_entity_chunks queries when the kg_entity_chunks table schema includes a relation_type column; schema detection is cached per-connection
  • Improves snippet truncation to cut at the last sentence boundary ([.!?]) within the limit rather than splitting at an arbitrary space
  • Behavioral Change: entity chunk results now exclude co_occurs_with-linked chunks by default when the DB schema supports it

Macroscope summarized 450f654.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

This PR enhances the prompt search mechanism with four features: low-confidence fallback messaging when search relevance scores fall below a threshold, improved snippet truncation at sentence boundaries using regex detection, schema-aware filtering of KG entity chunks to exclude co-occurrence relations, and corresponding test coverage.

Changes

Cohort / File(s) Summary
Production Feature Implementation
hooks/brainlayer-prompt-search.py
Added low-confidence fallback support with threshold and message builder; introduced fallback row capture during search; replaced separator-based truncation with sentence-end regex detection ([.!?]); implemented schema-aware KG entity chunk filtering to conditionally exclude relation_type='co_occurs_with' rows; switched to f-string SQL construction for optional relation filtering.
Test Coverage - Low Confidence Fallback
tests/test_adaptive_injection.py
Added three unit tests for build_low_confidence_fallback() covering empty rows, sub-threshold relevance scores (0.29), and at/above-threshold scores (0.30).
Test Coverage - Sentence Truncation
tests/test_hook_slim.py
Added test verifying truncate() with max_chars=55 truncates at the last sentence boundary before the limit, producing expected output ending with four dots.
Test Fixtures & Helpers
tests/test_prompt_classification.py
Updated make_hook_db() to add relation_type column to kg_entity_chunks and insert co-occurrence test data; patched get_db_path in command/casual-chat tests to bypass DB access; tightened entity-route assertion to verify co-occurrence noise is filtered from output.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SearchEngine
    participant Database
    participant RelationFilter
    participant Truncator
    participant FallbackHandler
    
    Client->>SearchEngine: Initiate search
    SearchEngine->>Database: Retrieve candidate rows with RRF scores
    Database-->>SearchEngine: Return filtered rows
    SearchEngine->>RelationFilter: Check KG entity chunks for relation_type
    RelationFilter->>Database: Query schema for relation_type column
    Database-->>RelationFilter: Schema info
    RelationFilter-->>SearchEngine: Exclude co_occurs_with relations
    SearchEngine->>Truncator: Build snippets with truncation
    Truncator->>Truncator: Find last sentence boundary [.!?]
    Truncator-->>SearchEngine: Return truncated snippets
    SearchEngine->>FallbackHandler: Check if context lines produced
    alt No context lines & low confidence
        FallbackHandler->>FallbackHandler: Check RRF score vs 0.30 threshold
        FallbackHandler-->>SearchEngine: Return fallback message
    else Score above threshold or lines produced
        FallbackHandler-->>SearchEngine: Return None
    end
    SearchEngine-->>Client: Return final results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 Searching through the knowledge warren,
We trimmed the tales at sentence's end,
Filtered noise with relation's warden,
And built a fallback-confidence blend! 🔍✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Optimize hook injection fallback and filtering' accurately reflects the three main changes: low-confidence fallback support, schema-aware KG entity chunk filtering, and improved snippet truncation at sentence boundaries.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/hook-injection-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 10, 2026

@codex review
@cursor @BugBot review
@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

@EtanHey, I'll review the changes in this PR right away!


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:16:13.883Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/brainlayer-prompt-search.py`:
- Around line 1244-1248: The current logic unconditionally calls
build_low_confidence_fallback(fallback_rows) when lines is empty, but
fallback_rows is only populated for the FTS paths (knowledge_question,
follow_up, hebrew_query) and not for entity_lookup; change the condition so the
low-confidence fallback is only built/added when fallback_rows is non-empty or
when the classification is one of the FTS branches (e.g., knowledge_question,
follow_up, hebrew_query), and for entity_lookup either skip building that
fallback or implement a separate entity-specific fallback; update the block
referencing fallback_rows, build_low_confidence_fallback, and the entity_lookup
classification to guard the fallback call accordingly.
- Around line 737-753: The fallback_rows creation currently hardcodes
rrf_score=0.0 which makes build_low_confidence_fallback() always treat the top
row as low-confidence; either compute a real relevance/rrf score for those
synthetic rows or stop setting rrf_score at all so
build_low_confidence_fallback() won't see a numeric score and won't auto-trigger
fallback. Update the code that builds fallback_rows (the block that sets
"rrf_score: 0.0") to either calculate a proper relevance value or omit/set
rrf_score to None, and keep
build_low_confidence_fallback(top_row.get("relevance") /
top_row.get("rrf_score")) logic unchanged.
- Around line 434-448: The function _kg_entity_chunks_has_relation_type
currently defensively checks for a future relation_type column and caches the
result in _KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE; update the codebase by either
(A) adding a brief docstring above _kg_entity_chunks_has_relation_type
explaining this is forward-compatible support for an expected relation_type
column (and note where/when it will be added, e.g., vector_store.py schema
changes), or (B) if relation_type is not planned, remove the check and related
test usage and update test_prompt_classification to match production schema;
reference the function name _kg_entity_chunks_has_relation_type and the cache
symbol _KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE when making the change so reviewers
can locate and verify the update.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 45d2285c-7134-4dd0-a008-dafa5a534392

📥 Commits

Reviewing files that changed from the base of the PR and between 1b5d970 and 450f654.

📒 Files selected for processing (4)
  • hooks/brainlayer-prompt-search.py
  • tests/test_adaptive_injection.py
  • tests/test_hook_slim.py
  • tests/test_prompt_classification.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_adaptive_injection.py
  • tests/test_hook_slim.py
  • tests/test_prompt_classification.py
  • hooks/brainlayer-prompt-search.py
🧠 Learnings (3)
📚 Learning: 2026-04-04T15:21:39.570Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Applied to files:

  • tests/test_prompt_classification.py
📚 Learning: 2026-04-04T15:22:02.740Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:241-259
Timestamp: 2026-04-04T15:22:02.740Z
Learning: In `hooks/brainlayer-prompt-search.py` (Python), `record_injection_event()` is explicitly best-effort telemetry: silent `except sqlite3.Error: pass` is intentional — table non-existence or lock failures are acceptable silent failures. `sqlite3.connect(timeout=2)` is the file-open timeout; `PRAGMA busy_timeout` governs per-statement lock-wait. The `DEADLINE_MS` (450ms) guard applies only to the FTS search phase, not to this side-channel write.

Applied to files:

  • hooks/brainlayer-prompt-search.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history

Applied to files:

  • hooks/brainlayer-prompt-search.py
🔇 Additional comments (9)
hooks/brainlayer-prompt-search.py (3)

43-47: LGTM: Well-defined constants for fallback behavior.

The threshold of 0.30 for low-confidence fallback is clearly documented, and the message appropriately guides users to use brain_search() for deeper retrieval. The module-level cache for relation_type column existence is a reasonable optimization.


402-411: LGTM: Improved sentence-boundary truncation.

The regex-based approach [.!?](?=(?:\s|$)) correctly identifies sentence-ending punctuation followed by whitespace or end-of-string. Using finditer to find the last match before the limit ensures clean truncation at natural boundaries. Falls back to word-boundary split if no sentence end is found.


640-663: LGTM: Safe conditional SQL construction.

The f-string usage here is acceptable since relation_filter is a hardcoded constant, not user input. The COALESCE(ec.relation_type, '') != 'co_occurs_with' correctly handles NULL values, ensuring rows without a relation_type are included.

tests/test_adaptive_injection.py (1)

157-175: LGTM: Comprehensive tests for fallback behavior.

The three test cases properly cover:

  1. Empty input → fallback message
  2. Below-threshold score (0.29) → fallback message
  3. At-threshold score (0.30) → no fallback (returns None)

The tests correctly exercise the boundary at 0.30 and validate the return type contract.

tests/test_prompt_classification.py (4)

38-47: LGTM: Test schema includes relation_type to validate filtering.

The test database schema intentionally includes the relation_type column that doesn't exist in the current production schema. This correctly tests the forward-compatible filtering behavior introduced in get_entity_chunks().


94-111: Effective test for co-occurrence filtering.

The test data is well-designed: chunk-theo-noise has higher relevance (0.99) than the legitimate chunk (0.9), ensuring the test validates that filtering by relation_type='co_occurs_with' works correctly rather than just relying on ordering.


165-179: LGTM: Skip tests correctly isolate DB access.

Patching get_db_path to return None ensures finalize_and_exit(mode="skip") is called before any database connection attempt, correctly validating that command and casual prompts bypass retrieval entirely.


197-199: LGTM: Assertion validates co-occurrence filtering in entity route.

The added assertion assert "co-occurrence noise" not in output directly validates that the co_occurs_with filtering excludes the noise chunk despite its higher relevance score.

tests/test_hook_slim.py (1)

58-66: LGTM: Test validates sentence-boundary truncation.

This test correctly validates the new truncation behavior:

  • Input has three sentences with the third extending past the 55-char limit
  • The function correctly truncates at the last sentence boundary (position 42) before the limit
  • Output includes the trailing "..." resulting in the expected "...sentence...." (4 dots total)

Comment on lines +434 to +448
def _kg_entity_chunks_has_relation_type(conn):
cache_key = _get_connection_cache_key(conn)
cached = _KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE.get(cache_key)
if cached is not None:
return cached

try:
columns = {row[1] for row in conn.execute("PRAGMA table_info(kg_entity_chunks)").fetchall()}
except sqlite3.Error:
columns = set()

has_column = "relation_type" in columns
_KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE[cache_key] = has_column
return has_column

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if relation_type column exists in kg_entity_chunks schema definition
rg -n "kg_entity_chunks" --type py -A 10 | head -60

# Check if there's a migration adding relation_type to kg_entity_chunks
rg -n "relation_type" --type py -C 3 | grep -i "kg_entity_chunks" | head -20

Repository: EtanHey/brainlayer

Length of output: 4918


🏁 Script executed:

# Find and examine the actual kg_entity_chunks schema definition in vector_store.py
rg -n "CREATE TABLE kg_entity_chunks" --type py -A 20

# Also check for any ALTER TABLE statements that might add relation_type
rg -n "ALTER TABLE kg_entity_chunks" --type py -A 5

Repository: EtanHey/brainlayer

Length of output: 4124


🏁 Script executed:

# Find the initial CREATE TABLE for kg_entity_chunks in vector_store.py
rg -n "CREATE TABLE kg_entity_chunks" src/brainlayer/vector_store.py -A 10

# Also search broadly for relation_type in production code (not tests)
rg -n "relation_type" src/brainlayer/ --type py

Repository: EtanHey/brainlayer

Length of output: 8273


🏁 Script executed:

# Look around the ALTER TABLE statements to find initial CREATE TABLE
sed -n '600,900p' src/brainlayer/vector_store.py | cat -n | grep -E "(CREATE TABLE|kg_entity_chunks|ALTER TABLE)" -A 3 -B 3

# Also try direct search with broader context
rg -n "CREATE TABLE" src/brainlayer/vector_store.py | grep -i entity

Repository: EtanHey/brainlayer

Length of output: 6993


🏁 Script executed:

# Check how _kg_entity_chunks_has_relation_type is used in the hook
rg -n "_kg_entity_chunks_has_relation_type" hooks/brainlayer-prompt-search.py -B 2 -A 8

# Look at line 644 context mentioned in initial results
sed -n '640,660p' hooks/brainlayer-prompt-search.py

Repository: EtanHey/brainlayer

Length of output: 1691


🏁 Script executed:

# Search for TODOs, FIXMEs, or comments about relation_type and kg_entity_chunks
rg -n "TODO|FIXME|XXX" hooks/brainlayer-prompt-search.py | head -20

# Check if there are any schema migration files or pending work
find . -name "*migrat*" -type f 2>/dev/null | head -10

# Search for any comments mentioning this forward compatibility
rg -n "relation_type.*kg_entity_chunks|kg_entity_chunks.*relation_type" --type py -B 3 -A 3

Repository: EtanHey/brainlayer

Length of output: 2170


🏁 Script executed:

# Check the migrate.py file for relation_type migration
cat src/brainlayer/migrate.py | head -100

# Also search for any migration or TODO comments about this
rg -n "relation_type" src/brainlayer/migrate.py -B 3 -A 3

Repository: EtanHey/brainlayer

Length of output: 3299


🏁 Script executed:

# Search for any schema evolution or chunk lifecycle plans
rg -n "relation_type|lifecycle|archived_at" src/brainlayer/ --type py | grep -E "(mention_type|archived|superseded)" | head -20

# Check if relation_type filtering is actually needed by looking at where co_occurs_with is used
rg -n "co_occurs_with" src/brainlayer/ --type py | head -15

Repository: EtanHey/brainlayer

Length of output: 2885


Document forward-compatible schema detection or clarify the intended timeline for relation_type in kg_entity_chunks.

The defensive check is sound—the relation_type column doesn't exist in kg_entity_chunks in production (created at vector_store.py:81-89 with only entity_id, chunk_id, relevance), and co_occurs_with filtering happens at the kg_relations level, not here. The function gracefully handles this by returning an empty filter string when the column is absent, so no filtering occurs in production.

However, the test schema in test_prompt_classification.py:40-45 includes relation_type in kg_entity_chunks, suggesting this may be planned. Either document this as intentional forward-compatibility or clarify whether and when relation_type will be added to the production schema.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 434 - 448, The function
_kg_entity_chunks_has_relation_type currently defensively checks for a future
relation_type column and caches the result in
_KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE; update the codebase by either (A) adding
a brief docstring above _kg_entity_chunks_has_relation_type explaining this is
forward-compatible support for an expected relation_type column (and note
where/when it will be added, e.g., vector_store.py schema changes), or (B) if
relation_type is not planned, remove the check and related test usage and update
test_prompt_classification to match production schema; reference the function
name _kg_entity_chunks_has_relation_type and the cache symbol
_KG_ENTITY_CHUNKS_RELATION_TYPE_CACHE when making the change so reviewers can
locate and verify the update.

Comment on lines +737 to +753
def build_low_confidence_fallback(rows):
if not rows:
return LOW_CONFIDENCE_FALLBACK_MESSAGE

top_row = rows[0]
if isinstance(top_row, dict):
relevance = top_row.get("relevance")
if relevance is None:
relevance = top_row.get("rrf_score")
else:
relevance = None

if relevance is not None and relevance < LOW_CONFIDENCE_FALLBACK_THRESHOLD:
return LOW_CONFIDENCE_FALLBACK_MESSAGE

return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: fallback_rows always have rrf_score=0.0, causing fallback to always trigger.

The fallback_rows constructed at lines 1218-1229 hardcode rrf_score: 0.0 for all rows. When build_low_confidence_fallback() checks the top row's relevance/rrf_score, it will always find 0.0 < 0.30, causing the fallback message to always be emitted when not lines is true.

If the intent is to use the actual FTS rank or relevance score, you need to compute a meaningful score. If the intent is to always show the fallback when no search results were injected, the threshold check is unnecessary.

🔧 Possible fix if intent is to always emit fallback when no lines
 def build_low_confidence_fallback(rows):
     if not rows:
         return LOW_CONFIDENCE_FALLBACK_MESSAGE
-
-    top_row = rows[0]
-    if isinstance(top_row, dict):
-        relevance = top_row.get("relevance")
-        if relevance is None:
-            relevance = top_row.get("rrf_score")
-    else:
-        relevance = None
-
-    if relevance is not None and relevance < LOW_CONFIDENCE_FALLBACK_THRESHOLD:
-        return LOW_CONFIDENCE_FALLBACK_MESSAGE
-
-    return None
+    # Always return fallback when called (caller determines when to call)
+    return LOW_CONFIDENCE_FALLBACK_MESSAGE

Alternatively, compute actual relevance scores in fallback_rows instead of hardcoding 0.0.

Also applies to: 1218-1229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 737 - 753, The fallback_rows
creation currently hardcodes rrf_score=0.0 which makes
build_low_confidence_fallback() always treat the top row as low-confidence;
either compute a real relevance/rrf score for those synthetic rows or stop
setting rrf_score at all so build_low_confidence_fallback() won't see a numeric
score and won't auto-trigger fallback. Update the code that builds fallback_rows
(the block that sets "rrf_score: 0.0") to either calculate a proper relevance
value or omit/set rrf_score to None, and keep
build_low_confidence_fallback(top_row.get("relevance") /
top_row.get("rrf_score")) logic unchanged.

Comment on lines +1244 to +1248
if not lines:
fallback = build_low_confidence_fallback(fallback_rows)
if fallback:
lines.append(fallback)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Logic concern: Fallback only applies to FTS-based paths, not entity lookup.

The fallback_rows variable is only populated in the knowledge_question, follow_up, and hebrew_query branches. For entity_lookup classification, fallback_rows remains empty, so build_low_confidence_fallback([]) would return the fallback message even if entity cards were successfully injected but lines was empty for other reasons.

Consider whether this is the intended behavior or if entity lookup should have its own fallback handling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/brainlayer-prompt-search.py` around lines 1244 - 1248, The current
logic unconditionally calls build_low_confidence_fallback(fallback_rows) when
lines is empty, but fallback_rows is only populated for the FTS paths
(knowledge_question, follow_up, hebrew_query) and not for entity_lookup; change
the condition so the low-confidence fallback is only built/added when
fallback_rows is non-empty or when the classification is one of the FTS branches
(e.g., knowledge_question, follow_up, hebrew_query), and for entity_lookup
either skip building that fallback or implement a separate entity-specific
fallback; update the block referencing fallback_rows,
build_low_confidence_fallback, and the entity_lookup classification to guard the
fallback call accordingly.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 450f654bd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1244 to +1245
if not lines:
fallback = build_low_confidence_fallback(fallback_rows)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Base fallback emission on retrieval state, not rendered lines

In main(), the low-confidence message is gated by if not lines, but lines is a presentation buffer rather than a retrieval signal. After dedup, filtered_rows can be empty even when the original query returned relevant chunks that were already injected, so this branch emits “No high-confidence memories found” for repeated/follow-up prompts where memories do exist. That is misleading and can spam fallback guidance in active sessions; the condition should be tied to actual search outcome/new-memory availability rather than whether any text was appended to lines.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 10, 2026

@coderabbitai Learning: co_occurs_with is filtered at SQL-time because it represents 85.5% of KG relations and is same-chunk co-occurrence noise, not a semantic relationship. Entity cards should surface real edges (DEPENDS_ON, WORKS_AT, MERGED_BY, etc.), not the bag-of-words baseline. Please learn this pattern for future reviews.

@EtanHey EtanHey merged commit f3a4bee into main Apr 10, 2026
6 checks passed
@EtanHey EtanHey deleted the feat/hook-injection-optimization branch April 10, 2026 23:27
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Tip

For best results, initiate chat on the files or code changes.

@EtanHey, understood — I've recorded this for future reviews. The key principle is that co_occurs_with is intentionally excluded at SQL-time in get_entity_chunks() because it constitutes ~85.5% of KG relations and represents same-chunk co-occurrence noise (a bag-of-words baseline), not a genuine semantic relationship. Entity card injection should only surface meaningful edges like DEPENDS_ON, WORKS_AT, MERGED_BY, etc.

I'll flag any future code that inadvertently re-introduces co_occurs_with rows into entity card results.


✏️ Learnings added
Learnt from: EtanHey
URL: https://github.com/EtanHey/brainlayer/pull/232

Timestamp: 2026-04-10T23:28:04.186Z
Learning: In `hooks/brainlayer-prompt-search.py` (EtanHey/brainlayer repo), `co_occurs_with` relation type is filtered out at SQL-time in `get_entity_chunks()` using a `relation_filter` when the `relation_type` column exists on `kg_entity_chunks`. This is intentional: `co_occurs_with` represents ~85.5% of KG relations and is same-chunk co-occurrence noise (bag-of-words baseline), not a semantic relationship. Entity cards should only surface real semantic edges such as DEPENDS_ON, WORKS_AT, MERGED_BY, etc. Do not flag this SQL exclusion as a bug or suggest removing it in future reviews.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:16:13.883Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Session dedup coordination: SessionStart hook writes injected chunk_ids to `/tmp/brainlayer_session_{id}.json`; UserPromptSubmit hook skips already-injected chunks; skip auto-search on handoff prompts

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/hooks/dedup_coordination.py : Use `/tmp/brainlayer_session_{id}.json` for SessionStart and UserPromptSubmit hook coordination; SessionStart writes injected chunk_ids, UserPromptSubmit skips already-injected; detect handoffs with 'handoff' or 'session-handoff' keywords

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant