feat: eval suite + entity injection in prompt hook#72
Conversation
Phase 0 — Baselines:
- tests/test_eval_baselines.py: 23-case eval suite across 8 domains
(entity routing, tag filter, recency, Hebrew FTS, cross-project,
decision retrieval, memory, mined real queries)
- tests/eval_baselines.json: recorded baseline results
- scripts/run_evals.py: CLI runner for before/after comparison
- tests/conftest.py: register `live` pytest mark
Phase A — Entity routing in prompt hook:
- hooks/brainlayer-prompt-search.py: detect known entity names
(person, company, agent) in user prompt → inject [Entity: Name — type]
section + linked chunks before FTS results
- Possessive stripping ("Simon's" → "Simon") for bigram matching
- Filter: only person/company/agent types (skip technology/concept noise)
Before/After scores (run: python tests/test_eval_baselines.py):
- brain_search quality: 94.7% (18/19) — unchanged (already good)
- hook entity injection: 25% → 100% (3 tests now pass)
- combined: 82.6% → 95.7% (+13.1pp)
Known gaps (xfail):
- Hebrew semantic accuracy (query returns unrelated Hebrew content)
- "today" temporal awareness in raw hybrid_search
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@cursorbot review this PR please |
|
@cursor-bugbot review |
🐛 Bug Review CompleteI've analyzed all 5 changed files and found 8 issues (2 critical, 2 medium, 4 minor). Here's the breakdown: 🔴 CRITICAL — Must Fix Before MergeBug #1: Possessive Name Truncation Names ending with # Current behavior:
"Simon's" → "Simon" ✓ correct
"James'" → "Jame" ✗ WRONG
"Jones'" → "Jone" ✗ WRONGRoot cause: Fix: # Line 147 — change regex to only match 's, not just '
if re.search(r"'s$", w): # Remove the ? quantifier
cleaned = cleaned[:-1]Bug #2: FTS5 Query Injection via Unescaped Quotes Keywords containing # Example that crashes:
keywords = ['test', 'with"quote']
fts_query = " OR ".join(f'"{kw}"' for kw in keywords)
# Result: "test" OR "with"quote" ← syntax errorFix option 1 (safer — strip quotes during extraction): # Line 102 — modify keyword extraction regex
words = re.findall(r"[a-zA-Z0-9][\w-]*", text.lower())
# Add after line 110:
keywords = [w.replace('"', '') for w in keywords if w not in STOP_WORDS and len(w) > 2 and w not in seen]Fix option 2 (escape quotes in FTS query): # Line 248
fts_query = " OR ".join(f'"{kw.replace(chr(34), chr(34)*2)}"' for kw in keywords)🟡 MEDIUM — Should FixBug #3: Timeout Budget Not Enforced Two independent # Entity detection: 250ms ✓ (under 450ms)
# FTS search: 300ms ✓ (under 450ms)
# Total: 550ms ✗ (exceeds <500ms target)Fix: Calculate remaining budget: # After line 265
if elapsed_ms(start) < DEADLINE_MS:
remaining = DEADLINE_MS - elapsed_ms(start)
if remaining < 100: # Need at least 100ms for FTS
conn.close()
if lines:
print("\n".join(lines))
sys.exit(0)Bug #4: Hardcoded Date in Recency Test Test checks for Fix: # Replace lines 188-189
from datetime import date, timedelta
today = date.today()
yesterday = today - timedelta(days=1)
if today.isoformat() in date_str or yesterday.isoformat() in date_str:
recent_found = True🟢 MINOR — Nice to FixIssue #5: Linter Violations ruff check scripts/run_evals.py
# F541: f-string without any placeholders (lines 59, 72)Fix: Run Issue #6: Capitalization Filter Too Restrictive Current filter misses 3-char entities and includes sentence starters: len(w) >= 4 and w[0].isupper() and not w.isupper()
# Misses: "Avi", "IBM", "AWS" (len < 4)
# Includes: "What", "Tell", "This" (sentence starters)Recommendation: Lower to 3 chars and check against capitalized stop words list. Issue #7: Silent Error Swallowing
Recommendation: Add optional debug logging: # After imports
DEBUG = os.environ.get("BRAINLAYER_HOOK_DEBUG")
# In except blocks
except sqlite3.Error as e:
if DEBUG:
sys.stderr.write(f"Entity detection failed: {e}\n")
passIssue #8: Fragile Negative Assertion Complex assertion could pass for wrong reasons: assert _passes(docs, ["auth", "JWT"]) and not _passes(docs, ["hybrid_search", "VectorStore"])If BrainLayer code mentions auth (e.g., "VectorStore auth methods"), test fails incorrectly. Already marked ✅ Verified Safe
📋 Action ItemsPriority order:
Would you like me to apply these fixes? |


Summary
brainlayer-prompt-search.pynow detects known entity names (person, company, agent types) in the user's prompt and injects[Entity: Name — type]section + linked chunks before FTS results.Before/After
What Changed
tests/test_eval_baselines.py— 23 pytest test cases,@pytest.mark.live, test the real production DB. Known gaps marked@pytest.mark.xfail. Run withpytest -m liveorpython tests/test_eval_baselines.py.scripts/run_evals.py— CLI runner:python scripts/run_evals.py --diffcompares to saved baseline.hooks/brainlayer-prompt-search.py— Entity detection additions:detect_entities_in_prompt(): checks bigrams + single words (4+ chars, capitalized) againstkg_entitiesWHEREentity_type IN (person, company, agent)kg_entity_chunkstests/eval_baselines.json— Committed baseline. Future runs compare against this.Test Plan
pytest tests/test_eval_baselines.py -q— 22 pass, 2 xfailed, 1 xpassedpython tests/test_eval_baselines.py— prints before/after scores[Entity: Avi Simon — person]in first line🤖 Generated with Claude Code