Skip to content

Fix FTS5 recall for long multi-keyword searches#152

Merged
EtanHey merged 1 commit into
mainfrom
fix/fts5-long-query-recall
Mar 30, 2026
Merged

Fix FTS5 recall for long multi-keyword searches#152
EtanHey merged 1 commit into
mainfrom
fix/fts5-long-query-recall

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Mar 30, 2026

Summary

  • switch FTS5 auto query escaping to use implicit AND for 1-3 terms and OR for 4+ terms
  • add a regression test proving hybrid_search() returns a result for owner profile career work history years experience when semantic search contributes nothing
  • update helper expectations so long multi-word searches no longer collapse to zero lexical matches

Test plan

  • pytest tests/test_search_gaps.py
  • pytest tests/test_search_gaps.py -k 'fts5_long_query_uses_or or hybrid_search_long_query_uses_fts_or_and_returns_results'
  • pytest tests/test_phase6_critical.py -q (fails in unrelated compact-format test)
  • pytest tests/test_enrichment_controller.py -q (fails in unrelated enrichment-controller tests)
  • pytest tests/test_vector_store.py -q (errors due apsw.BusyError: database is locked on live DB fixture)
  • pytest tests/ (repo-wide suite has many pre-existing unrelated failures/errors outside this change)

Notes

  • Full repo tests are not currently green in this environment, so this PR is scoped to the FTS fix plus regression coverage.
  • Unrelated local worktree changes in brain-bar/build-app.sh and 2026-03-29-174250-read-collab-gitsorchestratorcollabphase5-har.txt were intentionally left out.

Note

Fix FTS5 recall for long multi-keyword searches by switching to OR matching for 4+ terms

  • In auto mode, _escape_fts5_query now joins quoted terms with OR when there are 4 or more terms, instead of implicit AND (space-separated). Queries with 3 or fewer terms retain AND behavior.
  • Behavioral Change: auto mode results for long queries will broaden — documents matching any term are returned instead of only documents matching all terms.

Macroscope summarized 5f951a4.

Summary by CodeRabbit

  • Bug Fixes
    • Improved search recall for long multi-word queries (4+ terms) to help you find more relevant results.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 30, 2026

@codex review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 30, 2026

@greptileai review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 30, 2026

@cursor @BugBot review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Mar 30, 2026

@coderabbitai review

@cursor
Copy link
Copy Markdown

cursor Bot commented Mar 30, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 803cb9ba-e31d-41a0-97ef-39022d4f66a2

📥 Commits

Reviewing files that changed from the base of the PR and between 679236a and 5f951a4.

📒 Files selected for processing (2)
  • src/brainlayer/_helpers.py
  • tests/test_search_gaps.py

📝 Walkthrough

Walkthrough

The PR modifies FTS5 query escaping logic to switch from implicit AND to OR-based term joining when "auto" mode processes 4+ terms, while preserving implicit AND behavior for shorter queries. Corresponding test expectations are updated to verify the new behavior.

Changes

Cohort / File(s) Summary
FTS5 Auto Mode Logic
src/brainlayer/_helpers.py
Added conditional branch in _escape_fts5_query to join escaped FTS5 terms with " OR " for 4+ terms in "auto" mode; updated docstring and inline comments to reflect the new joiner selection logic.
Test Updates
tests/test_search_gaps.py
Renamed test class to TestFts5AutoMode and updated docstring. Modified long-query assertions to expect OR-joined terms instead of implicit AND. Added new test test_hybrid_search_long_query_uses_fts_or_and_returns_results to verify hybrid search behavior with long multi-keyword queries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 Long queries now leap with OR instead of AND,
Four terms or more, a scattered brand!
Short hops still shuffle through the space,
FTS5's search finds its new pace! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/fts5-long-query-recall

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey EtanHey merged commit 326b5fb into main Mar 30, 2026
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant