Skip to content

fix(seekdb): fix ann_search returning no rows in embedded mode#67

Merged
whhe merged 6 commits into
oceanbase:mainfrom
whhe:fix/seekdb-ann-search-cursor
Jun 4, 2026
Merged

fix(seekdb): fix ann_search returning no rows in embedded mode#67
whhe merged 6 commits into
oceanbase:mainfrom
whhe:fix/seekdb-ann-search-cursor

Conversation

@whhe

@whhe whhe commented Jun 4, 2026

Copy link
Copy Markdown
Member

Summary

Fixes two bugs that caused ann_search to always fail when using embedded SeekDB
(ObVecClient(path=...) / SeekdbRemoteClient(path=...)).

Also fixes two broken test assertions introduced by a Copilot suggestion on PR #65.

Changes

  • seekdb_engine.py: Set cursor._description = [] (not None) when a SELECT/SHOW/DESCRIBE returns 0 rows, so SQLAlchemy creates a row-returning CursorResult instead of _NoResultMetaData — which raised ResourceClosedError on fetchall().
  • seekdb_engine.py: Store the embedded server reference via engine.update_execution_options(seekdb_server=server) (public API) so the flush helper can reach it without touching private Engine attributes.
  • ob_client.py: Add _flush_seekdb_index() and call it at the end of insert(). HNSW index builds in seekdb are async; without an explicit flush, ann_search queries the index before it is built and returns 0 rows. The helper is a no-op for non-seekdb engines and for seekdb versions < 1.3.0. Refresh failures are demoted to a logger.warning so a flush error never aborts a successful write.
  • ob_vec_client.py: Buffer results from ann_search, post_ann_search, and precise_search using Result.freeze()() (SQLAlchemy's built-in buffering) instead of a bespoke _BufferedResult wrapper. This preserves the full SQLAlchemy Result interface (.first(), .scalars(), .mappings(), etc.) for callers.
  • tests/test_fts_index.py: Fix two assertions in FtsAnalyzerCompilationTest that expected PARSER_PROPERTIES = (...) with a space after =; the actual DDL output and reflection regex both use PARSER_PROPERTIES=(...) without a space.

Motivation

With embedded SeekDB, every call to ann_search raised sqlalchemy.exc.ResourceClosedError. The root causes were:

  1. Connection lifetime: ann_search returned conn.execute() directly from inside a with engine.connect() block. Once the block exited the connection closed, leaving callers with a dead CursorResult.
  2. Empty-SELECT cursor metadata: When _execute_via_pyseekdb returns [] (e.g., HNSW index not yet built), _SeekdbCursor set _description = None, which caused SQLAlchemy to mark the result as non-row-returning.
  3. Async HNSW build: seekdb builds HNSW indexes asynchronously. Any ann_search immediately after insert finds an empty index until a CALL dbms_index_manager.refresh() flush is issued (supported from seekdb ≥ 1.3.0).

@whhe whhe changed the title fix(client): buffer search results before connection closes fix(seekdb): fix ann_search returning no rows in embedded mode Jun 4, 2026
whhe and others added 3 commits June 4, 2026 23:23
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
… cursor

fix:
- Auto-flush seekdb async HNSW index build after insert so ann_search returns results
- Set cursor._description=[] (not None) for empty SELECT to prevent ResourceClosedError

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@whhe whhe force-pushed the fix/seekdb-ann-search-cursor branch from c2a9811 to 9f8eebe Compare June 4, 2026 15:40
@whhe whhe requested a review from Copilot June 4, 2026 16:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes embedded SeekDB ann_search correctness by addressing result-lifecycle and cursor-metadata edge cases, and by ensuring async HNSW index builds are flushed before querying. It also aligns FTS analyzer test assertions with the actual emitted DDL formatting.

Changes:

  • Ensure empty row-returning statements in embedded SeekDB produce a row-returning SQLAlchemy result (avoids ResourceClosedError on fetchall()).
  • Add an embedded SeekDB index flush after inserts to avoid querying before the async HNSW build completes.
  • Buffer/vector-search query results so they remain consumable after leaving the connection context; update FTS test assertions for parser properties formatting.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.

File Description
tests/test_fts_index.py Fixes analyzer DDL assertion strings to match emitted PARSER_PROPERTIES formatting.
pyobvector/client/seekdb_engine.py Adjusts empty-query cursor metadata handling; stores embedded server reference on the engine for later index refresh.
pyobvector/client/ob_client.py Adds a post-insert() embedded SeekDB index flush helper.
pyobvector/client/ob_vec_client.py Buffers results returned from vector search methods to avoid dead results after connection closure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyobvector/client/seekdb_engine.py Outdated
Comment thread pyobvector/client/ob_client.py Outdated
Comment thread pyobvector/client/ob_vec_client.py Outdated
Comment thread pyobvector/client/ob_vec_client.py Outdated
Comment thread pyobvector/client/ob_vec_client.py Outdated
Comment thread pyobvector/client/ob_vec_client.py Outdated
Comment thread pyobvector/client/ob_vec_client.py Outdated
- Replace engine._seekdb_server with engine.update_execution_options()
  and engine.get_execution_options() to avoid writing to private Engine
  attributes (addresses Copilot review comment)
- Replace _BufferedResult/_MappingsResult with Result.freeze()() so
  callers receive a full SQLAlchemy Result with all standard APIs
  (.first(), .scalars(), .mappings(), etc.)
- Wrap server.refresh_index() in try/except so a flush failure after
  insert is demoted to a warning instead of aborting the write

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

Comment thread pyobvector/client/seekdb_engine.py Outdated
Comment thread pyobvector/client/ob_client.py
whhe and others added 2 commits June 5, 2026 02:11
…docstring

- Add _description_from_select() using sqlglot to extract column names
  when a SELECT returns 0 rows, so cursor.description carries real names
  instead of an empty list
- Add module-level logger to seekdb_engine and log parse failures at DEBUG
- Fix _flush_seekdb_index docstring to reflect hasattr gate rather than
  a version check

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated no new comments.

@whhe whhe merged commit 920de97 into oceanbase:main Jun 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants