fix(seekdb): fix ann_search returning no rows in embedded mode#67
Merged
Conversation
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
… cursor fix: - Auto-flush seekdb async HNSW index build after insert so ann_search returns results - Set cursor._description=[] (not None) for empty SELECT to prevent ResourceClosedError Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
c2a9811 to
9f8eebe
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes embedded SeekDB ann_search correctness by addressing result-lifecycle and cursor-metadata edge cases, and by ensuring async HNSW index builds are flushed before querying. It also aligns FTS analyzer test assertions with the actual emitted DDL formatting.
Changes:
- Ensure empty row-returning statements in embedded SeekDB produce a row-returning SQLAlchemy result (avoids
ResourceClosedErroronfetchall()). - Add an embedded SeekDB index flush after inserts to avoid querying before the async HNSW build completes.
- Buffer/vector-search query results so they remain consumable after leaving the connection context; update FTS test assertions for parser properties formatting.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
tests/test_fts_index.py |
Fixes analyzer DDL assertion strings to match emitted PARSER_PROPERTIES formatting. |
pyobvector/client/seekdb_engine.py |
Adjusts empty-query cursor metadata handling; stores embedded server reference on the engine for later index refresh. |
pyobvector/client/ob_client.py |
Adds a post-insert() embedded SeekDB index flush helper. |
pyobvector/client/ob_vec_client.py |
Buffers results returned from vector search methods to avoid dead results after connection closure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Replace engine._seekdb_server with engine.update_execution_options() and engine.get_execution_options() to avoid writing to private Engine attributes (addresses Copilot review comment) - Replace _BufferedResult/_MappingsResult with Result.freeze()() so callers receive a full SQLAlchemy Result with all standard APIs (.first(), .scalars(), .mappings(), etc.) - Wrap server.refresh_index() in try/except so a flush failure after insert is demoted to a warning instead of aborting the write Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
…docstring - Add _description_from_select() using sqlglot to extract column names when a SELECT returns 0 rows, so cursor.description carries real names instead of an empty list - Add module-level logger to seekdb_engine and log parse failures at DEBUG - Fix _flush_seekdb_index docstring to reflect hasattr gate rather than a version check Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two bugs that caused
ann_searchto always fail when using embedded SeekDB(
ObVecClient(path=...)/SeekdbRemoteClient(path=...)).Also fixes two broken test assertions introduced by a Copilot suggestion on PR #65.
Changes
seekdb_engine.py: Setcursor._description = [](notNone) when a SELECT/SHOW/DESCRIBE returns 0 rows, so SQLAlchemy creates a row-returningCursorResultinstead of_NoResultMetaData— which raisedResourceClosedErroronfetchall().seekdb_engine.py: Store the embedded server reference viaengine.update_execution_options(seekdb_server=server)(public API) so the flush helper can reach it without touching private Engine attributes.ob_client.py: Add_flush_seekdb_index()and call it at the end ofinsert(). HNSW index builds in seekdb are async; without an explicit flush,ann_searchqueries the index before it is built and returns 0 rows. The helper is a no-op for non-seekdb engines and for seekdb versions < 1.3.0. Refresh failures are demoted to alogger.warningso a flush error never aborts a successful write.ob_vec_client.py: Buffer results fromann_search,post_ann_search, andprecise_searchusingResult.freeze()()(SQLAlchemy's built-in buffering) instead of a bespoke_BufferedResultwrapper. This preserves the full SQLAlchemyResultinterface (.first(),.scalars(),.mappings(), etc.) for callers.tests/test_fts_index.py: Fix two assertions inFtsAnalyzerCompilationTestthat expectedPARSER_PROPERTIES = (...)with a space after=; the actual DDL output and reflection regex both usePARSER_PROPERTIES=(...)without a space.Motivation
With embedded SeekDB, every call to
ann_searchraisedsqlalchemy.exc.ResourceClosedError. The root causes were:ann_searchreturnedconn.execute()directly from inside awith engine.connect()block. Once the block exited the connection closed, leaving callers with a deadCursorResult._execute_via_pyseekdbreturns[](e.g., HNSW index not yet built),_SeekdbCursorset_description = None, which caused SQLAlchemy to mark the result as non-row-returning.ann_searchimmediately afterinsertfinds an empty index until aCALL dbms_index_manager.refresh()flush is issued (supported from seekdb ≥ 1.3.0).