fix: restore hybrid search and patch MCP/embedding bugs#2
Conversation
- storage: detect sparse vectors via collection info instead of always returning False - search: pass av.SparseVector with using="keywords" so sparse RRF actually fires - mcp/server: wrap sync tool handlers with asyncio.to_thread to unblock event loop - mcp/tools_browse: try/finally around _handle_browse to stop leaking gRPC clients - embeddings: hash full text in cache key — 500-char prefix collided on long inputs - embeddings/ops: replace hardcoded 384/768 dims with TEXT_EMBED_DIM/CODE_EMBED_DIM
📝 WalkthroughWalkthroughConfiguration environment variables are added and Git-tracked; ruff becomes a standard dependency. Embedding dimension constants replace hardcoded values across modules. Async thread execution is applied to tool invocation to prevent event loop blocking. Resource cleanup via try/finally ensures storage connections close. Sparse vector search and sparse capability detection are refactored. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideRestores true hybrid search and fixes several runtime issues by correctly detecting sparse support in storage, using the provider’s sparse vector API in search, offloading synchronous MCP tool execution to background threads, tightening resource handling in the browse tool, fixing embedding cache-key collisions, and replacing hardcoded embedding dimensions with configuration constants (plus adding Ruff as a dev dependency). Sequence diagram for MCP call_tool execution offloaded to threadssequenceDiagram
actor User
participant MCPClient
participant MCPServer
participant ExtraTools as ExtraToolsModule
participant CoreTools as ToolsModule
User->>MCPClient: invoke tool(name, arguments)
MCPClient->>MCPServer: call_tool(name, arguments)
MCPServer->>MCPServer: enter try block
MCPServer->>ExtraTools: asyncio.to_thread(call_extra_tool, name, arguments)
activate ExtraTools
ExtraTools-->>MCPServer: result or None
deactivate ExtraTools
alt extra_tool_handled
MCPServer-->>MCPClient: TextContent list
MCPClient-->>User: render tool result
else fallback_to_core_tools
MCPServer->>CoreTools: asyncio.to_thread(tools_module.call_tool, name, arguments)
activate CoreTools
CoreTools-->>MCPServer: TextContent list
deactivate CoreTools
MCPServer-->>MCPClient: TextContent list
MCPClient-->>User: render tool result
end
opt error_during_tool_execution
MCPServer->>MCPServer: log error
MCPServer-->>MCPClient: TextContent("Context8 error: ...")
MCPClient-->>User: show error message
end
Sequence diagram for sparse search using provider SparseVector APIsequenceDiagram
participant SearchEngine
participant StorageService
participant VectorClient as VectorAIClient
SearchEngine->>SearchEngine: _search_sparse(query)
SearchEngine->>SearchEngine: compute indices, values
SearchEngine->>SearchEngine: sparse_vec = av.SparseVector(indices, values)
SearchEngine->>StorageService: storage.client
StorageService-->>SearchEngine: VectorClient
SearchEngine->>VectorClient: points.search(COLLECTION_NAME, vector=sparse_vec, using=keywords, filter=search_filter, limit=limit, with_payload=True)
VectorClient-->>SearchEngine: sparse_search_results
SearchEngine-->>SearchEngine: combine in RRF with dense results
SearchEngine-->>Caller: hybrid ranked results
Updated class diagram for storage, search, and embeddings servicesclassDiagram
class StorageService {
- client
- _sparse_supported
+ initialize() bool
+ sparse_supported() bool
- _discover_sparse_vectors(info) bool
}
class SearchEngine {
- storage StorageService
+ search(query, limit) list
- _search_sparse(query, limit, search_filter) list
- _search_dense(query, limit, search_filter) list
}
class EmbeddingsService {
- _cache dict
- _use_code_model bool
- _text_model
- _code_model
+ text_model() Any
+ code_model() Any
- _cache_key(text str, model_tag str) str
- _get_cached(text str, model_tag str) list~float~ | None
- _set_cached(text str, model_tag str, vector list~float~) None
+ embed_text(text str) list~float~
+ embed_code(code str) list~float~
}
class Config {
+ TEXT_MODEL
+ CODE_MODEL
+ TEXT_EMBED_DIM
+ CODE_EMBED_DIM
}
StorageService --> SearchEngine : used_by
EmbeddingsService --> SearchEngine : used_by
Config ..> EmbeddingsService : defines_dimensions
note for EmbeddingsService "embed_text returns [0.0] * TEXT_EMBED_DIM for blank text; embed_code returns [0.0] * CODE_EMBED_DIM or TEXT_EMBED_DIM depending on _use_code_model"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The embedding cache key now hashes the full text string, which can be very large; consider hashing the raw bytes (e.g.,
hashlib.md5(model_tag.encode() + text.encode()).hexdigest()) without building a combined f-string to avoid extra copies and potential memory overhead for long inputs. - In
_handle_browse, you now manageStorageServicewith an explicittry/finally; if this pattern appears elsewhere, it might be worth givingStorageServicea context manager interface (or a small helper) to reduce repetition and make resource lifecycle clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The embedding cache key now hashes the full text string, which can be very large; consider hashing the raw bytes (e.g., `hashlib.md5(model_tag.encode() + text.encode()).hexdigest()`) without building a combined f-string to avoid extra copies and potential memory overhead for long inputs.
- In `_handle_browse`, you now manage `StorageService` with an explicit `try/finally`; if this pattern appears elsewhere, it might be worth giving `StorageService` a context manager interface (or a small helper) to reduce repetition and make resource lifecycle clearer.
## Individual Comments
### Comment 1
<location path="pyproject.toml" line_range="26" />
<code_context>
"mcp>=1.0.0",
"click>=8.0.0",
"rich>=13.0.0",
+ "ruff>=0.15.11",
]
</code_context>
<issue_to_address>
**suggestion:** Ruff might be better scoped as a dev-only dependency instead of a runtime dependency.
Including `ruff` in `dependencies` will install it in all production environments, even though it’s only needed for linting. Unless there’s a runtime requirement, please move it to a dev/extra group (e.g., `dev` or `lint`) to avoid bloating production installs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| "mcp>=1.0.0", | ||
| "click>=8.0.0", | ||
| "rich>=13.0.0", | ||
| "ruff>=0.15.11", |
There was a problem hiding this comment.
suggestion: Ruff might be better scoped as a dev-only dependency instead of a runtime dependency.
Including ruff in dependencies will install it in all production environments, even though it’s only needed for linting. Unless there’s a runtime requirement, please move it to a dev/extra group (e.g., dev or lint) to avoid bloating production installs.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
21-37: 🛠️ Refactor suggestion | 🟠 MajorMove
ruffback todevextras — it shouldn't be a runtime dependency.The PR description states ruff is being added as a dev dependency, but line 26 puts it in
[project].dependencies, which makes every end-user ofcontext8(including pure consumers of the MCP server / CLI) pull in ruff. Ruff is a linter and isn't imported anywhere at runtime.Also,
ruff>=0.4.0is still listed under thedevextra (line 36), so the version requirement is now contradictory and duplicated.♻️ Proposed fix
dependencies = [ "sentence-transformers>=2.2.0", "mcp>=1.0.0", "click>=8.0.0", "rich>=13.0.0", - "ruff>=0.15.11", ] [project.optional-dependencies] code = [ "transformers>=4.30.0", ] dev = [ "pytest>=7.0.0", "pytest-asyncio>=0.21.0", - "ruff>=0.4.0", + "ruff>=0.15.11", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 21 - 37, The pyproject currently lists "ruff" under the main dependencies array and also under [project.optional-dependencies] dev, causing duplication and making ruff a runtime dependency; remove the "ruff>=0.15.11" entry from the top-level dependencies list and keep only the dev entry "ruff>=0.4.0" (or unify to the desired dev version) under the dev optional-dependencies so ruff is installed only for development.
🧹 Nitpick comments (2)
.env.example (1)
1-3: Optional: alphabetize keys to satisfy dotenv-linter.
dotenv-linterflagsUnorderedKeyon lines 2–3 (CONTEXT8_DB_HOST/CONTEXT8_DB_PORTshould come beforeHF_TOKEN). Not functional, but worth fixing to keep the lint clean.📝 Proposed reorder
-HF_TOKEN= -CONTEXT8_DB_HOST=localhost -CONTEXT8_DB_PORT=50051 +CONTEXT8_DB_HOST=localhost +CONTEXT8_DB_PORT=50051 +HF_TOKEN=🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.env.example around lines 1 - 3, Reorder the environment variable entries in .env.example so they are alphabetized to satisfy dotenv-linter: move CONTEXT8_DB_HOST and CONTEXT8_DB_PORT before HF_TOKEN (i.e., ensure the keys appear in alphabetical order: CONTEXT8_DB_HOST, CONTEXT8_DB_PORT, HF_TOKEN) while preserving the existing values/formatting.src/context8/embeddings/service.py (1)
82-86: Minor: blank-code dim trustsCONTEXT8_CODE_EMBED_DIMeven when the code model isn't actually loaded with that dim.Selecting
CODE_EMBED_DIMwhen_use_code_model=Trueis correct as long asCONTEXT8_CODE_EMBED_DIMmatches the code model's real output (e.g. 768 for CodeBERT). If a user enables the code model but forgets to set the env var (default 384), blank-code zero vectors won't match populated code-context vectors and may break upserts/searches. Consider deriving the dim from the actually-loaded model (self.code_model.get_sentence_embedding_dimension()) once it's been initialized, or asserting the two agree on first load.Not a regression from this PR — flagging for awareness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/context8/embeddings/service.py` around lines 82 - 86, embed_code currently returns zeros using the constant CODE_EMBED_DIM when self._use_code_model is true, which can mismatch the real model output dim; update embed_code to derive the zero-vector length from the initialized model instead: when self._use_code_model and self.code_model is available call self.code_model.get_sentence_embedding_dimension() to determine dim (fall back to CODE_EMBED_DIM only if model isn't loaded), and/or add an assertion on model initialization that CODE_EMBED_DIM equals self.code_model.get_sentence_embedding_dimension() to fail fast; reference symbols: embed_code, _use_code_model, CODE_EMBED_DIM, TEXT_EMBED_DIM, and self.code_model.get_sentence_embedding_dimension().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/context8/mcp/tools_browse.py`:
- Around line 99-133: The sibling function _handle_ecosystem has the same gRPC
client leak because StorageService() is created and storage.close() is only
called on the happy path; wrap the lifecycle of storage in a try/finally (like
in the other handler) so that storage.close() is always executed even if
browse() or the formatting loop throws; specifically, in _handle_ecosystem
ensure the StorageService() instantiation and the browse(...) call that returns
all_records are inside a try block and call storage.close() in the finally,
leaving any further formatting of all_records outside the try if desired.
---
Outside diff comments:
In `@pyproject.toml`:
- Around line 21-37: The pyproject currently lists "ruff" under the main
dependencies array and also under [project.optional-dependencies] dev, causing
duplication and making ruff a runtime dependency; remove the "ruff>=0.15.11"
entry from the top-level dependencies list and keep only the dev entry
"ruff>=0.4.0" (or unify to the desired dev version) under the dev
optional-dependencies so ruff is installed only for development.
---
Nitpick comments:
In @.env.example:
- Around line 1-3: Reorder the environment variable entries in .env.example so
they are alphabetized to satisfy dotenv-linter: move CONTEXT8_DB_HOST and
CONTEXT8_DB_PORT before HF_TOKEN (i.e., ensure the keys appear in alphabetical
order: CONTEXT8_DB_HOST, CONTEXT8_DB_PORT, HF_TOKEN) while preserving the
existing values/formatting.
In `@src/context8/embeddings/service.py`:
- Around line 82-86: embed_code currently returns zeros using the constant
CODE_EMBED_DIM when self._use_code_model is true, which can mismatch the real
model output dim; update embed_code to derive the zero-vector length from the
initialized model instead: when self._use_code_model and self.code_model is
available call self.code_model.get_sentence_embedding_dimension() to determine
dim (fall back to CODE_EMBED_DIM only if model isn't loaded), and/or add an
assertion on model initialization that CODE_EMBED_DIM equals
self.code_model.get_sentence_embedding_dimension() to fail fast; reference
symbols: embed_code, _use_code_model, CODE_EMBED_DIM, TEXT_EMBED_DIM, and
self.code_model.get_sentence_embedding_dimension().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df4f3356-647c-4df6-9303-5e806b6b75ca
📒 Files selected for processing (9)
.env.example.gitignorepyproject.tomlsrc/context8/cli/commands/ops.pysrc/context8/embeddings/service.pysrc/context8/mcp/server.pysrc/context8/mcp/tools_browse.pysrc/context8/search/engine.pysrc/context8/storage.py
| storage = StorageService() | ||
| try: | ||
| records = browse( | ||
| storage, | ||
| tag=args.get("tag"), | ||
| language=args.get("language"), | ||
| framework=args.get("framework"), | ||
| error_type=args.get("error_type"), | ||
| limit=args.get("limit", 20), | ||
| ) | ||
|
|
||
| if not records: | ||
| return [TextContent(type="text", text="No records match those filters.")] | ||
|
|
||
| lines = [f"Found {len(records)} record(s):\n"] | ||
| for i, r in enumerate(records, 1): | ||
| meta = [] | ||
| if r.language: | ||
| meta.append(r.language) | ||
| if r.framework: | ||
| meta.append(r.framework) | ||
| if r.error_type: | ||
| meta.append(r.error_type) | ||
| meta_str = f" ({', '.join(meta)})" if meta else "" | ||
|
|
||
| lines.append(f"[{i}] {r.problem_text[:120]}{meta_str}") | ||
| lines.append(f" Fix: {r.solution_text[:150]}") | ||
| if r.tags: | ||
| lines.append(f" Tags: {', '.join(r.tags[:5])}") | ||
| lines.append(f" ID: {r.id} Confidence: {r.confidence:.0%}") | ||
| lines.append("") | ||
|
|
||
| records = browse( | ||
| storage, | ||
| tag=args.get("tag"), | ||
| language=args.get("language"), | ||
| framework=args.get("framework"), | ||
| error_type=args.get("error_type"), | ||
| limit=args.get("limit", 20), | ||
| ) | ||
|
|
||
| if not records: | ||
| return [TextContent(type="text", text="No records match those filters.")] | ||
|
|
||
| lines = [f"Found {len(records)} record(s):\n"] | ||
| for i, r in enumerate(records, 1): | ||
| meta = [] | ||
| if r.language: | ||
| meta.append(r.language) | ||
| if r.framework: | ||
| meta.append(r.framework) | ||
| if r.error_type: | ||
| meta.append(r.error_type) | ||
| meta_str = f" ({', '.join(meta)})" if meta else "" | ||
|
|
||
| lines.append(f"[{i}] {r.problem_text[:120]}{meta_str}") | ||
| lines.append(f" Fix: {r.solution_text[:150]}") | ||
| if r.tags: | ||
| lines.append(f" Tags: {', '.join(r.tags[:5])}") | ||
| lines.append(f" ID: {r.id} Confidence: {r.confidence:.0%}") | ||
| lines.append("") | ||
|
|
||
| storage.close() | ||
| return [TextContent(type="text", text="\n".join(lines))] | ||
| return [TextContent(type="text", text="\n".join(lines))] | ||
| finally: | ||
| storage.close() |
There was a problem hiding this comment.
Same gRPC leak still present in _handle_ecosystem.
The try/finally fix here is correct and covers both the empty-records early return and any exception during browse()/formatting. However, the sibling _handle_ecosystem (lines 140-164) has the exact same leak class this PR is fixing: storage.close() is only invoked on the happy path at line 164, so any exception raised by either browse() call or in the loop body will leak the gRPC client — the same defect that prompted this PR. Worth applying the same try/finally pattern there for consistency.
🛡️ Proposed fix for `_handle_ecosystem`
def _handle_ecosystem(args: dict) -> list[TextContent]:
from ..browse import browse
from ..storage import StorageService
storage = StorageService()
+ try:
+ languages = args.get("languages", [])
+ frameworks = args.get("frameworks", [])
+ limit = args.get("limit", 10)
- languages = args.get("languages", [])
- frameworks = args.get("frameworks", [])
- limit = args.get("limit", 10)
+ all_records = []
+ seen_ids: set[str] = set()
- all_records = []
- seen_ids: set[str] = set()
+ for lang in languages:
+ records = browse(storage, language=lang, limit=limit)
+ for r in records:
+ if r.id not in seen_ids:
+ all_records.append(r)
+ seen_ids.add(r.id)
- # Collect records across all specified languages and frameworks
- for lang in languages:
- records = browse(storage, language=lang, limit=limit)
- for r in records:
- if r.id not in seen_ids:
- all_records.append(r)
- seen_ids.add(r.id)
-
- for fw in frameworks:
- records = browse(storage, framework=fw, limit=limit)
- for r in records:
- if r.id not in seen_ids:
- all_records.append(r)
- seen_ids.add(r.id)
-
- storage.close()
+ for fw in frameworks:
+ records = browse(storage, framework=fw, limit=limit)
+ for r in records:
+ if r.id not in seen_ids:
+ all_records.append(r)
+ seen_ids.add(r.id)
+ finally:
+ storage.close()(The remainder of the function, which only formats all_records, can stay outside the try since it no longer touches storage.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/context8/mcp/tools_browse.py` around lines 99 - 133, The sibling function
_handle_ecosystem has the same gRPC client leak because StorageService() is
created and storage.close() is only called on the happy path; wrap the lifecycle
of storage in a try/finally (like in the other handler) so that storage.close()
is always executed even if browse() or the formatting loop throws; specifically,
in _handle_ecosystem ensure the StorageService() instantiation and the
browse(...) call that returns all_records are inside a try block and call
storage.close() in the finally, leaving any further formatting of all_records
outside the try if desired.
Major release with contributions from @pathfindermilan: PR #2: Fix hybrid search (sparse was silently disabled), async MCP, embedding cache collision, browse resource leak, configurable dims PR #3: Docker+Podman auto-detection, self-bootstrapping serve command, --no-bootstrap flag, sparse search fix Plus: changelog in README, logo assets, demo video script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes six runtime bugs across the MCP server, search engine, storage,
and embedding pipeline. The two highest-impact fixes restore actual
hybrid search — sparse retrieval was silently disabled in production.
sparse_supportedalways returnedFalseafter thecollection already existed (every server restart). Now detects
sparse vectors via
_discover_sparse_vectors(collection_info)._search_sparseusedvector_name=+sparse_indices=kwargs and a raw values list. Server rejected it; the error was
swallowed and sparse never contributed to RRF. Now passes
av.SparseVector(indices=, values=)withusing="keywords".call_toolinvoked sync DB/embedding codedirectly, blocking the event loop. Wrapped both dispatch paths
with
asyncio.to_thread._handle_browseleaked a gRPC client on theno-results early return. Wrapped in
try/finally.text[:500], so two longinputs sharing a 500-char prefix returned the same vector. Now
hashes the full text.
384/768dims withTEXT_EMBED_DIM/CODE_EMBED_DIMsoCONTEXT8_CODE_EMBED_DIMactually flows through.
Test plan
context8 initon a fresh DB →context8 statsshowsSparse: enabledandHybrid ready: yescontext8_searchresults includekeywords@N(...)in theattribution line, proving sparse votes in RRF
context8_browsecalls with no results don't exhaustgRPC connections
Summary by Sourcery
Fix multiple runtime issues across browsing tools, MCP server, storage, search, and embeddings to restore proper hybrid search behavior and correct embedding configuration.
Bug Fixes:
Enhancements:
Build:
Summary by CodeRabbit
Chores
.gitignoreto exclude local environment files.Bug Fixes
Improvements