refactor: container management and bootstrap logic#3
Conversation
This commit introduces significant changes to how Context8 manages its containerized Actian VectorAI DB and its bootstrapping process. Key changes include: - **Generic Container Runtime Support:** The code now detects and supports both Docker and Podman, moving away from hardcoded Docker assumptions. This is reflected in user-facing messages and internal commands. - **Improved Bootstrap (`serve` command):** The `serve` command now includes an idempotent bootstrap mechanism that ensures the database container is running, the collection is initialized, and embedding models are cached *before* the MCP server starts. This allows `context8 serve` to work even on a cold machine without prior manual setup. - **Detached `serve` Option:** A `--no-bootstrap` flag is added to the `serve` command, allowing users to skip the auto-bootstrap if they prefer to manage these aspects manually. - **Simplified `docker.py`:** The `docker.py` module has been refactored to be more robust and flexible, handling runtime detection and compose command variations. - **Removed `confidence` from seed data:** The `confidence` field was removed from the `SEED_DATA` as it was not being used or populated. - **VectorAI Search Update:** The `SearchEngine` now correctly passes sparse vector data to the `VectorAIClient.points.search` method.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe changes introduce dynamic Docker/Podman container runtime detection, add a bootstrap step to the serve command for pre-initialization tasks, update the MCP server startup to use the CLI entry point, and generalize container runtime references throughout the codebase from Docker-specific to runtime-agnostic language. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Serve as serve<br/>Command
participant Bootstrap as _bootstrap()<br/>Process
participant Docker as Container<br/>Runtime<br/>(Detect)
participant DB as DB<br/>Container
participant Models as Embedding<br/>Models
participant MCP as MCP<br/>Server
User->>Serve: context8 serve [--no-bootstrap]
Serve->>Serve: Check --no-bootstrap flag
alt Bootstrap enabled (default)
Serve->>Bootstrap: Run _bootstrap()
Bootstrap->>Docker: detect_runtime()
Docker-->>Bootstrap: "docker" or "podman"
Bootstrap->>DB: Start/ensure DB container running
DB-->>Bootstrap: Container running
Bootstrap->>Bootstrap: Initialize storage & create collection
Bootstrap->>Models: Attempt to download embedding models
alt Model download succeeds
Models-->>Bootstrap: Models ready
else Model download fails
Models-->>Bootstrap: Download error
Bootstrap->>Bootstrap: Log warning to stderr
end
Bootstrap-->>Serve: Bootstrap complete
else Bootstrap disabled (--no-bootstrap)
Serve->>Serve: Skip bootstrap
end
Serve->>MCP: Start MCP server
MCP-->>User: Server running
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideRefactors container management to support Docker and Podman, adds an idempotent bootstrap flow to Sequence diagram for the new serve bootstrap and MCP startup flowsequenceDiagram
actor Agent
participant CLI_serve as CLI_serve
participant DockerModule as docker
participant StorageService as StorageService
participant EmbeddingService as EmbeddingService
participant MCPServer as MCP_server
participant Runtime as container_runtime
participant VectorAIDB as VectorAI_DB_container
Agent->>CLI_serve: invoke context8 serve
activate CLI_serve
CLI_serve->>CLI_serve: check no_bootstrap flag
alt bootstrap_enabled
CLI_serve->>CLI_serve: _bootstrap()
activate CLI_serve
CLI_serve->>DockerModule: is_container_running()
activate DockerModule
DockerModule->>DockerModule: detect_runtime()
DockerModule->>Runtime: probe runtime info
Runtime-->>DockerModule: status
DockerModule-->>CLI_serve: running_state
deactivate DockerModule
alt container_not_running
CLI_serve->>DockerModule: ensure_running(timeout_secs)
activate DockerModule
DockerModule->>DockerModule: detect_runtime()
DockerModule->>DockerModule: _compose_cmd()
DockerModule->>Runtime: compose up -d
Runtime-->>DockerModule: result
DockerModule-->>CLI_serve: (ok, msg)
deactivate DockerModule
CLI_serve->>VectorAIDB: wait until ready
VectorAIDB-->>CLI_serve: ready
else container_running
CLI_serve->>CLI_serve: skip start
end
CLI_serve->>StorageService: initialize()
activate StorageService
StorageService-->>CLI_serve: created_flag
StorageService->>StorageService: close()
deactivate StorageService
CLI_serve->>EmbeddingService: ensure_models_downloaded()
activate EmbeddingService
EmbeddingService-->>CLI_serve: models_cached_or_lazy
deactivate EmbeddingService
else no_bootstrap
CLI_serve->>CLI_serve: skip _bootstrap
end
CLI_serve->>MCPServer: run_server()
activate MCPServer
MCPServer-->>Agent: MCP stdio session
deactivate MCPServer
deactivate CLI_serve
Class diagram for updated container, bootstrap, config, and search logicclassDiagram
class DockerModule {
<<module>>
+str CONTEXT8_DIR
+str COMPOSE_TEMPLATE
+str CONTAINER_NAME
+str _runtime_cache
+list~str~ _compose_cache
+Path _compose_dir()
+Path _compose_path()
+Path _ensure_compose_file()
+bool _probe(cmd)
+str detect_runtime()
+list~str~ _compose_cmd()
+CompletedProcess run_compose(args)
+bool is_container_running()
+tuple~bool, str~ ensure_running(timeout_secs)
}
class ServeCommand {
<<module>>
+serve(no_bootstrap)
-_log(msg)
-_bootstrap()
}
class ConfigModule {
<<module>>
+list~str~ get_server_command()
}
class DoctorCommand {
<<module>>
+doctor()
}
class LifecycleCommands {
<<module>>
+start(detach)
+init(seed, github, force)
}
class SearchEngine {
<<class>>
+StorageService storage
+_search_sparse(query, indices, values, search_filter, limit)
}
class VectorAIClient {
<<external>>
+points.search(collection_name, vector, using, sparse_indices, filter, limit, with_payload)
}
class StorageService {
<<class>>
+initialize()
+close()
}
class EmbeddingService {
<<class>>
+ensure_models_downloaded()
}
ServeCommand --> DockerModule : uses
ServeCommand --> StorageService : uses
ServeCommand --> EmbeddingService : uses
DoctorCommand --> DockerModule : detect_runtime(), is_container_running()
LifecycleCommands --> DockerModule : ensure_running()
ConfigModule --> ServeCommand : routes to serve
SearchEngine --> VectorAIClient : uses points.search
SearchEngine --> StorageService : uses client
DockerModule ..> VectorAIClient : DB connection via container
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
get_server_command() now returns context8 serve instead of python -m context8.mcp.server. Test assertion updated to check structure rather than exact command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 5 security issues, 3 other issues, and left some high level feedback:
Security issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In
_bootstrap, catching a broadExceptionaroundEmbeddingService.ensure_models_downloaded()risks hiding real bugs in model setup; consider narrowing this to expected exceptions (e.g. network or model-availability errors) so unexpected failures still surface. - The
_runtime_cacheand_compose_cachevalues indocker.pyare never invalidated, so a process that starts before a runtime/daemon is available will continue to report no runtime/compose even if the user later starts Docker/Podman; if this module is used in any long-lived process, consider adding a way to force a re-probe.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_bootstrap`, catching a broad `Exception` around `EmbeddingService.ensure_models_downloaded()` risks hiding real bugs in model setup; consider narrowing this to expected exceptions (e.g. network or model-availability errors) so unexpected failures still surface.
- The `_runtime_cache` and `_compose_cache` values in `docker.py` are never invalidated, so a process that starts before a runtime/daemon is available will continue to report no runtime/compose even if the user later starts Docker/Podman; if this module is used in any long-lived process, consider adding a way to force a re-probe.
## Individual Comments
### Comment 1
<location path="src/context8/cli/commands/serve.py" line_range="30-36" />
<code_context>
+ print(f"[context8] {msg}", file=sys.stderr, flush=True)
+
+
+def _bootstrap() -> None:
+ """Idempotent bootstrap: container up, collection ready, models cached.
+
+ Safe to run on every `serve` invocation — each step is a no-op when already
+ satisfied. All output goes to stderr so the MCP stdio protocol stays clean.
+ """
+ from ...docker import ensure_running, is_container_running
+
+ if not is_container_running():
+ _log("starting DB container...")
+ ok, msg = ensure_running(timeout_secs=30)
+ if not ok:
+ _log(f"FATAL: container failed to start: {msg}")
+ raise SystemExit(1)
+ _log(f"container ready ({msg})")
+
+ from ...storage import StorageService
+
+ storage = StorageService()
+ created = storage.initialize()
+ if created:
+ _log("collection created")
+ storage.close()
+
+ try:
</code_context>
<issue_to_address>
**suggestion:** Consider handling storage initialization failures to produce a clearer fatal message.
If `StorageService()` or `storage.initialize()` fails (e.g. DB unreachable, bad config), the exception will currently bubble up and crash without the clear stderr message you use for the container bootstrap. Consider wrapping storage initialization in a try/except, logging a `FATAL` via `_log`, and exiting with `SystemExit(1)` so failures are reported consistently and predictably.
```suggestion
from ...storage import StorageService
storage = None
try:
storage = StorageService()
created = storage.initialize()
if created:
_log("collection created")
except Exception as e:
_log(f"FATAL: storage initialization failed: {e}")
raise SystemExit(1)
finally:
if storage is not None:
try:
storage.close()
except Exception as e:
_log(f"warning: failed to close storage cleanly ({e})")
```
</issue_to_address>
### Comment 2
<location path="tests/test_agents.py" line_range="60-62" />
<code_context>
data = json.loads(config_path.read_text())
assert "context8" in data["mcpServers"]
entry = data["mcpServers"]["context8"]
- assert entry["command"] == "python"
- assert entry["args"] == ["-m", "context8.mcp.server"]
+ assert "command" in entry
+ assert isinstance(entry["args"], list)
def test_idempotent(self, tmp_path):
</code_context>
<issue_to_address>
**issue (testing):** The updated assertions are very loose and no longer guarantee that the MCP server command is wired to the new `serve` entrypoint.
The previous test verified that Claude launched `python -m context8.mcp.server`. Now it only checks that a command exists and `args` is a list, which would still pass even if the config pointed to the wrong binary or arguments.
To keep this test useful with `get_server_command`, I suggest either:
- asserting that `entry['command']`/`entry['args']` exactly match `context8.config.get_server_command()` (import and compare the full list), or
- asserting that `'serve'` appears in the args and that the command is either the `context8` script or the current interpreter.
This will ensure the test still proves agents are routed through the `serve` CLI rather than just checking for any command.
</issue_to_address>
### Comment 3
<location path="src/context8/docker.py" line_range="37" />
<code_context>
CONTAINER_NAME = "context8_db"
+_runtime_cache: str | None = None
+_compose_cache: list[str] | None = None
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying runtime and compose detection by replacing the multiple helper functions and global caches with a single linear resolver that returns both the compose command and runtime.
You can drop most of the indirection by collapsing runtime/compose detection into a single linear resolver and removing the global caches/sentinels. That keeps Docker/Podman support but makes the control flow much easier to follow.
For example, you can replace `_runtime_cache`, `_compose_cache`, `_probe`, `detect_runtime` and `_compose_cmd` with something like:
```python
def _iter_compose_candidates() -> list[tuple[list[str], str]]:
# command, inferred runtime
return [
(["docker", "compose"], "docker"),
(["docker-compose"], "docker"),
(["podman", "compose"], "podman"),
(["podman-compose"], "podman"),
]
def _resolve_compose() -> tuple[list[str], str] | None:
"""Return (compose_cmd, runtime) or None if nothing is usable."""
for cmd, runtime in _iter_compose_candidates():
try:
subprocess.run(
cmd + ["version"],
capture_output=True,
check=True,
timeout=5,
)
return cmd, runtime
except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
continue
return None
```
Then `run_compose`, `is_container_running`, and `ensure_running` become simpler and don’t need to know about probing or caches:
```python
def run_compose(args: list[str]) -> subprocess.CompletedProcess:
"""Run a compose command against the Context8 compose file."""
_ensure_compose_file()
resolved = _resolve_compose()
if resolved is None:
return subprocess.CompletedProcess(
args=args,
returncode=1,
stdout="",
stderr=(
"no compose tool found — install one of: "
"`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
),
)
cmd, _runtime = resolved
return subprocess.run(
cmd + args,
cwd=str(_compose_dir()),
capture_output=True,
text=True,
)
```
```python
def is_container_running() -> bool:
"""Check if the context8_db container is running under docker or podman."""
resolved = _resolve_compose()
if resolved is None:
return False
_cmd, runtime = resolved
try:
result = subprocess.run(
[runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
capture_output=True,
text=True,
timeout=5,
)
return bool(result.stdout.strip()) and "Up" in result.stdout
except Exception:
return False
```
```python
def ensure_running(timeout_secs: int = 30) -> tuple[bool, str]:
"""Start the container if not running, wait for it to be healthy."""
if is_container_running():
return True, "already running"
resolved = _resolve_compose()
runtime = resolved[1] if resolved is not None else "container runtime"
logger.info(f"Starting Actian VectorAI DB container via {runtime}...")
result = run_compose(["up", "-d"])
if result.returncode != 0:
return False, f"compose up failed: {result.stderr.strip()}"
...
```
This keeps all existing functionality (Docker + Podman, the specific error message and `CompletedProcess` return from `run_compose`) but removes:
- global `_runtime_cache` / `_compose_cache` and the `""`/`[]` sentinels
- the two-phase `info`/`--version` probing split across `detect_runtime` and `_compose_cmd`
- the generic `_probe()` helper used in subtly different ways
The behavior becomes: *“try a small ordered set of compose commands, pick the first working one, infer runtime from that”*, which matches the reviewer’s suggested mental model and is easier to reason about and test.
</issue_to_address>
### Comment 4
<location path="src/context8/cli/commands/ops.py" line_range="76" />
<code_context>
result = subprocess.run([runtime, "info"], capture_output=True, text=True, timeout=5)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 5
<location path="src/context8/docker.py" line_range="63" />
<code_context>
subprocess.run(cmd, capture_output=True, check=True, timeout=5)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 6
<location path="src/context8/docker.py" line_range="136-144" />
<code_context>
return subprocess.CompletedProcess(
args=args,
returncode=1,
stdout="",
stderr=(
"no compose tool found — install one of: "
"`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
),
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 7
<location path="src/context8/docker.py" line_range="145-149" />
<code_context>
return subprocess.run(
cmd + args,
cwd=str(_compose_dir()),
capture_output=True,
text=True,
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Comment 8
<location path="src/context8/docker.py" line_range="159-163" />
<code_context>
result = subprocess.run(
[runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
capture_output=True,
text=True,
timeout=5,
)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| from ...storage import StorageService | ||
|
|
||
| storage = StorageService() | ||
| created = storage.initialize() | ||
| if created: | ||
| _log("collection created") | ||
| storage.close() |
There was a problem hiding this comment.
suggestion: Consider handling storage initialization failures to produce a clearer fatal message.
If StorageService() or storage.initialize() fails (e.g. DB unreachable, bad config), the exception will currently bubble up and crash without the clear stderr message you use for the container bootstrap. Consider wrapping storage initialization in a try/except, logging a FATAL via _log, and exiting with SystemExit(1) so failures are reported consistently and predictably.
| from ...storage import StorageService | |
| storage = StorageService() | |
| created = storage.initialize() | |
| if created: | |
| _log("collection created") | |
| storage.close() | |
| from ...storage import StorageService | |
| storage = None | |
| try: | |
| storage = StorageService() | |
| created = storage.initialize() | |
| if created: | |
| _log("collection created") | |
| except Exception as e: | |
| _log(f"FATAL: storage initialization failed: {e}") | |
| raise SystemExit(1) | |
| finally: | |
| if storage is not None: | |
| try: | |
| storage.close() | |
| except Exception as e: | |
| _log(f"warning: failed to close storage cleanly ({e})") |
| entry = data["mcpServers"]["context8"] | ||
| assert entry["command"] == "python" | ||
| assert entry["args"] == ["-m", "context8.mcp.server"] | ||
| assert "command" in entry | ||
| assert isinstance(entry["args"], list) |
There was a problem hiding this comment.
issue (testing): The updated assertions are very loose and no longer guarantee that the MCP server command is wired to the new serve entrypoint.
The previous test verified that Claude launched python -m context8.mcp.server. Now it only checks that a command exists and args is a list, which would still pass even if the config pointed to the wrong binary or arguments.
To keep this test useful with get_server_command, I suggest either:
- asserting that
entry['command']/entry['args']exactly matchcontext8.config.get_server_command()(import and compare the full list), or - asserting that
'serve'appears in the args and that the command is either thecontext8script or the current interpreter.
This will ensure the test still proves agents are routed through the serve CLI rather than just checking for any command.
|
|
||
| CONTAINER_NAME = "context8_db" | ||
|
|
||
| _runtime_cache: str | None = None |
There was a problem hiding this comment.
issue (complexity): Consider simplifying runtime and compose detection by replacing the multiple helper functions and global caches with a single linear resolver that returns both the compose command and runtime.
You can drop most of the indirection by collapsing runtime/compose detection into a single linear resolver and removing the global caches/sentinels. That keeps Docker/Podman support but makes the control flow much easier to follow.
For example, you can replace _runtime_cache, _compose_cache, _probe, detect_runtime and _compose_cmd with something like:
def _iter_compose_candidates() -> list[tuple[list[str], str]]:
# command, inferred runtime
return [
(["docker", "compose"], "docker"),
(["docker-compose"], "docker"),
(["podman", "compose"], "podman"),
(["podman-compose"], "podman"),
]
def _resolve_compose() -> tuple[list[str], str] | None:
"""Return (compose_cmd, runtime) or None if nothing is usable."""
for cmd, runtime in _iter_compose_candidates():
try:
subprocess.run(
cmd + ["version"],
capture_output=True,
check=True,
timeout=5,
)
return cmd, runtime
except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
continue
return NoneThen run_compose, is_container_running, and ensure_running become simpler and don’t need to know about probing or caches:
def run_compose(args: list[str]) -> subprocess.CompletedProcess:
"""Run a compose command against the Context8 compose file."""
_ensure_compose_file()
resolved = _resolve_compose()
if resolved is None:
return subprocess.CompletedProcess(
args=args,
returncode=1,
stdout="",
stderr=(
"no compose tool found — install one of: "
"`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
),
)
cmd, _runtime = resolved
return subprocess.run(
cmd + args,
cwd=str(_compose_dir()),
capture_output=True,
text=True,
)def is_container_running() -> bool:
"""Check if the context8_db container is running under docker or podman."""
resolved = _resolve_compose()
if resolved is None:
return False
_cmd, runtime = resolved
try:
result = subprocess.run(
[runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
capture_output=True,
text=True,
timeout=5,
)
return bool(result.stdout.strip()) and "Up" in result.stdout
except Exception:
return Falsedef ensure_running(timeout_secs: int = 30) -> tuple[bool, str]:
"""Start the container if not running, wait for it to be healthy."""
if is_container_running():
return True, "already running"
resolved = _resolve_compose()
runtime = resolved[1] if resolved is not None else "container runtime"
logger.info(f"Starting Actian VectorAI DB container via {runtime}...")
result = run_compose(["up", "-d"])
if result.returncode != 0:
return False, f"compose up failed: {result.stderr.strip()}"
...This keeps all existing functionality (Docker + Podman, the specific error message and CompletedProcess return from run_compose) but removes:
- global
_runtime_cache/_compose_cacheand the""/[]sentinels - the two-phase
info/--versionprobing split acrossdetect_runtimeand_compose_cmd - the generic
_probe()helper used in subtly different ways
The behavior becomes: “try a small ordered set of compose commands, pick the first working one, infer runtime from that”, which matches the reviewer’s suggested mental model and is easier to reason about and test.
| ) | ||
| else: | ||
| try: | ||
| result = subprocess.run([runtime, "info"], capture_output=True, text=True, timeout=5) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| return ["docker", "compose"] | ||
| except (subprocess.CalledProcessError, FileNotFoundError): | ||
| return ["docker-compose"] | ||
| subprocess.run(cmd, capture_output=True, check=True, timeout=5) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| return subprocess.CompletedProcess( | ||
| args=args, | ||
| returncode=1, | ||
| stdout="", | ||
| stderr=( | ||
| "no compose tool found — install one of: " | ||
| "`docker compose`, `podman compose`, `podman-compose`, `docker-compose`" | ||
| ), | ||
| ) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| return subprocess.run( | ||
| cmd, | ||
| cmd + args, | ||
| cwd=str(_compose_dir()), | ||
| capture_output=True, | ||
| text=True, |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| result = subprocess.run( | ||
| ["docker", "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{{{.Status}}}}"], | ||
| [runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=5, |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
There was a problem hiding this comment.
Pull request overview
This PR refactors Context8’s container/runtime management and MCP server startup so context8 serve can self-bootstrap the Actian VectorAI DB (Docker or Podman) before starting the stdio MCP loop.
Changes:
- Add Docker/Podman runtime detection plus compose-command resolution, and update user-facing lifecycle/doctor messaging accordingly.
- Route agent MCP launch config through
context8 serve(instead ofpython -m context8.mcp.server) and add a--no-bootstrapescape hatch. - Fix sparse search invocation to pass sparse indices/values in the shape expected by the VectorAI client, and remove
confidencefields from seed data.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/context8/docker.py |
Adds runtime + compose probing/caching and updates container start/check logic to be runtime-agnostic. |
src/context8/cli/commands/serve.py |
Adds idempotent bootstrap before starting MCP server; adds --no-bootstrap. |
src/context8/config.py |
Updates MCP server command generation to invoke context8 serve. |
src/context8/cli/commands/ops.py |
Updates doctor to check a generic container runtime rather than Docker-only. |
src/context8/cli/commands/lifecycle.py |
Updates init/start messaging to refer to Docker or Podman. |
src/context8/search/engine.py |
Updates sparse-vector search parameter passing. |
src/context8/ingest/seed.py |
Removes per-record confidence fields from SEED_DATA. |
tests/test_agents.py |
Loosens assertions around generated agent MCP server command/args. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert "command" in entry | ||
| assert isinstance(entry["args"], list) |
There was a problem hiding this comment.
The assertions here became very loose ("command" in entry and args is a list), which can let regressions slip through (e.g., missing the new serve subcommand). Consider making this deterministic by patching shutil.which("context8") and then asserting the generated {command, args} matches one of the expected shapes (installed script + serve, or sys.executable -m context8 serve).
| if not is_container_running(): | ||
| _log("starting DB container...") | ||
| ok, msg = ensure_running(timeout_secs=30) | ||
| if not ok: | ||
| _log(f"FATAL: container failed to start: {msg}") | ||
| raise SystemExit(1) | ||
| _log(f"container ready ({msg})") | ||
|
|
There was a problem hiding this comment.
_bootstrap() only calls ensure_running() when is_container_running() is false. If the container exists but the DB is still starting (or unhealthy), the bootstrap will skip the health-check/wait and may fail immediately when StorageService() connects. Consider calling ensure_running() unconditionally here (and letting it be idempotent) or otherwise explicitly waiting for VectorAIClient.health_check() even when the container is already "Up".
| if not is_container_running(): | |
| _log("starting DB container...") | |
| ok, msg = ensure_running(timeout_secs=30) | |
| if not ok: | |
| _log(f"FATAL: container failed to start: {msg}") | |
| raise SystemExit(1) | |
| _log(f"container ready ({msg})") | |
| was_running = is_container_running() | |
| if was_running: | |
| _log("verifying DB container readiness...") | |
| else: | |
| _log("starting DB container...") | |
| ok, msg = ensure_running(timeout_secs=30) | |
| if not ok: | |
| _log(f"FATAL: container failed to start: {msg}") | |
| raise SystemExit(1) | |
| _log(f"container ready ({msg})") |
| created = storage.initialize() | ||
| if created: | ||
| _log("collection created") | ||
| storage.close() |
There was a problem hiding this comment.
storage.close() won’t run if storage.initialize() raises (e.g., DB connection issue). Wrap the StorageService lifecycle in a try/finally (or context manager) so the client connection is always closed during bootstrap failures.
| created = storage.initialize() | |
| if created: | |
| _log("collection created") | |
| storage.close() | |
| try: | |
| created = storage.initialize() | |
| if created: | |
| _log("collection created") | |
| finally: | |
| storage.close() |
| if is_container_running(): | ||
| return True, "already running" |
There was a problem hiding this comment.
ensure_running() returns (True, "already running") purely based on docker/podman ps output, without verifying the DB is actually accepting connections. Since callers use this as a readiness gate, consider still running the VectorAIClient.health_check() loop when the container is already running (and only return success once the DB responds).
| "error_type": "ModuleNotFoundError", | ||
| "language": "python", | ||
| "tags": ["virtual-env", "opencv", "import"], | ||
| "confidence": 0.95, | ||
| }, |
There was a problem hiding this comment.
These seed records no longer include a confidence field, but seed ingestion still applies confidence=data.get("confidence", 0.9) (so all seed records now get the same default). If confidence is meant to influence ranking/UX, consider keeping per-record values; if it’s truly unused, consider removing/ignoring confidence during seed ingestion to avoid an unintentional behavior change.
Major release with contributions from @pathfindermilan: PR #2: Fix hybrid search (sparse was silently disabled), async MCP, embedding cache collision, browse resource leak, configurable dims PR #3: Docker+Podman auto-detection, self-bootstrapping serve command, --no-bootstrap flag, sparse search fix Plus: changelog in README, logo assets, demo video script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit introduces significant changes to how Context8 manages its containerized Actian VectorAI DB and its bootstrapping process.
Key changes include:
servecommand): Theservecommand now includes an idempotent bootstrap mechanism that ensures the database container is running, the collection is initialized, and embedding models are cached before the MCP server starts. This allowscontext8 serveto work even on a cold machine without prior manual setup.serveOption: A--no-bootstrapflag is added to theservecommand, allowing users to skip the auto-bootstrap if they prefer to manage these aspects manually.docker.py: Thedocker.pymodule has been refactored to be more robust and flexible, handling runtime detection and compose command variations.confidencefrom seed data: Theconfidencefield was removed from theSEED_DATAas it was not being used or populated.SearchEnginenow correctly passes sparse vector data to theVectorAIClient.points.searchmethod.Summary by Sourcery
Refactor container management and server bootstrap to support multiple runtimes and ensure the MCP server can start from a cold environment.
New Features:
servecommand that auto-starts the DB container, initializes storage, and pre-downloads embedding models by default.--no-bootstrapoption to theserveCLI command to allow manual control over initialization steps.Bug Fixes:
VectorAIClient.points.search.serveentry point instead of the internal module path, keeping stdio bootstrap-safe.Enhancements:
Documentation:
Tests:
Chores:
confidenceattribute from ingestion seed data entries.Summary by CodeRabbit
New Features
--no-bootstrapflag to serve command for skipping initialization stepsImprovements