refactor: container management and bootstrap logic by pathfindermilan · Pull Request #3 · hallelx2/context8

pathfindermilan · 2026-04-25T15:45:57Z

This commit introduces significant changes to how Context8 manages its containerized Actian VectorAI DB and its bootstrapping process.

Key changes include:

Generic Container Runtime Support: The code now detects and supports both Docker and Podman, moving away from hardcoded Docker assumptions. This is reflected in user-facing messages and internal commands.
Improved Bootstrap (serve command): The serve command now includes an idempotent bootstrap mechanism that ensures the database container is running, the collection is initialized, and embedding models are cached before the MCP server starts. This allows context8 serve to work even on a cold machine without prior manual setup.
Detached serve Option: A --no-bootstrap flag is added to the serve command, allowing users to skip the auto-bootstrap if they prefer to manage these aspects manually.
Simplified docker.py: The docker.py module has been refactored to be more robust and flexible, handling runtime detection and compose command variations.
Removed confidence from seed data: The confidence field was removed from the SEED_DATA as it was not being used or populated.
VectorAI Search Update: The SearchEngine now correctly passes sparse vector data to the VectorAIClient.points.search method.

Summary by Sourcery

Refactor container management and server bootstrap to support multiple runtimes and ensure the MCP server can start from a cold environment.

New Features:

Add generic container runtime detection to support both Docker and Podman for compose and container operations.
Introduce an idempotent bootstrap phase in the serve command that auto-starts the DB container, initializes storage, and pre-downloads embedding models by default.
Add a --no-bootstrap option to the serve CLI command to allow manual control over initialization steps.

Bug Fixes:

Correct VectorAI sparse search calls to pass values and sparse indices directly to VectorAIClient.points.search.
Update the MCP server command configuration so agents launch the serve entry point instead of the internal module path, keeping stdio bootstrap-safe.

Enhancements:

Improve container lifecycle messaging and doctor checks to be runtime-agnostic and more informative when daemons are unavailable.
Refine the docker/compose utility module to probe available runtimes and compose frontends robustly and cache detection results.

Documentation:

Clarify user-facing help text and docstrings around initialization, container usage, and the recommended MCP server command.

Tests:

Relax agent configuration tests to match the new, more flexible MCP server command and argument configuration.

Chores:

Remove the unused confidence attribute from ingestion seed data entries.

Summary by CodeRabbit

New Features
- Added --no-bootstrap flag to serve command for skipping initialization steps
- Container runtime detection now supports both Docker and Podman automatically
Improvements
- Generalized messaging and error reporting from Docker-specific to "container runtime" terminology
- Enhanced health checks and diagnostics for runtime availability detection

This commit introduces significant changes to how Context8 manages its containerized Actian VectorAI DB and its bootstrapping process. Key changes include: - **Generic Container Runtime Support:** The code now detects and supports both Docker and Podman, moving away from hardcoded Docker assumptions. This is reflected in user-facing messages and internal commands. - **Improved Bootstrap (`serve` command):** The `serve` command now includes an idempotent bootstrap mechanism that ensures the database container is running, the collection is initialized, and embedding models are cached *before* the MCP server starts. This allows `context8 serve` to work even on a cold machine without prior manual setup. - **Detached `serve` Option:** A `--no-bootstrap` flag is added to the `serve` command, allowing users to skip the auto-bootstrap if they prefer to manage these aspects manually. - **Simplified `docker.py`:** The `docker.py` module has been refactored to be more robust and flexible, handling runtime detection and compose command variations. - **Removed `confidence` from seed data:** The `confidence` field was removed from the `SEED_DATA` as it was not being used or populated. - **VectorAI Search Update:** The `SearchEngine` now correctly passes sparse vector data to the `VectorAIClient.points.search` method.

coderabbitai · 2026-04-25T15:46:03Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The changes introduce dynamic Docker/Podman container runtime detection, add a bootstrap step to the serve command for pre-initialization tasks, update the MCP server startup to use the CLI entry point, and generalize container runtime references throughout the codebase from Docker-specific to runtime-agnostic language.

Changes

Cohort / File(s)	Summary
Container Runtime Detection & Generalization `src/context8/docker.py`, `src/context8/cli/commands/lifecycle.py`, `src/context8/cli/commands/ops.py`	Introduces `detect_runtime()` to probe for Docker or Podman availability; updates `run_compose()`, `is_container_running()`, and health check logic to use detected runtime dynamically; generalizes messaging from "Docker" to generic container runtime references.
Serve Command Bootstrap Feature `src/context8/cli/commands/serve.py`	Adds `--no-bootstrap` flag and new `_bootstrap()` step that starts the DB container, initializes storage, and pre-downloads embedding models; failures during embedding download are logged as warnings without blocking server startup.
MCP Server Launch Refactor `src/context8/config.py`	Changes `get_server_command()` from invoking `python -m context8.mcp.server` to using the installed `context8 serve` CLI entry point; attempts to locate `context8` script on PATH with interpreter-based fallback.
Search & Data Updates `src/context8/search/engine.py`, `src/context8/ingest/seed.py`	Refactors sparse keyword search to pass sparse values and indices as separate parameters; removes per-record confidence fields from seed data, allowing default fallback values.
Test Relaxation `tests/test_agents.py`	Reduces coupling in agent configuration test by validating field presence and types rather than exact command/args values.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Serve as serve<br/>Command
    participant Bootstrap as _bootstrap()<br/>Process
    participant Docker as Container<br/>Runtime<br/>(Detect)
    participant DB as DB<br/>Container
    participant Models as Embedding<br/>Models
    participant MCP as MCP<br/>Server
    
    User->>Serve: context8 serve [--no-bootstrap]
    Serve->>Serve: Check --no-bootstrap flag
    alt Bootstrap enabled (default)
        Serve->>Bootstrap: Run _bootstrap()
        Bootstrap->>Docker: detect_runtime()
        Docker-->>Bootstrap: "docker" or "podman"
        Bootstrap->>DB: Start/ensure DB container running
        DB-->>Bootstrap: Container running
        Bootstrap->>Bootstrap: Initialize storage & create collection
        Bootstrap->>Models: Attempt to download embedding models
        alt Model download succeeds
            Models-->>Bootstrap: Models ready
        else Model download fails
            Models-->>Bootstrap: Download error
            Bootstrap->>Bootstrap: Log warning to stderr
        end
        Bootstrap-->>Serve: Bootstrap complete
    else Bootstrap disabled (--no-bootstrap)
        Serve->>Serve: Skip bootstrap
    end
    Serve->>MCP: Start MCP server
    MCP-->>User: Server running

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly Related PRs

PR #2: Modifies sparse-vector search construction in src/context8/search/engine.py — related to the sparse indices/values parameter refactoring.
PR #1: Updates the same CLI command modules and Docker/runtime detection logic in docker.py and config.py — overlapping edits to container runtime selection and health checks.

Poem

🐰 Hop-hop, no more Docker-only ways,
Podman joins the merry phase!
Bootstrap boots the DB awake,
Models downloaded for embedding's sake,
Runtime-agnostic, we prance and play! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: refactoring container management (Docker/Podman support) and adding bootstrap logic to the serve command.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/hybrid-search-and-mcp-bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai · 2026-04-25T15:46:40Z

Reviewer's Guide

Refactors container management to support Docker and Podman, adds an idempotent bootstrap flow to context8 serve, routes agent startup through the CLI, and cleans up seed/search behavior and user-facing messaging.

Sequence diagram for the new serve bootstrap and MCP startup flow

sequenceDiagram
    actor Agent
    participant CLI_serve as CLI_serve
    participant DockerModule as docker
    participant StorageService as StorageService
    participant EmbeddingService as EmbeddingService
    participant MCPServer as MCP_server
    participant Runtime as container_runtime
    participant VectorAIDB as VectorAI_DB_container

    Agent->>CLI_serve: invoke context8 serve
    activate CLI_serve
    CLI_serve->>CLI_serve: check no_bootstrap flag
    alt bootstrap_enabled
        CLI_serve->>CLI_serve: _bootstrap()
        activate CLI_serve
        CLI_serve->>DockerModule: is_container_running()
        activate DockerModule
        DockerModule->>DockerModule: detect_runtime()
        DockerModule->>Runtime: probe runtime info
        Runtime-->>DockerModule: status
        DockerModule-->>CLI_serve: running_state
        deactivate DockerModule
        alt container_not_running
            CLI_serve->>DockerModule: ensure_running(timeout_secs)
            activate DockerModule
            DockerModule->>DockerModule: detect_runtime()
            DockerModule->>DockerModule: _compose_cmd()
            DockerModule->>Runtime: compose up -d
            Runtime-->>DockerModule: result
            DockerModule-->>CLI_serve: (ok, msg)
            deactivate DockerModule
            CLI_serve->>VectorAIDB: wait until ready
            VectorAIDB-->>CLI_serve: ready
        else container_running
            CLI_serve->>CLI_serve: skip start
        end

        CLI_serve->>StorageService: initialize()
        activate StorageService
        StorageService-->>CLI_serve: created_flag
        StorageService->>StorageService: close()
        deactivate StorageService

        CLI_serve->>EmbeddingService: ensure_models_downloaded()
        activate EmbeddingService
        EmbeddingService-->>CLI_serve: models_cached_or_lazy
        deactivate EmbeddingService
    else no_bootstrap
        CLI_serve->>CLI_serve: skip _bootstrap
    end

    CLI_serve->>MCPServer: run_server()
    activate MCPServer
    MCPServer-->>Agent: MCP stdio session
    deactivate MCPServer
    deactivate CLI_serve

Class diagram for updated container, bootstrap, config, and search logic

classDiagram
    class DockerModule {
        <<module>>
        +str CONTEXT8_DIR
        +str COMPOSE_TEMPLATE
        +str CONTAINER_NAME
        +str _runtime_cache
        +list~str~ _compose_cache
        +Path _compose_dir()
        +Path _compose_path()
        +Path _ensure_compose_file()
        +bool _probe(cmd)
        +str detect_runtime()
        +list~str~ _compose_cmd()
        +CompletedProcess run_compose(args)
        +bool is_container_running()
        +tuple~bool, str~ ensure_running(timeout_secs)
    }

    class ServeCommand {
        <<module>>
        +serve(no_bootstrap)
        -_log(msg)
        -_bootstrap()
    }

    class ConfigModule {
        <<module>>
        +list~str~ get_server_command()
    }

    class DoctorCommand {
        <<module>>
        +doctor()
    }

    class LifecycleCommands {
        <<module>>
        +start(detach)
        +init(seed, github, force)
    }

    class SearchEngine {
        <<class>>
        +StorageService storage
        +_search_sparse(query, indices, values, search_filter, limit)
    }

    class VectorAIClient {
        <<external>>
        +points.search(collection_name, vector, using, sparse_indices, filter, limit, with_payload)
    }

    class StorageService {
        <<class>>
        +initialize()
        +close()
    }

    class EmbeddingService {
        <<class>>
        +ensure_models_downloaded()
    }

    ServeCommand --> DockerModule : uses
    ServeCommand --> StorageService : uses
    ServeCommand --> EmbeddingService : uses

    DoctorCommand --> DockerModule : detect_runtime(), is_container_running()

    LifecycleCommands --> DockerModule : ensure_running()

    ConfigModule --> ServeCommand : routes to serve

    SearchEngine --> VectorAIClient : uses points.search
    SearchEngine --> StorageService : uses client

    DockerModule ..> VectorAIClient : DB connection via container

File-Level Changes

Change	Details	Files
Generalize container runtime and compose handling to support both Docker and Podman with cached detection and improved error reporting.	Introduce runtime and compose command probing helpers that detect usable Docker or Podman installations and cache results. Update compose command resolution to try multiple docker/podman compose variants and return a structured failure when none are available. Make container status checks and startup logic runtime-agnostic and update log messages to refer to a generic container runtime.	`src/context8/docker.py` `src/context8/cli/commands/lifecycle.py` `src/context8/cli/commands/ops.py`
Make `context8 serve` self-bootstrapping while allowing opt-out and keeping MCP stdio output clean.	Add a bootstrap routine that ensures the DB container is running, the collection is initialized, and embedding models are downloaded before starting the MCP server. Ensure bootstrap logging goes only to stderr and add a `--no-bootstrap` flag to allow manual container/collection management. Adjust the serve command implementation to invoke bootstrap before running the MCP server event loop.	`src/context8/cli/commands/serve.py`
Route MCP server startup through the `serve` CLI entry point for better portability and bootstrap behavior.	Change the server command configuration to prefer the installed `context8` script with the `serve` subcommand, falling back to the current Python interpreter when necessary. Relax agent configuration tests to assert shape rather than exact command/args values to accommodate the new resolution logic.	`src/context8/config.py` `tests/test_agents.py`
Align seed data and search behavior with actual usage and VectorAI expectations.	Remove the unused `confidence` field from all seed records to match ingestion behavior. Fix sparse search by sending values as the main vector and passing indices via the `sparse_indices` argument to `VectorAIClient.points.search`.	`src/context8/ingest/seed.py` `src/context8/search/engine.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

get_server_command() now returns context8 serve instead of python -m context8.mcp.server. Test assertion updated to check structure rather than exact command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sourcery-ai

Hey - I've found 5 security issues, 3 other issues, and left some high level feedback:

Security issues:

Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

In _bootstrap, catching a broad Exception around EmbeddingService.ensure_models_downloaded() risks hiding real bugs in model setup; consider narrowing this to expected exceptions (e.g. network or model-availability errors) so unexpected failures still surface.
The _runtime_cache and _compose_cache values in docker.py are never invalidated, so a process that starts before a runtime/daemon is available will continue to report no runtime/compose even if the user later starts Docker/Podman; if this module is used in any long-lived process, consider adding a way to force a re-probe.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_bootstrap`, catching a broad `Exception` around `EmbeddingService.ensure_models_downloaded()` risks hiding real bugs in model setup; consider narrowing this to expected exceptions (e.g. network or model-availability errors) so unexpected failures still surface.
- The `_runtime_cache` and `_compose_cache` values in `docker.py` are never invalidated, so a process that starts before a runtime/daemon is available will continue to report no runtime/compose even if the user later starts Docker/Podman; if this module is used in any long-lived process, consider adding a way to force a re-probe.

## Individual Comments

### Comment 1
<location path="src/context8/cli/commands/serve.py" line_range="30-36" />
<code_context>
+    print(f"[context8] {msg}", file=sys.stderr, flush=True)
+
+
+def _bootstrap() -> None:
+    """Idempotent bootstrap: container up, collection ready, models cached.
+
+    Safe to run on every `serve` invocation — each step is a no-op when already
+    satisfied. All output goes to stderr so the MCP stdio protocol stays clean.
+    """
+    from ...docker import ensure_running, is_container_running
+
+    if not is_container_running():
+        _log("starting DB container...")
+        ok, msg = ensure_running(timeout_secs=30)
+        if not ok:
+            _log(f"FATAL: container failed to start: {msg}")
+            raise SystemExit(1)
+        _log(f"container ready ({msg})")
+
+    from ...storage import StorageService
+
+    storage = StorageService()
+    created = storage.initialize()
+    if created:
+        _log("collection created")
+    storage.close()
+
+    try:
</code_context>
<issue_to_address>
**suggestion:** Consider handling storage initialization failures to produce a clearer fatal message.

If `StorageService()` or `storage.initialize()` fails (e.g. DB unreachable, bad config), the exception will currently bubble up and crash without the clear stderr message you use for the container bootstrap. Consider wrapping storage initialization in a try/except, logging a `FATAL` via `_log`, and exiting with `SystemExit(1)` so failures are reported consistently and predictably.

```suggestion
    from ...storage import StorageService

    storage = None
    try:
        storage = StorageService()
        created = storage.initialize()
        if created:
            _log("collection created")
    except Exception as e:
        _log(f"FATAL: storage initialization failed: {e}")
        raise SystemExit(1)
    finally:
        if storage is not None:
            try:
                storage.close()
            except Exception as e:
                _log(f"warning: failed to close storage cleanly ({e})")
```
</issue_to_address>

### Comment 2
<location path="tests/test_agents.py" line_range="60-62" />
<code_context>
             data = json.loads(config_path.read_text())
             assert "context8" in data["mcpServers"]
             entry = data["mcpServers"]["context8"]
-            assert entry["command"] == "python"
-            assert entry["args"] == ["-m", "context8.mcp.server"]
+            assert "command" in entry
+            assert isinstance(entry["args"], list)

     def test_idempotent(self, tmp_path):
</code_context>
<issue_to_address>
**issue (testing):** The updated assertions are very loose and no longer guarantee that the MCP server command is wired to the new `serve` entrypoint.

The previous test verified that Claude launched `python -m context8.mcp.server`. Now it only checks that a command exists and `args` is a list, which would still pass even if the config pointed to the wrong binary or arguments.

To keep this test useful with `get_server_command`, I suggest either:
- asserting that `entry['command']`/`entry['args']` exactly match `context8.config.get_server_command()` (import and compare the full list), or
- asserting that `'serve'` appears in the args and that the command is either the `context8` script or the current interpreter.

This will ensure the test still proves agents are routed through the `serve` CLI rather than just checking for any command.
</issue_to_address>

### Comment 3
<location path="src/context8/docker.py" line_range="37" />
<code_context>

 CONTAINER_NAME = "context8_db"

+_runtime_cache: str | None = None
+_compose_cache: list[str] | None = None
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying runtime and compose detection by replacing the multiple helper functions and global caches with a single linear resolver that returns both the compose command and runtime.

You can drop most of the indirection by collapsing runtime/compose detection into a single linear resolver and removing the global caches/sentinels. That keeps Docker/Podman support but makes the control flow much easier to follow.

For example, you can replace `_runtime_cache`, `_compose_cache`, `_probe`, `detect_runtime` and `_compose_cmd` with something like:

```python
def _iter_compose_candidates() -> list[tuple[list[str], str]]:
    # command, inferred runtime
    return [
        (["docker", "compose"], "docker"),
        (["docker-compose"], "docker"),
        (["podman", "compose"], "podman"),
        (["podman-compose"], "podman"),
    ]


def _resolve_compose() -> tuple[list[str], str] | None:
    """Return (compose_cmd, runtime) or None if nothing is usable."""
    for cmd, runtime in _iter_compose_candidates():
        try:
            subprocess.run(
                cmd + ["version"],
                capture_output=True,
                check=True,
                timeout=5,
            )
            return cmd, runtime
        except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
            continue
    return None
```

Then `run_compose`, `is_container_running`, and `ensure_running` become simpler and don’t need to know about probing or caches:

```python
def run_compose(args: list[str]) -> subprocess.CompletedProcess:
    """Run a compose command against the Context8 compose file."""
    _ensure_compose_file()
    resolved = _resolve_compose()
    if resolved is None:
        return subprocess.CompletedProcess(
            args=args,
            returncode=1,
            stdout="",
            stderr=(
                "no compose tool found — install one of: "
                "`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
            ),
        )

    cmd, _runtime = resolved
    return subprocess.run(
        cmd + args,
        cwd=str(_compose_dir()),
        capture_output=True,
        text=True,
    )
```

```python
def is_container_running() -> bool:
    """Check if the context8_db container is running under docker or podman."""
    resolved = _resolve_compose()
    if resolved is None:
        return False

    _cmd, runtime = resolved
    try:
        result = subprocess.run(
            [runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
            capture_output=True,
            text=True,
            timeout=5,
        )
        return bool(result.stdout.strip()) and "Up" in result.stdout
    except Exception:
        return False
```

```python
def ensure_running(timeout_secs: int = 30) -> tuple[bool, str]:
    """Start the container if not running, wait for it to be healthy."""
    if is_container_running():
        return True, "already running"

    resolved = _resolve_compose()
    runtime = resolved[1] if resolved is not None else "container runtime"
    logger.info(f"Starting Actian VectorAI DB container via {runtime}...")

    result = run_compose(["up", "-d"])
    if result.returncode != 0:
        return False, f"compose up failed: {result.stderr.strip()}"
    ...
```

This keeps all existing functionality (Docker + Podman, the specific error message and `CompletedProcess` return from `run_compose`) but removes:

- global `_runtime_cache` / `_compose_cache` and the `""`/`[]` sentinels
- the two-phase `info`/`--version` probing split across `detect_runtime` and `_compose_cmd`
- the generic `_probe()` helper used in subtly different ways

The behavior becomes: *“try a small ordered set of compose commands, pick the first working one, infer runtime from that”*, which matches the reviewer’s suggested mental model and is easier to reason about and test.
</issue_to_address>

### Comment 4
<location path="src/context8/cli/commands/ops.py" line_range="76" />
<code_context>
            result = subprocess.run([runtime, "info"], capture_output=True, text=True, timeout=5)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Comment 5
<location path="src/context8/docker.py" line_range="63" />
<code_context>
        subprocess.run(cmd, capture_output=True, check=True, timeout=5)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Comment 6
<location path="src/context8/docker.py" line_range="136-144" />
<code_context>
        return subprocess.CompletedProcess(
            args=args,
            returncode=1,
            stdout="",
            stderr=(
                "no compose tool found — install one of: "
                "`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
            ),
        )
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Comment 7
<location path="src/context8/docker.py" line_range="145-149" />
<code_context>
    return subprocess.run(
        cmd + args,
        cwd=str(_compose_dir()),
        capture_output=True,
        text=True,
    )
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Comment 8
<location path="src/context8/docker.py" line_range="159-163" />
<code_context>
        result = subprocess.run(
            [runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
            capture_output=True,
            text=True,
            timeout=5,
        )
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-25T15:50:51Z

+    from ...storage import StorageService
+
+    storage = StorageService()
+    created = storage.initialize()
+    if created:
+        _log("collection created")
+    storage.close()


suggestion: Consider handling storage initialization failures to produce a clearer fatal message.

If StorageService() or storage.initialize() fails (e.g. DB unreachable, bad config), the exception will currently bubble up and crash without the clear stderr message you use for the container bootstrap. Consider wrapping storage initialization in a try/except, logging a FATAL via _log, and exiting with SystemExit(1) so failures are reported consistently and predictably.

Suggested change

from ...storage import StorageService

storage = StorageService()

created = storage.initialize()

if created:

_log("collection created")

storage.close()

from ...storage import StorageService

storage = None

try:

storage = StorageService()

created = storage.initialize()

if created:

_log("collection created")

except Exception as e:

_log(f"FATAL: storage initialization failed: {e}")

raise SystemExit(1)

finally:

if storage is not None:

try:

storage.close()

except Exception as e:

_log(f"warning: failed to close storage cleanly ({e})")

sourcery-ai · 2026-04-25T15:50:51Z

            entry = data["mcpServers"]["context8"]
-            assert entry["command"] == "python"
-            assert entry["args"] == ["-m", "context8.mcp.server"]
+            assert "command" in entry
+            assert isinstance(entry["args"], list)


issue (testing): The updated assertions are very loose and no longer guarantee that the MCP server command is wired to the new serve entrypoint.

The previous test verified that Claude launched python -m context8.mcp.server. Now it only checks that a command exists and args is a list, which would still pass even if the config pointed to the wrong binary or arguments.

To keep this test useful with get_server_command, I suggest either:

asserting that entry['command']/entry['args'] exactly match context8.config.get_server_command() (import and compare the full list), or

asserting that 'serve' appears in the args and that the command is either the context8 script or the current interpreter.

This will ensure the test still proves agents are routed through the serve CLI rather than just checking for any command.

sourcery-ai · 2026-04-25T15:50:51Z


 CONTAINER_NAME = "context8_db"

+_runtime_cache: str | None = None


issue (complexity): Consider simplifying runtime and compose detection by replacing the multiple helper functions and global caches with a single linear resolver that returns both the compose command and runtime.

You can drop most of the indirection by collapsing runtime/compose detection into a single linear resolver and removing the global caches/sentinels. That keeps Docker/Podman support but makes the control flow much easier to follow.

For example, you can replace _runtime_cache, _compose_cache, _probe, detect_runtime and _compose_cmd with something like:

def _iter_compose_candidates() -> list[tuple[list[str], str]]: # command, inferred runtime return [ (["docker", "compose"], "docker"), (["docker-compose"], "docker"), (["podman", "compose"], "podman"), (["podman-compose"], "podman"), ] def _resolve_compose() -> tuple[list[str], str] | None: """Return (compose_cmd, runtime) or None if nothing is usable.""" for cmd, runtime in _iter_compose_candidates(): try: subprocess.run( cmd + ["version"], capture_output=True, check=True, timeout=5, ) return cmd, runtime except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired): continue return None

Then run_compose, is_container_running, and ensure_running become simpler and don’t need to know about probing or caches:

def run_compose(args: list[str]) -> subprocess.CompletedProcess: """Run a compose command against the Context8 compose file.""" _ensure_compose_file() resolved = _resolve_compose() if resolved is None: return subprocess.CompletedProcess( args=args, returncode=1, stdout="", stderr=( "no compose tool found — install one of: " "`docker compose`, `podman compose`, `podman-compose`, `docker-compose`" ), ) cmd, _runtime = resolved return subprocess.run( cmd + args, cwd=str(_compose_dir()), capture_output=True, text=True, )

def is_container_running() -> bool: """Check if the context8_db container is running under docker or podman.""" resolved = _resolve_compose() if resolved is None: return False _cmd, runtime = resolved try: result = subprocess.run( [runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"], capture_output=True, text=True, timeout=5, ) return bool(result.stdout.strip()) and "Up" in result.stdout except Exception: return False

def ensure_running(timeout_secs: int = 30) -> tuple[bool, str]: """Start the container if not running, wait for it to be healthy.""" if is_container_running(): return True, "already running" resolved = _resolve_compose() runtime = resolved[1] if resolved is not None else "container runtime" logger.info(f"Starting Actian VectorAI DB container via {runtime}...") result = run_compose(["up", "-d"]) if result.returncode != 0: return False, f"compose up failed: {result.stderr.strip()}" ...

This keeps all existing functionality (Docker + Podman, the specific error message and CompletedProcess return from run_compose) but removes:

global _runtime_cache / _compose_cache and the ""/[] sentinels

the two-phase info/--version probing split across detect_runtime and _compose_cmd

the generic _probe() helper used in subtly different ways

The behavior becomes: “try a small ordered set of compose commands, pick the first working one, infer runtime from that”, which matches the reviewer’s suggested mental model and is easier to reason about and test.

sourcery-ai · 2026-04-25T15:50:51Z

+        )
+    else:
+        try:
+            result = subprocess.run([runtime, "info"], capture_output=True, text=True, timeout=5)


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai · 2026-04-25T15:50:51Z

-        return ["docker", "compose"]
-    except (subprocess.CalledProcessError, FileNotFoundError):
-        return ["docker-compose"]
+        subprocess.run(cmd, capture_output=True, check=True, timeout=5)


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai · 2026-04-25T15:50:51Z

+        return subprocess.CompletedProcess(
+            args=args,
+            returncode=1,
+            stdout="",
+            stderr=(
+                "no compose tool found — install one of: "
+                "`docker compose`, `podman compose`, `podman-compose`, `docker-compose`"
+            ),
+        )


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'CompletedProcess' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai · 2026-04-25T15:50:51Z

    return subprocess.run(
-        cmd,
+        cmd + args,
        cwd=str(_compose_dir()),
        capture_output=True,
        text=True,


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai · 2026-04-25T15:50:51Z

        result = subprocess.run(
-            ["docker", "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{{{.Status}}}}"],
+            [runtime, "ps", "--filter", f"name={CONTAINER_NAME}", "--format", "{{.Status}}"],
            capture_output=True,
            text=True,
            timeout=5,


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

Copilot

Pull request overview

This PR refactors Context8’s container/runtime management and MCP server startup so context8 serve can self-bootstrap the Actian VectorAI DB (Docker or Podman) before starting the stdio MCP loop.

Changes:

Add Docker/Podman runtime detection plus compose-command resolution, and update user-facing lifecycle/doctor messaging accordingly.
Route agent MCP launch config through context8 serve (instead of python -m context8.mcp.server) and add a --no-bootstrap escape hatch.
Fix sparse search invocation to pass sparse indices/values in the shape expected by the VectorAI client, and remove confidence fields from seed data.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/context8/docker.py`	Adds runtime + compose probing/caching and updates container start/check logic to be runtime-agnostic.
`src/context8/cli/commands/serve.py`	Adds idempotent bootstrap before starting MCP server; adds `--no-bootstrap`.
`src/context8/config.py`	Updates MCP server command generation to invoke `context8 serve`.
`src/context8/cli/commands/ops.py`	Updates `doctor` to check a generic container runtime rather than Docker-only.
`src/context8/cli/commands/lifecycle.py`	Updates init/start messaging to refer to Docker or Podman.
`src/context8/search/engine.py`	Updates sparse-vector search parameter passing.
`src/context8/ingest/seed.py`	Removes per-record `confidence` fields from `SEED_DATA`.
`tests/test_agents.py`	Loosens assertions around generated agent MCP server command/args.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T15:54:01Z

+            assert "command" in entry
+            assert isinstance(entry["args"], list)


The assertions here became very loose ("command" in entry and args is a list), which can let regressions slip through (e.g., missing the new serve subcommand). Consider making this deterministic by patching shutil.which("context8") and then asserting the generated {command, args} matches one of the expected shapes (installed script + serve, or sys.executable -m context8 serve).

Copilot · 2026-04-25T15:54:01Z

+    if not is_container_running():
+        _log("starting DB container...")
+        ok, msg = ensure_running(timeout_secs=30)
+        if not ok:
+            _log(f"FATAL: container failed to start: {msg}")
+            raise SystemExit(1)
+        _log(f"container ready ({msg})")
+


_bootstrap() only calls ensure_running() when is_container_running() is false. If the container exists but the DB is still starting (or unhealthy), the bootstrap will skip the health-check/wait and may fail immediately when StorageService() connects. Consider calling ensure_running() unconditionally here (and letting it be idempotent) or otherwise explicitly waiting for VectorAIClient.health_check() even when the container is already "Up".

Suggested change

if not is_container_running():

_log("starting DB container...")

ok, msg = ensure_running(timeout_secs=30)

if not ok:

_log(f"FATAL: container failed to start: {msg}")

raise SystemExit(1)

_log(f"container ready ({msg})")

was_running = is_container_running()

if was_running:

_log("verifying DB container readiness...")

else:

_log("starting DB container...")

ok, msg = ensure_running(timeout_secs=30)

if not ok:

_log(f"FATAL: container failed to start: {msg}")

raise SystemExit(1)

_log(f"container ready ({msg})")

Copilot · 2026-04-25T15:54:02Z

+    created = storage.initialize()
+    if created:
+        _log("collection created")
+    storage.close()


storage.close() won’t run if storage.initialize() raises (e.g., DB connection issue). Wrap the StorageService lifecycle in a try/finally (or context manager) so the client connection is always closed during bootstrap failures.

Suggested change

created = storage.initialize()

if created:

_log("collection created")

storage.close()

try:

created = storage.initialize()

if created:

_log("collection created")

finally:

storage.close()

Copilot · 2026-04-25T15:54:02Z

    if is_container_running():
        return True, "already running"


ensure_running() returns (True, "already running") purely based on docker/podman ps output, without verifying the DB is actually accepting connections. Since callers use this as a readiness gate, consider still running the VectorAIClient.health_check() loop when the container is already running (and only return success once the DB responds).

Copilot · 2026-04-25T15:54:02Z

        "error_type": "ModuleNotFoundError",
        "language": "python",
        "tags": ["virtual-env", "opencv", "import"],
-        "confidence": 0.95,
    },


These seed records no longer include a confidence field, but seed ingestion still applies confidence=data.get("confidence", 0.9) (so all seed records now get the same default). If confidence is meant to influence ranking/UX, consider keeping per-record values; if it’s truly unused, consider removing/ignoring confidence during seed ingestion to avoid an unintentional behavior change.

@pathfindermilan

Major release with contributions from @pathfindermilan: PR #2: Fix hybrid search (sparse was silently disabled), async MCP, embedding cache collision, browse resource leak, configurable dims PR #3: Docker+Podman auto-detection, self-bootstrapping serve command, --no-bootstrap flag, sparse search fix Plus: changelog in README, logo assets, demo video script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pathfindermilan and others added 2 commits April 25, 2026 17:47

refactor: reformat subprocess using ruff

7c7e4ee

fix: update agent test for new serve-based server command

0b11119

get_server_command() now returns context8 serve instead of python -m context8.mcp.server. Test assertion updated to check structure rather than exact command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hallelx2 marked this pull request as ready for review April 25, 2026 15:49

Copilot AI review requested due to automatic review settings April 25, 2026 15:49

Copilot started reviewing on behalf of hallelx2 April 25, 2026 15:49 View session

hallelx2 merged commit 78f58b0 into main Apr 25, 2026
7 of 9 checks passed

sourcery-ai Bot reviewed Apr 25, 2026

View reviewed changes

Copilot AI reviewed Apr 25, 2026

View reviewed changes


		CONTAINER_NAME = "context8_db"

		_runtime_cache: str \| None = None

		assert "command" in entry
		assert isinstance(entry["args"], list)

Conversation

pathfindermilan commented Apr 25, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for the new serve bootstrap and MCP startup flow

Class diagram for updated container, bootstrap, config, and search logic

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

pathfindermilan commented Apr 25, 2026 •

edited by sourcery-ai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

sourcery-ai Bot commented Apr 25, 2026 •

edited

Loading