Skip to content

feat(mcp): port MCP server with search/find_related tools from semble#16

Merged
amondnet merged 2 commits into
mainfrom
feat/unit-14-mcp
May 28, 2026
Merged

feat(mcp): port MCP server with search/find_related tools from semble#16
amondnet merged 2 commits into
mainfrom
feat/unit-14-mcp

Conversation

@amondnet

@amondnet amondnet commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Ports `src/semble/mcp.py` to `src/mcp/server.ts` (Unit 14 of the parallel semble→csp port).

What's exported

  • `IndexCache` — LRU(10) cache of `CspIndex` instances. Promise-dedups concurrent requests, awaits a one-time model pre-load, and exposes `startWatcher(path)` / `stopWatcher()` for live re-indexing.
  • `createServer(cache, defaultSource?)` — registers `search` and `find_related` MCP tools with the verbatim agent-facing descriptions from semble. Lazy-imports `@modelcontextprotocol/sdk` and `zod` so the module is usable before Unit 0 lands.
  • `serve(path?, options?)` — starts an MCP stdio server, pre-warms the model + (optionally) pre-indexes `path` in parallel with server startup, and blocks until stdin EOF (mirrors semble's `run_stdio_async`).

Safety preserved from upstream

  • `ssh://`, `git://`, `file://` and scp-style URLs rejected with the verbatim "Only https://, http://, ..." message.
  • "No repo specified and no default index" error when both `repo` and `defaultSource` are absent.
  • Underlying build errors are wrapped: `Failed to index "": `.

Code-review fixes applied

Self-review (extra-high effort) caught and I fixed:

  1. `evict()` was fire-and-forget async — made it awaitable so the watcher's evict-then-rebuild ordering is deterministic.
  2. `serve()` closed the stdio transport in `finally` immediately after `await prewarm`, killing the server before it served any requests — now blocks on stdin EOF.
  3. `startWatcher()` overwrote `watcherClose` without closing the previous watcher (file-handle leak) — now stops the existing watcher first.
  4. `inputSchema` passed plain JSON descriptors where the SDK expects a Zod raw shape — now lazy-imports zod (transitive dep of the SDK) and builds proper schemas.
  5. chokidar's `ignored` filter (`/(^|[/\\])\../`) silently dropped events for projects rooted inside dotfile dirs (`~/.config/proj`) — removed to match semble's unfiltered watcher.
  6. Replaced cast-bypass with conditional spread for `exactOptionalPropertyTypes`-safe `ref` forwarding.
  7. Hoisted `node:fs/promises` and `node:path` to top-level imports (was re-importing per evict).

Stubs included

This unit imports from `../indexing/index.ts`, `../types.ts`, and `../utils.ts` per the unit spec. Since those sibling units may not have merged yet, I included minimal stubs so this branch type-checks in isolation. Sibling unit PRs will replace them.

Tests

`bun test src/mcp/` — 19 pass / 0 fail, covering:

  • `IndexCache`: caching, concurrent dedup, await-able evict, LRU 11th-source eviction, git vs local routing, failed-build cache cleanup, multi-watcher leak.
  • `getIndex` safety: rejects ssh / git / file URLs, requires source, accepts https, wraps errors.
  • `createServer`: tool registration, no-results JSON, safety errors as plain strings, missing-chunk message.

E2E (spawning the real MCP stdio server) is skipped per the unit spec since `@modelcontextprotocol/sdk` isn't in main yet.

Out of scope

  • Real indexing (depends on Unit 7 chunking, dense/sparse units, etc.).
  • Wiring the `mcp` CLI subcommand into `src/cli.ts` (Unit 12 / CLI work).

🤖 Generated with Claude Code


Summary by cubic

Ports the semble MCP server to TypeScript with search and find_related tools, an LRU index cache, and a stdio serve() that pre-warms the model, can pre-index a repo, and blocks until stdin closes.

  • New Features

    • MCP server exposes search and find_related with upstream descriptions; returns JSON; includes clear agent instructions.
    • Index cache (LRU 10) with Promise dedup, awaitable evict, model pre-warm, and a chokidar-based watcher for local paths.
    • createServer() lazy-loads @modelcontextprotocol/sdk and zod; returns a placeholder when SDK is missing so the module still loads.
    • serve() starts an MCP stdio server, pre-warms the model, optionally pre-indexes a default path, starts a watcher for local paths, and blocks until stdin EOF.
    • Safety parity with upstream: rejects ssh://, git://, file:// and scp-style URLs; requires repo or a default; wraps index errors.
  • Bug Fixes

    • Server no longer shuts down early; it now blocks on stdin EOF before closing the transport.
    • File watcher no longer leaks handles; starting a watcher closes any existing watcher first and removes over-eager ignores to match upstream.
    • Eviction is awaitable to ensure evict-then-rebuild ordering.
    • Correctly builds zod input schemas for tool registration.
    • Hardened error handling: silences unhandled model-load rejection and wraps search/find_related handlers in try/catch to return plain-string errors.
    • Serve stability: shares a single stdin end/close cleanup to avoid listener leaks, always stops the watcher when the SDK is unavailable or on shutdown, and moves server.connect() inside try for reliable teardown.

Written for commit a7cc156. Summary will update on new commits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a TypeScript port of the semble MCP server, introducing the core server logic, an LRU-bounded index cache with file watching, search and find-related tools, and corresponding unit tests. The code review is highly constructive, pointing out critical robustness improvements. These include preventing unhandled promise rejections during model loading, wrapping tool handlers in try-catch blocks to catch unexpected runtime errors, cleaning up stdin event listeners to avoid memory leaks, and ensuring file watchers are properly disposed of if the server connection fails.

Comment thread src/mcp/server.ts
Comment thread src/mcp/server.ts
Comment thread src/mcp/server.ts
Comment thread src/mcp/server.ts
Comment thread src/mcp/server.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Architecture diagram
sequenceDiagram
    participant Client as MCP Client (IDE)
    participant Server as MCP Server (serve())
    participant IndexCache as IndexCache (LRU 10)
    participant Chokidar as chokidar Watcher
    participant CspIndex as CspIndex (indexing)
    participant Model as Model Loader (loadModel)
    participant FS as File System / Git

    Note over Client,FS: Server Startup

    Client->>Server: spawn (stdio)
    Server->>Server: prewarm model + pre-index path (parallel)
    Server->>IndexCache: ensureModelLoading()
    IndexCache->>Model: loadModel()
    Model-->>IndexCache: modelPath
    opt pre-index path provided
        Server->>IndexCache: get(path)
        IndexCache->>CspIndex: fromPath() or fromGit()
        CspIndex->>FS: read files / clone
        FS-->>CspIndex: data
        CspIndex-->>IndexCache: index instance
        IndexCache-->>Server: ready
    end
    Server->>Server: block on stdin EOF

    Note over Client,FS: Search Tool

    Client->>Server: call_tool search({query, repo?})
    Server->>Server: getIndex(repo, defaultSource)
    alt repo is ssh/git/file URL
        Server-->>Client: "Only https://, http://..." (error string)
    else repo and defaultSource both missing
        Server-->>Client: "No repo specified..." (error string)
    else valid source
        Server->>IndexCache: get(source)
        alt cache miss (first call or evicted)
            IndexCache->>IndexCache: await model (if not ready)
            opt LRU full (≥10 entries)
                IndexCache->>IndexCache: evict oldest entry
            end
            IndexCache->>CspIndex: fromPath/git(source, modelPath)
            alt fromGit (https/http URL)
                CspIndex->>FS: clone/fetch
            else fromPath (local)
                CspIndex->>FS: walk filesystem
            end
            FS-->>CspIndex: files
            CspIndex-->>IndexCache: index
        end
        IndexCache-->>Server: index
        Server->>CspIndex: search(query)
        CspIndex-->>Server: results (empty if none)
        Server->>Server: formatResults()
        Server-->>Client: JSON results or {"error":"No results found."}
    end

    Note over Client,FS: Find Related Tool

    Client->>Server: call_tool find_related({file_path, line, repo?})
    Server->>Server: getIndex(repo, defaultSource)
    Note over Server,IndexCache: same safety checks & cache path
    Server->>IndexCache: get(source)
    IndexCache-->>Server: index
    Server->>CspIndex: chunks
    Server->>Server: resolveChunk(chunks, file_path, line)
    alt chunk not found
        Server-->>Client: "No chunk found at ..." (error string)
    else chunk found
        Server->>CspIndex: findRelated(chunk)
        CspIndex-->>Server: related results
        Server-->>Client: JSON results
    end

    Note over Client,FS: File Watcher (local paths)

    Client->>Server: startWatcher(path)
    Server->>IndexCache: startWatcher(path)
    IndexCache->>IndexCache: stopWatcher (close previous)
    IndexCache->>Chokidar: watch(path)
    Chokidar-->>IndexCache: watcher instance
    loop on file change (debounced 250ms)
        Chokidar->>Chokidar: onChange (add/change/unlink)
        Chokidar->>IndexCache: evict(path) then get(path)
        IndexCache->>IndexCache: evict (awaitable)
        IndexCache->>CspIndex: rebuild index
        alt rebuild error
            IndexCache->>IndexCache: swallow error (next explicit get() surfaces)
        end
    end

    Note over Client,Server: Shutdown

    Client->>Server: stdin EOF
    Server->>Server: close transport
    Server->>IndexCache: stopWatcher()
    IndexCache->>Chokidar: close()
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/mcp/server.ts
Comment thread src/mcp/server.ts Outdated
Comment thread src/mcp/server.ts Outdated

@amondnet amondnet left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 7, deferred 1.

Applied (gemini-code-assist):

  • src/mcp/server.ts L76 — silence unhandled rejection on modelReady.promise
  • src/mcp/server.ts L375 — wrap search handler body in single try/catch
  • src/mcp/server.ts L417 — wrap find_related handler body in single try/catch
  • src/mcp/server.ts L619 — share cleanup between stdin end + close to avoid listener leak
  • src/mcp/server.ts L635 — move server.connect(transport) inside try block

Applied (cubic-dev-ai):

  • src/mcp/server.ts L622 — same connect-inside-try fix as above
  • src/mcp/server.ts L611 — stop watcher on early SDK-unavailable return path

Deferred (cubic-dev-ai):

  • src/mcp/server.ts L591 — ref not propagated to tool lookups. This matches semble's exact behavior (upstream tool handlers also drop ref); per CLAUDE.md the MCP surface preserves parity with semble. Better raised upstream first, or addressed as a follow-up.

All 19 tests in src/mcp/server.test.ts still pass.

@amondnet amondnet self-assigned this May 28, 2026
amondnet added 2 commits May 29, 2026 01:10
Port src/semble/mcp.py to src/mcp/server.ts. Adds:

- IndexCache: LRU(10), Promise-dedup, async evict(), file watcher with
  pre-evict ordering, parallel model pre-warm.
- createServer(): registers MCP tools 'search' and 'find_related' with
  verbatim agent-facing descriptions from semble. Lazy-imports the SDK
  and zod so the module loads even when @modelcontextprotocol/sdk
  isn't installed yet (Unit 0).
- serve(): pre-warms model + optional pre-index in parallel with
  starting the stdio server. Blocks on stdin EOF (matches semble's
  run_stdio_async) rather than closing the transport prematurely.

Safety: rejects ssh://, git://, file:// URLs; requires repo or
defaultSource; wraps build errors with context.

Includes minimal stubs for ../types.ts, ../utils.ts, ../indexing/index.ts
so this branch type-checks in isolation; sibling unit branches will
replace them with the real implementations.

Tests cover IndexCache caching/dedup/evict/LRU, the safety layer in
_get_index, and the tool handler edge cases. E2E spawning is skipped
per the unit spec (no real MCP stdio integration).
- IndexCache: silence unhandled rejection on modelReady.promise so a
  model load failure before any get() call doesn't crash the process
- search/find_related handlers: wrap the full body in try/catch so
  errors from search()/findRelated()/formatResults() surface as plain
  strings instead of unhandled rejections
- serve: share a single cleanup callback between stdin 'end' and 'close'
  so repeated serve() calls don't leak listeners on process.stdin
- serve: move server.connect(transport) inside the try block and stop
  the watcher on the early SDK-unavailable return path so the watcher
  is always torn down
@amondnet amondnet force-pushed the feat/unit-14-mcp branch from 9be72ba to a7cc156 Compare May 28, 2026 16:10
@amondnet amondnet merged commit 18110e6 into main May 28, 2026
@amondnet amondnet deleted the feat/unit-14-mcp branch May 28, 2026 16:10
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant