feat(mcp): port MCP server with search/find_related tools from semble#16
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a TypeScript port of the semble MCP server, introducing the core server logic, an LRU-bounded index cache with file watching, search and find-related tools, and corresponding unit tests. The code review is highly constructive, pointing out critical robustness improvements. These include preventing unhandled promise rejections during model loading, wrapping tool handlers in try-catch blocks to catch unexpected runtime errors, cleaning up stdin event listeners to avoid memory leaks, and ensuring file watchers are properly disposed of if the server connection fails.
There was a problem hiding this comment.
3 issues found across 5 files
Architecture diagram
sequenceDiagram
participant Client as MCP Client (IDE)
participant Server as MCP Server (serve())
participant IndexCache as IndexCache (LRU 10)
participant Chokidar as chokidar Watcher
participant CspIndex as CspIndex (indexing)
participant Model as Model Loader (loadModel)
participant FS as File System / Git
Note over Client,FS: Server Startup
Client->>Server: spawn (stdio)
Server->>Server: prewarm model + pre-index path (parallel)
Server->>IndexCache: ensureModelLoading()
IndexCache->>Model: loadModel()
Model-->>IndexCache: modelPath
opt pre-index path provided
Server->>IndexCache: get(path)
IndexCache->>CspIndex: fromPath() or fromGit()
CspIndex->>FS: read files / clone
FS-->>CspIndex: data
CspIndex-->>IndexCache: index instance
IndexCache-->>Server: ready
end
Server->>Server: block on stdin EOF
Note over Client,FS: Search Tool
Client->>Server: call_tool search({query, repo?})
Server->>Server: getIndex(repo, defaultSource)
alt repo is ssh/git/file URL
Server-->>Client: "Only https://, http://..." (error string)
else repo and defaultSource both missing
Server-->>Client: "No repo specified..." (error string)
else valid source
Server->>IndexCache: get(source)
alt cache miss (first call or evicted)
IndexCache->>IndexCache: await model (if not ready)
opt LRU full (≥10 entries)
IndexCache->>IndexCache: evict oldest entry
end
IndexCache->>CspIndex: fromPath/git(source, modelPath)
alt fromGit (https/http URL)
CspIndex->>FS: clone/fetch
else fromPath (local)
CspIndex->>FS: walk filesystem
end
FS-->>CspIndex: files
CspIndex-->>IndexCache: index
end
IndexCache-->>Server: index
Server->>CspIndex: search(query)
CspIndex-->>Server: results (empty if none)
Server->>Server: formatResults()
Server-->>Client: JSON results or {"error":"No results found."}
end
Note over Client,FS: Find Related Tool
Client->>Server: call_tool find_related({file_path, line, repo?})
Server->>Server: getIndex(repo, defaultSource)
Note over Server,IndexCache: same safety checks & cache path
Server->>IndexCache: get(source)
IndexCache-->>Server: index
Server->>CspIndex: chunks
Server->>Server: resolveChunk(chunks, file_path, line)
alt chunk not found
Server-->>Client: "No chunk found at ..." (error string)
else chunk found
Server->>CspIndex: findRelated(chunk)
CspIndex-->>Server: related results
Server-->>Client: JSON results
end
Note over Client,FS: File Watcher (local paths)
Client->>Server: startWatcher(path)
Server->>IndexCache: startWatcher(path)
IndexCache->>IndexCache: stopWatcher (close previous)
IndexCache->>Chokidar: watch(path)
Chokidar-->>IndexCache: watcher instance
loop on file change (debounced 250ms)
Chokidar->>Chokidar: onChange (add/change/unlink)
Chokidar->>IndexCache: evict(path) then get(path)
IndexCache->>IndexCache: evict (awaitable)
IndexCache->>CspIndex: rebuild index
alt rebuild error
IndexCache->>IndexCache: swallow error (next explicit get() surfaces)
end
end
Note over Client,Server: Shutdown
Client->>Server: stdin EOF
Server->>Server: close transport
Server->>IndexCache: stopWatcher()
IndexCache->>Chokidar: close()
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
amondnet
left a comment
There was a problem hiding this comment.
Applied 7, deferred 1.
Applied (gemini-code-assist):
- src/mcp/server.ts L76 — silence unhandled rejection on
modelReady.promise - src/mcp/server.ts L375 — wrap
searchhandler body in single try/catch - src/mcp/server.ts L417 — wrap
find_relatedhandler body in single try/catch - src/mcp/server.ts L619 — share cleanup between stdin
end+closeto avoid listener leak - src/mcp/server.ts L635 — move
server.connect(transport)inside try block
Applied (cubic-dev-ai):
- src/mcp/server.ts L622 — same connect-inside-try fix as above
- src/mcp/server.ts L611 — stop watcher on early SDK-unavailable return path
Deferred (cubic-dev-ai):
- src/mcp/server.ts L591 —
refnot propagated to tool lookups. This matches semble's exact behavior (upstream tool handlers also dropref); per CLAUDE.md the MCP surface preserves parity with semble. Better raised upstream first, or addressed as a follow-up.
All 19 tests in src/mcp/server.test.ts still pass.
Port src/semble/mcp.py to src/mcp/server.ts. Adds: - IndexCache: LRU(10), Promise-dedup, async evict(), file watcher with pre-evict ordering, parallel model pre-warm. - createServer(): registers MCP tools 'search' and 'find_related' with verbatim agent-facing descriptions from semble. Lazy-imports the SDK and zod so the module loads even when @modelcontextprotocol/sdk isn't installed yet (Unit 0). - serve(): pre-warms model + optional pre-index in parallel with starting the stdio server. Blocks on stdin EOF (matches semble's run_stdio_async) rather than closing the transport prematurely. Safety: rejects ssh://, git://, file:// URLs; requires repo or defaultSource; wraps build errors with context. Includes minimal stubs for ../types.ts, ../utils.ts, ../indexing/index.ts so this branch type-checks in isolation; sibling unit branches will replace them with the real implementations. Tests cover IndexCache caching/dedup/evict/LRU, the safety layer in _get_index, and the tool handler edge cases. E2E spawning is skipped per the unit spec (no real MCP stdio integration).
- IndexCache: silence unhandled rejection on modelReady.promise so a model load failure before any get() call doesn't crash the process - search/find_related handlers: wrap the full body in try/catch so errors from search()/findRelated()/formatResults() surface as plain strings instead of unhandled rejections - serve: share a single cleanup callback between stdin 'end' and 'close' so repeated serve() calls don't leak listeners on process.stdin - serve: move server.connect(transport) inside the try block and stop the watcher on the early SDK-unavailable return path so the watcher is always torn down
9be72ba to
a7cc156
Compare
Summary
Ports `src/semble/mcp.py` to `src/mcp/server.ts` (Unit 14 of the parallel semble→csp port).
What's exported
Safety preserved from upstream
Code-review fixes applied
Self-review (extra-high effort) caught and I fixed:
Stubs included
This unit imports from `../indexing/index.ts`, `../types.ts`, and `../utils.ts` per the unit spec. Since those sibling units may not have merged yet, I included minimal stubs so this branch type-checks in isolation. Sibling unit PRs will replace them.
Tests
`bun test src/mcp/` — 19 pass / 0 fail, covering:
E2E (spawning the real MCP stdio server) is skipped per the unit spec since `@modelcontextprotocol/sdk` isn't in main yet.
Out of scope
🤖 Generated with Claude Code
Summary by cubic
Ports the semble MCP server to TypeScript with
searchandfind_relatedtools, an LRU index cache, and a stdioserve()that pre-warms the model, can pre-index a repo, and blocks until stdin closes.New Features
searchandfind_relatedwith upstream descriptions; returns JSON; includes clear agent instructions.chokidar-based watcher for local paths.createServer()lazy-loads@modelcontextprotocol/sdkandzod; returns a placeholder when SDK is missing so the module still loads.serve()starts an MCP stdio server, pre-warms the model, optionally pre-indexes a default path, starts a watcher for local paths, and blocks until stdin EOF.ssh://,git://,file://and scp-style URLs; requiresrepoor a default; wraps index errors.Bug Fixes
zodinput schemas for tool registration.search/find_relatedhandlers in try/catch to return plain-string errors.end/closecleanup to avoid listener leaks, always stops the watcher when the SDK is unavailable or on shutdown, and movesserver.connect()insidetryfor reliable teardown.Written for commit a7cc156. Summary will update on new commits.