Rewrite csp in Rust
Track: rust-rewrite-20260618
Type: refactor (language rewrite / migration)
Origin decision: ADR-0003
Overview
@pleaseai/csp currently exists as a complete TypeScript/Bun implementation (~5,900 LOC) ported from MinishLab/semble. Per ADR-0003, the project is being rewritten in Rust to gain single-binary distribution, better indexing/embedding performance and memory footprint, and a more natural fit with the native Rust ecosystem (model2vec-rs, tree-sitter, ignore, rmcp).
This track covers Phases 1–7 of the ADR-0003 roadmap. Phase 0 (Cargo workspace scaffold, clap CLI stubs, Rust CI, pinned toolchain) is already committed on branch feat/rust-rewrite. The defining constraint is behavioral equivalence: the Rust build must reproduce the existing implementation's observable behavior (tokenization, ranking order, chunk boundaries, search results, CLI/MCP contracts), verified by reusing the TypeScript test suite as language-neutral golden fixtures. The TypeScript src/ remains the source of truth until the Rust line reaches parity, then is retired.
Scope
The rewrite is delivered in dependency-ordered phases (leaf-first, each verifiable against golden fixtures):
- P1 — Pure core: identifier-aware tokenization (camelCase/PascalCase/snake_case split + lowercased compound), ranking (weighting, boosting, penalties), and BM25 scoring math. RRF fusion (
k=60), adaptive alpha (0.3 symbol / 0.5 NL).
- P2 — Chunking: tree-sitter AST chunking with line-fallback (1500-char target,
MIN_CHUNK_SIZE=50, RECURSION_DEPTH=500), and the extension→language map.
- P3 — Indexing: dense embeddings via
model2vec-rs, file walking via the ignore crate (.gitignore + .cspignore, default-ignore dirs), BM25 sparse index, and the content-hash cache in the global ~/.csp/index/ (per ADR-0002).
- P4 — Search: the hybrid pipeline (semantic + BM25 → RRF → multi-chunk file boost → query-type boost → top-k rerank with path penalties + file-saturation decay
0.5) and the CspIndex-equivalent core API (fromPath/fromGit/search/findRelated/save/load).
- P5 — CLI: the
csp binary subcommands (search/index/find-related/mcp/init/savings/clear) with flags (--top-k/--content/--index/--agent), plus ~/.csp/savings.jsonl telemetry.
- P6 — MCP: the MCP server via
rmcp, exposing the search and find_related tools, launched by csp mcp.
- P7 — Distribution: Biome-style multi-channel distribution — cross-compiled release binaries (GitHub Releases), an npm wrapper package preserving the
bunx @pleaseai/csp entrypoint, and the Homebrew tap; plus README/README.ko updates.
Success Criteria
Constraints
- No behavioral change relative to semble / the TypeScript port — observable outputs must match (this is a rewrite, not a redesign).
- Public CLI + MCP surface is preserved: subcommand names, flags, MCP tool names, the
bunx @pleaseai/csp entrypoint, the ~/.csp/ paths, and the global index-cache model (ADR-0002) carry over unchanged.
- Phased, parity-gated delivery: each phase merges only when its golden-fixture equivalence checks pass; the TypeScript implementation stays authoritative until full parity.
- GitHub Actions third-party actions remain SHA-pinned; the Rust toolchain is pinned via
rust-toolchain.toml.
Out of Scope
- The JS-importable library API (
import { CspIndex }) — deferred behind a future napi-rs seam; the csp core crate is designed as that seam (ADR-0003).
- Any new search/ranking features or behavior improvements beyond what the TypeScript implementation already does.
- Removal/retirement of the TypeScript
src/ — happens in a separate cleanup once parity is confirmed, not within this track.
- New language grammars or embedding models beyond those the current implementation supports.
Rewrite csp in Rust
Overview
@pleaseai/cspcurrently exists as a complete TypeScript/Bun implementation (~5,900 LOC) ported from MinishLab/semble. Per ADR-0003, the project is being rewritten in Rust to gain single-binary distribution, better indexing/embedding performance and memory footprint, and a more natural fit with the native Rust ecosystem (model2vec-rs,tree-sitter,ignore,rmcp).This track covers Phases 1–7 of the ADR-0003 roadmap. Phase 0 (Cargo workspace scaffold, clap CLI stubs, Rust CI, pinned toolchain) is already committed on branch
feat/rust-rewrite. The defining constraint is behavioral equivalence: the Rust build must reproduce the existing implementation's observable behavior (tokenization, ranking order, chunk boundaries, search results, CLI/MCP contracts), verified by reusing the TypeScript test suite as language-neutral golden fixtures. The TypeScriptsrc/remains the source of truth until the Rust line reaches parity, then is retired.Scope
The rewrite is delivered in dependency-ordered phases (leaf-first, each verifiable against golden fixtures):
k=60), adaptive alpha (0.3symbol /0.5NL).MIN_CHUNK_SIZE=50,RECURSION_DEPTH=500), and the extension→language map.model2vec-rs, file walking via theignorecrate (.gitignore+.cspignore, default-ignore dirs), BM25 sparse index, and the content-hash cache in the global~/.csp/index/(per ADR-0002).0.5) and theCspIndex-equivalent core API (fromPath/fromGit/search/findRelated/save/load).cspbinary subcommands (search/index/find-related/mcp/init/savings/clear) with flags (--top-k/--content/--index/--agent), plus~/.csp/savings.jsonltelemetry.rmcp, exposing thesearchandfind_relatedtools, launched bycsp mcp.bunx @pleaseai/cspentrypoint, and the Homebrew tap; plus README/README.ko updates.Success Criteria
bunx @pleaseai/csp …with no change to the documented commands.fmt+clippy -D warnings+test) passes on every phase's merge.Constraints
bunx @pleaseai/cspentrypoint, the~/.csp/paths, and the global index-cache model (ADR-0002) carry over unchanged.rust-toolchain.toml.Out of Scope
import { CspIndex }) — deferred behind a future napi-rs seam; thecspcore crate is designed as that seam (ADR-0003).src/— happens in a separate cleanup once parity is confirmed, not within this track.