feat: wire up CspIndex orchestrator + global index cache (#18)#21
Conversation
…../tokens.ts - T001: drop local Chunk/SearchResult/tokenize duplicates; import from ./types.ts and ./tokens.ts - attach toDict to every SearchResult creation point (search/_searchSemantic/_searchBm25) via makeResult/chunkToDict so results satisfy the ../types.ts SearchResult contract consumed by utils.formatResults - re-export Chunk/SearchResult to preserve the module's public surface - Tests: bun test src/search.test.ts → 24 pass (4 new toDict tests); full suite 320 pass, no new failures [/please:implement]
…ce discovery [/please:implement]
[/please:implement]
- T002: Bm25Index.build(...) replaces private-ctor `new Bm25Index()` + .index() - T002: drop invalid 2nd ctor arg `model.dim` (SelectableBasicBackend ctor is (vectors, BasicArgs); dim is derived) - T002: ContentType.CODE replaces non-existent ContentType.Code - Tests: full suite unchanged at baseline (320 pass / 5 fail / 3 errors); no new errors - typecheck: 3 targeted create.ts errors removed; 4 pre-existing async/Chunk-type errors remain (T003 scope) [/please:implement]
[/please:implement]
- T002: walkFiles는 async generator이므로 for await로 순회, chunkSource는 await 후 spread, detectLanguage의 string|undefined를 language ?? null로 전달 - dense.ts/sparse.ts의 로컬 Chunk interface를 제거하고 ../types.ts의 Chunk로 통합. 기존 importer(dense.test.ts, sparse.test.ts) 호환을 위해 re-export 유지 - Tests: src/indexing 81 pass / 2 fail / 2 errors (baseline 동일, 신규 실패 0) · 잔존 실패는 범위 외 makeStubModel/DEFAULT_CONTENT 미export 의존 테스트 [/please:implement]
[/please:implement]
- T003: fromPath loads model + createIndexFromPath into a populated
CspIndex; constructor takes {model, semanticIndex, bm25Index, chunks,
modelPath, root, content}; add stats getter and DEFAULT_CONTENT export.
- save/loadFromDisk are throwing stubs (T006/T007) so cli.ts type-checks
in Phase A — removes the 2 pre-existing CspIndex.save / loadFromDisk
type errors at cli.ts:288/415.
- loadModel re-exported as [model, modelPath] tuple (mcp destructures
[, modelPath]); search/findRelated stay sync stubs ([]) for T004.
- Typecheck: no new non-TS5097 errors in index.ts; mcp/server.ts + cli.ts
error set unchanged vs baseline (minus the 2 stub errors now fixed).
- Tests: full suite unchanged at baseline (320 pass / 5 fail / 3 errors).
[/please:implement]
[/please:implement]
- T004: search() guards blank query / topK<=0 / empty index / empty filter selector, then delegates to search.ts (sync, no await — MCP parity). - T004: findRelated() re-embeds the seed and queries the semantic backend, excluding the seed chunk. - Align index.test.ts setup to real module APIs (makeStubModel(4), Bm25Index.build, SelectableBasicBackend(vecs), ContentType.CODE); behavioral assertions unchanged. - Export makeStubModel from dense.ts. - Tests: passed (src/indexing/index.test.ts T004 cases green) [/please:implement]
[/please:implement]
findRelated만 seed.chunk를 읽으므로 seed 파라미터를 Chunk | { chunk: Chunk }로 완화.
SearchResult(toDict 필수)를 강제하던 타입 에러(index.test.ts:127) 해소.
JSDoc가 약속한 경로 검증을 구현 — 없는 경로는 'Path does not exist', 파일이면 'Path is not a directory'로 throw (index.test.ts fromPath 에러 케이스).
- T0A(b): add chunkToDict/chunkFromDict/chunkLocation/searchResultToDict and ChunkDictInput — camelCase round-trip layer for disk persistence, distinct from search.ts's snake_case wire-format toDict - T0A(c): align types.test.ts/index.test.ts to uppercase enum keys (ContentType.CODE/DOCS/CONFIG, CallType.SEARCH/FIND_RELATED) per the CLAUDE.md contract; string-value assertions unchanged - Tests: src/types.test.ts 13 pass, src/index.test.ts 4 pass [/please:implement]
- T0A(d): add `documents` getter returning per-document token counts (one entry per indexed doc) so callers can assert corpus size without reaching into private #state; satisfies create.test.ts's bm25Index.documents.length === chunks.length - Tests: src/indexing/create.test.ts 5 pass, src/indexing/sparse.test.ts 17 pass [/please:implement]
- T0A(a): replace process-wide `mock.module('../indexing/index.ts')`
(which Bun applies irreversibly at module-load and leaks the stub into
../indexing/index.test.ts) with static-method reassignment on the real
CspIndex class, restored in afterAll. Stub fromPath/fromGit return real
empty CspIndex instances so `instanceof CspIndex` and empty-index
`search() === []` still hold
- Tests: server.test.ts 19 pass; indexing/index.test.ts no longer
regresses in full-suite (stats + fromPath recovered, both orderings)
[/please:implement]
- T0A(e): the fake index result lacked toDict, so utils.formatResults (r.toDict()) threw and "csp search formats non-empty results as JSON" failed. Add a toDict matching search.ts's snake_case wire format (file_path/start_line/end_line/location) - Tests: src/cli.test.ts 43 pass [/please:implement]
- Progress: T0A done (351 pass / 3 fail / 0 error; baseline 330/12/1) - Decision Log: two separate chunk serialization layers; server.test.ts DI seam over mock.module - Surprises: Bun 1.3.10 mock.module is irreversible across files [/please:implement]
- T005: shallow-clone url into a 0700 temp dir, reuse fromPath pipeline, cleanup in finally - Tests: passed [/please:implement]
[/please:implement]
…ense) - T006: save(dir) writes manifest.json (schemaVersion/contentHash/sourceId/ content/modelId) + chunks.json (chunkToDict camelCase) + bm25.json (Bm25Index.save) + vectors.bin/args.json (SelectableBasicBackend.save) - File names verified distinct (no collision); dense roundtrip bit-stable (no float drift, NFR-002) — both STOP conditions checked, not triggered - Tests: save artifacts/manifest/chunks-format/determinism green; T007 loadFromDisk roundtrip still pending [/please:implement]
…isk cache) - T012: IndexCache in-memory miss now routes through a loadOrBuild seam (default loadOrBuildIndex), so mcp shares the ~/.csp/index/<key> disk cache and keys it identically to cli (omit ref/modelPath when absent). - Watcher stays in-memory-evict only; disk content-hash invalidation owns disk reuse-vs-rebuild → single rebuild, no conflicting cache view. - Add loadOrBuild DI seam to IndexCacheOptions; tests inject a stub to stay off the real ~/.csp home and the network. - Tests: passed (396 pass / 0 fail) [/please:implement]
[/please:implement]
upstream 베이스라인(eacbe43)에 cache.py 부재(인메모리 LRU만) 확인 후 글로벌 ~/.csp/index content-hash 캐시 채택 근거화 + CLAUDE.md repo-local .csp/ 대비 divergence 기록 (T013)
- T014: clear index/all removes the global ~/.csp/index cache root only; clear all runs index removal + clearSavings() as two independent actions, never an rmtree of ~/.csp. Adds resolveIndexRoot/clearIndexCache helpers with AC-015 safety guards (target must end with `index`, not be the home). - Tests: passed [/please:implement]
…etion) [/please:implement]
- T015: align README (EN/KO) + CLAUDE.md to ADR 0002 global ~/.csp/index/ auto-cache - clear index/all now describe real disk-cache deletion; drop stale "not wired up" note - document search/find-related auto-cache + --index override; .load → loadFromDisk - Tests: 404 pass / 0 fail (no code change) [/please:implement]
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 289 |
| Duplication | 28 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request successfully implements the CspIndex orchestrator, enabling hybrid search, index persistence, and a global on-disk content-hash cache under ~/.csp/index/. It also integrates this caching mechanism into both the CLI and MCP server, updates documentation (including ADR 0002), and aligns the test suite. The review feedback correctly identifies several performance concerns where synchronous I/O and process execution (such as spawnSync, statSync, readFileSync, and writeFileSync) are used inside asynchronous functions, recommending a transition to asynchronous APIs to prevent blocking the event loop. However, one comment incorrectly assumed a function was already asynchronous and has been filtered out.
…e perf, manifest validation) - cloneShallow: reject git ref starting with '-' (CWE-88 arg injection) - clearIndexCache: realpathSync before guard so a symlinked index can't redirect the delete outside the cache tree (CWE-61) - tryReuse: compare manifest contentHash before the full loadFromDisk on the cache-miss path (skip loading an index we discard) - loadFromDisk: parseManifest() runtime-validates content/sourceId/modelId, not just schemaVersion (on-disk trust boundary) - regression tests for all four Review: PR #21 (security/performance/types aspects)
There was a problem hiding this comment.
3 issues found and verified against the latest diff
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
… fixes) gemini-code-assist (event-loop blocking): - cloneShallow: spawnSync → execFile/promisify (async git clone) - fromPath/save/loadFromDisk/collectSourceFiles/tryReuse: node:fs → node:fs/promises cubic-dev-ai: - clearIndexCache: enforce realIndexRoot is the DIRECT child of resolved home (symlink to another .../index dir is now refused) [P1] - fromGit: re-root the index at the git URL (not the deleted temp checkout) so persisted manifest sourceId is stable [P2] - agent-memory doc: mark IndexManifest as now validated by parseManifest [P2] - regression tests for direct-child symlink + fromGit root PR #21 review (gemini-code-assist, cubic-dev-ai)
Summary
Wires up the
CspIndexorchestrator and implements csp's index persistence / caching model. Closes #18.Before this PR,
CspIndex.fromPath/fromGitthrewnot yet implementedandsave()/loadFromDisk()did not exist —csp search/indexdid not run end-to-end and there was no index cache. This PR makes the full pipeline work and decides the storage model (ADR 0002).What changed (4 phases, 15 tasks + T0A)
Phase A — orchestrator wiring
search.tsunified onto../types.ts/../tokens.ts; create/dense/sparse type + async fixesCspIndex.fromPath(path validation) +fromGit(shallow clone into0700temp dir, always cleaned up)search/findRelatedwired to the ranking pipeline (kept synchronous — MCP server calls withoutawait)Phase B — explicit-path persistence
save(dir)→manifest.json+chunks.json+bm25.json+vectors.bin+args.jsonloadFromDisk(dir)→ lossless roundtrip (dense save→load verified bit-stable,maxDiff=0), schema-version + missing-artifact validationindex -o/search|find-related --indexrespect explicit pathsPhase C — global content-hash auto-cache
cache.ts:resolveCacheDir,computeContentHash,loadOrBuildIndex,0700hardeningcsp search/find-relatedwithout--indexauto-cache into~/.csp/index/<key>; content-hash invalidationIndexCacheroutes builds through the sameloadOrBuildIndex(CLI↔MCP share one disk cache; watcher = in-memory evict only, no double rebuild)~/.csp/index/chosen; documents that upstream baseline (eacbe43) has nocache.py(in-memory LRU only), so this is a csp-original designPhase D — clear + docs
csp clear indexdeletes only~/.csp/index/(guarded: refuses any path that isn't theindexchild of the home —~/.csp/root andsavings.jsonlare structurally protected);clear all= index + savings as two independent actionsT0A (pre-existing test-suite debt, separately approved)
server.test.tsglobalmock.moduleleak, added canonical camelCase serialization helpers totypes.ts, aligned scaffold tests to the documentedCODE/DOCS/CONFIGcontractVerification
bun test: 404 pass / 0 fail / 0 error (was 316/5/3 at branch base)TS5097errors (the project-wide.ts-extension baseline is unchanged); the 33 remaining are pre-existing scaffold/MCP-SDK typingsclear indexsafety asserted by a real-temp-home test: home dir +savings.jsonlsurviveNotes
jitimissing — pre-existing infra gap).please/docs/decisions/0002-index-storage-cache-model.mdSummary by cubic
Wires up the
CspIndexorchestrator with on‑disk persistence and a global~/.csp/indexcontent‑hash cache. Adds async I/O for non‑blocking ops and tighter cache‑deletion guards;csp index/search/find-relatedrun end to end and reuse indexes across the CLI and MCP server.New Features
CspIndex.fromPath(path validation) andfromGit(shallow clone into a0700temp dir; rejects refs starting with-; manifestsourceIdis the git URL, not the temp checkout).searchandfindRelatedto the ranking pipeline (sync), excluding the seed for related queries.save(dir)andloadFromDisk(dir)with manifest/chunks/BM25/vectors artifacts and validation; dense round‑trips are bit‑stable; manifest parsing validatesschemaVersionand key fields.~/.csp/index/<key>withresolveCacheDir,computeContentHash, andloadOrBuildIndex;csp search/find-relatedauto‑cache when--indexis omitted; MCP shares the same cache. Cache reuse now checks the manifest content hash before load, and async I/O is used across clone/save/load/hash to avoid event‑loop blocking.csp clear indexdeletes only~/.csp/index/with a strengthened guard:realpathand “direct child of~/.csp” enforcement to block symlink escapes;clear allruns index and savings clears separately.Migration
CspIndex.loadFromDisk()(replaces.load()in docs).csp indexrequires-o <dir>for explicit persistence;--index <path>onsearch/find-relatedloads that path and bypasses auto‑cache.ContentType.CODE | DOCS | CONFIG(string values unchanged).Written for commit 3f5a5b2. Summary will update on new commits.
Verification Checklist
bun test통과 — 407 pass / 0 fail / 0 errorcsp index -o→csp search --index경로 배선 + 테스트(cli.test.ts roundtrip)csp clear index후~/.csp/index삭제 +savings.jsonl보존(AC-015 가드 + 실 temp-home 테스트)