Skip to content

feat: wire up CspIndex orchestrator + global index cache (#18)#21

Merged
amondnet merged 70 commits into
mainfrom
amondnet/wire-up-cspindex-orchestrator-decide-index-persi
Jun 18, 2026
Merged

feat: wire up CspIndex orchestrator + global index cache (#18)#21
amondnet merged 70 commits into
mainfrom
amondnet/wire-up-cspindex-orchestrator-decide-index-persi

Conversation

@amondnet

@amondnet amondnet commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Wires up the CspIndex orchestrator and implements csp's index persistence / caching model. Closes #18.

Before this PR, CspIndex.fromPath/fromGit threw not yet implemented and save()/loadFromDisk() did not exist — csp search/index did not run end-to-end and there was no index cache. This PR makes the full pipeline work and decides the storage model (ADR 0002).

What changed (4 phases, 15 tasks + T0A)

Phase A — orchestrator wiring

  • search.ts unified onto ../types.ts/../tokens.ts; create/dense/sparse type + async fixes
  • CspIndex.fromPath (path validation) + fromGit (shallow clone into 0700 temp dir, always cleaned up)
  • search/findRelated wired to the ranking pipeline (kept synchronous — MCP server calls without await)

Phase B — explicit-path persistence

  • save(dir)manifest.json + chunks.json + bm25.json + vectors.bin + args.json
  • loadFromDisk(dir) → lossless roundtrip (dense save→load verified bit-stable, maxDiff=0), schema-version + missing-artifact validation
  • CLI index -o / search|find-related --index respect explicit paths

Phase C — global content-hash auto-cache

  • New cache.ts: resolveCacheDir, computeContentHash, loadOrBuildIndex, 0700 hardening
  • csp search/find-related without --index auto-cache into ~/.csp/index/<key>; content-hash invalidation
  • MCP IndexCache routes builds through the same loadOrBuildIndex (CLI↔MCP share one disk cache; watcher = in-memory evict only, no double rebuild)
  • ADR 0002 — global ~/.csp/index/ chosen; documents that upstream baseline (eacbe43) has no cache.py (in-memory LRU only), so this is a csp-original design

Phase D — clear + docs

  • csp clear index deletes only ~/.csp/index/ (guarded: refuses any path that isn't the index child of the home — ~/.csp/ root and savings.jsonl are structurally protected); clear all = index + savings as two independent actions
  • README.md / README.ko.md / CLAUDE.md synced to the wired behavior

T0A (pre-existing test-suite debt, separately approved)

  • Fixed server.test.ts global mock.module leak, added canonical camelCase serialization helpers to types.ts, aligned scaffold tests to the documented CODE/DOCS/CONFIG contract

Verification

  • bun test: 404 pass / 0 fail / 0 error (was 316/5/3 at branch base)
  • Typecheck: no new non-TS5097 errors (the project-wide .ts-extension baseline is unchanged); the 33 remaining are pre-existing scaffold/MCP-SDK typings
  • clear index safety asserted by a real-temp-home test: home dir + savings.jsonl survive

Notes

  • ESLint not run in-worktree (jiti missing — pre-existing infra gap)
  • Storage decision recorded in .please/docs/decisions/0002-index-storage-cache-model.md

Summary by cubic

Wires up the CspIndex orchestrator with on‑disk persistence and a global ~/.csp/index content‑hash cache. Adds async I/O for non‑blocking ops and tighter cache‑deletion guards; csp index/search/find-related run end to end and reuse indexes across the CLI and MCP server.

  • New Features

    • Implemented CspIndex.fromPath (path validation) and fromGit (shallow clone into a 0700 temp dir; rejects refs starting with -; manifest sourceId is the git URL, not the temp checkout).
    • Wired search and findRelated to the ranking pipeline (sync), excluding the seed for related queries.
    • Added save(dir) and loadFromDisk(dir) with manifest/chunks/BM25/vectors artifacts and validation; dense round‑trips are bit‑stable; manifest parsing validates schemaVersion and key fields.
    • Introduced disk cache: ~/.csp/index/<key> with resolveCacheDir, computeContentHash, and loadOrBuildIndex; csp search/find-related auto‑cache when --index is omitted; MCP shares the same cache. Cache reuse now checks the manifest content hash before load, and async I/O is used across clone/save/load/hash to avoid event‑loop blocking.
    • csp clear index deletes only ~/.csp/index/ with a strengthened guard: realpath and “direct child of ~/.csp” enforcement to block symlink escapes; clear all runs index and savings clears separately.
    • Recorded the storage decision in ADR 0002; docs/specs updated.
  • Migration

    • Use CspIndex.loadFromDisk() (replaces .load() in docs).
    • csp index requires -o <dir> for explicit persistence; --index <path> on search/find-related loads that path and bypasses auto‑cache.
    • Enum keys are now ContentType.CODE | DOCS | CONFIG (string values unchanged).

Written for commit 3f5a5b2. Summary will update on new commits.

Verification Checklist

  • 각 태스크 RED-GREEN-REFACTOR + bun test 통과 — 407 pass / 0 fail / 0 error
  • Typecheck 게이트(변경 모듈 신규 에러 0): 비-TS5097/80007 에러 33개 불변(전부 선존); T001/T002가 create.ts/search.ts 선존 에러 감소
  • 엔드투엔드 SC-001a: csp index -ocsp search --index 경로 배선 + 테스트(cli.test.ts roundtrip)
  • 자동 캐시 SC-002/003: 2회차 인덱싱 skip(캐시 히트 시 빌드 seam 미호출 단언), 파일 변경 시 content-hash 무효화 재빌드(cache.test.ts)
  • SC-004: csp clear index~/.csp/index 삭제 + savings.jsonl 보존(AC-015 가드 + 실 temp-home 테스트)
  • SC-005: README(영/한) clear·index·savings가 실제 동작과 일치
  • 코드 리뷰(spec/security/perf/types/code/errors/tests) 통과 — 4개 finding 수정 + 회귀 테스트, iteration 2 재검증 clean

amondnet added 30 commits June 17, 2026 22:58
CspIndex 오케스트레이터 배선 및 인덱스 영속화·캐싱 모델 (#18)
spec.md + plan.md (15 tasks, 4 phases) + 워크스페이스 tracks 디렉터리 초기화

Refs #18
…../tokens.ts

- T001: drop local Chunk/SearchResult/tokenize duplicates; import from ./types.ts and ./tokens.ts
- attach toDict to every SearchResult creation point (search/_searchSemantic/_searchBm25)
  via makeResult/chunkToDict so results satisfy the ../types.ts SearchResult contract
  consumed by utils.formatResults
- re-export Chunk/SearchResult to preserve the module's public surface
- Tests: bun test src/search.test.ts → 24 pass (4 new toDict tests); full suite 320 pass, no new failures

[/please:implement]
[/please:implement]
- T002: Bm25Index.build(...) replaces private-ctor `new Bm25Index()` + .index()
- T002: drop invalid 2nd ctor arg `model.dim` (SelectableBasicBackend ctor is (vectors, BasicArgs); dim is derived)
- T002: ContentType.CODE replaces non-existent ContentType.Code
- Tests: full suite unchanged at baseline (320 pass / 5 fail / 3 errors); no new errors
- typecheck: 3 targeted create.ts errors removed; 4 pre-existing async/Chunk-type errors remain (T003 scope)

[/please:implement]
- T002: walkFiles는 async generator이므로 for await로 순회, chunkSource는
  await 후 spread, detectLanguage의 string|undefined를 language ?? null로 전달
- dense.ts/sparse.ts의 로컬 Chunk interface를 제거하고 ../types.ts의 Chunk로
  통합. 기존 importer(dense.test.ts, sparse.test.ts) 호환을 위해 re-export 유지
- Tests: src/indexing 81 pass / 2 fail / 2 errors (baseline 동일, 신규 실패 0)
  · 잔존 실패는 범위 외 makeStubModel/DEFAULT_CONTENT 미export 의존 테스트

[/please:implement]
- T003: fromPath loads model + createIndexFromPath into a populated
  CspIndex; constructor takes {model, semanticIndex, bm25Index, chunks,
  modelPath, root, content}; add stats getter and DEFAULT_CONTENT export.
- save/loadFromDisk are throwing stubs (T006/T007) so cli.ts type-checks
  in Phase A — removes the 2 pre-existing CspIndex.save / loadFromDisk
  type errors at cli.ts:288/415.
- loadModel re-exported as [model, modelPath] tuple (mcp destructures
  [, modelPath]); search/findRelated stay sync stubs ([]) for T004.
- Typecheck: no new non-TS5097 errors in index.ts; mcp/server.ts + cli.ts
  error set unchanged vs baseline (minus the 2 stub errors now fixed).
- Tests: full suite unchanged at baseline (320 pass / 5 fail / 3 errors).

[/please:implement]
- T004: search() guards blank query / topK<=0 / empty index / empty filter
  selector, then delegates to search.ts (sync, no await — MCP parity).
- T004: findRelated() re-embeds the seed and queries the semantic backend,
  excluding the seed chunk.
- Align index.test.ts setup to real module APIs (makeStubModel(4),
  Bm25Index.build, SelectableBasicBackend(vecs), ContentType.CODE);
  behavioral assertions unchanged.
- Export makeStubModel from dense.ts.
- Tests: passed (src/indexing/index.test.ts T004 cases green)

[/please:implement]
findRelated만 seed.chunk를 읽으므로 seed 파라미터를 Chunk | { chunk: Chunk }로 완화.
SearchResult(toDict 필수)를 강제하던 타입 에러(index.test.ts:127) 해소.
JSDoc가 약속한 경로 검증을 구현 — 없는 경로는 'Path does not exist',
파일이면 'Path is not a directory'로 throw (index.test.ts fromPath 에러 케이스).
- T0A(b): add chunkToDict/chunkFromDict/chunkLocation/searchResultToDict
  and ChunkDictInput — camelCase round-trip layer for disk persistence,
  distinct from search.ts's snake_case wire-format toDict
- T0A(c): align types.test.ts/index.test.ts to uppercase enum keys
  (ContentType.CODE/DOCS/CONFIG, CallType.SEARCH/FIND_RELATED) per the
  CLAUDE.md contract; string-value assertions unchanged
- Tests: src/types.test.ts 13 pass, src/index.test.ts 4 pass

[/please:implement]
- T0A(d): add `documents` getter returning per-document token counts
  (one entry per indexed doc) so callers can assert corpus size without
  reaching into private #state; satisfies create.test.ts's
  bm25Index.documents.length === chunks.length
- Tests: src/indexing/create.test.ts 5 pass, src/indexing/sparse.test.ts 17 pass

[/please:implement]
- T0A(a): replace process-wide `mock.module('../indexing/index.ts')`
  (which Bun applies irreversibly at module-load and leaks the stub into
  ../indexing/index.test.ts) with static-method reassignment on the real
  CspIndex class, restored in afterAll. Stub fromPath/fromGit return real
  empty CspIndex instances so `instanceof CspIndex` and empty-index
  `search() === []` still hold
- Tests: server.test.ts 19 pass; indexing/index.test.ts no longer
  regresses in full-suite (stats + fromPath recovered, both orderings)

[/please:implement]
- T0A(e): the fake index result lacked toDict, so utils.formatResults
  (r.toDict()) threw and "csp search formats non-empty results as JSON"
  failed. Add a toDict matching search.ts's snake_case wire format
  (file_path/start_line/end_line/location)
- Tests: src/cli.test.ts 43 pass

[/please:implement]
- Progress: T0A done (351 pass / 3 fail / 0 error; baseline 330/12/1)
- Decision Log: two separate chunk serialization layers; server.test.ts
  DI seam over mock.module
- Surprises: Bun 1.3.10 mock.module is irreversible across files

[/please:implement]
- T005: shallow-clone url into a 0700 temp dir, reuse fromPath pipeline, cleanup in finally
- Tests: passed

[/please:implement]
…ense)

- T006: save(dir) writes manifest.json (schemaVersion/contentHash/sourceId/
  content/modelId) + chunks.json (chunkToDict camelCase) + bm25.json
  (Bm25Index.save) + vectors.bin/args.json (SelectableBasicBackend.save)
- File names verified distinct (no collision); dense roundtrip bit-stable
  (no float drift, NFR-002) — both STOP conditions checked, not triggered
- Tests: save artifacts/manifest/chunks-format/determinism green; T007
  loadFromDisk roundtrip still pending

[/please:implement]
amondnet added 15 commits June 18, 2026 03:19
…isk cache)

- T012: IndexCache in-memory miss now routes through a loadOrBuild seam
  (default loadOrBuildIndex), so mcp shares the ~/.csp/index/<key> disk
  cache and keys it identically to cli (omit ref/modelPath when absent).
- Watcher stays in-memory-evict only; disk content-hash invalidation owns
  disk reuse-vs-rebuild → single rebuild, no conflicting cache view.
- Add loadOrBuild DI seam to IndexCacheOptions; tests inject a stub to
  stay off the real ~/.csp home and the network.
- Tests: passed (396 pass / 0 fail)

[/please:implement]
upstream 베이스라인(eacbe43)에 cache.py 부재(인메모리 LRU만) 확인 후
글로벌 ~/.csp/index content-hash 캐시 채택 근거화 + CLAUDE.md repo-local
.csp/ 대비 divergence 기록 (T013)
- T014: clear index/all removes the global ~/.csp/index cache root only;
  clear all runs index removal + clearSavings() as two independent actions,
  never an rmtree of ~/.csp. Adds resolveIndexRoot/clearIndexCache helpers
  with AC-015 safety guards (target must end with `index`, not be the home).
- Tests: passed

[/please:implement]
- T015: align README (EN/KO) + CLAUDE.md to ADR 0002 global ~/.csp/index/ auto-cache
- clear index/all now describe real disk-cache deletion; drop stale "not wired up" note
- document search/find-related auto-cache + --index override; .load → loadFromDisk
- Tests: 404 pass / 0 fail (no code change)

[/please:implement]
@codacy-production

codacy-production Bot commented Jun 17, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 289 complexity · 28 duplication

Metric Results
Complexity 289
Duplication 28

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully implements the CspIndex orchestrator, enabling hybrid search, index persistence, and a global on-disk content-hash cache under ~/.csp/index/. It also integrates this caching mechanism into both the CLI and MCP server, updates documentation (including ADR 0002), and aligns the test suite. The review feedback correctly identifies several performance concerns where synchronous I/O and process execution (such as spawnSync, statSync, readFileSync, and writeFileSync) are used inside asynchronous functions, recommending a transition to asynchronous APIs to prevent blocking the event loop. However, one comment incorrectly assumed a function was already asynchronous and has been filtered out.

Comment thread src/indexing/index.ts Outdated
Comment thread src/indexing/index.ts Outdated
Comment thread src/indexing/index.ts
Comment thread src/indexing/index.ts
Comment thread src/indexing/index.ts
Comment thread src/indexing/cache.ts Outdated
Comment thread src/indexing/cache.ts
Comment thread src/indexing/cache.ts
amondnet added 2 commits June 18, 2026 04:10
…e perf, manifest validation)

- cloneShallow: reject git ref starting with '-' (CWE-88 arg injection)
- clearIndexCache: realpathSync before guard so a symlinked index can't
  redirect the delete outside the cache tree (CWE-61)
- tryReuse: compare manifest contentHash before the full loadFromDisk on the
  cache-miss path (skip loading an index we discard)
- loadFromDisk: parseManifest() runtime-validates content/sourceId/modelId,
  not just schemaVersion (on-disk trust boundary)
- regression tests for all four

Review: PR #21 (security/performance/types aspects)
@amondnet amondnet marked this pull request as ready for review June 17, 2026 19:13

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found and verified against the latest diff

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/indexing/cache.ts Outdated
Comment thread src/indexing/index.ts Outdated
… fixes)

gemini-code-assist (event-loop blocking):
- cloneShallow: spawnSync → execFile/promisify (async git clone)
- fromPath/save/loadFromDisk/collectSourceFiles/tryReuse: node:fs → node:fs/promises
cubic-dev-ai:
- clearIndexCache: enforce realIndexRoot is the DIRECT child of resolved home
  (symlink to another .../index dir is now refused) [P1]
- fromGit: re-root the index at the git URL (not the deleted temp checkout) so
  persisted manifest sourceId is stable [P2]
- agent-memory doc: mark IndexManifest as now validated by parseManifest [P2]
- regression tests for direct-child symlink + fromGit root

PR #21 review (gemini-code-assist, cubic-dev-ai)
@amondnet amondnet self-assigned this Jun 18, 2026
@amondnet amondnet merged commit da945a5 into main Jun 18, 2026
2 checks passed
@amondnet amondnet deleted the amondnet/wire-up-cspindex-orchestrator-decide-index-persi branch June 18, 2026 00:13
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire up CspIndex orchestrator + decide index persistence/caching model

1 participant