fix(chunking): enable AST chunking by wiring real ALL_LANGUAGES (#28)#31
Conversation
ALL_LANGUAGES in src/chunking/core.ts was an empty stub Set, so
isSupportedLanguage() always returned false and chunk-source.ts never
took the tree-sitter AST path -- every file silently fell back to line
chunking.
Replace the stub with an import of the real, populated set from
src/indexing/files.ts. The dependency direction is one-way
(indexing -> chunking; files.ts imports nothing), so no cycle is
introduced -- confirmed by passing typecheck, the full test suite, and
runtime CLI index/search.
Also correct core.test.ts, which asserted the broken stub behavior
(isSupportedLanguage('typescript') === false). It now expects true for
known languages (typescript, python) and false for unknown ones.
Closes #28
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughLanguage utilities are consolidated into a new centralized ChangesLanguage utilities consolidation and AST chunking enablement
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 29 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
This pull request replaces the stub for ALL_LANGUAGES in src/chunking/core.ts with an import from ../indexing/files.ts and updates the corresponding tests in src/chunking/core.test.ts to assert that 'typescript' and 'python' are supported languages. The review feedback points out that importing from ../indexing/files.ts introduces a package-level circular dependency between the chunking and indexing modules, and suggests extracting the shared language definitions into a separate, lower-level module to maintain clean architectural boundaries.
There was a problem hiding this comment.
No issues found across 2 files
Architecture diagram
sequenceDiagram
participant Index as Indexing Pipeline
participant ChunkSource as chunk-source.ts
participant Core as core.ts
participant Files as src/indexing/files.ts
participant TreeSitter as Tree-sitter (optional)
participant ChunkLines as chunkLines (fallback)
Note over Index,ChunkLines: AST Chunking Flow
Index->>ChunkSource: chunkDocument(document, language)
ChunkSource->>Core: isSupportedLanguage(language)
Core->>Files: import ALL_LANGUAGES
Files-->>Core: populated Set of known languages (e.g., typescript, python)
Core-->>ChunkSource: true (known) or false (unknown)
alt isSupportedLanguage returns true
ChunkSource->>TreeSitter: chunk(document, language)
alt Tree-sitter parser available
TreeSitter-->>ChunkSource: AST chunks
ChunkSource-->>Index: return AST chunks
else Tree-sitter not installed
TreeSitter-->>ChunkSource: null
ChunkSource->>ChunkLines: chunkLines(document)
ChunkLines-->>ChunkSource: line-based chunks
ChunkSource-->>Index: return line chunks
end
else isSupportedLanguage returns false
ChunkSource->>ChunkLines: chunkLines(document)
ChunkLines-->>ChunkSource: line-based chunks
ChunkSource-->>Index: return line chunks
end
Note over Core,Files: No import cycle: indexing uses chunking, files is leaf module
…chunking↔indexing cycle (#28) Move src/indexing/files.ts → src/languages.ts (a dependency-free leaf) and repoint chunking/core.ts, indexing/cache.ts, indexing/create.ts at it. Previously chunking/core.ts imported ALL_LANGUAGES from ../indexing/files.ts while indexing/create.ts imports ../chunking/chunk-source.ts, forming a package-level circular dependency (ADP violation, flagged by gemini-code-assist on #31). languages.ts has no internal imports, so both chunking and indexing now depend on a lower-level leaf and the directory cycle is gone. No behavior change: pure module relocation + import-path updates.
Summary
Fixes #28. AST-based chunking was silently disabled:
ALL_LANGUAGESinsrc/chunking/core.tswas an empty stubSet, soisSupportedLanguage()always returnedfalseandsrc/chunking/chunk-source.tsnever took the tree-sitter AST path — every file fell back to line chunking.Change
src/chunking/core.ts: removed the empty stubconst ALL_LANGUAGES(+ its 3-line stub comment) and replaced it withimport { ALL_LANGUAGES } from '../indexing/files.ts'— the real, populated set derived fromEXTENSION_TO_LANGUAGE.isSupportedLanguageis unchanged (stillALL_LANGUAGES.has(language)).src/chunking/core.test.ts: theisSupportedLanguagetest asserted the broken stub behavior ('typescript'→false). Corrected to expecttruefor known languages (typescript,python) andfalsefor'not-a-real-language'. This is a correction of a test that locked in broken stub behavior, not a weakening.No import cycle
Dependency direction is one-way:
indexing → chunking(indexing/create.tsimportschunking/chunk-source.ts).src/indexing/files.tsis a pure leaf data module — it imports nothing. Confirmed safe by passing typecheck, full test suite, and runtime CLI.Graceful degradation
chunk-source.tsgates onisSupportedLanguagethen callschunk(), which returnsnullwhen the tree-sitter parser is unavailable, falling through tochunkLines. So enabling AST chunking is safe even without the tree-sitter dependency installed.Verification
bun run typecheck— passbun test— 408 pass / 0 failbun run lint— cleancsp index srcthencsp search "search" -k 3— exit 0, non-empty resultsSummary by cubic
Enables AST-based chunking by wiring the real
ALL_LANGUAGESand moves language tables tosrc/languages.tsto remove the chunking↔indexing cycle. Restores correct language support and closes #28.ALL_LANGUAGESwith the populated import fromsrc/languages.ts, soisSupportedLanguage()recognizes real languages.truefor known languages (typescript,python) andfalsefor unknown ones.Written for commit 9610883. Summary will update on new commits.
Summary by CodeRabbit