Skip to content

fix(chunking): enable AST chunking by wiring real ALL_LANGUAGES (#28)#31

Merged
amondnet merged 2 commits into
mainfrom
fix/enable-ast-chunking-28
Jun 18, 2026
Merged

fix(chunking): enable AST chunking by wiring real ALL_LANGUAGES (#28)#31
amondnet merged 2 commits into
mainfrom
fix/enable-ast-chunking-28

Conversation

@amondnet

@amondnet amondnet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #28. AST-based chunking was silently disabled: ALL_LANGUAGES in src/chunking/core.ts was an empty stub Set, so isSupportedLanguage() always returned false and src/chunking/chunk-source.ts never took the tree-sitter AST path — every file fell back to line chunking.

Change

  • src/chunking/core.ts: removed the empty stub const ALL_LANGUAGES (+ its 3-line stub comment) and replaced it with import { ALL_LANGUAGES } from '../indexing/files.ts' — the real, populated set derived from EXTENSION_TO_LANGUAGE. isSupportedLanguage is unchanged (still ALL_LANGUAGES.has(language)).
  • src/chunking/core.test.ts: the isSupportedLanguage test asserted the broken stub behavior ('typescript'false). Corrected to expect true for known languages (typescript, python) and false for 'not-a-real-language'. This is a correction of a test that locked in broken stub behavior, not a weakening.

No import cycle

Dependency direction is one-way: indexing → chunking (indexing/create.ts imports chunking/chunk-source.ts). src/indexing/files.ts is a pure leaf data module — it imports nothing. Confirmed safe by passing typecheck, full test suite, and runtime CLI.

Graceful degradation

chunk-source.ts gates on isSupportedLanguage then calls chunk(), which returns null when the tree-sitter parser is unavailable, falling through to chunkLines. So enabling AST chunking is safe even without the tree-sitter dependency installed.

Verification

  • bun run typecheck — pass
  • bun test — 408 pass / 0 fail
  • bun run lint — clean
  • E2E CLI smoke: csp index src then csp search "search" -k 3 — exit 0, non-empty results

Summary by cubic

Enables AST-based chunking by wiring the real ALL_LANGUAGES and moves language tables to src/languages.ts to remove the chunking↔indexing cycle. Restores correct language support and closes #28.

  • Bug Fixes
    • Replaced stubbed ALL_LANGUAGES with the populated import from src/languages.ts, so isSupportedLanguage() recognizes real languages.
    • Updated tests to expect true for known languages (typescript, python) and false for unknown ones.
    • Kept safe fallback: if a parser isn’t available, we still fall back to line chunking.

Written for commit 9610883. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed language support detection to correctly recognize TypeScript and Python as supported.
  • Tests
    • Updated language support detection tests to validate against the real shared set of supported languages.
  • Refactor
    • Aligned language-related imports across indexing and test code to use the shared language definitions.

ALL_LANGUAGES in src/chunking/core.ts was an empty stub Set, so
isSupportedLanguage() always returned false and chunk-source.ts never
took the tree-sitter AST path -- every file silently fell back to line
chunking.

Replace the stub with an import of the real, populated set from
src/indexing/files.ts. The dependency direction is one-way
(indexing -> chunking; files.ts imports nothing), so no cycle is
introduced -- confirmed by passing typecheck, the full test suite, and
runtime CLI index/search.

Also correct core.test.ts, which asserted the broken stub behavior
(isSupportedLanguage('typescript') === false). It now expects true for
known languages (typescript, python) and false for unknown ones.

Closes #28
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 69df42a6-69db-4735-8a10-7d6ef52cdbdb

📥 Commits

Reviewing files that changed from the base of the PR and between 5ca6a05 and 9610883.

📒 Files selected for processing (5)
  • src/chunking/core.ts
  • src/indexing/cache.ts
  • src/indexing/create.ts
  • src/languages.test.ts
  • src/languages.ts
✅ Files skipped from review due to trivial changes (2)
  • src/indexing/cache.ts
  • src/languages.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/chunking/core.ts

📝 Walkthrough

Walkthrough

Language utilities are consolidated into a new centralized languages.ts module. All existing references across indexing and testing modules redirect their imports to this new home. The core.ts stub for ALL_LANGUAGES is replaced with an import from languages.ts, enabling isSupportedLanguage() to recognize supported languages and activate AST chunking. Tests are updated to reflect the corrected behavior.

Changes

Language utilities consolidation and AST chunking enablement

Layer / File(s) Summary
Redirect language utility imports to languages.ts
src/indexing/cache.ts, src/indexing/create.ts, src/languages.test.ts
cache.ts and create.ts update getExtensions (and detectLanguage in create.ts) imports to come from ../languages.ts instead of ./files.ts. languages.test.ts updates its import source similarly.
Wire real ALL_LANGUAGES and update isSupportedLanguage tests
src/chunking/core.ts, src/chunking/core.test.ts
core.ts removes the empty ReadonlySet stub and imports ALL_LANGUAGES from ../languages.ts, enabling isSupportedLanguage() to return accurate results. core.test.ts updates assertions to expect typescript and python to return true.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 A stub once sat empty, cold, and forlorn,
Blocking the AST chunking light at dawn.
I gathered the languages, gave them a home,
In languages.ts, no longer to roam.
TypeScript and Python now parse with tree-sitter's might! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: enabling AST chunking by wiring the real ALL_LANGUAGES constant and fixing the root cause of the disabled functionality.
Linked Issues check ✅ Passed All acceptance criteria from issue #28 are met: ALL_LANGUAGES is imported from the real source [#28], no import cycle introduced through modular restructuring [#28], test updated to expect true for supported languages [#28], and E2E verification confirms AST chunking functionality [#28].
Out of Scope Changes check ✅ Passed All changes are within scope: test updates validate language support, import path consolidation to languages.ts eliminates circular dependencies, and no unrelated refactoring is present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/enable-ast-chunking-28

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 18, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 29 complexity · 0 duplication

Metric Results
Complexity 29
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the stub for ALL_LANGUAGES in src/chunking/core.ts with an import from ../indexing/files.ts and updates the corresponding tests in src/chunking/core.test.ts to assert that 'typescript' and 'python' are supported languages. The review feedback points out that importing from ../indexing/files.ts introduces a package-level circular dependency between the chunking and indexing modules, and suggests extracting the shared language definitions into a separate, lower-level module to maintain clean architectural boundaries.

Comment thread src/chunking/core.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant Index as Indexing Pipeline
    participant ChunkSource as chunk-source.ts
    participant Core as core.ts
    participant Files as src/indexing/files.ts
    participant TreeSitter as Tree-sitter (optional)
    participant ChunkLines as chunkLines (fallback)

    Note over Index,ChunkLines: AST Chunking Flow

    Index->>ChunkSource: chunkDocument(document, language)
    ChunkSource->>Core: isSupportedLanguage(language)
    Core->>Files: import ALL_LANGUAGES
    Files-->>Core: populated Set of known languages (e.g., typescript, python)
    Core-->>ChunkSource: true (known) or false (unknown)

    alt isSupportedLanguage returns true
        ChunkSource->>TreeSitter: chunk(document, language)
        alt Tree-sitter parser available
            TreeSitter-->>ChunkSource: AST chunks
            ChunkSource-->>Index: return AST chunks
        else Tree-sitter not installed
            TreeSitter-->>ChunkSource: null
            ChunkSource->>ChunkLines: chunkLines(document)
            ChunkLines-->>ChunkSource: line-based chunks
            ChunkSource-->>Index: return line chunks
        end
    else isSupportedLanguage returns false
        ChunkSource->>ChunkLines: chunkLines(document)
        ChunkLines-->>ChunkSource: line-based chunks
        ChunkSource-->>Index: return line chunks
    end

    Note over Core,Files: No import cycle: indexing uses chunking, files is leaf module
Loading

Re-trigger cubic

@amondnet amondnet self-assigned this Jun 18, 2026
…chunking↔indexing cycle (#28)

Move src/indexing/files.ts → src/languages.ts (a dependency-free leaf) and
repoint chunking/core.ts, indexing/cache.ts, indexing/create.ts at it.

Previously chunking/core.ts imported ALL_LANGUAGES from ../indexing/files.ts
while indexing/create.ts imports ../chunking/chunk-source.ts, forming a
package-level circular dependency (ADP violation, flagged by gemini-code-assist
on #31). languages.ts has no internal imports, so both chunking and indexing
now depend on a lower-level leaf and the directory cycle is gone.

No behavior change: pure module relocation + import-path updates.
@amondnet amondnet merged commit 900de35 into main Jun 18, 2026
5 checks passed
@amondnet amondnet deleted the fix/enable-ast-chunking-28 branch June 18, 2026 08:13
@pleaseai-bot pleaseai-bot Bot mentioned this pull request Jun 18, 2026
@pleaseai-bot pleaseai-bot Bot mentioned this pull request Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AST chunking never runs — core.ts ALL_LANGUAGES is an empty stub set

1 participant