Rust port: expand tree-sitter grammar coverage to match upstream language pack

## Context

PR #37 closed two upstream-parity gaps in the Rust port (`crates/csp`): the ranking pipeline is now wired (TD-002) and the chunk length matches upstream (750). The remaining open gap tracked in [`semble.md` §6.2](.please/docs/references/semble.md) is **tree-sitter grammar coverage**.

## Problem

`language_for` in `crates/csp/src/chunking/core.rs` statically links only **~14 grammars**:

> rust, python, javascript, typescript, tsx, go, java, c, cpp, ruby, json, bash, html, css

Upstream semble uses [`tree_sitter_language_pack`](https://github.com/Goldziher/tree-sitter-language-pack) (≈all languages). Meanwhile `EXTENSION_TO_LANGUAGE` (`crates/csp/src/indexing/files.rs`, ~350 entries) recognizes far more languages than the curated grammar set.

**Effect**: a file in a recognized-but-uncurated language (e.g. Rust-side: kotlin, swift, php, scala, lua, …) is still walked and indexed, but falls through to **line-based chunking** instead of AST chunking — coarser, less semantically-aligned chunk boundaries than upstream produces. This is a real behavioral narrowing vs upstream, not just missing recognition.

## Proposed work

1. Decide the target set: full language-pack parity vs an expanded curated set (weigh binary size / build time of pulling in many `tree-sitter-*` crates).
2. Add the chosen grammar crates as deps and extend the `language_for` match arms (+ keep `is_supported_language` in sync).
3. Add chunking tests for the newly-AST-supported languages (mirror the existing `core.rs` grammar tests).
4. Update `semble.md` §4.3 / §6.2 and the constants/coverage notes when the gap closes.

## Acceptance criteria

- [ ] Target grammar set decided and documented (with rationale on binary-size trade-off).
- [ ] The selected languages return `Some(Language)` from `language_for` and are AST-chunked (verified by tests), no longer line-fallback.
- [ ] Quality gate green: `cargo fmt --all && cargo clippy --all-targets --all-features -- -D warnings && cargo test --workspace`.
- [ ] `semble.md` §6.2 item removed / updated.

## References

- `.please/docs/references/semble.md` §4.3 (chunking) and §6.2 (open gaps)
- [ADR-0001 — native tree-sitter](.please/docs/decisions/0001-native-tree-sitter.md)
- Source of truth: Python upstream `MinishLab/semble` (`src/semble/chunking/`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust port: expand tree-sitter grammar coverage to match upstream language pack #38

Context

Problem

Proposed work

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rust port: expand tree-sitter grammar coverage to match upstream language pack #38

Description

Context

Problem

Proposed work

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions