Skip to content

feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON#57

Merged
flupkede merged 29 commits into
masterfrom
features/support_additional_tree_sitter
May 20, 2026
Merged

feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON#57
flupkede merged 29 commits into
masterfrom
features/support_additional_tree_sitter

Conversation

@flupkede

Copy link
Copy Markdown
Owner

This PR adds support for 5 new tree-sitter grammars to expand codebase parsing capabilities.

Changes

  • Added tree-sitter-bash (0.25.1)
  • Added tree-sitter-ruby (0.23.1)
  • Added tree-sitter-php (0.24.2)
  • Added tree-sitter-yaml (0.7.2)
  • Added tree-sitter-json (0.24.8)

Files Modified

  • Cargo.toml - Added 5 new dependencies
  • Cargo.lock - Updated with new dependency entries
  • src/chunker/grammar.rs - Added grammar loading and tests for 5 new languages
  • src/file/language.rs - Added languages to is_supported_for_tree_sitter() and updated test

Validation

  • cargo fmt check passed
  • cargo check passed
  • cargo clippy passed
  • cargo test --lib passed (394 passed, 12 ignored)

flupkede and others added 29 commits May 15, 2026 18:14
feat: add bash equivalents of QC and bump-version scripts
Replace `index_quiet()` background task (which opens its own LMDB handle) with
inline `SharedStores::new()` + background `force_reindex_with_stores()`. This
prevents the race where `get_or_open_stores()` tries to open a second LMDB
handle on the same path while `index_quiet` holds the first one.

Key changes:
- Open SharedStores inline (fast, just LMDB+Tantivy handle creation)
- Store as RepoState::Write immediately (prevents double-open from other handlers)
- Spawn force_reindex_with_stores + vector build + restart_fsw in background
- Add active_reindexes guard for concurrency protection
- Clean up config entry on failure
In serve mode, with_vector_store_read_for() no longer falls back to opening a standalone VectorStore when shared_stores read fails. This prevents "environment already opened" errors from LMDB.
- stop_fsw now returns None for Readonly repos instead of passing
  readonly stores into the force-reindex path. Callers fall through
  to try_open_stores(create_if_missing=true) which fails clearly
  when the write lock is held.
- touch_access doc: warmup_repo now calls touch_access after
  successful warmup; update doc to reflect current callers.
fix: centralize DB open/create logic, fix force-reindex on missing DB
When a project's repo name equals a Python package subdirectory name
(e.g. project 'aprimo_mcp', package at aprimo_mcp/config.py),
strip_alias_prefix was too aggressive: it turned 'aprimo_mcp/config.py'
into 'config.py', which then resolved to the wrong absolute path and
yielded no chunks.

Fix: extract outline_items_for_normalized helper and add a second lookup
in file_outline -- if the stripped path returns empty AND stripping changed
the path, retry with the original (un-stripped) relative path. This is
harmless for the normal case (alias != subdir name) and restores correct
behaviour for the aliased-subdir case.

Tested: cargo check + cargo clippy clean.
- Improve doc comment: clarify that multi-store mode never returns Err
  (per-store I/O failures are logged+skipped), only single-store does.
- Replace unwrap_or_default() on fallback path with explicit match that
  emits a warn! log when the fallback itself fails, so operators can see
  the failure instead of silently getting an empty result.
…ndexing status

Phase 2: pre-mark all queued candidates as CSharpIndexStatus::Indexing immediately
after evaluation (before semaphore acquisition), so repos waiting in the queue
already show "C#…" in the TUI instead of "C#·" or nothing.

Phase 3: set csharp_index_status = Indexing before batch-find-refs starts and
restore to Ready on completion (success or failure). Intentionally does NOT touch
active_reindexes — pre-warm should not block HTTP /reindex and should not override
the repo label (Warm/Open). This makes the 30-60s batch-find-refs operation visible
in the TUI as "C#…" instead of the silent "◐ warm C#·" that was shown before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enterprise C# repos with thousands of symbols + Phase-3 ref_cache can
exceed the old 64 MB LMDB limit, causing MDB_MAP_FULL errors when
find_refs_for_canonical_key writes to scip_ref_cache. This made
find_impact fail for all cached-ref writes on affected repos.

The 64 MB limit was listed as a known follow-up issue. Fix:
- Default map_size → 512 MB (virtual address space, no RAM cost)
- Add CODESEARCH_SCIP_LMDB_MAP_MB env-var override for tuning
- Constants SCIP_LMDB_DEFAULT_MAP_SIZE_MB / SCIP_LMDB_MAP_SIZE_MB_ENV
  follow the same pattern as the embed-cache and vector-DB size vars

Also update AGENTS.md: mark map_size as resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rride

Guards against accidental regression of SCIP_LMDB_DEFAULT_MAP_SIZE_MB (must
stay at 512 MB — the old 64 MB limit caused MDB_MAP_FULL on enterprise repos
once Phase-3 ref_cache writes were introduced).

Also verifies that the CODESEARCH_SCIP_LMDB_MAP_MB env-var override reaches
open_scip_env() without panicking: a 16 MB override on an empty DB must
return Ok(empty) from find_references.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_size

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…code removal

- add_repo_handler conflict branch: also unregister alias from repos.json when
  active_reindexes.insert returns false; previously the config entry was left
  orphaned (no open stores) until server restart

- try_open_stores: rename `create_if_missing` param to `allow_create` for
  clarity — the old name implied write-failure would fall back to readonly,
  which it does not when allow_create=true

- evaluate_csharp_rebuild: replace (bool, &'static str) return type with a
  proper RebuildDecision enum; removes the fragile string equality check
  `reason == "fresh, last_scip>=last_changed"` at the call site

- trigger_symbol_rebuild: document the known TUI-flash race when FSW-SCIP
  rebuild and HTTP /reindex fire concurrently for the same alias (DashSet
  idempotency means no data corruption; cosmetic-only)

- lmdb_registry: remove dead-code is_registered() and tracked_count() fns
  that were never called in production or tests

- AddRepoRequest: remove dead `global: bool` field (global indexing not
  implemented; serde will silently ignore the field if clients send it)

- Remove accidentally committed CTempbase_mcp.rs temp file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All three add_repo_handler cleanup paths that call config.save() after
rolling back a failed registration were silently discarding I/O errors.
Replace `let _ = config.save()` with a `tracing::warn!` so disk failures
during error-path cleanup are visible in logs — consistent with the
remove_repo_handler which already logs save failures.

Affects three sites: open-DB failure, concurrent-reindex conflict, and
background-task index failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace "aprimo_mcp" example in mcp/mod.rs comment with a generic
"my_pkg" placeholder. Remove client name from .claude/CLAUDE.md
todo item.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@flupkede flupkede merged commit 0646c44 into master May 20, 2026
1 of 3 checks passed
@flupkede flupkede deleted the features/support_additional_tree_sitter branch June 10, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant