feat: add C, C++, Kotlin, Swift, Scala, Bash language support#708
feat: add C, C++, Kotlin, Swift, Scala, Bash language support#708carlos-alm merged 18 commits intomainfrom
Conversation
…pth duplication The walk_node/walk_node_depth pattern was duplicated identically across all 9 language extractors (~190 lines of boilerplate). Each extractor repeated the same depth check, match dispatch, and child traversal loop — only the match arms differed. Add a generic `walk_tree<F>` function to helpers.rs that handles depth limiting and child recursion, accepting a closure for language-specific node matching. Refactor all 9 extractors and 7 type map walkers to use it. Zero-cost abstraction (monomorphized, no dyn dispatch).
Add 6 high-demand languages to both the TypeScript (WASM) and Rust (native) engines: C, C++, Kotlin, Swift, Scala, Bash. Each language gets: - TypeScript extractor (src/extractors/<lang>.ts) - Rust extractor (crates/codegraph-core/src/extractors/<lang>.rs) - LANGUAGE_REGISTRY entry + LanguageKind enum variant - Complexity rules (LangRules + HalsteadRules) - CFG rules (CfgRules) - AST config (LangAstConfig) - Parser tests (tests/parsers/<lang>.test.ts) Key grammar quirks handled: - Swift tree-sitter uses class_declaration for class/struct/enum - Kotlin tree-sitter uses class_declaration for class/interface - Scala import_declaration has alternating identifier/dot children 317/317 parser tests pass (272 existing + 45 new), zero regressions.
Greptile SummaryThis PR adds first-class support for C, C++, Kotlin, Swift, Scala, and Bash across all layers of the codegraph stack: TypeScript (WASM) extractors, Rust (native) extractors, parser registry entries, complexity rules ( All previously-flagged issues (CPP/Swift AST_CONFIG shadowing, Kotlin Remaining findings:
Confidence Score: 4/5Safe to merge after fixing the Bash complexity omission; the grouped Scala import and Kotlin CFG issues are non-blocking but should be tracked. All previously-raised P0/P1 issues have been resolved. One new P1 remains (Bash c_style_for_statement/until_statement missing from complexity branch_nodes) that will silently undercount complexity metrics. Two P2 issues do not block functionality but represent observable behavioral discrepancies. Score is 4 rather than 5 because of the one confirmed P1 complexity-metric bug. crates/codegraph-core/src/complexity.rs (BASH_RULES branch_nodes/nesting_nodes), src/extractors/scala.ts (handleScalaImportDecl), crates/codegraph-core/src/cfg.rs (KOTLIN_CFG jump_expression) Important Files Changed
Reviews (3): Last reviewed commit: "fix(ci): increase embedding regression t..." | Re-trigger Greptile |
| const CPP_AST_CONFIG: LangAstConfig = LangAstConfig { | ||
| call_types: &["call_expression"], | ||
| new_types: &["new_expression"], | ||
| throw_types: &["throw_statement"], | ||
| await_types: &[], | ||
| string_types: &["string_literal", "raw_string_literal"], | ||
| regex_types: &[], | ||
| quote_chars: &['"'], | ||
| string_prefixes: &[], | ||
| }; |
There was a problem hiding this comment.
Local CPP_AST_CONFIG shadows the richer helpers.rs version
cpp.rs defines a private const CPP_AST_CONFIG with await_types: &[] and string_prefixes: &[]. Because the module also does use super::helpers::*, the local binding shadows the pub const CPP_AST_CONFIG in helpers.rs, which was added with await_types: &["co_await_expression"] and string_prefixes: &['L', 'u', 'U', 'R'].
As a result, the native (Rust) extractor will never classify C++ co_await expressions as await-type AST nodes, and wide/unicode string literals (L"...", u"...", U"...", R"...") won't be recognized as prefixed strings — both silently omitted from the analysis.
The same shadowing issue exists for swift.rs (const SWIFT_AST_CONFIG line 20).
There was a problem hiding this comment.
Fixed. Removed all local const *_AST_CONFIG definitions from cpp.rs, swift.rs, c.rs, kotlin.rs, scala.rs, and bash.rs. All six extractors now use the pub const versions from helpers.rs via use super::helpers::*. The cpp.rs shadow was silently dropping co_await_expression and C++ string prefixes (L, u, U, R).
| function handleKotlinObjectDecl(node: TreeSitterNode, ctx: ExtractorOutput): void { | ||
| const nameNode = findChild(node, 'type_identifier'); | ||
| if (!nameNode) return; | ||
| ctx.definitions.push({ | ||
| name: nameNode.text, | ||
| kind: 'module', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| }); | ||
|
|
There was a problem hiding this comment.
There was a problem hiding this comment.
Fixed. Changed the TS extractor to emit \ for Kotlin , matching the Rust extractor. Kotlin objects are singletons with class-like semantics (properties, methods, interface implementation), so \ is the correct kind. Updated the test expectation accordingly.
| // Inheritance: inheritance_specifier nodes are DIRECT children of class_declaration | ||
| for (let i = 0; i < node.childCount; i++) { | ||
| const child = node.child(i); | ||
| if (!child || child.type !== 'inheritance_specifier') continue; | ||
| // inheritance_specifier > user_type > type_identifier | ||
| const userType = findChild(child, 'user_type'); | ||
| if (userType) { | ||
| const typeId = findChild(userType, 'type_identifier'); | ||
| if (typeId) { | ||
| ctx.classes.push({ | ||
| name, | ||
| extends: typeId.text, | ||
| line: node.startPosition.row + 1, | ||
| }); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Swift multiple-inheritance always emits
extends, never implements
The loop over inheritance_specifier children unconditionally pushes { extends: typeId.text } for every inherited type. The Rust extractor correctly uses a first flag to distinguish the superclass from subsequent protocol conformances. The TS extractor should do the same.
There was a problem hiding this comment.
Fixed. Added a first flag to the TS Swift inheritance loop: first inheritance_specifier emits extends (superclass), subsequent ones emit implements (protocol conformances). This matches the Rust extractor behavior.
| function handleScalaValVarDef(node: TreeSitterNode, ctx: ExtractorOutput): void { | ||
| // Only handle top-level vals/vars | ||
| if (node.parent?.type === 'template_body') return; | ||
| const pattern = node.childForFieldName('pattern'); | ||
| if (!pattern) return; | ||
| const nameNode = | ||
| pattern.type === 'identifier' ? pattern : findChild(pattern, 'identifier'); | ||
| if (!nameNode) return; | ||
| ctx.definitions.push({ | ||
| name: nameNode.text, | ||
| kind: 'function', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), |
There was a problem hiding this comment.
Fixed. Scala val_definition now emits kind "constant", var_definition emits "variable". Swift property_declaration now checks for let/var keyword child: let -> "constant", var -> "variable". Previously both incorrectly emitted "function".
| pub static KOTLIN_RULES: LangRules = LangRules { | ||
| branch_nodes: &["if_expression", "for_statement", "while_statement", "do_while_statement", "catch_block", "when_expression", "when_entry"], | ||
| case_nodes: &["when_entry"], | ||
| logical_operators: &["&&", "||"], | ||
| logical_node_type: "conjunction_expression", | ||
| optional_chain_type: Some("safe_navigation"), | ||
| nesting_nodes: &["if_expression", "for_statement", "while_statement", "do_while_statement", "catch_block", "when_expression"], | ||
| function_nodes: &["function_declaration"], | ||
| if_node_type: Some("if_expression"), | ||
| else_node_type: None, | ||
| elif_node_type: None, |
There was a problem hiding this comment.
Fixed. Changed LangRules.logical_node_type from a single &str to logical_node_types: &[&str]. Kotlin now specifies &["conjunction_expression", "disjunction_expression"] so both && and || branches are counted in complexity metrics. All other languages updated to use the slice form with their existing single type.
| fn node_text_raw(node: &Node) -> &str { | ||
| // We only check kind, not text content for keywords | ||
| node.kind() | ||
| } | ||
|
|
||
| fn find_kotlin_parent_class<'a>(node: &Node<'a>, source: &[u8]) -> Option<String> { | ||
| let mut current = node.parent(); | ||
| while let Some(parent) = current { | ||
| match parent.kind() { | ||
| "class_declaration" | "object_declaration" => { |
There was a problem hiding this comment.
Fixed. Removed the misleading node_text_raw function entirely. Its only call site now uses mod_child.kind() directly, which is what the function was actually returning.
…708) Remove private const *_AST_CONFIG in c.rs, cpp.rs, kotlin.rs, swift.rs, scala.rs, bash.rs that shadowed the richer pub versions in helpers.rs. The cpp.rs shadow silently dropped co_await_expression and string prefixes.
…708) Both TS and Rust extractors now emit kind: 'class' for Kotlin object declarations, matching Kotlin semantics (singletons are class-like).
First inheritance_specifier is superclass (extends), subsequent ones are protocol conformances (implements), matching the Rust extractor behavior. Also remove unused isClass variable.
Scala val_definition → 'constant', var_definition → 'variable'. Swift property_declaration: let → 'constant', var → 'variable'. Previously both emitted 'function' incorrectly.
|
@greptileai review |
tree-sitter-kotlin 0.3.x exports language() function, not LANGUAGE const.
…708) The new tree-sitter grammar packages have conflicting peer dependency requirements (0.21 vs 0.25). legacy-peer-deps allows npm install to succeed in CI.
… linking (#708) - Add 'variable' and 'namespace' to SymbolKind type union - Use transmute instead of extern C redeclaration for tree-sitter-kotlin bridge to fix Windows linker failure (unresolved external symbol) - Fix kotlin.rs test to use LanguageKind::Kotlin.tree_sitter_language()
… duplicate symbol errors (#708) tree-sitter-kotlin 0.3 depends on tree-sitter 0.20, which bundles its own C runtime (ts_language_*, ts_lexer_*) that conflicts with tree-sitter 0.24's copy at link time on Linux/Windows. tree-sitter-kotlin-sg 0.4 uses tree-sitter-language (no bundled C runtime), eliminating the duplicate symbols. Same upstream grammar (fwcd/tree-sitter-kotlin) so all node types are identical. Also adds namespace and variable to DEFAULT_NODE_COLORS in colors.ts to satisfy the Record<AnyNodeKind, string> constraint after the SymbolKind expansion.
… engine (#708) - trait_definition kind: 'trait' -> 'interface' (matches Rust) - object_definition kind: 'module' -> 'class' (matches Rust) - Inheritance: use found_extends flag to distinguish extends vs implements - Skip function-local val/var in Scala and let/var in Swift extractors - Update test expectations accordingly
The beforeAll hook was timing out at 120s on macOS CI runners due to slow model download. Doubled to 240s for headroom.
|
Addressed all Greptile round 2 findings: P1: trait_definition kind mismatch — Changed TS Scala extractor from P1: object_definition kind mismatch — Changed TS Scala extractor from P1: Scala inheritance mismatch — Added P2: Function-local bindings leaking as top-level definitions — Added guards in both extractors:
CI: embedding-regression timeout — Increased |
…ed (#708) The impact workflow restores a cached .codegraph/ directory that can become corrupted. Added fallback: if incremental build fails, delete the cache and rebuild from scratch.
Summary
class_declarationfor class/struct/enum, Kotlin usesclass_declarationfor class/interface, Scalaimport_declarationhas alternating identifier/dot childrenChanges per layer
src/extractors/{c,cpp,kotlin,swift,scala,bash}.tscrates/codegraph-core/src/extractors/{c,cpp,kotlin,swift,scala,bash}.rstypes.ts,parser.ts,index.ts,build-wasm.ts,package.json,Cargo.toml,parser_registry.rs,types.rs,mod.rscomplexity.rs(LangRules + HalsteadRules),cfg.rs(CfgRules),helpers.rs(LangAstConfig)tests/parsers/{c,cpp,kotlin,swift,scala,bash}.test.tsTest plan
cargo buildcompiles with new Rust extractors