Skip to content

Latest commit

 

History

History
106 lines (79 loc) · 4.56 KB

File metadata and controls

106 lines (79 loc) · 4.56 KB

Language Support

Supported Languages

Language Extensions Parser Symbol Types Decorators Docstrings Notes / Limitations
Python .py tree-sitter-python function, class, method, constant, type @decorator Triple-quoted strings Type aliases require Python 3.12+ syntax for full fidelity
JavaScript .js, .jsx tree-sitter-javascript function, class, method, constant // and /** */ comments Anonymous arrow functions without assigned names are not indexed
TypeScript .ts, .tsx tree-sitter-typescript function, class, method, constant, type @decorator // and /** */ comments Decorator extraction depends on Stage-3 decorator syntax
Go .go tree-sitter-go function, method, type, constant // comments No class hierarchy (language limitation)
Rust .rs tree-sitter-rust function, type (struct/enum/trait), impl, constant #[attr] /// and //! comments Macro-generated symbols are not visible to the parser
Java .java tree-sitter-java method, class, type (interface/enum), constant @Annotation /** */ Javadoc Deep inner-class nesting may be flattened
PHP .php tree-sitter-php function, class, method, type (interface/trait/enum), constant #[Attribute] /** */ PHPDoc PHP 8+ attributes supported; language-file <?php tag required
Dart .dart tree-sitter-dart function, class (class/mixin/extension), method, type (enum/typedef) @annotation /// doc comments Constructors and top-level constants are not indexed
C# .cs tree-sitter-csharp class (class/record), method (method/constructor), type (interface/enum/struct/delegate) [Attribute] /// <summary> XML doc comments Properties and const fields not indexed

Parser Engine

All language parsing is powered by tree-sitter via the tree-sitter-language-pack Python package, providing:

  • Incremental, error-tolerant parsing
  • Uniform AST representation across languages
  • Pre-compiled grammars for supported languages

Dependency: tree-sitter-language-pack>=0.7.0 (pinned in pyproject.toml)


Adding a New Language

  1. Define a LanguageSpec in src/jcodemunch_mcp/parser/languages.py:
NEW_LANG_SPEC = LanguageSpec(
    ts_language="new_language",
    symbol_node_types={
        "function_definition": "function",
        "class_definition": "class",
    },
    name_fields={
        "function_definition": "name",
        "class_definition": "name",
    },
    param_fields={
        "function_definition": "parameters",
    },
    return_type_fields={},
    docstring_strategy="preceding_comment",
    decorator_node_type=None,
    container_node_types=["class_definition"],
    constant_patterns=[],
    type_patterns=[],
)
  1. Register the language:
LANGUAGE_REGISTRY["new_language"] = NEW_LANG_SPEC
  1. Map file extensions:
LANGUAGE_EXTENSIONS[".ext"] = "new_language"
  1. Verify parser availability:
from tree_sitter_language_pack import get_parser
get_parser("new_language")  # Must not raise
  1. Add parser tests:
def test_parse_new_language():
    source = "..."
    symbols = parse_file(source, "test.ext", "new_language")
    assert len(symbols) >= 2

Inspecting AST Node Types

To inspect the node types produced by tree-sitter for a source file:

from tree_sitter_language_pack import get_parser

parser = get_parser("python")
tree = parser.parse(b"def foo(): pass")

def print_tree(node, indent=0):
    print(" " * indent + f"{node.type} [{node.start_point}-{node.end_point}]")
    for child in node.children:
        print_tree(child, indent + 2)

print_tree(tree.root_node)

This inspection process helps identify the correct symbol_node_types, name_fields, and extraction rules when adding support for a new language.