Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ graph TD

1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.

**Supported Languages (Tree-sitter semantic parsing)**: TypeScript, JavaScript, Python, Rust, Go, Java, C#, Ruby, PHP, Apex, Bash, C, C++, JSON, TOML, YAML
**Supported Languages (Tree-sitter semantic parsing)**: TypeScript, JavaScript, Python, Rust, Go, Java, C#, Ruby, PHP, Apex, Bash, C, C++, JSON, TOML, YAML, Zig

**Additional Supported Formats (line-based chunking)**: TXT, HTML, HTM, Markdown, Shell scripts

Expand All @@ -223,6 +223,7 @@ graph TD
**/*.{sql,graphql,proto} **/*.{yaml,yml,toml}
**/*.{md,mdx} **/*.{sh,bash,zsh}
**/*.{txt,html,htm} **/*.{cls,trigger}
**/*.zig
```

Use `include` to replace defaults, or `additionalInclude` to extend (e.g. `"**/*.pdf"`, `"**/*.csv"`).
Expand Down Expand Up @@ -314,7 +315,7 @@ The plugin exposes these tools to the OpenCode agent:
```
[1] function "validatePayment" at src/billing.ts:45-67 (score: 0.92)
[2] class "PaymentProcessor" at src/processor.ts:12-89 (score: 0.87)

Use Read tool to examine specific files.
```
- **Workflow**: `codebase_peek` → find locations → `Read` specific files
Expand Down Expand Up @@ -351,7 +352,9 @@ Returns recent debug logs with optional filtering.
- **Parameters**: `category` (optional: `search`, `embedding`, `cache`, `gc`, `branch`), `level` (optional: `error`, `warn`, `info`, `debug`), `limit` (default: 50).

### `call_graph`
Query the call graph to find callers or callees of a function/method. Automatically built during indexing for TypeScript, JavaScript, Python, Go, and Rust.

Query the call graph to find callers or callees of a function/method. Automatically built during indexing for TypeScript, JavaScript, Python, Go, Rust, PHP, and Zig.

- **Use for**: Understanding code flow, tracing dependencies, impact analysis.
- **Parameters**: `name` (function name), `direction` (`callers` or `callees`), `symbolId` (required for `callees`, returned by previous queries).
- **Example**: Find who calls `validateToken` → `call_graph(name="validateToken", direction="callers")`
Expand Down Expand Up @@ -937,7 +940,7 @@ Be aware of these characteristics:
]
}
```

This loads directly from your source directory, so changes take effect after rebuilding.

## 🤝 Contributing
Expand Down Expand Up @@ -1000,7 +1003,7 @@ The Rust native module handles performance-critical operations:
- **usearch**: High-performance vector similarity search with F16 quantization
- **SQLite**: Persistent storage for embeddings, chunks, branch catalog, symbols, and call edges
- **BM25 inverted index**: Fast keyword search for hybrid retrieval
- **Call graph extraction**: Tree-sitter query-based extraction of function calls, method calls, constructors, and imports (TypeScript/JavaScript, Python, Go, Rust)
- **Call graph extraction**: Tree-sitter query-based extraction of function calls, method calls, constructors, and imports (TypeScript/JavaScript, Python, Go, Rust, PHP, Zig)
- **xxhash**: Fast content hashing for change detection

Rebuild with: `npm run build:native` (requires Rust toolchain)
Expand Down
11 changes: 11 additions & 0 deletions native/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions native/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ tree-sitter-toml-ng = "0.7"
tree-sitter-yaml = "0.7"
tree-sitter-php = "0.23"
tree-sitter-sfapex = "3.0"
tree-sitter-zig = "1"
tree-sitter-language = "0.1"
rusqlite = { version = "0.31", features = ["bundled"] }
xxhash-rust = { version = "0.8", features = ["xxh3"] }
Expand Down
17 changes: 17 additions & 0 deletions native/queries/zig-calls.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
; Direct function calls: foo(), bar(1, 2)
(call_expression
function: (identifier) @callee.name) @call

; Method/field calls: std.debug.print(...) → MethodCall
(call_expression
function: (field_expression
member: (identifier) @callee.name)) @call @method.call

; Builtin calls: @This(), @sizeOf(), @import("std")
(builtin_function
(builtin_identifier) @callee.name) @call
Comment thread
Helweg marked this conversation as resolved.

; @import builtins: capture module path as import edge
(builtin_function
(arguments
(string) @import.name)) @import
2 changes: 2 additions & 0 deletions native/src/call_extractor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ pub fn extract_calls(content: &str, language_name: &str) -> Result<Vec<CallSite>
Language::Rust => tree_sitter_rust::LANGUAGE.into(),
Language::Go => tree_sitter_go::LANGUAGE.into(),
Language::Php => tree_sitter_php::LANGUAGE_PHP.into(),
Language::Zig => tree_sitter_zig::LANGUAGE.into(),
Language::Apex => tree_sitter_sfapex::apex::LANGUAGE.into(),
_ => return Ok(vec![]),
};
Expand All @@ -54,6 +55,7 @@ pub fn extract_calls(content: &str, language_name: &str) -> Result<Vec<CallSite>
Language::Rust => include_str!("../queries/rust-calls.scm"),
Language::Go => include_str!("../queries/go-calls.scm"),
Language::Php => include_str!("../queries/php-calls.scm"),
Language::Zig => include_str!("../queries/zig-calls.scm"),
Language::Apex => include_str!("../queries/apex-calls.scm"),
_ => return Ok(vec![]),
};
Expand Down
20 changes: 17 additions & 3 deletions native/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ pub fn parse_file_internal(file_path: &str, content: &str) -> Result<Vec<CodeChu
Language::Toml => tree_sitter_toml_ng::LANGUAGE.into(),
Language::Yaml => tree_sitter_yaml::LANGUAGE.into(),
Language::Php => tree_sitter_php::LANGUAGE_PHP.into(),
Language::Zig => tree_sitter_zig::LANGUAGE.into(),
Language::Apex => tree_sitter_sfapex::apex::LANGUAGE.into(),
_ => return Ok(chunk_by_lines(content, &language)),
};
Expand Down Expand Up @@ -256,6 +257,7 @@ fn is_comment_node(node_type: &str, language: &Language) -> bool {
Language::Toml => matches!(node_type, "comment"),
Language::Yaml => matches!(node_type, "comment"),
Language::Php => matches!(node_type, "comment"),
Language::Zig => matches!(node_type, "comment"),
Language::Apex => matches!(node_type, "line_comment" | "block_comment"),
_ => false,
}
Expand Down Expand Up @@ -447,6 +449,17 @@ lazy_static! {
set.insert("enum_declaration");
set
};
static ref ZIG_SEMANTIC_NODES: HashSet<&'static str> = {
let mut set = HashSet::new();
set.insert("function_declaration");
set.insert("test_declaration");
set.insert("struct_declaration");
set.insert("enum_declaration");
set.insert("union_declaration");
set.insert("opaque_declaration");
set.insert("error_set_declaration");
set
};
// Apex grammar (tree-sitter-sfapex) is Java-derived: the declaration node
// kinds match Java exactly, plus `trigger_declaration` which is unique to
// Apex (Salesforce database triggers). Verified against tree-sitter-sfapex
Expand Down Expand Up @@ -488,6 +501,7 @@ fn is_semantic_node(node_type: &str, language: &Language) -> bool {
Language::Toml => TOML_SEMANTIC_NODES.contains(node_type),
Language::Yaml => YAML_SEMANTIC_NODES.contains(node_type),
Language::Php => PHP_SEMANTIC_NODES.contains(node_type),
Language::Zig => ZIG_SEMANTIC_NODES.contains(node_type),
Language::Apex => APEX_SEMANTIC_NODES.contains(node_type),
_ => false,
};
Expand Down Expand Up @@ -732,11 +746,11 @@ function greet(name: string): string {

class Greeter {
private name: string;

constructor(name: string) {
this.name = name;
}

greet(): string {
return `Hello, ${this.name}!`;
}
Expand All @@ -756,7 +770,7 @@ def greet(name: str) -> str:
class Greeter:
def __init__(self, name: str):
self.name = name

def greet(self) -> str:
return f"Hello, {self.name}!"
"#;
Expand Down
4 changes: 4 additions & 0 deletions native/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ pub enum Language {
Html,
Php,
Apex,
Zig,
Text,
}

Expand All @@ -59,6 +60,7 @@ impl Language {
"html" | "htm" => Language::Html,
"txt" => Language::Text,
"php" | "inc" => Language::Php,
"zig" => Language::Zig,
"cls" | "trigger" => Language::Apex,
_ => Language::Text,
}
Expand All @@ -85,6 +87,7 @@ impl Language {
Language::Markdown => "markdown",
Language::Html => "html",
Language::Php => "php",
Language::Zig => "zig",
Language::Apex => "apex",
Language::Text => "text",
}
Expand Down Expand Up @@ -112,6 +115,7 @@ impl Language {
"html" | "htm" => Language::Html,
"text" | "txt" => Language::Text,
"php" => Language::Php,
"zig" => Language::Zig,
"apex" => Language::Apex,
_ => Language::Text,
}
Expand Down
1 change: 1 addition & 0 deletions src/config/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export const DEFAULT_INCLUDE = [
"**/*.{md,mdx}",
"**/*.{sh,bash,zsh}",
"**/*.{txt,html,htm}",
"**/*.zig",
];

export const DEFAULT_EXCLUDE = [
Expand Down
46 changes: 40 additions & 6 deletions src/indexer/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ import type { SymbolData, CallEdgeData } from "../native/index.js";
import { getBranchOrDefault, getBaseBranch, isGitRepo } from "../git/index.js";
import { resolveProjectIndexPath } from "../config/paths.js";

export const CALL_GRAPH_LANGUAGES = new Set(["typescript", "tsx", "javascript", "jsx", "python", "go", "rust", "php", "apex"]);
export const CALL_GRAPH_LANGUAGES = new Set(["typescript", "tsx", "javascript", "jsx", "python", "go", "rust", "php", "apex", "zig"]);
// Languages whose identifiers are case-insensitive at the language level.
// The Rust call_extractor lowercases callee names for these languages (except
// constructors and imports), so same-file resolution in this file must use
Expand Down Expand Up @@ -66,6 +66,9 @@ export const CALL_GRAPH_SYMBOL_CHUNK_TYPES = new Set([
"mod_item",
"trait_declaration",
"trigger_declaration",
"test_declaration",
"struct_declaration",
"union_declaration",
]);

function float32ArrayToBuffer(arr: number[]): Buffer {
Expand Down Expand Up @@ -276,6 +279,7 @@ interface IndexCompatibility {
const INDEX_METADATA_VERSION = "1";
const EMBEDDING_STRATEGY_VERSION = "2";
const RANKING_TOKEN_CACHE_LIMIT = 4096;
const RANK_HYBRID_CACHE_LIMIT = 256;

function createPendingChunkStorageText(texts: PendingChunk["texts"]): string {
const primaryText = texts[0]?.text ?? "";
Expand Down Expand Up @@ -492,6 +496,7 @@ const rankingQueryTokenCache = new Map<string, Set<string>>();
const rankingNameTokenCache = new Map<string, Set<string>>();
const rankingPathTokenCache = new Map<string, Set<string>>();
const rankingTextTokenCache = new Map<string, Set<string>>();
const rankHybridResultsCache = new WeakMap<RankedCandidate[], WeakMap<RankedCandidate[], Map<string, RankedCandidate[]>>>();

const STOPWORDS = new Set([
"the", "and", "for", "with", "from", "that", "this", "into", "using", "where",
Expand Down Expand Up @@ -1064,9 +1069,8 @@ export function rerankResults(
}

const queryTokenList = Array.from(queryTokens);
const intent = classifyQueryIntentRaw(query);
const docIntent = classifyDocIntent(queryTokenList);
const preferSourcePaths = options?.prioritizeSourcePaths ?? intent === "source";
const preferSourcePaths = options?.prioritizeSourcePaths ?? classifyQueryIntentRaw(query) === "source";
const identifierHints = extractIdentifierHints(query);

const head = candidates.slice(0, topN).map((candidate, idx) => {
Expand Down Expand Up @@ -1268,16 +1272,46 @@ export function rankHybridResults(
keywordResults: RankedCandidate[],
options: HybridRankOptions & { prioritizeSourcePaths?: boolean }
): RankedCandidate[] {
const prioritizeSourcePaths = options.prioritizeSourcePaths ?? classifyQueryIntentRaw(query) === "source";
const cacheKey = `${query}\u0001${options.fusionStrategy}|${options.rrfK}|${options.hybridWeight}|${options.rerankTopN}|${options.limit}|${prioritizeSourcePaths ? 1 : 0}`;

let byKeyword = rankHybridResultsCache.get(semanticResults);
if (!byKeyword) {
byKeyword = new WeakMap<RankedCandidate[], Map<string, RankedCandidate[]>>();
rankHybridResultsCache.set(semanticResults, byKeyword);
}

let bucket = byKeyword.get(keywordResults);
if (!bucket) {
bucket = new Map<string, RankedCandidate[]>();
byKeyword.set(keywordResults, bucket);
} else {
const cached = bucket.get(cacheKey);
if (cached) {
return cached;
}
}

const overfetchLimit = Math.max(options.limit * 4, options.limit);
const fused = options.fusionStrategy === "rrf"
? fuseResultsRrf(semanticResults, keywordResults, options.rrfK, overfetchLimit)
: fuseResultsWeighted(semanticResults, keywordResults, options.hybridWeight, overfetchLimit);

const rerankPoolLimit = Math.max(overfetchLimit, options.rerankTopN * 3, options.limit * 6);
const rerankPool = fused.slice(0, rerankPoolLimit);
return rerankResults(query, rerankPool, options.rerankTopN, {
prioritizeSourcePaths: options.prioritizeSourcePaths ?? classifyQueryIntentRaw(query) === "source",
const ranked = rerankResults(query, rerankPool, options.rerankTopN, {
prioritizeSourcePaths,
});

if (bucket.size >= RANK_HYBRID_CACHE_LIMIT) {
const oldest = bucket.keys().next().value;
if (oldest !== undefined) {
bucket.delete(oldest);
}
}
bucket.set(cacheKey, ranked);

return ranked;
}

export function rankSemanticOnlyResults(
Expand Down Expand Up @@ -4495,7 +4529,7 @@ export class Indexer {
}
): Promise<SearchResult[]> {
const { store, provider, database } = await this.ensureInitialized();

const compatibility = this.checkCompatibility();
if (!compatibility.compatible) {
throw new Error(
Expand Down
45 changes: 45 additions & 0 deletions tests/call-graph.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,51 @@ describe("call-graph", () => {
});
});

describe("zig call extraction", () => {
it("should extract direct function calls", () => {
const content = `
const std = @import("std");

pub fn greet(name: []const u8) void {
std.debug.print("Hello, {s}\\n", .{name});
}

pub fn main() void {
greet("world");
}
`;
const calls = extractCalls(content, "zig");
const callNames = calls.map((c) => c.calleeName);
expect(callNames).toContain("greet");
});

it("should classify field-access calls as MethodCall", () => {
const content = `
const std = @import("std");

pub fn greet(name: []const u8) void {
std.debug.print("Hello, {s}\\n", .{name});
}
`;
const calls = extractCalls(content, "zig");
const printCall = calls.find((c) => c.calleeName === "print");
expect(printCall).toBeDefined();
expect(printCall!.callType).toBe("MethodCall");
});

it("should extract @import builtins as import edges", () => {
const content = `
const std = @import("std");
const math = @import("math.zig");
`;
const calls = extractCalls(content, "zig");
const importCalls = calls.filter((c) => c.callType === "Import");
expect(importCalls.length).toBeGreaterThanOrEqual(2);
expect(importCalls.some((c) => c.calleeName.includes("std"))).toBe(true);
expect(importCalls.some((c) => c.calleeName.includes("math.zig"))).toBe(true);
});
});

describe("call graph storage", () => {
it("should store symbols in database", () => {
const db = openDb();
Expand Down
Loading
Loading