Skip to content

fix(parity): restore call AST node extraction in WASM engine#705

Merged
carlos-alm merged 17 commits intomainfrom
refactor/parser-abstraction-layer
Mar 30, 2026
Merged

fix(parity): restore call AST node extraction in WASM engine#705
carlos-alm merged 17 commits intomainfrom
refactor/parser-abstraction-layer

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Fixes #697. PR #686 incorrectly removed call-AST extraction from the WASM engine and filtered call nodes from native output — documenting a parity gap as expected behavior rather than fixing it. This restores full engine parity for ast_nodes.

  • Restore call_expression: 'call' in astTypes map (rules/javascript.ts)
  • Restore call extraction in ast-store-visitor.ts with extractCallName, extractCallReceiver, argument-only recursion (walkCallArguments), and node dedup tracking (matched set)
  • Remove three call-kind filters that stripped native call nodes before DB insertion (engine.ts, ast.ts bulk path, ast.ts JS fallback path)
  • Fix stale engineOpts.nativeDb reference — pipeline closes the NativeDatabase but left a dangling reference in engineOpts, causing bulkInsertAstNodes to throw on a closed connection
  • Update test assertionast-nodes.test.ts now asserts calls ARE stored (was incorrectly asserting 0 calls)

Test plan

  • build-parity.test.ts — all 4 tests pass (nodes, edges, roles, ast_nodes)
  • ast-nodes.test.ts — all 19 tests pass (WASM + native)
  • Full test suite — 2131 tests pass, 0 failures
  • Lint clean (biome)

Extract shared patterns from 9 language extractors into 4 reusable
helpers in helpers.ts, reducing per-language boilerplate by ~30 lines:

- findParentNode: replaces 6 findParent*/findCurrentImpl functions
- extractBodyMembers: replaces 5 body-iteration patterns for enums/structs
- stripQuotes: replaces inline .replace(/"/g,'') across 3 extractors
- lastPathSegment: replaces inline .split('.').pop() across 6 extractors

Net: +77 helper lines, -159 extractor lines = -82 lines total.
Add clarifying comment to empty self_parameter if-block in rust.ts
and document anchored-vs-global semantics in stripQuotes JSDoc.
PR #686 incorrectly removed call_expression extraction from the WASM
ast-store-visitor and added filters to strip call nodes from native
output — documenting a parity gap as expected behavior rather than
fixing it. This restores full parity:

- Restore call_expression in JS astTypes map
- Restore call extraction logic in ast-store-visitor with receiver
  extraction, argument-only recursion, and dedup tracking
- Remove three call-kind filters (engine.ts, ast.ts bulk path, ast.ts
  JS fallback path) that stripped native call nodes
- Clear stale engineOpts.nativeDb reference after pipeline closes the
  NativeDatabase to prevent stages from using a closed connection
- Update ast-nodes test to assert calls ARE stored
@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @carlos-alm's task —— View job


Comprehensive PR Review Complete ✅

Reviewing PR #705: "fix(parity): restore call AST node extraction in WASM engine"

Root Cause Analysis

Problem Being Solved: PR #686 incorrectly removed call AST node extraction from WASM engine while filtering native call nodes, creating an engine parity gap. The previous PR documented this gap as "expected behavior" rather than fixing the underlying issue.

Why This Approach: This PR restores true engine parity by ensuring both native and WASM engines extract identical call AST nodes. The approach mirrors native engine strategy exactly (argument-only recursion, receiver extraction, deduplication).

Risk Assessment: Low risk - this is a genuine bug fix restoring intended behavior, not a workaround.

Backlog Compliance

  • Zero-dep: ✅ — No new runtime dependencies, builds on existing tree-sitter/SQLite infrastructure
  • Foundation-aligned: ✅ — Fully compliant with all principles:
    • P1 (always-current): Improves accuracy of local analysis
    • P4 (zero-cost core): Pure local computation, no cloud APIs
    • P5 (embeddable-first): Maintains programmatic API contract
    • P8 (we are not an agent): Pure analysis tool, no code generation
  • Problem-fit: 5/5 — Directly addresses the core problem: "AI coding assistants waste tokens re-orienting in large codebases, hallucinate dependencies, and miss blast radius." Call AST nodes are essential for accurate blast radius analysis and preventing hallucinated dependencies.
  • Breaking: No — Purely additive restoration of previously working behavior
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a clean, well-implemented fix that:

  • Restores engine parity (required by build-parity.test.ts hard gate)
  • Fixes a real crash bug (stale nativeDb reference)
  • Strengthens test assertions (from accepting 0 calls to requiring ≥1)
  • Uses defensive coding practices throughout

Particularly Strong Points:

  • Exact native engine mirroring: WASM call extraction now matches native strategy precisely
  • Test strengthening: Changed from documenting broken behavior to asserting correct behavior
  • Comprehensive fix: Addresses both WASM extraction AND native filter removal
  • Crash prevention: Fixes the stale database reference that caused runtime errors

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5) — Exceptional quality
  • Action: APPROVE
  • Reasoning: This is a textbook example of fixing engine parity correctly. Rather than documenting the gap as acceptable, it restores the intended behavior. The implementation is robust, well-tested, and follows all established patterns. The stale database reference fix prevents real crashes. All changes strengthen the codebase without introducing risk.

This PR should be merged immediately.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR restores full WASM/native engine parity for ast_nodes by undoing the incorrect changes from PR #686, which had stripped call AST node extraction from the WASM visitor and added filters to silently drop kind: 'call' nodes from native output. It also fixes a stale engineOpts.nativeDb reference that caused bulkInsertAstNodes to throw on a closed connection.

Key changes:

  • rules/javascript.ts: Restores call_expression: 'call' in the astTypes map
  • ast-store-visitor.ts: Re-adds full call extraction (extractCallName, extractCallReceiver, walkCallArguments) with argument-only recursion to mirror the native engine's strategy of preventing double-counting chained calls, plus a matched dedup set with the tree-sitter node.id reuse invariant correctly documented
  • engine.ts / ast.ts: Removes three call-kind filters that discarded call nodes before DB insertion
  • pipeline.ts: Fixes the dangling engineOpts.nativeDb reference by clearing it alongside ctx.nativeDb when the NativeDatabase is closed before pipeline stages run
  • Stage import style (build-edges.ts, build-structure.ts, collect-files.ts): Normalises #-alias imports to relative paths, matching the convention already used by the other 6 stage files
  • Tests updated to assert 3 kind:call nodes (was incorrectly asserting 0), with parity diagnostics added to build-parity.test.ts

Confidence Score: 5/5

Safe to merge — the parity fix is well-tested (2131 passing tests including 4 parity tests), and the only remaining observation is a minor P2 inconsistency in how a null receiver is normalised between the bulk and JS fallback paths.

All P0/P1 concerns from prior review rounds are addressed. The single remaining finding is P2: null ?? '' in the bulk path stores an empty string for calls with no receiver, while the JS fallback stores null. Both values semantically mean no receiver and the parity test passes, so this does not block merging.

src/features/ast.ts — the receiver: n.receiver ?? '' normalisation differs from the JS fallback path's n.receiver || null

Important Files Changed

Filename Overview
src/ast-analysis/visitors/ast-store-visitor.ts Core change: restores call extraction with extractCallName, extractCallReceiver, walkCallArguments, and a matched dedup set. The enterNode guard (no skipChildren on matched nodes) is correctly documented given the tree-sitter node.id reuse caveat. Logic for chaining (arguments-only recursion) mirrors native engine.
src/features/ast.ts Removes two call-kind filters (bulk path + JS fallback path) and changes receiver: n.receiver to n.receiver ?? '' — the ?? introduces a minor inconsistency with the JS fallback path which normalises to null instead of empty string.
src/domain/graph/builder/pipeline.ts Fixes stale engineOpts.nativeDb reference: clears it alongside ctx.nativeDb when the NativeDatabase is closed before pipeline stages, preventing bulkInsertAstNodes from being called on a closed connection.
src/ast-analysis/engine.ts Removes the incorrect pre-filter that stripped kind === 'call' nodes from native output before DB insertion. Straightforward deletion.
src/ast-analysis/rules/javascript.ts Restores call_expression: 'call' in the astTypes map so the WASM visitor recognises call nodes again.
tests/parsers/ast-nodes.test.ts Updates the call-kind test from asserting 0 to asserting 3 calls (eval, result.set, console.log), with a note that await fetch() is captured under 'await' not 'call'.
tests/integration/build-parity.test.ts Adds diagnostic console.error logging for mismatched AST node counts before the equality assertion, helping debug CI-only parity failures.
src/domain/graph/builder/stages/build-edges.ts Converts #-alias imports to relative paths, matching the convention used by the other 6 stage files.
src/domain/graph/builder/stages/build-structure.ts Same alias-to-relative-path normalisation as build-edges.ts.
src/domain/graph/builder/stages/collect-files.ts Same alias-to-relative-path normalisation as build-edges.ts.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runAnalyses] --> B{Native bulk path\nnativeDb available?}
    B -- Yes --> C[walkWithVisitors WASM\nenterNode for all kinds incl. call]
    B -- No --> D[JS fallback\nwalkAst via createAstStoreVisitor]
    C --> E[symbols.astNodes set\nwith call nodes included]
    D --> E2[rows collected\nwith call nodes included]
    E --> F{buildAstNodes\nnativeDb bulk path?}
    E2 --> G[JS INSERT\nreceiver → null if falsy]
    F -- Yes --> H[bulkInsertAstNodes\nreceiver ?? null\nall kinds incl. call]
    F -- No --> G
    H --> I[(ast_nodes table)]
    G --> I
    subgraph pipeline.ts fix
        J[ctx.nativeDb.close] --> K[ctx.nativeDb = undefined]
        K --> L[engineOpts.nativeDb = undefined\nPREVENTS stale ref throw]
        L --> M[Reopen nativeDb\nbefore runAnalyses]
        M --> N[engineOpts.nativeDb = new ref]
    end
Loading

Reviews (3): Last reviewed commit: "fix: remove parity-diag console.error lo..." | Re-trigger Greptile

Comment on lines 213 to +214
enterNode(node: TreeSitterNode, _context: VisitorContext): EnterNodeResult | undefined {
if (matched.has(node.id)) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 enterNode returns undefined for matched nodes — walker still descends children

When matched.has(node.id) is true, enterNode returns undefined without setting skipChildren. The walkWithVisitors walker only halts descent when result?.skipChildren is truthy; an undefined return means the walker will recurse into the node's children via the normal DFS path.

In the current logic this is not a bug: every node that can be in matched before enterNode is called on it sits inside a call_expression argument subtree, whose parent call already returned { skipChildren: true }, so the walker never actually reaches those pre-collected nodes. However the safety assumption is subtle and not documented, and a future change to walkCallArguments (e.g. walking the function field as well) could break it silently.

Consider making the guard explicit:

Suggested change
enterNode(node: TreeSitterNode, _context: VisitorContext): EnterNodeResult | undefined {
if (matched.has(node.id)) return;
enterNode(node: TreeSitterNode, _context: VisitorContext): EnterNodeResult | undefined {
if (matched.has(node.id)) return { skipChildren: true };

This makes the invariant hold unconditionally without relying on the parent's skipChildren.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — \ now returns \ for already-matched nodes instead of . This makes the invariant hold unconditionally without relying on the parent's . See commit 173ab12.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: reverted the skipChildren guard. Tree-sitter \ is a memory address that can be reused across different nodes in the same parse tree. Adding \ on an ID collision suppresses an unrelated subtree, which broke WASM ast_node extraction in the build-parity test on CI (native v3.5.0 produced 19 ast_nodes, WASM produced 0). The original \ is the correct behavior: it prevents re-collection without suppressing children, and the parent call's \ handles the intended dedup case. Added a comment documenting this invariant. See commit 0f4259e.

Comment on lines +9 to +11
import { getNodeId } from '../../../../db/index.js';
import { debug } from '../../../../infrastructure/logger.js';
import { loadNative } from '../../../../infrastructure/native.js';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Undocumented conversion from #-alias imports to relative paths

build-edges.ts, build-structure.ts, and collect-files.ts all have their #infrastructure/…, #db/…, #shared/…, and #types imports converted to ../../../../… relative paths. This change isn't mentioned in the PR description and isn't consistent with how the rest of the codebase uses these path aliases (e.g. pipeline.ts and other stage files still use the #-aliases).

If this was done to fix a broken alias in a specific build or test context, a brief comment or follow-up ticket would help. If it was accidental, it should be reverted so import style stays uniform.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — the #-alias to relative-path conversion is actually consistent with the rest of the stage files. On main, finalize.ts, detect-changes.ts, resolve-imports.ts, insert-nodes.ts, parse-files.ts, and run-analyses.ts already use relative paths. Only build-edges.ts, build-structure.ts, and collect-files.ts still had the #-aliases. This PR normalizes the remaining 3 files to match the convention used by the other 6 stage files. No revert needed.

#705)

When enterNode encountered an already-matched node it returned undefined,
relying on the parent call's skipChildren to prevent redundant descent.
This was correct but fragile — a future change to walkCallArguments could
silently break the invariant. Now returns { skipChildren: true } explicitly.
Replace loose toBeGreaterThanOrEqual(1) with exact toBe(3) and verify
each expected call name (eval, result.set, console.log). The fixture
produces exactly 3 call nodes — fetch is captured as kind:await instead.
Pinning the count catches future parity regressions immediately.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile review feedback:

  • enterNode guard (P2): Fixed — now returns { skipChildren: true } for matched nodes unconditionally (commit 173ab12)
  • Loose test bound (P2): Fixed — pinned exact call count to toBe(3) and added individual toContain checks for each expected call name: eval, result.set, console.log (commit 56ef0ca)
  • Import alias conversion (P2): No change needed — the conversion is consistent with the other 6 stage files that already use relative paths on main. This normalizes the last 3 files.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

… skips (#705)

Reverts the { skipChildren: true } guard for matched nodes. Tree-sitter
node.id is a memory address that can be reused across different nodes in
the same tree. Returning skipChildren on an ID collision suppresses an
unrelated subtree, breaking WASM ast_node extraction in the build-parity
test. The original undefined return is correct: it prevents re-collection
without suppressing children, and the parent call's skipChildren handles
the intended dedup case.
When native and WASM ast_node counts diverge, log both counts and the
distinct kinds present in each. This will reveal whether the CI-only
failure is caused by native extracting calls that WASM misses, or a
deeper issue with the WASM walker path.
The prebuilt native binary (v3.5.0) expects `receiver` as `String`
(not `Option<String>`), so passing JS `null` from WASM-extracted
ast_nodes crashes the bulk insert with:
  Failed to convert JavaScript value `Null` into rust type `String`

Coerce `null`/`undefined` receiver to `""` before passing to the
native `bulkInsertAstNodes` path. This fixes the build-parity test
(native 19 vs WASM 0 ast_nodes) and the typed-method-call test.
Remove temporary diagnostic logging added to trace the WASM ast_nodes
CI failure. The root cause (null receiver in NAPI bulk insert) is now
fixed, so these stderr diagnostics are no longer needed.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Merged main into the branch and fixed the CI failure caused by the merge.

Root cause: PR #696 (merged to main) reopens nativeDb before runAnalyses, so the WASM engine's ast_nodes now flow through the native bulkInsertAstNodes Rust path. The prebuilt binary (v3.5.0) has receiver: String (not Option<String>), so passing JS null for non-call nodes crashes with: Failed to convert JavaScript value 'Null' into rust type 'String' on AstInsertNode.receiver.

Fixes applied:

  1. fix: coerce null receiver to empty string for NAPI compatreceiver: n.receiver ?? '' in the native bulk-insert batch mapping
  2. fix: remove parity-diag console.error logging — cleaned up the 11 temporary console.error diagnostics added to trace this failure, plus fixed the resulting unused-variable lint warning

All 2131 tests pass locally, lint clean (1 pre-existing warning).

@greptileai

@carlos-alm carlos-alm merged commit 8140b89 into main Mar 30, 2026
13 of 15 checks passed
@carlos-alm carlos-alm deleted the refactor/parser-abstraction-layer branch March 30, 2026 13:37
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(parity): native engine produces ast_nodes, WASM engine does not

1 participant