perf: native Rust build-glue queries (detect-changes, finalize, incremental) by carlos-alm · Pull Request #735 · optave/ops-codegraph-tool

carlos-alm · 2026-04-01T09:38:58Z

Summary

Add 7 batched NativeDatabase Rust methods that replace ~30 individual JS↔SQLite round-trips with single napi calls on the native engine path
detect-changes: getFileHashData, checkPendingAnalysis, healFileMetadata, findReverseDependencies
finalize: getFinalizeCounts, runAdvisoryChecks
collect-files: getCollectFilesData
WASM fallback paths preserved unchanged — all methods are optional on the TS interface

Test plan

tsc --noEmit — TypeScript compiles cleanly
cargo check — Rust compiles (no code errors; linker needs MSVC on CI)
biome check — lint clean on all changed files
277 integration/graph tests pass (build, incremental, queries)
1049 parser/search/unit tests pass
CI: full native engine test suite
Manual: codegraph build --engine native + incremental rebuild

Closes #694

Add 7 batched NativeDatabase methods that replace ~30 individual JS↔SQLite round-trips with single napi calls on the native engine path: - getFileHashData: table check + all rows + max mtime (detect-changes) - checkPendingAnalysis: cfg_blocks + dataflow counts (detect-changes) - healFileMetadata: batch upsert file_hashes (detect-changes) - findReverseDependencies: reverse-dep query (detect-changes) - getFinalizeCounts: node + edge counts (finalize) - runAdvisoryChecks: orphaned/stale embeddings + unused exports (finalize) - getCollectFilesData: file_hashes count + paths (collect-files) WASM fallback paths are preserved unchanged. Closes #694

claude · 2026-04-01T09:39:15Z

Claude finished @carlos-alm's task —— View job

PR Review: Native Rust Build-Glue Queries

Review Progress

Phase 1: Root cause analysis and problem validation
Phase 2: Configuration change detection
Phase 3: Test integrity verification
Phase 4: Code quality and security review
Phase 5: Backlog compliance assessment
Final assessment and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses a legitimate performance bottleneck where individual JS↔SQLite round-trips during build operations create excessive overhead. The solution batches multiple related database operations into single native calls.

Why This Approach: The implementation is well-justified:

Replaces ~30 individual round-trips with 7 batched methods
Targets specific bottlenecks in detect-changes, finalize, and collect-files stages
Maintains WASM fallback paths (no breaking changes)
Uses optional methods on TypeScript interface (backward compatible)

Risk Assessment: Low risk - this is an additive optimization that preserves existing behavior while improving performance on the native engine path.

Phase 2: Configuration Changes ✅

No concerning configuration changes detected:

No biome/linting rule modifications
No test timeouts or coverage threshold changes
No TypeScript strict mode relaxations
All changes are implementation-focused

Phase 3: Test Integrity ✅

No red flags found:

No skipped tests or weakened assertions
No .only or conditional test logic introduced
Existing test files for builder stages remain intact
Test plan shows comprehensive validation (tsc, cargo check, biome check, 1326 total tests)

Phase 4: Code Quality & Security ✅

Rust Code Quality (native_db.rs:1032-1269):

Error Handling: Excellent - proper map_err chains, no unwraps/panics
SQL Security: All queries use parameterized statements (no injection risk)
Memory Safety: Proper rusqlite patterns, transaction handling
Performance: Efficient - single transactions, prepared statements with caching

TypeScript Integration (detect-changes.ts, finalize.ts, collect-files.ts):

Backward Compatibility: Optional methods with WASM fallbacks preserved
Error Handling: Graceful degradation when native methods unavailable
Code Structure: Clean separation between native and WASM paths
No Anti-patterns: No eslint-disables, proper error propagation

Specific Strengths:

getFileHashData(): Batches table check + all rows + max mtime calculation
checkPendingAnalysis(): Efficient table existence + count checks
findReverseDependencies(): Optimized with HashSet deduplication
runAdvisoryChecks(): Three separate warning queries in one call

Phase 5: Backlog Compliance ✅

Zero-dep: ✓ — No new runtime dependencies added. Uses existing rusqlite/napi-rs infrastructure.

Foundation-aligned: ✓ — Fully compliant with core principles:

P1 (always-current): Improves local-first performance
P4 (zero-cost core): Optimizes existing functionality without new costs
P5 (embeddable-first): Maintains backward compatibility
P8 (not an agent): Pure performance optimization, no behavioral changes

Problem-fit: 5/5 — Directly addresses the core problem. This optimization significantly reduces round-trip overhead that affects AI agents during graph builds, improving response times for code navigation queries.

Breaking: No — Purely additive. Optional methods with full WASM fallback preservation.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed performance optimization.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Excellent implementation of a legitimate performance optimization. Clean code, proper error handling, backward compatibility preserved, comprehensive test plan. Only missing the 5th star due to incomplete CI validation in test plan.

Minor Notes:

Consider running full native engine test suite in CI before merge
Manual testing of codegraph build --engine native + incremental rebuild recommended

greptile-apps · 2026-04-01T09:44:59Z

Greptile Summary

This PR introduces 7 batched NativeDatabase Rust methods that collapse ~30 individual JS↔SQLite round-trips into single napi calls across the detect-changes, finalize, and collect-files pipeline stages. The WASM fallback paths are preserved unchanged and all new methods are declared optional on the NativeDatabase interface, making native-only improvements non-breaking for older engine builds.

Key changes:

get_file_hash_data — replaces table-check + full-scan + MAX(mtime) with a single call; max_mtime computed in Rust and forwarded to tryJournalTier via precomputedMaxMtime
check_pending_analysis — replaces two IIFE try/catch COUNT blocks; uses -1 to distinguish missing tables from empty ones
heal_file_metadata — batch upsert inside a single transaction; per-row errors now propagate and roll back the transaction (addresses prior feedback)
find_reverse_dependencies — per-file edge query loop with in-Rust deduplication and sort_unstable() for deterministic ordering (addresses prior feedback)
get_finalize_counts — two COUNT queries collapsed into one call
run_advisory_checks — all three advisory queries (orphaned embeddings, stale embeddings, unused exports) collapsed into one call
get_collect_files_data — single-query implementation with count derived from files.len() rather than a separate COUNT(*) (addresses prior feedback)

Remaining minor issue:

get_collect_files_data uses filter_map(|r| r.ok()) to collect file rows, silently discarding any row-level error. This is inconsistent with the explicit error propagation used by every other new method in this batch (see inline comment). In practice, TEXT → String deserialization cannot fail, but the divergence is worth aligning.

Confidence Score: 5/5

Safe to merge — all three prior P1 concerns are resolved, and the one remaining finding is a minor style inconsistency with no practical runtime impact.

All previously flagged issues (partial heal commits, non-deterministic reverse-dep ordering, redundant COUNT in collect-files) are properly addressed in the final SHA. The single remaining comment is a P2 style observation about filter_map vs. explicit error propagation in get_collect_files_data — a TEXT-column deserialization failure is practically impossible in SQLite, so there is no correctness or data-integrity risk. The native/WASM semantic parity across all 7 new methods is solid.

crates/codegraph-core/src/native_db.rs — minor row-error handling inconsistency in get_collect_files_data (P2 only).

Important Files Changed

Filename	Overview
crates/codegraph-core/src/native_db.rs	Adds 7 batched napi methods for build-glue queries; all three previously flagged issues resolved; one minor row-error inconsistency in get_collect_files_data remains (P2).
src/domain/graph/builder/stages/detect-changes.ts	Rewires getChangedFiles, runPendingAnalysis, healMetadata, and findReverseDependencies to use native batch APIs when available; WASM fallback preserved cleanly.
src/domain/graph/builder/stages/finalize.ts	runAdvisoryChecks and finalize count queries batched into single native calls; native and WASM paths are semantically equivalent including null/error handling.
src/domain/graph/builder/stages/collect-files.ts	tryFastCollect patched to use getCollectFilesData on native engine; count is now derived from files.len() in Rust so the two values are always consistent.
src/types.ts	Adds 7 optional method signatures to NativeDatabase; all marked optional preserving backward compatibility with older native engine builds.

Sequence Diagram

sequenceDiagram
    participant TS as TypeScript Stage
    participant ND as NativeDatabase (Rust/napi)
    participant WD as BetterSQLite3 (WASM)
    participant DB as SQLite File

    Note over TS,DB: detect-changes — getChangedFiles
    alt native engine (getFileHashData present)
        TS->>ND: getFileHashData()
        ND->>DB: SELECT file, hash, mtime, size FROM file_hashes
        DB-->>ND: all rows + max_mtime computed in Rust
        ND-->>TS: FileHashData { exists, rows, maxMtime }
    else WASM / fallback
        TS->>WD: SELECT 1 FROM file_hashes LIMIT 1
        WD->>DB: (table check)
        TS->>WD: SELECT file, hash, mtime, size FROM file_hashes
        WD->>DB: (all rows)
        TS->>WD: SELECT MAX(mtime) FROM file_hashes
        WD->>DB: (max mtime)
    end

    Note over TS,DB: finalize — counts + advisory checks
    alt native (getFinalizeCounts + runAdvisoryChecks present)
        TS->>ND: getFinalizeCounts()
        ND->>DB: COUNT nodes + COUNT edges
        ND-->>TS: FinalizeCounts
        TS->>ND: runAdvisoryChecks(hasEmbeddings)
        ND->>DB: orphaned embeddings + embed_built_at + unused exports
        ND-->>TS: AdvisoryCheckResult
    else WASM
        TS->>WD: SELECT COUNT(*) FROM nodes
        TS->>WD: SELECT COUNT(*) FROM edges
        TS->>WD: orphaned / stale / unused queries (3 separate calls)
    end

_{Reviews (2): Last reviewed commit: "fix: address Greptile review feedback fo..." | Re-trigger Greptile}

greptile-apps · 2026-04-01T09:45:03Z

crates/codegraph-core/src/native_db.rs

+                files: vec![],
+            });
+        }
+        let count = conn
+            .query_row("SELECT COUNT(*) FROM file_hashes", [], |r| r.get::<_, i64>(0))
+            .unwrap_or(0);
+        if count == 0 {
+            return Ok(CollectFilesData {
+                count: 0,
+                files: vec![],
+            });
+        }
+        let mut stmt = conn
+            .prepare_cached("SELECT file FROM file_hashes")
+            .map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?;
+        let rows = stmt
+            .query_map([], |row| row.get::<_, String>(0))
+            .map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))?;
+        let files: Vec<String> = rows.filter_map(|r| r.ok()).collect();
+        Ok(CollectFilesData {
+            count,
+            files,
+        })


get_collect_files_data — two unguarded queries can return inconsistent count and files

The COUNT(*) query and the subsequent SELECT file FROM file_hashes execute as separate statements without a wrapping transaction. If any write occurs between them (unlikely under normal SQLite serialization, but possible in WAL mode with concurrent readers), the returned count could be stale relative to files. In collect-files.ts, data.count is used only as an early-exit guard (if (dbFileCount === 0) return null), so a divergence doesn't cause incorrect results — but it's architecturally cleaner to either wrap both in a BEGIN/COMMIT, or simply remove the separate COUNT(*) query entirely and derive the count from files.len():

// Option: single query, derive count from result let mut stmt = conn .prepare_cached("SELECT file FROM file_hashes") .map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?; let files: Vec<String> = stmt .query_map([], |row| row.get::<_, String>(0)) .map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))? .filter_map(|r| r.ok()) .collect(); let count = files.len() as i64; Ok(CollectFilesData { count, files })

This eliminates the redundant COUNT(*) round-trip, makes count always consistent with files, and preserves the early-exit behaviour on the TS side unchanged.

Fixed in 685d40e. Removed the separate COUNT(*) query entirely — now derives count from files.len() as i64 after the single SELECT file FROM file_hashes query, eliminating the redundant round-trip and ensuring count is always consistent with the returned file list.

greptile-apps · 2026-04-01T09:45:05Z

crates/codegraph-core/src/native_db.rs

+            for entry in &entries {
+                if stmt
+                    .execute(params![entry.file, entry.hash, entry.mtime, entry.size])
+                    .is_ok()
+                {
+                    count += 1;
+                }
+            }


heal_file_metadata — per-entry failures silently committed as partial results

Individual stmt.execute(...) errors are swallowed with .is_ok(), but tx.commit() is called unconditionally afterward. This means a partial set of rows can be committed even when some entries failed — silently and with a return value that no longer reflects what actually landed. The TypeScript WASM fallback wraps the same loop in a db.transaction() that rolls back entirely on any run() error.

While the healMetadata caller in detect-changes.ts already swallows the whole operation in try { … } catch { /* ignore heal errors */ }, propagating per-entry failures instead of silently continuing keeps the native path semantically aligned with the fallback:

for entry in &entries { stmt.execute(params![entry.file, entry.hash, entry.mtime, entry.size]) .map_err(|e| napi::Error::from_reason(format!("heal row failed: {e}")))?; count += 1; }

A propagated error bubbles up to the TS catch block anyway, so the user-visible behaviour is identical — but the commit then reflects only fully successful batches, matching the WASM rollback semantics.

Fixed in 685d40e. Per-row errors now propagate via \ instead of being silently swallowed with . A failed row will cause the transaction to roll back (since \ is dropped without ), matching the WASM fallback's rollback-on-error semantics. The TS caller's \ still handles the propagated error gracefully.

greptile-apps · 2026-04-01T09:45:06Z

crates/codegraph-core/src/native_db.rs

+            .map_err(|e| napi::Error::from_reason(format!("reverseDeps prepare failed: {e}")))?;
+
+        for file in &changed_files {
+            let rows = stmt
+                .query_map(params![file], |row| row.get::<_, String>(0))
+                .map_err(|e| {
+                    napi::Error::from_reason(format!("reverseDeps query failed: {e}"))
+                })?;
+            for row in rows {
+                if let Ok(dep_file) = row {
+                    if !changed_set.contains(dep_file.as_str()) {
+                        result_set.insert(dep_file);
+                    }
+                }
+            }
+        }
+        Ok(result_set.into_iter().collect())
+    }
+
+    /// Get node and edge counts in a single napi call.
+    #[napi]
+    pub fn get_finalize_counts(&self) -> napi::Result<FinalizeCounts> {
+        let conn = self.conn()?;
+        let node_count = conn
+            .query_row("SELECT COUNT(*) FROM nodes", [], |r| r.get::<_, i64>(0))
+            .unwrap_or(0);
+        let edge_count = conn
+            .query_row("SELECT COUNT(*) FROM edges", [], |r| r.get::<_, i64>(0))
+            .unwrap_or(0);
+        Ok(FinalizeCounts {
+            node_count,
+            edge_count,
+        })


find_reverse_dependencies — non-deterministic result ordering via HashSet

result_set.into_iter().collect() iterates a std::collections::HashSet, which provides no ordering guarantees and can produce different orderings across invocations (or Rust versions). The caller in detect-changes.ts builds a Set<string> from the result (so deduplication is unaffected), but the non-deterministic order propagates into ctx.parseChanges.push(...) entries, making the parse order vary between runs.

Consider using a BTreeSet or sorting before returning if deterministic builds matter:

let mut result_vec: Vec<String> = result_set.into_iter().collect(); result_vec.sort_unstable(); Ok(result_vec)

Fixed in 685d40e. Added sort_unstable() before returning the result vector, ensuring deterministic ordering across runs regardless of HashSet iteration order.

…735) - Remove redundant COUNT(*) query in get_collect_files_data; derive count from files.len() to guarantee consistency - Propagate per-row errors in heal_file_metadata instead of silently swallowing with .is_ok(), matching WASM rollback semantics - Sort find_reverse_dependencies output for deterministic ordering

carlos-alm · 2026-04-01T09:58:55Z

@greptileai

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

Merge branch 'main' into perf/native-build-glue-queries

b89c30e

carlos-alm merged commit f1540c2 into main Apr 1, 2026
18 checks passed

carlos-alm deleted the perf/native-build-glue-queries branch April 1, 2026 10:28

github-actions bot locked and limited conversation to collaborators Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735

perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735
carlos-alm merged 3 commits intomainfrom
perf/native-build-glue-queries

carlos-alm commented Apr 1, 2026

Uh oh!

claude bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

carlos-alm Apr 1, 2026

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

carlos-alm Apr 1, 2026

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

carlos-alm Apr 1, 2026

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Apr 1, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Native Rust Build-Glue Queries

Review Progress

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Changes ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality & Security ✅

Phase 5: Backlog Compliance ✅

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 1, 2026 •

edited

Loading

greptile-apps bot commented Apr 1, 2026 •

edited

Loading