Skip to content

perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735

Merged
carlos-alm merged 3 commits intomainfrom
perf/native-build-glue-queries
Apr 1, 2026
Merged

perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735
carlos-alm merged 3 commits intomainfrom
perf/native-build-glue-queries

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add 7 batched NativeDatabase Rust methods that replace ~30 individual JS↔SQLite round-trips with single napi calls on the native engine path
  • detect-changes: getFileHashData, checkPendingAnalysis, healFileMetadata, findReverseDependencies
  • finalize: getFinalizeCounts, runAdvisoryChecks
  • collect-files: getCollectFilesData
  • WASM fallback paths preserved unchanged — all methods are optional on the TS interface

Test plan

  • tsc --noEmit — TypeScript compiles cleanly
  • cargo check — Rust compiles (no code errors; linker needs MSVC on CI)
  • biome check — lint clean on all changed files
  • 277 integration/graph tests pass (build, incremental, queries)
  • 1049 parser/search/unit tests pass
  • CI: full native engine test suite
  • Manual: codegraph build --engine native + incremental rebuild

Closes #694

Add 7 batched NativeDatabase methods that replace ~30 individual
JS↔SQLite round-trips with single napi calls on the native engine path:

- getFileHashData: table check + all rows + max mtime (detect-changes)
- checkPendingAnalysis: cfg_blocks + dataflow counts (detect-changes)
- healFileMetadata: batch upsert file_hashes (detect-changes)
- findReverseDependencies: reverse-dep query (detect-changes)
- getFinalizeCounts: node + edge counts (finalize)
- runAdvisoryChecks: orphaned/stale embeddings + unused exports (finalize)
- getCollectFilesData: file_hashes count + paths (collect-files)

WASM fallback paths are preserved unchanged.

Closes #694
@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native Rust Build-Glue Queries

Review Progress

  • Phase 1: Root cause analysis and problem validation
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality and security review
  • Phase 5: Backlog compliance assessment
  • Final assessment and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses a legitimate performance bottleneck where individual JS↔SQLite round-trips during build operations create excessive overhead. The solution batches multiple related database operations into single native calls.

Why This Approach: The implementation is well-justified:

  • Replaces ~30 individual round-trips with 7 batched methods
  • Targets specific bottlenecks in detect-changes, finalize, and collect-files stages
  • Maintains WASM fallback paths (no breaking changes)
  • Uses optional methods on TypeScript interface (backward compatible)

Risk Assessment: Low risk - this is an additive optimization that preserves existing behavior while improving performance on the native engine path.

Phase 2: Configuration Changes ✅

No concerning configuration changes detected:

  • No biome/linting rule modifications
  • No test timeouts or coverage threshold changes
  • No TypeScript strict mode relaxations
  • All changes are implementation-focused

Phase 3: Test Integrity ✅

No red flags found:

  • No skipped tests or weakened assertions
  • No .only or conditional test logic introduced
  • Existing test files for builder stages remain intact
  • Test plan shows comprehensive validation (tsc, cargo check, biome check, 1326 total tests)

Phase 4: Code Quality & Security ✅

Rust Code Quality (native_db.rs:1032-1269):

  • Error Handling: Excellent - proper map_err chains, no unwraps/panics
  • SQL Security: All queries use parameterized statements (no injection risk)
  • Memory Safety: Proper rusqlite patterns, transaction handling
  • Performance: Efficient - single transactions, prepared statements with caching

TypeScript Integration (detect-changes.ts, finalize.ts, collect-files.ts):

  • Backward Compatibility: Optional methods with WASM fallbacks preserved
  • Error Handling: Graceful degradation when native methods unavailable
  • Code Structure: Clean separation between native and WASM paths
  • No Anti-patterns: No eslint-disables, proper error propagation

Specific Strengths:

  • getFileHashData(): Batches table check + all rows + max mtime calculation
  • checkPendingAnalysis(): Efficient table existence + count checks
  • findReverseDependencies(): Optimized with HashSet deduplication
  • runAdvisoryChecks(): Three separate warning queries in one call

Phase 5: Backlog Compliance ✅

Zero-dep: ✓ — No new runtime dependencies added. Uses existing rusqlite/napi-rs infrastructure.

Foundation-aligned: ✓ — Fully compliant with core principles:

  • P1 (always-current): Improves local-first performance
  • P4 (zero-cost core): Optimizes existing functionality without new costs
  • P5 (embeddable-first): Maintains backward compatibility
  • P8 (not an agent): Pure performance optimization, no behavioral changes

Problem-fit: 5/5 — Directly addresses the core problem. This optimization significantly reduces round-trip overhead that affects AI agents during graph builds, improving response times for code navigation queries.

Breaking: No — Purely additive. Optional methods with full WASM fallback preservation.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed performance optimization.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent implementation of a legitimate performance optimization. Clean code, proper error handling, backward compatibility preserved, comprehensive test plan. Only missing the 5th star due to incomplete CI validation in test plan.

Minor Notes:

  • Consider running full native engine test suite in CI before merge
  • Manual testing of codegraph build --engine native + incremental rebuild recommended

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR introduces 7 batched NativeDatabase Rust methods that collapse ~30 individual JS↔SQLite round-trips into single napi calls across the detect-changes, finalize, and collect-files pipeline stages. The WASM fallback paths are preserved unchanged and all new methods are declared optional on the NativeDatabase interface, making native-only improvements non-breaking for older engine builds.

Key changes:

  • get_file_hash_data — replaces table-check + full-scan + MAX(mtime) with a single call; max_mtime computed in Rust and forwarded to tryJournalTier via precomputedMaxMtime
  • check_pending_analysis — replaces two IIFE try/catch COUNT blocks; uses -1 to distinguish missing tables from empty ones
  • heal_file_metadata — batch upsert inside a single transaction; per-row errors now propagate and roll back the transaction (addresses prior feedback)
  • find_reverse_dependencies — per-file edge query loop with in-Rust deduplication and sort_unstable() for deterministic ordering (addresses prior feedback)
  • get_finalize_counts — two COUNT queries collapsed into one call
  • run_advisory_checks — all three advisory queries (orphaned embeddings, stale embeddings, unused exports) collapsed into one call
  • get_collect_files_data — single-query implementation with count derived from files.len() rather than a separate COUNT(*) (addresses prior feedback)

Remaining minor issue:

  • get_collect_files_data uses filter_map(|r| r.ok()) to collect file rows, silently discarding any row-level error. This is inconsistent with the explicit error propagation used by every other new method in this batch (see inline comment). In practice, TEXT → String deserialization cannot fail, but the divergence is worth aligning.

Confidence Score: 5/5

Safe to merge — all three prior P1 concerns are resolved, and the one remaining finding is a minor style inconsistency with no practical runtime impact.

All previously flagged issues (partial heal commits, non-deterministic reverse-dep ordering, redundant COUNT in collect-files) are properly addressed in the final SHA. The single remaining comment is a P2 style observation about filter_map vs. explicit error propagation in get_collect_files_data — a TEXT-column deserialization failure is practically impossible in SQLite, so there is no correctness or data-integrity risk. The native/WASM semantic parity across all 7 new methods is solid.

crates/codegraph-core/src/native_db.rs — minor row-error handling inconsistency in get_collect_files_data (P2 only).

Important Files Changed

Filename Overview
crates/codegraph-core/src/native_db.rs Adds 7 batched napi methods for build-glue queries; all three previously flagged issues resolved; one minor row-error inconsistency in get_collect_files_data remains (P2).
src/domain/graph/builder/stages/detect-changes.ts Rewires getChangedFiles, runPendingAnalysis, healMetadata, and findReverseDependencies to use native batch APIs when available; WASM fallback preserved cleanly.
src/domain/graph/builder/stages/finalize.ts runAdvisoryChecks and finalize count queries batched into single native calls; native and WASM paths are semantically equivalent including null/error handling.
src/domain/graph/builder/stages/collect-files.ts tryFastCollect patched to use getCollectFilesData on native engine; count is now derived from files.len() in Rust so the two values are always consistent.
src/types.ts Adds 7 optional method signatures to NativeDatabase; all marked optional preserving backward compatibility with older native engine builds.

Sequence Diagram

sequenceDiagram
    participant TS as TypeScript Stage
    participant ND as NativeDatabase (Rust/napi)
    participant WD as BetterSQLite3 (WASM)
    participant DB as SQLite File

    Note over TS,DB: detect-changes — getChangedFiles
    alt native engine (getFileHashData present)
        TS->>ND: getFileHashData()
        ND->>DB: SELECT file, hash, mtime, size FROM file_hashes
        DB-->>ND: all rows + max_mtime computed in Rust
        ND-->>TS: FileHashData { exists, rows, maxMtime }
    else WASM / fallback
        TS->>WD: SELECT 1 FROM file_hashes LIMIT 1
        WD->>DB: (table check)
        TS->>WD: SELECT file, hash, mtime, size FROM file_hashes
        WD->>DB: (all rows)
        TS->>WD: SELECT MAX(mtime) FROM file_hashes
        WD->>DB: (max mtime)
    end

    Note over TS,DB: finalize — counts + advisory checks
    alt native (getFinalizeCounts + runAdvisoryChecks present)
        TS->>ND: getFinalizeCounts()
        ND->>DB: COUNT nodes + COUNT edges
        ND-->>TS: FinalizeCounts
        TS->>ND: runAdvisoryChecks(hasEmbeddings)
        ND->>DB: orphaned embeddings + embed_built_at + unused exports
        ND-->>TS: AdvisoryCheckResult
    else WASM
        TS->>WD: SELECT COUNT(*) FROM nodes
        TS->>WD: SELECT COUNT(*) FROM edges
        TS->>WD: orphaned / stale / unused queries (3 separate calls)
    end
Loading

Reviews (2): Last reviewed commit: "fix: address Greptile review feedback fo..." | Re-trigger Greptile

Comment on lines +1246 to +1268
files: vec![],
});
}
let count = conn
.query_row("SELECT COUNT(*) FROM file_hashes", [], |r| r.get::<_, i64>(0))
.unwrap_or(0);
if count == 0 {
return Ok(CollectFilesData {
count: 0,
files: vec![],
});
}
let mut stmt = conn
.prepare_cached("SELECT file FROM file_hashes")
.map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?;
let rows = stmt
.query_map([], |row| row.get::<_, String>(0))
.map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))?;
let files: Vec<String> = rows.filter_map(|r| r.ok()).collect();
Ok(CollectFilesData {
count,
files,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 get_collect_files_data — two unguarded queries can return inconsistent count and files

The COUNT(*) query and the subsequent SELECT file FROM file_hashes execute as separate statements without a wrapping transaction. If any write occurs between them (unlikely under normal SQLite serialization, but possible in WAL mode with concurrent readers), the returned count could be stale relative to files. In collect-files.ts, data.count is used only as an early-exit guard (if (dbFileCount === 0) return null), so a divergence doesn't cause incorrect results — but it's architecturally cleaner to either wrap both in a BEGIN/COMMIT, or simply remove the separate COUNT(*) query entirely and derive the count from files.len():

// Option: single query, derive count from result
let mut stmt = conn
    .prepare_cached("SELECT file FROM file_hashes")
    .map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?;
let files: Vec<String> = stmt
    .query_map([], |row| row.get::<_, String>(0))
    .map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))?
    .filter_map(|r| r.ok())
    .collect();
let count = files.len() as i64;
Ok(CollectFilesData { count, files })

This eliminates the redundant COUNT(*) round-trip, makes count always consistent with files, and preserves the early-exit behaviour on the TS side unchanged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 685d40e. Removed the separate COUNT(*) query entirely — now derives count from files.len() as i64 after the single SELECT file FROM file_hashes query, eliminating the redundant round-trip and ensuring count is always consistent with the returned file list.

Comment on lines +1120 to +1127
for entry in &entries {
if stmt
.execute(params![entry.file, entry.hash, entry.mtime, entry.size])
.is_ok()
{
count += 1;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 heal_file_metadata — per-entry failures silently committed as partial results

Individual stmt.execute(...) errors are swallowed with .is_ok(), but tx.commit() is called unconditionally afterward. This means a partial set of rows can be committed even when some entries failed — silently and with a return value that no longer reflects what actually landed. The TypeScript WASM fallback wraps the same loop in a db.transaction() that rolls back entirely on any run() error.

While the healMetadata caller in detect-changes.ts already swallows the whole operation in try { … } catch { /* ignore heal errors */ }, propagating per-entry failures instead of silently continuing keeps the native path semantically aligned with the fallback:

for entry in &entries {
    stmt.execute(params![entry.file, entry.hash, entry.mtime, entry.size])
        .map_err(|e| napi::Error::from_reason(format!("heal row failed: {e}")))?;
    count += 1;
}

A propagated error bubbles up to the TS catch block anyway, so the user-visible behaviour is identical — but the commit then reflects only fully successful batches, matching the WASM rollback semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 685d40e. Per-row errors now propagate via \ instead of being silently swallowed with . A failed row will cause the transaction to roll back (since \ is dropped without ), matching the WASM fallback's rollback-on-error semantics. The TS caller's \ still handles the propagated error gracefully.

Comment on lines +1153 to +1185
.map_err(|e| napi::Error::from_reason(format!("reverseDeps prepare failed: {e}")))?;

for file in &changed_files {
let rows = stmt
.query_map(params![file], |row| row.get::<_, String>(0))
.map_err(|e| {
napi::Error::from_reason(format!("reverseDeps query failed: {e}"))
})?;
for row in rows {
if let Ok(dep_file) = row {
if !changed_set.contains(dep_file.as_str()) {
result_set.insert(dep_file);
}
}
}
}
Ok(result_set.into_iter().collect())
}

/// Get node and edge counts in a single napi call.
#[napi]
pub fn get_finalize_counts(&self) -> napi::Result<FinalizeCounts> {
let conn = self.conn()?;
let node_count = conn
.query_row("SELECT COUNT(*) FROM nodes", [], |r| r.get::<_, i64>(0))
.unwrap_or(0);
let edge_count = conn
.query_row("SELECT COUNT(*) FROM edges", [], |r| r.get::<_, i64>(0))
.unwrap_or(0);
Ok(FinalizeCounts {
node_count,
edge_count,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 find_reverse_dependencies — non-deterministic result ordering via HashSet

result_set.into_iter().collect() iterates a std::collections::HashSet, which provides no ordering guarantees and can produce different orderings across invocations (or Rust versions). The caller in detect-changes.ts builds a Set<string> from the result (so deduplication is unaffected), but the non-deterministic order propagates into ctx.parseChanges.push(...) entries, making the parse order vary between runs.

Consider using a BTreeSet or sorting before returning if deterministic builds matter:

let mut result_vec: Vec<String> = result_set.into_iter().collect();
result_vec.sort_unstable();
Ok(result_vec)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 685d40e. Added sort_unstable() before returning the result vector, ensuring deterministic ordering across runs regardless of HashSet iteration order.

…735)

- Remove redundant COUNT(*) query in get_collect_files_data; derive
  count from files.len() to guarantee consistency
- Propagate per-row errors in heal_file_metadata instead of silently
  swallowing with .is_ok(), matching WASM rollback semantics
- Sort find_reverse_dependencies output for deterministic ordering
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit f1540c2 into main Apr 1, 2026
18 checks passed
@carlos-alm carlos-alm deleted the perf/native-build-glue-queries branch April 1, 2026 10:28
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: native Rust build-glue queries (detect-changes, finalize, incremental)

1 participant