perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735
perf: native Rust build-glue queries (detect-changes, finalize, incremental)#735carlos-alm merged 3 commits intomainfrom
Conversation
Add 7 batched NativeDatabase methods that replace ~30 individual JS↔SQLite round-trips with single napi calls on the native engine path: - getFileHashData: table check + all rows + max mtime (detect-changes) - checkPendingAnalysis: cfg_blocks + dataflow counts (detect-changes) - healFileMetadata: batch upsert file_hashes (detect-changes) - findReverseDependencies: reverse-dep query (detect-changes) - getFinalizeCounts: node + edge counts (finalize) - runAdvisoryChecks: orphaned/stale embeddings + unused exports (finalize) - getCollectFilesData: file_hashes count + paths (collect-files) WASM fallback paths are preserved unchanged. Closes #694
|
Claude finished @carlos-alm's task —— View job PR Review: Native Rust Build-Glue QueriesReview Progress
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR addresses a legitimate performance bottleneck where individual JS↔SQLite round-trips during build operations create excessive overhead. The solution batches multiple related database operations into single native calls. Why This Approach: The implementation is well-justified:
Risk Assessment: Low risk - this is an additive optimization that preserves existing behavior while improving performance on the native engine path. Phase 2: Configuration Changes ✅No concerning configuration changes detected:
Phase 3: Test Integrity ✅No red flags found:
Phase 4: Code Quality & Security ✅Rust Code Quality (
TypeScript Integration (
Specific Strengths:
Phase 5: Backlog Compliance ✅Zero-dep: ✓ — No new runtime dependencies added. Uses existing rusqlite/napi-rs infrastructure. Foundation-aligned: ✓ — Fully compliant with core principles:
Problem-fit: 5/5 — Directly addresses the core problem. This optimization significantly reduces round-trip overhead that affects AI agents during graph builds, improving response times for code navigation queries. Breaking: No — Purely additive. Optional methods with full WASM fallback preservation. Tier: 1 (zero-dep ✓ + foundation-aligned ✓) Critical ConcernsNone identified. This is a well-executed performance optimization. Final Recommendation
Minor Notes:
|
Greptile SummaryThis PR introduces 7 batched Key changes:
Remaining minor issue:
Confidence Score: 5/5Safe to merge — all three prior P1 concerns are resolved, and the one remaining finding is a minor style inconsistency with no practical runtime impact. All previously flagged issues (partial heal commits, non-deterministic reverse-dep ordering, redundant COUNT in collect-files) are properly addressed in the final SHA. The single remaining comment is a P2 style observation about filter_map vs. explicit error propagation in get_collect_files_data — a TEXT-column deserialization failure is practically impossible in SQLite, so there is no correctness or data-integrity risk. The native/WASM semantic parity across all 7 new methods is solid. crates/codegraph-core/src/native_db.rs — minor row-error handling inconsistency in get_collect_files_data (P2 only). Important Files Changed
Sequence DiagramsequenceDiagram
participant TS as TypeScript Stage
participant ND as NativeDatabase (Rust/napi)
participant WD as BetterSQLite3 (WASM)
participant DB as SQLite File
Note over TS,DB: detect-changes — getChangedFiles
alt native engine (getFileHashData present)
TS->>ND: getFileHashData()
ND->>DB: SELECT file, hash, mtime, size FROM file_hashes
DB-->>ND: all rows + max_mtime computed in Rust
ND-->>TS: FileHashData { exists, rows, maxMtime }
else WASM / fallback
TS->>WD: SELECT 1 FROM file_hashes LIMIT 1
WD->>DB: (table check)
TS->>WD: SELECT file, hash, mtime, size FROM file_hashes
WD->>DB: (all rows)
TS->>WD: SELECT MAX(mtime) FROM file_hashes
WD->>DB: (max mtime)
end
Note over TS,DB: finalize — counts + advisory checks
alt native (getFinalizeCounts + runAdvisoryChecks present)
TS->>ND: getFinalizeCounts()
ND->>DB: COUNT nodes + COUNT edges
ND-->>TS: FinalizeCounts
TS->>ND: runAdvisoryChecks(hasEmbeddings)
ND->>DB: orphaned embeddings + embed_built_at + unused exports
ND-->>TS: AdvisoryCheckResult
else WASM
TS->>WD: SELECT COUNT(*) FROM nodes
TS->>WD: SELECT COUNT(*) FROM edges
TS->>WD: orphaned / stale / unused queries (3 separate calls)
end
Reviews (2): Last reviewed commit: "fix: address Greptile review feedback fo..." | Re-trigger Greptile |
| files: vec![], | ||
| }); | ||
| } | ||
| let count = conn | ||
| .query_row("SELECT COUNT(*) FROM file_hashes", [], |r| r.get::<_, i64>(0)) | ||
| .unwrap_or(0); | ||
| if count == 0 { | ||
| return Ok(CollectFilesData { | ||
| count: 0, | ||
| files: vec![], | ||
| }); | ||
| } | ||
| let mut stmt = conn | ||
| .prepare_cached("SELECT file FROM file_hashes") | ||
| .map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?; | ||
| let rows = stmt | ||
| .query_map([], |row| row.get::<_, String>(0)) | ||
| .map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))?; | ||
| let files: Vec<String> = rows.filter_map(|r| r.ok()).collect(); | ||
| Ok(CollectFilesData { | ||
| count, | ||
| files, | ||
| }) |
There was a problem hiding this comment.
get_collect_files_data — two unguarded queries can return inconsistent count and files
The COUNT(*) query and the subsequent SELECT file FROM file_hashes execute as separate statements without a wrapping transaction. If any write occurs between them (unlikely under normal SQLite serialization, but possible in WAL mode with concurrent readers), the returned count could be stale relative to files. In collect-files.ts, data.count is used only as an early-exit guard (if (dbFileCount === 0) return null), so a divergence doesn't cause incorrect results — but it's architecturally cleaner to either wrap both in a BEGIN/COMMIT, or simply remove the separate COUNT(*) query entirely and derive the count from files.len():
// Option: single query, derive count from result
let mut stmt = conn
.prepare_cached("SELECT file FROM file_hashes")
.map_err(|e| napi::Error::from_reason(format!("collectFiles prepare failed: {e}")))?;
let files: Vec<String> = stmt
.query_map([], |row| row.get::<_, String>(0))
.map_err(|e| napi::Error::from_reason(format!("collectFiles query failed: {e}")))?
.filter_map(|r| r.ok())
.collect();
let count = files.len() as i64;
Ok(CollectFilesData { count, files })This eliminates the redundant COUNT(*) round-trip, makes count always consistent with files, and preserves the early-exit behaviour on the TS side unchanged.
There was a problem hiding this comment.
Fixed in 685d40e. Removed the separate COUNT(*) query entirely — now derives count from files.len() as i64 after the single SELECT file FROM file_hashes query, eliminating the redundant round-trip and ensuring count is always consistent with the returned file list.
| for entry in &entries { | ||
| if stmt | ||
| .execute(params![entry.file, entry.hash, entry.mtime, entry.size]) | ||
| .is_ok() | ||
| { | ||
| count += 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
heal_file_metadata — per-entry failures silently committed as partial results
Individual stmt.execute(...) errors are swallowed with .is_ok(), but tx.commit() is called unconditionally afterward. This means a partial set of rows can be committed even when some entries failed — silently and with a return value that no longer reflects what actually landed. The TypeScript WASM fallback wraps the same loop in a db.transaction() that rolls back entirely on any run() error.
While the healMetadata caller in detect-changes.ts already swallows the whole operation in try { … } catch { /* ignore heal errors */ }, propagating per-entry failures instead of silently continuing keeps the native path semantically aligned with the fallback:
for entry in &entries {
stmt.execute(params![entry.file, entry.hash, entry.mtime, entry.size])
.map_err(|e| napi::Error::from_reason(format!("heal row failed: {e}")))?;
count += 1;
}A propagated error bubbles up to the TS catch block anyway, so the user-visible behaviour is identical — but the commit then reflects only fully successful batches, matching the WASM rollback semantics.
There was a problem hiding this comment.
Fixed in 685d40e. Per-row errors now propagate via \ instead of being silently swallowed with . A failed row will cause the transaction to roll back (since \ is dropped without ), matching the WASM fallback's rollback-on-error semantics. The TS caller's \ still handles the propagated error gracefully.
| .map_err(|e| napi::Error::from_reason(format!("reverseDeps prepare failed: {e}")))?; | ||
|
|
||
| for file in &changed_files { | ||
| let rows = stmt | ||
| .query_map(params![file], |row| row.get::<_, String>(0)) | ||
| .map_err(|e| { | ||
| napi::Error::from_reason(format!("reverseDeps query failed: {e}")) | ||
| })?; | ||
| for row in rows { | ||
| if let Ok(dep_file) = row { | ||
| if !changed_set.contains(dep_file.as_str()) { | ||
| result_set.insert(dep_file); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| Ok(result_set.into_iter().collect()) | ||
| } | ||
|
|
||
| /// Get node and edge counts in a single napi call. | ||
| #[napi] | ||
| pub fn get_finalize_counts(&self) -> napi::Result<FinalizeCounts> { | ||
| let conn = self.conn()?; | ||
| let node_count = conn | ||
| .query_row("SELECT COUNT(*) FROM nodes", [], |r| r.get::<_, i64>(0)) | ||
| .unwrap_or(0); | ||
| let edge_count = conn | ||
| .query_row("SELECT COUNT(*) FROM edges", [], |r| r.get::<_, i64>(0)) | ||
| .unwrap_or(0); | ||
| Ok(FinalizeCounts { | ||
| node_count, | ||
| edge_count, | ||
| }) |
There was a problem hiding this comment.
find_reverse_dependencies — non-deterministic result ordering via HashSet
result_set.into_iter().collect() iterates a std::collections::HashSet, which provides no ordering guarantees and can produce different orderings across invocations (or Rust versions). The caller in detect-changes.ts builds a Set<string> from the result (so deduplication is unaffected), but the non-deterministic order propagates into ctx.parseChanges.push(...) entries, making the parse order vary between runs.
Consider using a BTreeSet or sorting before returning if deterministic builds matter:
let mut result_vec: Vec<String> = result_set.into_iter().collect();
result_vec.sort_unstable();
Ok(result_vec)There was a problem hiding this comment.
Fixed in 685d40e. Added sort_unstable() before returning the result vector, ensuring deterministic ordering across runs regardless of HashSet iteration order.
…735) - Remove redundant COUNT(*) query in get_collect_files_data; derive count from files.len() to guarantee consistency - Propagate per-row errors in heal_file_metadata instead of silently swallowing with .is_ok(), matching WASM rollback semantics - Sort find_reverse_dependencies output for deterministic ordering
Summary
NativeDatabaseRust methods that replace ~30 individual JS↔SQLite round-trips with single napi calls on the native engine pathgetFileHashData,checkPendingAnalysis,healFileMetadata,findReverseDependenciesgetFinalizeCounts,runAdvisoryChecksgetCollectFilesDataTest plan
tsc --noEmit— TypeScript compiles cleanlycargo check— Rust compiles (no code errors; linker needs MSVC on CI)biome check— lint clean on all changed filescodegraph build --engine native+ incremental rebuildCloses #694