perf(db): add compound indexes to fix query regression from TS migration#632
perf(db): add compound indexes to fix query regression from TS migration#632carlos-alm merged 5 commits intomainfrom
Conversation
…tion Add migration v16 with idx_edges_kind_target and idx_edges_kind_source indexes. The fan-in/fan-out GROUP BY subqueries used by fnDeps, roles, and triage need kind-first indexes to avoid full-table scans on the edges table. Without them, a 32% growth in graph density (from the TypeScript migration) caused a disproportionate 136% query slowdown. Also optimize classifyNodeRoles (both full and incremental paths) to skip parameter and property nodes from the expensive fan-in/fan-out JOINs. These ~4,100 nodes are always classified as dead-leaf by definition and can never have callers or callees. Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: Performance indexes for query regression fixTodo List:
Phase 1: Root Cause Analysis ✓Problem: TypeScript migration caused 32% graph growth → 136% fnDeps slowdown due to missing compound indexes Phase 2: Configuration Change Detection ✓HIGH-RISK CHANGES: ✅ NONE - No config/linting/build changes Phase 3: Test Integrity ✓RED FLAGS: ✅ NONE - No test modifications, author reports all 2,128 tests pass Phase 4: Code Quality Review ✓MIGRATION v16: Proper compound index ordering, uses Phase 5: Backlog Compliance Assessment ✓Root Cause AnalysisProblem Being Solved: TypeScript migration regression - 32% graph growth caused disproportionate 136% query slowdown due to missing compound indexes for fan-in/fan-out subqueries. Backlog Compliance
Final Recommendation
Excellent work - this is exactly how performance regressions should be addressed: measure the bottleneck, fix the underlying issue with proper database design, and add algorithmic optimizations based on domain knowledge. |
Greptile SummaryThis PR addresses a query regression caused by graph growth from the TypeScript migration by adding two compound SQLite indexes and skipping ~4,100 leaf nodes from expensive fan-in/fan-out computations.\n\n- Migration v16 adds Confidence Score: 5/5Safe to merge — targeted performance fix with correct index design, no logic regressions, and all 2,128 tests passing. Both changes are well-scoped: the migration uses IF NOT EXISTS and is append-only, and the classifyNodeRoles refactor is structurally sound (correct map accumulation, correct early-exit guard, correct incremental scoping to allAffectedFiles). The prior review concern about unused columns in leafRows was already resolved. No blocking issues found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[classifyNodeRoles] --> B{changedFiles?}
B -- Yes --> C[classifyNodeRolesIncremental]
B -- No --> D[classifyNodeRolesFull]
D --> D1["leafRows: SELECT id WHERE kind IN ('parameter','property')"]
D --> D2["rows: LEFT JOIN fan-in/fan-out\nWHERE kind NOT IN (...,'parameter','property')\n uses idx_edges_kind_target/source"]
D1 --> D3["idsByRole.set('dead-leaf', leafIds)"]
D2 --> D4["classifyRoles → roleMap"]
D4 --> D5["rows loop: append to idsByRole\n(dead-leaf entry already exists → push)"]
D3 --> D6["db.transaction: UPDATE nodes SET role=NULL\nthen batch UPDATE per role chunk"]
D5 --> D6
C --> C1["leafRows: SELECT id WHERE kind IN ('parameter','property') AND file IN (...)"]
C --> C2["rows: correlated subqueries fan-in/fan-out\nWHERE kind NOT IN (...) AND file IN (...)\n uses idx_edges_kind_target/source"]
C1 --> C3["idsByRole.set('dead-leaf', leafIds)"]
C2 --> C4["classifyRoles → roleMap"]
C4 --> C5["rows loop: append to idsByRole"]
C3 --> C6["db.transaction: UPDATE affected files only\nthen batch UPDATE per role chunk"]
C5 --> C6
Reviews (3): Last reviewed commit: "fix: replace corrupted UTF-8 character i..." | Re-trigger Greptile |
src/features/structure.ts
Outdated
| const leafRows = db | ||
| .prepare( | ||
| `SELECT n.id, n.name, n.kind, n.file | ||
| FROM nodes n | ||
| WHERE n.kind IN ('parameter', 'property')`, | ||
| ) | ||
| .all() as { id: number; name: string; kind: string; file: string }[]; |
There was a problem hiding this comment.
Unused columns in full-path
leafRows query
name, kind, and file are fetched here but only id is ever accessed in the processing loop below (for (const row of leafRows) leafIds.push(row.id)). The incremental path correctly fetches only SELECT n.id — the full path can match that.
| const leafRows = db | |
| .prepare( | |
| `SELECT n.id, n.name, n.kind, n.file | |
| FROM nodes n | |
| WHERE n.kind IN ('parameter', 'property')`, | |
| ) | |
| .all() as { id: number; name: string; kind: string; file: string }[]; | |
| const leafRows = db | |
| .prepare( | |
| `SELECT n.id | |
| FROM nodes n | |
| WHERE n.kind IN ('parameter', 'property')`, | |
| ) | |
| .all() as { id: number }[]; |
There was a problem hiding this comment.
Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.
There was a problem hiding this comment.
Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.
The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.
|
Fixed the corrupted UTF-8 replacement character in the incremental classifier comment (line 600 of |
Summary
idx_edges_kind_targetandidx_edges_kind_sourcecompound indexes on theedgestable. The fan-in/fan-outGROUP BYsubqueries (used by fnDeps, roles, triage) need kind-first indexes to avoid full-table scans. Without them, the 32% graph growth from the TypeScript migration caused a disproportionate 136% fnDeps slowdown.classifyNodeRoles(both full and incremental paths) to skip ~4,100 parameter/property nodes from the expensive fan-in/fan-out JOINs. These leaf nodes are alwaysdead-leafby definition and can never have callers or callees.Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms, fnDeps path 8.5ms → ~1ms.
Addresses regressions reported in #625, #626, #627.
Test plan
codegraph build --no-incrementalsucceeds, new indexes createdcodegraph statsshows unchanged node/edge counts