perf(db): add compound indexes to fix query regression from TS migration by carlos-alm · Pull Request #632 · optave/ops-codegraph-tool

carlos-alm · 2026-03-26T08:26:09Z

Summary

Add migration v16 with idx_edges_kind_target and idx_edges_kind_source compound indexes on the edges table. The fan-in/fan-out GROUP BY subqueries (used by fnDeps, roles, triage) need kind-first indexes to avoid full-table scans. Without them, the 32% graph growth from the TypeScript migration caused a disproportionate 136% fnDeps slowdown.
Optimize classifyNodeRoles (both full and incremental paths) to skip ~4,100 parameter/property nodes from the expensive fan-in/fan-out JOINs. These leaf nodes are always dead-leaf by definition and can never have callers or callees.

Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms, fnDeps path 8.5ms → ~1ms.

Addresses regressions reported in #625, #626, #627.

Test plan

Full test suite passes (2,128 tests, 0 failures)
codegraph build --no-incremental succeeds, new indexes created
codegraph stats shows unchanged node/edge counts
Local benchmark confirms query latency improvement

…tion Add migration v16 with idx_edges_kind_target and idx_edges_kind_source indexes. The fan-in/fan-out GROUP BY subqueries used by fnDeps, roles, and triage need kind-first indexes to avoid full-table scans on the edges table. Without them, a 32% growth in graph density (from the TypeScript migration) caused a disproportionate 136% query slowdown. Also optimize classifyNodeRoles (both full and incremental paths) to skip parameter and property nodes from the expensive fan-in/fan-out JOINs. These ~4,100 nodes are always classified as dead-leaf by definition and can never have callers or callees. Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms.

claude · 2026-03-26T08:26:24Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: Performance indexes for query regression fix

Todo List:

Read and analyze changed files
Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity
Phase 4: Code Quality Review
Phase 5: Backlog Compliance Assessment
Provide structured final assessment

Phase 1: Root Cause Analysis ✓

Problem: TypeScript migration caused 32% graph growth → 136% fnDeps slowdown due to missing compound indexes
Solution: Add idx_edges_kind_target/idx_edges_kind_source indexes + skip 4,100 leaf nodes from expensive JOINs
Risk: Low - purely additive, preserves correctness

Phase 2: Configuration Change Detection ✓

HIGH-RISK CHANGES: ✅ NONE - No config/linting/build changes

Phase 3: Test Integrity ✓

RED FLAGS: ✅ NONE - No test modifications, author reports all 2,128 tests pass

Phase 4: Code Quality Review ✓

MIGRATION v16: Proper compound index ordering, uses IF NOT EXISTS, follows patterns
OPTIMIZATION: Correctly identifies parameter/property as structural leaves, safe 4,100-node skip
TECHNICAL: Index order matches access patterns, incremental path preserved, no behavior changes

Phase 5: Backlog Compliance Assessment ✓

Root Cause Analysis

Problem Being Solved: TypeScript migration regression - 32% graph growth caused disproportionate 136% query slowdown due to missing compound indexes for fan-in/fan-out subqueries.
Why This Approach: Database-first solution targeting the actual bottleneck (index scans) plus algorithmic optimization (skip leaf nodes that can't participate in call relationships).
Risk Assessment: Minimal - purely additive schema changes + logically sound optimization that preserves all semantics.

Backlog Compliance

Zero-dep: ✅ - No new runtime dependencies, builds on existing SQLite/tree-sitter infrastructure
Foundation-aligned: ✅ - Pure performance optimization, no cloud APIs, maintains offline-first principle
Problem-fit: 5 - Directly addresses core issue: prevents AI agents from experiencing slow codegraph queries that waste tokens during re-orientation in large codebases
Breaking: No - Schema migrations are backward compatible, no API changes, preserves all existing behavior
Tier: 1 (zero-dep ✅ + foundation-aligned ✅)

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Textbook performance fix addressing a real regression with proper root cause analysis. Database indexes are correctly ordered for access patterns, optimization correctly identifies structural properties of parameter/property nodes, no correctness risks, comprehensive test validation, and directly improves AI agent experience by preventing query slowdowns.

Excellent work - this is exactly how performance regressions should be addressed: measure the bottleneck, fix the underlying issue with proper database design, and add algorithmic optimizations based on domain knowledge.

greptile-apps · 2026-03-26T08:30:50Z

Greptile Summary

This PR addresses a query regression caused by graph growth from the TypeScript migration by adding two compound SQLite indexes and skipping ~4,100 leaf nodes from expensive fan-in/fan-out computations.\n\n- Migration v16 adds idx_edges_kind_target ON edges(kind, target_id) and idx_edges_kind_source ON edges(kind, source_id). The column order is intentional and correct: kind is the equality filter (WHERE kind = 'calls'), so placing it first allows SQLite to narrow the scan to only 'calls' edges before resolving target_id/source_id. Both GROUP BY subqueries in the full path and correlated subqueries in the incremental path benefit directly.\n- classifyNodeRolesFull and classifyNodeRolesIncremental now pre-classify parameter and property nodes as dead-leaf with a cheap scan, then exclude them from the heavy JOIN/subquery path. The idsByRole map is correctly pre-seeded before the main rows loop — callable nodes that the classifier also resolves to dead-leaf push into the same backing array rather than overwriting it. The early-return guard is widened to rows.length === 0 && leafRows.length === 0, preventing a silent no-op when a file contains only leaf-kind nodes.\n- All logic paths (full, incremental, zero-rows edge case) look correct. The previously noted over-fetching of name/kind/file in leafRows was already resolved in 6c8cc17 per the prior thread.

Confidence Score: 5/5

Safe to merge — targeted performance fix with correct index design, no logic regressions, and all 2,128 tests passing.

Both changes are well-scoped: the migration uses IF NOT EXISTS and is append-only, and the classifyNodeRoles refactor is structurally sound (correct map accumulation, correct early-exit guard, correct incremental scoping to allAffectedFiles). The prior review concern about unused columns in leafRows was already resolved. No blocking issues found.

No files require special attention.

Important Files Changed

Filename	Overview
src/db/migrations.ts	Adds migration v16 with two compound indexes on the edges table — (kind, target_id) and (kind, source_id) — ordered correctly (equality filter kind first) so SQLite can skip-scan 'calls' edges without full table scans.
src/features/structure.ts	Both full and incremental classification paths now short-circuit parameter/property nodes as dead-leaf before the expensive fan-in/fan-out JOINs/subqueries; the idsByRole map correctly accumulates callable nodes that also resolve to dead-leaf into the same array, and the early-exit guard is correctly widened to rows.length === 0 && leafRows.length === 0.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[classifyNodeRoles] --> B{changedFiles?}
    B -- Yes --> C[classifyNodeRolesIncremental]
    B -- No --> D[classifyNodeRolesFull]

    D --> D1["leafRows: SELECT id WHERE kind IN ('parameter','property')"]
    D --> D2["rows: LEFT JOIN fan-in/fan-out\nWHERE kind NOT IN (...,'parameter','property')\n uses idx_edges_kind_target/source"]
    D1 --> D3["idsByRole.set('dead-leaf', leafIds)"]
    D2 --> D4["classifyRoles → roleMap"]
    D4 --> D5["rows loop: append to idsByRole\n(dead-leaf entry already exists → push)"]
    D3 --> D6["db.transaction: UPDATE nodes SET role=NULL\nthen batch UPDATE per role chunk"]
    D5 --> D6

    C --> C1["leafRows: SELECT id WHERE kind IN ('parameter','property') AND file IN (...)"]
    C --> C2["rows: correlated subqueries fan-in/fan-out\nWHERE kind NOT IN (...) AND file IN (...)\n uses idx_edges_kind_target/source"]
    C1 --> C3["idsByRole.set('dead-leaf', leafIds)"]
    C2 --> C4["classifyRoles → roleMap"]
    C4 --> C5["rows loop: append to idsByRole"]
    C3 --> C6["db.transaction: UPDATE affected files only\nthen batch UPDATE per role chunk"]
    C5 --> C6

_{Reviews (3): Last reviewed commit: "fix: replace corrupted UTF-8 character i..." | Re-trigger Greptile}

greptile-apps · 2026-03-26T08:30:53Z

src/features/structure.ts

+  const leafRows = db
+    .prepare(
+      `SELECT n.id, n.name, n.kind, n.file
+      FROM nodes n
+      WHERE n.kind IN ('parameter', 'property')`,
+    )
+    .all() as { id: number; name: string; kind: string; file: string }[];


Unused columns in full-path leafRows query

name, kind, and file are fetched here but only id is ever accessed in the processing loop below (for (const row of leafRows) leafIds.push(row.id)). The incremental path correctly fetches only SELECT n.id — the full path can match that.

Suggested change

const leafRows = db

.prepare(

`SELECT n.id, n.name, n.kind, n.file

FROM nodes n

WHERE n.kind IN ('parameter', 'property')`,

)

.all() as { id: number; name: string; kind: string; file: string }[];

const leafRows = db

.prepare(

`SELECT n.id

FROM nodes n

WHERE n.kind IN ('parameter', 'property')`,

)

.all() as { id: number }[];

Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.

The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.

carlos-alm · 2026-03-26T08:47:00Z

@greptileai

…ent (#632)

carlos-alm · 2026-03-26T20:27:41Z

Fixed the corrupted UTF-8 replacement character in the incremental classifier comment (line 600 of src/features/structure.ts) noted in the Greptile summary. Replaced � with a proper em-dash —.

carlos-alm · 2026-03-26T20:28:21Z

@greptileai

greptile-apps bot reviewed Mar 26, 2026

View reviewed changes

carlos-alm added 2 commits March 26, 2026 02:37

Merge branch 'main' into perf/benchmark-regression-indexes

9080b1a

fix: remove unused columns from full-path leafRows query (#632)

6c8cc17

The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.

fix: replace corrupted UTF-8 character in incremental classifier comm…

ca2f324

…ent (#632)

Merge branch 'main' into perf/benchmark-regression-indexes

58a82cc

carlos-alm merged commit c61e825 into main Mar 26, 2026
12 checks passed

carlos-alm deleted the perf/benchmark-regression-indexes branch March 26, 2026 21:36

github-actions bot locked and limited conversation to collaborators Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(db): add compound indexes to fix query regression from TS migration#632

perf(db): add compound indexes to fix query regression from TS migration#632
carlos-alm merged 5 commits intomainfrom
perf/benchmark-regression-indexes

carlos-alm commented Mar 26, 2026

Uh oh!

claude bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 26, 2026

Uh oh!

carlos-alm Mar 26, 2026

Uh oh!

carlos-alm Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 26, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: Performance indexes for query regression fix

Phase 1: Root Cause Analysis ✓

Phase 2: Configuration Change Detection ✓

Phase 3: Test Integrity ✓

Phase 4: Code Quality Review ✓

Phase 5: Backlog Compliance Assessment ✓

Root Cause Analysis

Backlog Compliance

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 26, 2026 •

edited

Loading

greptile-apps bot commented Mar 26, 2026 •

edited

Loading