diff --git a/docs/roadmap/ROADMAP.md b/docs/roadmap/ROADMAP.md index 4ca9cf9d..ecc16437 100644 --- a/docs/roadmap/ROADMAP.md +++ b/docs/roadmap/ROADMAP.md @@ -2,7 +2,7 @@ > **Current version:** 3.1.4 | **Status:** Active development | **Updated:** March 2026 -Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across ten phases -- closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default. +Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across eleven phases -- closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default. **LLM strategy:** All LLM-powered features are **optional enhancements**. Everything works without an API key. When configured (OpenAI, Anthropic, Ollama, or any OpenAI-compatible endpoint), users unlock richer semantic search and natural language queries. @@ -17,13 +17,14 @@ Codegraph is a strong local-first code graph CLI. This roadmap describes planned | [**2.5**](#phase-25--analysis-expansion) | Analysis Expansion | Complexity metrics, community detection, flow tracing, co-change, manifesto, boundary rules, check, triage, audit, batch, hybrid search | **Complete** (v2.6.0) | | [**2.7**](#phase-27--deep-analysis--graph-enrichment) | Deep Analysis & Graph Enrichment | Dataflow analysis, intraprocedural CFG, AST node storage, expanded node/edge types, extractors refactoring, CLI consolidation, interactive viewer, exports command, normalizeSymbol | **Complete** (v3.0.0) | | [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring (Vertical Slice) | Unified AST analysis framework, command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, builder pipeline, presentation layer, domain grouping, curated API, unified graph model, qualified names, CLI composability | **In Progress** (v3.1.4) | -| [**4**](#phase-4--typescript-migration) | TypeScript Migration | Project setup, core type definitions, leaf -> core -> orchestration module migration, test migration, supply-chain security, CI coverage gates | Planned | -| [**5**](#phase-5--runtime--extensibility) | Runtime & Extensibility | Event-driven pipeline, unified engine strategy, subgraph export filtering, transitive confidence, query caching, configuration profiles, pagination, plugin system, DX & onboarding | Planned | -| [**6**](#phase-6--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned | -| [**7**](#phase-7--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned | -| [**8**](#phase-8--expanded-language-support) | Expanded Language Support | 8 new languages (11 -> 19), parser utilities | Planned | -| [**9**](#phase-9--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned | -| [**10**](#phase-10--interactive-visualization--advanced-features) | Visualization & Advanced | Web UI, dead code detection, monorepo, agentic search, refactoring analysis | Planned | +| [**4**](#phase-4--native-analysis-acceleration) | Native Analysis Acceleration | Move JS-only build phases (AST nodes, CFG, dataflow, insert nodes, structure, roles, complexity) to Rust; fix incremental rebuild data loss on native; sub-100ms 1-file rebuilds | Planned | +| [**5**](#phase-5--typescript-migration) | TypeScript Migration | Project setup, core type definitions, leaf -> core -> orchestration module migration, test migration, supply-chain security, CI coverage gates | Planned | +| [**6**](#phase-6--runtime--extensibility) | Runtime & Extensibility | Event-driven pipeline, unified engine strategy, subgraph export filtering, transitive confidence, query caching, configuration profiles, pagination, plugin system, DX & onboarding | Planned | +| [**7**](#phase-7--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned | +| [**8**](#phase-8--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned | +| [**9**](#phase-9--expanded-language-support) | Expanded Language Support | 8 new languages (11 -> 19), parser utilities | Planned | +| [**10**](#phase-10--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned | +| [**11**](#phase-11--interactive-visualization--advanced-features) | Visualization & Advanced | Web UI, dead code detection, monorepo, agentic search, refactoring analysis | Planned | ### Dependency graph @@ -33,12 +34,13 @@ Phase 1 (Rust Core) |--> Phase 2.5 (Analysis Expansion) |--> Phase 2.7 (Deep Analysis & Graph Enrichment) |--> Phase 3 (Architectural Refactoring) - |--> Phase 4 (TypeScript Migration) - |--> Phase 5 (Runtime & Extensibility) - |--> Phase 6 (Embeddings + Metadata) --> Phase 7 (NL Queries + Narration) - |--> Phase 8 (Languages) - |--> Phase 9 (GitHub/CI) <-- Phase 6 (risk_score, side_effects) -Phases 1-7 --> Phase 10 (Visualization + Refactoring Analysis) + |--> Phase 4 (Native Analysis Acceleration) + |--> Phase 5 (TypeScript Migration) + |--> Phase 6 (Runtime & Extensibility) + |--> Phase 7 (Embeddings + Metadata) --> Phase 8 (NL Queries + Narration) + |--> Phase 9 (Languages) + |--> Phase 10 (GitHub/CI) <-- Phase 7 (risk_score, side_effects) +Phases 1-8 --> Phase 11 (Visualization + Refactoring Analysis) ``` --- @@ -991,13 +993,126 @@ Practical cleanup to make the CLI surface match the internal composability that --- -## Phase 4 -- TypeScript Migration +## Phase 4 -- Native Analysis Acceleration + +**Goal:** Move the remaining JS-only build phases to Rust so that `--engine native` eliminates all redundant WASM visitor walks. Today only 3 of 10 build phases (parse, resolve imports, build edges) run in Rust — the other 7 execute identical JavaScript regardless of engine, leaving ~50% of native build time on the table. + +**Why its own phase:** This is a substantial Rust engineering effort — porting 6 JS visitors to `crates/codegraph-core/`, fixing a data loss bug in incremental rebuilds, and optimizing the 1-file rebuild path. Doing this before the TS migration avoids rewriting the same visitor code twice (once to TS, once to Rust). The Phase 3 module boundaries make each phase a self-contained target. + +**Evidence (v3.1.4 benchmarks on 398 files):** + +| Phase | Native | WASM | Ratio | Status | +|-------|-------:|-----:|------:|--------| +| Parse | 468ms | 1483ms | 3.2x faster | Already Rust | +| Build edges | 88ms | 152ms | 1.7x faster | Already Rust | +| Resolve imports | 8ms | 9ms | ~1x | Already Rust | +| **AST nodes** | **361ms** | **347ms** | **~1x** | JS visitor — biggest win | +| **CFG** | **126ms** | **125ms** | **~1x** | JS visitor | +| **Dataflow** | **100ms** | **98ms** | **~1x** | JS visitor | +| **Insert nodes** | **143ms** | **148ms** | **~1x** | Pure SQLite batching | +| **Roles** | **29ms** | **32ms** | **~1x** | JS classification | +| **Structure** | **13ms** | **17ms** | **~1x** | JS directory tree | +| Complexity | 16ms | 77ms | 5x faster | Partly pre-computed | + +**Target:** Reduce native full-build time from ~1,400ms to ~700ms (2x improvement) by eliminating ~690ms of redundant JS visitor work. + +### 4.1 -- AST Node Extraction in Rust + +The largest single opportunity. Currently the native parser returns partial AST node data, so the JS `buildAstNodes()` visitor re-walks all WASM trees anyway (~361ms). + +- Extend `crates/codegraph-core/` to extract all AST node types (`call`, `new`, `string`, `regex`, `throw`, `await`) during the native parse phase +- Return complete AST node data in the `FileSymbols` result so `run-analyses.js` can skip the WASM walker entirely +- Validate parity: ensure native extraction produces identical node counts to the WASM visitor (benchmark already tracks this via `nodes/file`) + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/ast.js`, `src/domain/graph/builder/stages/run-analyses.js` + +### 4.2 -- CFG Construction in Rust + +The intraprocedural control-flow graph visitor runs in JS even on native builds (~126ms). + +- Port `createCfgVisitor()` logic to Rust: basic block detection, branch/loop edges, entry/exit nodes +- Return CFG block data per function in `FileSymbols` so the JS visitor is fully bypassed +- Validate parity: CFG block counts and edge counts must match the WASM visitor output + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/cfg.js`, `src/ast-analysis/visitors/cfg-visitor.js` + +### 4.3 -- Dataflow Analysis in Rust + +Dataflow edges are computed by a JS visitor that walks WASM trees (~100ms on native builds). + +- Port `createDataflowVisitor()` to Rust: variable definitions, assignments, reads, def-use chains +- Return dataflow edges in `FileSymbols` +- Validate parity against WASM visitor output + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/dataflow.js`, `src/ast-analysis/visitors/dataflow-visitor.js` + +### 4.4 -- Batch SQLite Inserts via Rust + +`insertNodes` is pure SQLite work (~143ms) but runs row-by-row from JS. Batching in Rust can reduce JS↔native boundary crossings. + +- Expose a `batchInsertNodes(nodes[])` function from Rust that uses a single prepared statement in a transaction +- Alternatively, generate the SQL batch on the JS side and execute as a single `better-sqlite3` call (may be sufficient without Rust) +- Benchmark both approaches; pick whichever is faster + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/db/index.js`, `src/domain/graph/builder/stages/insert-nodes.js` + +### 4.5 -- Role Classification & Structure in Rust + +Smaller wins (~42ms combined) but complete the picture of a fully native build pipeline. + +- Port `classifyNodeRoles()` to Rust: hub/leaf/bridge/utility classification based on in/out degree and betweenness +- Port directory structure building and metrics aggregation +- Return role assignments and structure data alongside parse results + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/structure.js`, `src/domain/graph/builder/stages/build-structure.js` + +### 4.6 -- Complete Complexity Pre-computation + +Complexity is partly pre-computed by native (~16ms vs 77ms WASM) but not all functions are covered. + +- Ensure native parse computes cognitive, cyclomatic, Halstead, and MI metrics for every function, not just a subset +- Eliminate the WASM fallback path in `buildComplexityMetrics()` when running native + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/complexity.js` + +### 4.7 -- Fix Incremental Rebuild Data Loss on Native Engine + +**Bug:** On native 1-file rebuilds, complexity, CFG, and dataflow data for the changed file is **silently lost**. `purgeFilesFromGraph` removes the old data, but the analysis phases never re-compute it because: + +1. The native parser does not produce a `_tree` (WASM tree-sitter tree) +2. The unified walker at `src/ast-analysis/engine.js:108-109` skips files without `_tree` +3. The `buildXxx` functions check for pre-computed fields (`d.complexity`, `d.cfg?.blocks`) which the native parser does not provide for these analyses +4. Result: 0.1ms no-op — the phases run but do nothing + +This is confirmed by the v3.1.4 1-file rebuild data: complexity (0.1ms), CFG (0.1ms), dataflow (0.2ms) on native — these are just module import overhead, not actual computation. Contrast with v3.1.3 where the numbers were higher (1.3ms, 8.7ms, 4ms) because earlier versions triggered a WASM fallback tree via `ensureWasmTrees`. + +**Fix (prerequisite: 4.1–4.3):** Once the native parser returns complete AST nodes, CFG blocks, and dataflow edges in `FileSymbols`, the `run-analyses` stage can store them directly without needing a WASM tree. The incremental path must: + +- Ensure `parseFilesAuto()` returns pre-computed analysis data for the single changed file +- Have `run-analyses.js` store that data (currently it only stores if `_tree` exists or if pre-computed fields are present — the latter path needs to work reliably) +- Add an integration test: rebuild 1 file on native engine, then query its complexity/CFG/dataflow and assert non-empty results + +**Affected files:** `src/ast-analysis/engine.js`, `src/domain/graph/builder/stages/run-analyses.js`, `src/domain/parser.js`, `tests/integration/` + +### 4.8 -- Incremental Rebuild Performance + +With analysis data loss fixed, optimize the 1-file rebuild path end-to-end. Current native 1-file rebuild is 265ms — dominated by parse (51ms), structure (13ms), roles (27ms), edges (13ms), insert (12ms), and finalize (12ms). + +- **Skip unchanged phases:** Structure and roles are graph-wide computations. On a 1-file change, only the changed file's nodes/edges need updating — skip full reclassification unless the file's degree changed significantly +- **Incremental edge rebuild:** Only rebuild edges involving the changed file's symbols, not the full edge set +- **Benchmark target:** Sub-100ms native 1-file rebuilds (from current 265ms) + +**Affected files:** `src/domain/graph/builder/stages/build-structure.js`, `src/domain/graph/builder/stages/build-edges.js`, `src/domain/graph/builder/pipeline.js` + +--- + +## Phase 5 -- TypeScript Migration **Goal:** Migrate the codebase from plain JavaScript to TypeScript, leveraging the clean module boundaries established in Phase 3. Incremental module-by-module migration starting from leaf modules inward. -**Why after Phase 3:** The architectural refactoring creates small, well-bounded modules with explicit interfaces (Repository, Engine, BaseExtractor, Pipeline stages, Command objects). These are natural type boundaries -- typing monolithic 2,000-line files that are about to be split would be double work. +**Why after Phase 4:** The architectural refactoring (Phase 3) creates small, well-bounded modules with explicit interfaces. Phase 4 moves the remaining hot-path visitor code to Rust — doing TS migration first would mean rewriting those visitors to TypeScript only to delete them when porting to Rust. With both phases complete, the JS layer is purely orchestration and presentation, which is the ideal surface for TypeScript. -### 4.1 -- Project Setup +### 5.1 -- Project Setup - Add `typescript` as a devDependency - Create `tsconfig.json` with strict mode, ES module output, path aliases matching the Phase 3 module structure @@ -1008,7 +1123,7 @@ Practical cleanup to make the CLI surface match the internal composability that **Affected files:** `package.json`, `biome.json`, new `tsconfig.json` -### 4.2 -- Core Type Definitions +### 5.2 -- Core Type Definitions Define TypeScript interfaces for all abstractions introduced in Phase 3: @@ -1036,7 +1151,7 @@ These interfaces serve as the migration contract -- each module is migrated to s **New file:** `src/types.ts` -### 4.3 -- Leaf Module Migration +### 5.3 -- Leaf Module Migration Migrate modules with no internal dependencies first: @@ -1053,7 +1168,7 @@ Migrate modules with no internal dependencies first: Allow `.js` and `.ts` to coexist during migration (`allowJs: true` in tsconfig). -### 4.4 -- Core Module Migration +### 5.4 -- Core Module Migration Migrate modules that implement Phase 3 interfaces: @@ -1068,7 +1183,7 @@ Migrate modules that implement Phase 3 interfaces: | `src/analysis/*.ts` | Typed analysis results (impact scores, call chains) | | `src/resolve.ts` | Import resolution with confidence types | -### 4.5 -- Orchestration & Public API Migration +### 5.5 -- Orchestration & Public API Migration Migrate top-level orchestration and entry points: @@ -1081,7 +1196,7 @@ Migrate top-level orchestration and entry points: | `src/cli/*.ts` | Command objects with typed options | | `src/index.ts` | Curated public API with proper export types | -### 4.6 -- Test Migration +### 5.6 -- Test Migration - Migrate test files from `.js` to `.ts` - Add type-safe test utilities and fixture builders @@ -1092,7 +1207,7 @@ Migrate top-level orchestration and entry points: **Affected files:** All `src/**/*.js` -> `src/**/*.ts`, all `tests/**/*.js` -> `tests/**/*.ts`, `package.json`, `biome.json` -### 4.7 -- Supply-Chain Security & Audit +### 5.7 -- Supply-Chain Security & Audit **Gap:** No `npm audit` in CI pipeline. No supply-chain attestation (SLSA/SBOM). No formal security audit history. @@ -1105,33 +1220,33 @@ Migrate top-level orchestration and entry points: **Affected files:** `.github/workflows/ci.yml`, `.github/workflows/publish.yml`, `docs/security/` -### 4.8 -- CI Test Quality & Coverage Gates +### 5.8 -- CI Test Quality & Coverage Gates **Gaps:** - No coverage thresholds enforced in CI (coverage report runs locally only) - Embedding tests in separate workflow requiring HuggingFace token - 312 `setTimeout`/`sleep` instances in tests — potential flakiness under load -- No dependency audit step in CI (see also [4.7](#47----supply-chain-security--audit)) +- No dependency audit step in CI (see also [5.7](#57----supply-chain-security--audit)) **Deliverables:** 1. **Coverage gate** -- add `vitest --coverage` to CI with minimum threshold (e.g. 80% lines/branches); fail the pipeline when coverage drops below the threshold 2. **Unified test workflow** -- merge embedding tests into the main CI workflow using a securely stored `HF_TOKEN` secret; eliminate the separate workflow 3. **Timer cleanup** -- audit and reduce `setTimeout`/`sleep` usage in tests; replace with deterministic waits (event-based, polling with backoff, or `vi.useFakeTimers()`) to reduce flakiness -4. > _Dependency audit step is covered by [4.7](#47----supply-chain-security--audit) deliverable 1._ +4. > _Dependency audit step is covered by [5.7](#57----supply-chain-security--audit) deliverable 1._ **Affected files:** `.github/workflows/ci.yml`, `vitest.config.js`, `tests/` --- -## Phase 5 -- Runtime & Extensibility +## Phase 6 -- Runtime & Extensibility -**Goal:** Harden the runtime for large codebases and open the platform to external contributors. These items were deferred from Phase 3 -- they depend on the clean module boundaries and domain layering established there, and benefit from TypeScript's type safety (Phase 4) for safe refactoring of cross-cutting concerns like caching, streaming, and plugin contracts. +**Goal:** Harden the runtime for large codebases and open the platform to external contributors. These items were deferred from Phase 3 -- they depend on the clean module boundaries and domain layering established there, and benefit from TypeScript's type safety (Phase 5) for safe refactoring of cross-cutting concerns like caching, streaming, and plugin contracts. **Why after TypeScript Migration:** Several of these items introduce new internal contracts (plugin API, cache interface, streaming protocol, engine strategy). Defining those contracts in TypeScript from the start avoids a second migration pass and gives contributors type-checked extension points. -### 5.1 -- Event-Driven Pipeline +### 6.1 -- Event-Driven Pipeline Replace the synchronous build/analysis pipeline with an event/streaming architecture. Enables progress reporting, cancellation tokens, and bounded memory usage on large repositories (10K+ files). @@ -1143,7 +1258,7 @@ Replace the synchronous build/analysis pipeline with an event/streaming architec **Affected files:** `src/domain/graph/builder.js`, `src/cli/`, `src/mcp/` -### 5.2 -- Unified Engine Interface (Strategy Pattern) +### 6.2 -- Unified Engine Interface (Strategy Pattern) Replace scattered `engine.name === 'native'` / `engine === 'wasm'` branching throughout the codebase with a formal Strategy pattern. Each engine implements a common `ParsingEngine` interface with methods like `parse(file)`, `batchParse(files)`, `supports(language)`, and `capabilities()`. @@ -1155,7 +1270,7 @@ Replace scattered `engine.name === 'native'` / `engine === 'wasm'` branching thr **Affected files:** `src/infrastructure/native.js`, `src/domain/parser.js`, `src/domain/graph/builder.js` -### 5.3 -- Subgraph Export Filtering +### 6.3 -- Subgraph Export Filtering Add focus and depth controls to `codegraph export` so users can produce usable visualizations of specific subsystems rather than the entire graph. @@ -1172,7 +1287,7 @@ codegraph export --focus "buildGraph" --depth 3 --format dot **Affected files:** `src/features/export.js`, `src/presentation/export.js` -### 5.4 -- Transitive Import-Aware Confidence +### 6.4 -- Transitive Import-Aware Confidence Improve import resolution accuracy by walking the import graph before falling back to proximity heuristics. Currently the 6-level priority system uses directory proximity as a strong signal, but this can mis-resolve when a symbol is re-exported through an index file several directories away. @@ -1183,7 +1298,7 @@ Improve import resolution accuracy by walking the import graph before falling ba **Affected files:** `src/domain/graph/resolve.js` -### 5.5 -- Query Result Caching +### 6.5 -- Query Result Caching Add an LRU/TTL cache layer between the analysis/query functions and the SQLite repository. With 34+ MCP tools that often run overlapping queries within a session, caching eliminates redundant DB round-trips. @@ -1196,7 +1311,7 @@ Add an LRU/TTL cache layer between the analysis/query functions and the SQLite r **Affected files:** `src/domain/analysis/`, `src/db/index.js` -### 5.6 -- Configuration Profiles +### 6.6 -- Configuration Profiles Support named configuration profiles for monorepos and multi-service projects where different parts of the codebase need different settings. @@ -1217,7 +1332,7 @@ Support named configuration profiles for monorepos and multi-service projects wh **Affected files:** `src/infrastructure/config.js`, `src/cli/` -### 5.7 -- Pagination Standardization +### 6.7 -- Pagination Standardization Standardize SQL-level `LIMIT`/`OFFSET` pagination across all repository queries and surface it consistently through the CLI and MCP. @@ -1229,7 +1344,7 @@ Standardize SQL-level `LIMIT`/`OFFSET` pagination across all repository queries **Affected files:** `src/shared/paginate.js`, `src/db/index.js`, `src/domain/analysis/`, `src/mcp/` -### 5.8 -- Plugin System for Custom Commands +### 6.8 -- Plugin System for Custom Commands Allow users to extend codegraph with custom commands by dropping a JS/TS module into `~/.codegraph/plugins/` (global) or `.codegraph/plugins/` (project-local). @@ -1271,13 +1386,13 @@ Lower the barrier to first successful use. Today codegraph requires manual insta --- -## Phase 6 -- Intelligent Embeddings +## Phase 7 -- Intelligent Embeddings **Goal:** Dramatically improve semantic search quality by embedding natural-language descriptions instead of raw code. -> **Phase 6.3 (Hybrid Search) was completed early** during Phase 2.5 -- FTS5 BM25 + semantic search with RRF fusion is already shipped in v2.6.0. +> **Phase 7.3 (Hybrid Search) was completed early** during Phase 2.5 -- FTS5 BM25 + semantic search with RRF fusion is already shipped in v2.6.0. -### 6.1 -- LLM Description Generator +### 7.1 -- LLM Description Generator For each function/method/class node, generate a concise natural-language description: @@ -1305,7 +1420,7 @@ For each function/method/class node, generate a concise natural-language descrip **New file:** `src/describer.js` -### 6.2 -- Enhanced Embedding Pipeline +### 7.2 -- Enhanced Embedding Pipeline - When descriptions exist, embed the description text instead of raw code - Keep raw code as fallback when no description is available @@ -1316,11 +1431,11 @@ For each function/method/class node, generate a concise natural-language descrip **Affected files:** `src/embedder.js` -### ~~6.3 -- Hybrid Search~~ ✅ Completed in Phase 2.5 +### ~~7.3 -- Hybrid Search~~ ✅ Completed in Phase 2.5 Shipped in v2.6.0. FTS5 BM25 keyword search + semantic vector search with RRF fusion. Three search modes: `hybrid` (default), `semantic`, `keyword`. -### 6.4 -- Build-time Semantic Metadata +### 7.4 -- Build-time Semantic Metadata Enrich nodes with LLM-generated metadata beyond descriptions. Computed incrementally at build time (only for changed nodes), stored as columns on the `nodes` table. @@ -1333,9 +1448,9 @@ Enrich nodes with LLM-generated metadata beyond descriptions. Computed increment - MCP tool: `assess ` -- returns complexity rating + specific concerns - Cascade invalidation: when a node changes, mark dependents for re-enrichment -**Depends on:** 6.1 (LLM provider abstraction) +**Depends on:** 7.1 (LLM provider abstraction) -### 6.5 -- Module Summaries +### 7.5 -- Module Summaries Aggregate function descriptions + dependency direction into file-level narratives. @@ -1343,17 +1458,17 @@ Aggregate function descriptions + dependency direction into file-level narrative - MCP tool: `explain_module ` -- returns module purpose, key exports, role in the system - `naming_conventions` metadata per module -- detected patterns (camelCase, snake_case, verb-first), flag outliers -**Depends on:** 6.1 (function-level descriptions must exist first) +**Depends on:** 7.1 (function-level descriptions must exist first) > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure table, and prompt design. --- -## Phase 7 -- Natural Language Queries +## Phase 8 -- Natural Language Queries **Goal:** Allow developers to ask questions about their codebase in plain English. -### 7.1 -- Query Engine +### 8.1 -- Query Engine ```bash codegraph ask "How does the authentication flow work?" @@ -1379,7 +1494,7 @@ codegraph ask "How does the authentication flow work?" **New file:** `src/nlquery.js` -### 7.2 -- Conversational Sessions +### 8.2 -- Conversational Sessions Multi-turn conversations with session memory. @@ -1393,7 +1508,7 @@ codegraph sessions clear - Store conversation history in SQLite table `sessions` - Include prior Q&A pairs in subsequent prompts -### 7.3 -- MCP Integration +### 8.3 -- MCP Integration New MCP tool: `ask_codebase` -- natural language query via MCP. @@ -1401,7 +1516,7 @@ Enables AI coding agents (Claude Code, Cursor, etc.) to ask codegraph questions **Affected files:** `src/mcp.js` -### 7.4 -- LLM-Narrated Graph Queries +### 8.4 -- LLM-Narrated Graph Queries Graph traversal + LLM narration for questions that require both structural data and natural-language explanation. Each query walks the graph first, then sends the structural result to the LLM for narration. @@ -1414,9 +1529,9 @@ Graph traversal + LLM narration for questions that require both structural data Pre-computed `flow_narratives` table caches results for key entry points at build time, invalidated when any node in the chain changes. -**Depends on:** 6.4 (`side_effects` metadata), 6.1 (descriptions for narration context) +**Depends on:** 7.4 (`side_effects` metadata), 7.1 (descriptions for narration context) -### 7.5 -- Onboarding & Navigation Tools +### 8.5 -- Onboarding & Navigation Tools Help new contributors and AI agents orient in an unfamiliar codebase. @@ -1425,15 +1540,15 @@ Help new contributors and AI agents orient in an unfamiliar codebase. - MCP tool: `get_started` -- returns ordered list: "start here, then read this, then this" - `change_plan ` -- LLM reads description, graph identifies relevant modules, returns touch points and test coverage gaps -**Depends on:** 6.5 (module summaries for context), 7.1 (query engine) +**Depends on:** 7.5 (module summaries for context), 8.1 (query engine) --- -## Phase 8 -- Expanded Language Support +## Phase 9 -- Expanded Language Support **Goal:** Go from 11 -> 19 supported languages. -### 8.1 -- Batch 1: High Demand +### 9.1 -- Batch 1: High Demand | Language | Extensions | Grammar | Effort | |----------|-----------|---------|--------| @@ -1442,7 +1557,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase. | Kotlin | `.kt`, `.kts` | `tree-sitter-kotlin` | Low | | Swift | `.swift` | `tree-sitter-swift` | Medium | -### 8.2 -- Batch 2: Growing Ecosystems +### 9.2 -- Batch 2: Growing Ecosystems | Language | Extensions | Grammar | Effort | |----------|-----------|---------|--------| @@ -1451,7 +1566,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase. | Lua | `.lua` | `tree-sitter-lua` | Low | | Zig | `.zig` | `tree-sitter-zig` | Low | -### 8.3 -- Parser Abstraction Layer +### 9.3 -- Parser Abstraction Layer Extract shared patterns from existing extractors into reusable helpers. @@ -1467,13 +1582,13 @@ Extract shared patterns from existing extractors into reusable helpers. --- -## Phase 9 -- GitHub Integration & CI +## Phase 10 -- GitHub Integration & CI **Goal:** Bring codegraph's analysis into pull request workflows. > **Note:** Phase 2.5 delivered `codegraph check` (CI validation predicates with exit code 0/1), which provides the foundation for GitHub Action integration. The boundary violation, blast radius, and cycle detection predicates are already available. -### 9.1 -- Reusable GitHub Action +### 10.1 -- Reusable GitHub Action A reusable GitHub Action that runs on PRs: @@ -1496,7 +1611,7 @@ A reusable GitHub Action that runs on PRs: **New file:** `.github/actions/codegraph-ci/action.yml` -### 9.2 -- PR Review Integration +### 10.2 -- PR Review Integration ```bash codegraph review --pr @@ -1519,7 +1634,7 @@ Requires `gh` CLI. For each changed function: **New file:** `src/github.js` -### 9.3 -- Visual Impact Graphs for PRs +### 10.3 -- Visual Impact Graphs for PRs Extend the existing `diff-impact --format mermaid` foundation with CI automation and LLM annotations. @@ -1540,9 +1655,9 @@ Extend the existing `diff-impact --format mermaid` foundation with CI automation - Highlight fragile nodes: high churn + high fan-in = high breakage risk - Track blast radius trends: "this PR's blast radius is 2x larger than your average" -**Depends on:** 9.1 (GitHub Action), 6.4 (`risk_score`, `side_effects`) +**Depends on:** 10.1 (GitHub Action), 7.4 (`risk_score`, `side_effects`) -### 9.4 -- SARIF Output +### 10.4 -- SARIF Output Add SARIF output format for cycle detection. SARIF integrates with GitHub Code Scanning, showing issues inline in the PR. @@ -1561,9 +1676,9 @@ LLM-generated docstrings aware of callers, callees, and types. Diff-aware: only --- -## Phase 10 -- Interactive Visualization & Advanced Features +## Phase 11 -- Interactive Visualization & Advanced Features -### 10.1 -- Interactive Web Visualization (Partially Complete) +### 11.1 -- Interactive Web Visualization (Partially Complete) > **Phase 2.7 progress:** `codegraph plot` (Phase 2.7.8) ships a self-contained HTML viewer with vis-network. It supports layout switching, color/size/cluster overlays, drill-down, community detection, and a detail panel. The remaining work is the server-based experience below. @@ -1584,7 +1699,7 @@ Opens a local web UI at `localhost:3000` extending the static HTML viewer with: **New file:** `src/visualizer.js` -### 10.2 -- Dead Code Detection +### 11.2 -- Dead Code Detection ```bash codegraph dead @@ -1597,7 +1712,7 @@ Find functions/methods/classes with zero incoming edges (never called). Filters **Affected files:** `src/queries.js` -### 10.3 -- Cross-Repository Support (Monorepo) +### 11.3 -- Cross-Repository Support (Monorepo) Support multi-package monorepos with cross-package edges. @@ -1607,7 +1722,7 @@ Support multi-package monorepos with cross-package edges. - `codegraph build --workspace` to scan all packages - Impact analysis across package boundaries -### 10.4 -- Agentic Search +### 11.4 -- Agentic Search Recursive reference-following search that traces connections. @@ -1629,7 +1744,7 @@ codegraph agent-search "payment processing" **New file:** `src/agentic-search.js` -### 10.5 -- Refactoring Analysis +### 11.5 -- Refactoring Analysis LLM-powered structural analysis that identifies refactoring opportunities. The graph provides the structural data; the LLM interprets it. @@ -1644,7 +1759,7 @@ LLM-powered structural analysis that identifies refactoring opportunities. The g > **Note:** `hotspots` and `boundary_analysis` already have data foundations from Phase 2.5 (structure.js hotspots, boundaries.js evaluation). This phase adds LLM interpretation on top. -**Depends on:** 6.4 (`risk_score`, `complexity_notes`), 6.5 (module summaries) +**Depends on:** 7.4 (`risk_score`, `complexity_notes`), 7.5 (module summaries) ### 10.6 -- Auto-generated Docstrings @@ -1655,7 +1770,7 @@ codegraph annotate --changed-only LLM-generated docstrings aware of callers, callees, and types. Diff-aware: only regenerate for functions whose code or dependencies changed. Stores in `docstrings` column on nodes table -- does not modify source files unless explicitly requested. -**Depends on:** 6.1 (LLM provider abstraction), 6.4 (side effects context) +**Depends on:** 7.1 (LLM provider abstraction), 7.4 (side effects context) > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure tables, and prompt design for all LLM-powered features. diff --git a/src/db/connection.js b/src/db/connection.js index acf87547..b16828a0 100644 --- a/src/db/connection.js +++ b/src/db/connection.js @@ -3,6 +3,8 @@ import path from 'node:path'; import Database from 'better-sqlite3'; import { warn } from '../infrastructure/logger.js'; import { DbError } from '../shared/errors.js'; +import { Repository } from './repository/base.js'; +import { SqliteRepository } from './repository/sqlite-repository.js'; function isProcessAlive(pid) { try { @@ -86,3 +88,32 @@ export function openReadonlyOrFail(customPath) { } return new Database(dbPath, { readonly: true }); } + +/** + * Open a Repository from either an injected instance or a DB path. + * + * When `opts.repo` is a Repository instance, returns it directly (no DB opened). + * Otherwise opens a readonly SQLite DB and wraps it in SqliteRepository. + * + * @param {string} [customDbPath] - Path to graph.db (ignored when opts.repo is set) + * @param {object} [opts] + * @param {Repository} [opts.repo] - Pre-built Repository to use instead of SQLite + * @returns {{ repo: Repository, close(): void }} + */ +export function openRepo(customDbPath, opts = {}) { + if (opts.repo != null) { + if (!(opts.repo instanceof Repository)) { + throw new TypeError( + `openRepo: opts.repo must be a Repository instance, got ${Object.prototype.toString.call(opts.repo)}`, + ); + } + return { repo: opts.repo, close() {} }; + } + const db = openReadonlyOrFail(customDbPath); + return { + repo: new SqliteRepository(db), + close() { + db.close(); + }, + }; +} diff --git a/src/db/index.js b/src/db/index.js index fc947b23..59a42808 100644 --- a/src/db/index.js +++ b/src/db/index.js @@ -1,5 +1,5 @@ // Barrel re-export — keeps all existing `import { ... } from '…/db/index.js'` working. -export { closeDb, findDbPath, openDb, openReadonlyOrFail } from './connection.js'; +export { closeDb, findDbPath, openDb, openReadonlyOrFail, openRepo } from './connection.js'; export { getBuildMeta, initSchema, MIGRATIONS, setBuildMeta } from './migrations.js'; export { fanInJoinSQL, diff --git a/src/domain/analysis/symbol-lookup.js b/src/domain/analysis/symbol-lookup.js index 47a7d403..b272004a 100644 --- a/src/domain/analysis/symbol-lookup.js +++ b/src/domain/analysis/symbol-lookup.js @@ -12,6 +12,7 @@ import { findNodesWithFanIn, listFunctionNodes, openReadonlyOrFail, + Repository, } from '../../db/index.js'; import { isTestFile } from '../../infrastructure/test-filter.js'; import { ALL_SYMBOL_KINDS } from '../../shared/kinds.js'; @@ -23,11 +24,16 @@ const FUNCTION_KINDS = ['function', 'method', 'class']; /** * Find nodes matching a name query, ranked by relevance. * Scoring: exact=100, prefix=60, word-boundary=40, substring=10, plus fan-in tiebreaker. + * + * @param {object} dbOrRepo - A better-sqlite3 Database or a Repository instance */ -export function findMatchingNodes(db, name, opts = {}) { +export function findMatchingNodes(dbOrRepo, name, opts = {}) { const kinds = opts.kind ? [opts.kind] : opts.kinds?.length ? opts.kinds : FUNCTION_KINDS; - const rows = findNodesWithFanIn(db, `%${name}%`, { kinds, file: opts.file }); + const isRepo = dbOrRepo instanceof Repository; + const rows = isRepo + ? dbOrRepo.findNodesWithFanIn(`%${name}%`, { kinds, file: opts.file }) + : findNodesWithFanIn(dbOrRepo, `%${name}%`, { kinds, file: opts.file }); const nodes = opts.noTests ? rows.filter((n) => !isTestFile(n.file)) : rows; diff --git a/src/features/communities.js b/src/features/communities.js index cf46fa39..062a89b5 100644 --- a/src/features/communities.js +++ b/src/features/communities.js @@ -1,5 +1,5 @@ import path from 'node:path'; -import { openReadonlyOrFail } from '../db/index.js'; +import { openRepo } from '../db/index.js'; import { louvainCommunities } from '../graph/algorithms/louvain.js'; import { buildDependencyGraph } from '../graph/builders/dependency.js'; import { paginateResult } from '../shared/paginate.js'; @@ -26,15 +26,15 @@ function getDirectory(filePath) { * @returns {{ communities: object[], modularity: number, drift: object, summary: object }} */ export function communitiesData(customDbPath, opts = {}) { - const db = openReadonlyOrFail(customDbPath); + const { repo, close } = openRepo(customDbPath, opts); let graph; try { - graph = buildDependencyGraph(db, { + graph = buildDependencyGraph(repo, { fileLevel: !opts.functions, noTests: opts.noTests, }); } finally { - db.close(); + close(); } // Handle empty or trivial graphs diff --git a/src/features/sequence.js b/src/features/sequence.js index 0edeba87..271d2ea2 100644 --- a/src/features/sequence.js +++ b/src/features/sequence.js @@ -6,7 +6,8 @@ * sequence-diagram conventions. */ -import { findCallees, openReadonlyOrFail } from '../db/index.js'; +import { openRepo } from '../db/index.js'; +import { SqliteRepository } from '../db/repository/sqlite-repository.js'; import { findMatchingNodes } from '../domain/queries.js'; import { isTestFile } from '../infrastructure/test-filter.js'; import { paginateResult } from '../shared/paginate.js'; @@ -85,19 +86,19 @@ function buildAliases(files) { * @returns {{ entry, participants, messages, depth, totalMessages, truncated }} */ export function sequenceData(name, dbPath, opts = {}) { - const db = openReadonlyOrFail(dbPath); + const { repo, close } = openRepo(dbPath, opts); try { const maxDepth = opts.depth || 10; const noTests = opts.noTests || false; const withDataflow = opts.dataflow || false; // Phase 1: Direct LIKE match - let matchNode = findMatchingNodes(db, name, opts)[0] ?? null; + let matchNode = findMatchingNodes(repo, name, opts)[0] ?? null; // Phase 2: Prefix-stripped matching if (!matchNode) { for (const prefix of FRAMEWORK_ENTRY_PREFIXES) { - matchNode = findMatchingNodes(db, `${prefix}${name}`, opts)[0] ?? null; + matchNode = findMatchingNodes(repo, `${prefix}${name}`, opts)[0] ?? null; if (matchNode) break; } } @@ -133,7 +134,7 @@ export function sequenceData(name, dbPath, opts = {}) { const nextFrontier = []; for (const fid of frontier) { - const callees = findCallees(db, fid); + const callees = repo.findCallees(fid); const caller = idToNode.get(fid); @@ -163,18 +164,17 @@ export function sequenceData(name, dbPath, opts = {}) { if (d === maxDepth && frontier.length > 0) { // Only mark truncated if at least one frontier node has further callees - const hasMoreCalls = frontier.some((fid) => findCallees(db, fid).length > 0); + const hasMoreCalls = frontier.some((fid) => repo.findCallees(fid).length > 0); if (hasMoreCalls) truncated = true; } } // Dataflow annotations: add return arrows if (withDataflow && messages.length > 0) { - const hasTable = db - .prepare("SELECT name FROM sqlite_master WHERE type='table' AND name='dataflow'") - .get(); + const hasTable = repo.hasDataflowTable(); - if (hasTable) { + if (hasTable && repo instanceof SqliteRepository) { + const db = repo.db; // Build name|file lookup for O(1) target node access const nodeByNameFile = new Map(); for (const n of idToNode.values()) { @@ -281,7 +281,7 @@ export function sequenceData(name, dbPath, opts = {}) { } return result; } finally { - db.close(); + close(); } } diff --git a/src/features/triage.js b/src/features/triage.js index 32257f3f..00b35ccd 100644 --- a/src/features/triage.js +++ b/src/features/triage.js @@ -1,4 +1,4 @@ -import { findNodesForTriage, openReadonlyOrFail } from '../db/index.js'; +import { openRepo } from '../db/index.js'; import { DEFAULT_WEIGHTS, scoreRisk } from '../graph/classifiers/risk.js'; import { warn } from '../infrastructure/logger.js'; import { isTestFile } from '../infrastructure/test-filter.js'; @@ -14,7 +14,7 @@ import { paginateResult } from '../shared/paginate.js'; * @returns {{ items: object[], summary: object, _pagination?: object }} */ export function triageData(customDbPath, opts = {}) { - const db = openReadonlyOrFail(customDbPath); + const { repo, close } = openRepo(customDbPath, opts); try { const noTests = opts.noTests || false; const fileFilter = opts.file || null; @@ -26,7 +26,7 @@ export function triageData(customDbPath, opts = {}) { let rows; try { - rows = findNodesForTriage(db, { + rows = repo.findNodesForTriage({ noTests, file: fileFilter, kind: kindFilter, @@ -115,7 +115,7 @@ export function triageData(customDbPath, opts = {}) { offset: opts.offset, }); } finally { - db.close(); + close(); } } diff --git a/src/graph/builders/dependency.js b/src/graph/builders/dependency.js index a494ef11..633b4147 100644 --- a/src/graph/builders/dependency.js +++ b/src/graph/builders/dependency.js @@ -3,32 +3,39 @@ * Replaces inline graph construction in cycles.js, communities.js, viewer.js, export.js. */ -import { getCallableNodes, getCallEdges, getFileNodesAll, getImportEdges } from '../../db/index.js'; +import { + getCallableNodes, + getCallEdges, + getFileNodesAll, + getImportEdges, + Repository, +} from '../../db/index.js'; import { isTestFile } from '../../infrastructure/test-filter.js'; import { CodeGraph } from '../model.js'; /** - * @param {object} db - Open better-sqlite3 database (readonly) + * @param {object} dbOrRepo - Open better-sqlite3 database (readonly) or a Repository instance * @param {object} [opts] * @param {boolean} [opts.fileLevel=true] - File-level (imports) or function-level (calls) * @param {boolean} [opts.noTests=false] - Exclude test files * @param {number} [opts.minConfidence] - Minimum edge confidence (function-level only) * @returns {CodeGraph} */ -export function buildDependencyGraph(db, opts = {}) { +export function buildDependencyGraph(dbOrRepo, opts = {}) { const fileLevel = opts.fileLevel !== false; const noTests = opts.noTests || false; if (fileLevel) { - return buildFileLevelGraph(db, noTests); + return buildFileLevelGraph(dbOrRepo, noTests); } - return buildFunctionLevelGraph(db, noTests, opts.minConfidence); + return buildFunctionLevelGraph(dbOrRepo, noTests, opts.minConfidence); } -function buildFileLevelGraph(db, noTests) { +function buildFileLevelGraph(dbOrRepo, noTests) { const graph = new CodeGraph(); + const isRepo = dbOrRepo instanceof Repository; - let nodes = getFileNodesAll(db); + let nodes = isRepo ? dbOrRepo.getFileNodesAll() : getFileNodesAll(dbOrRepo); if (noTests) nodes = nodes.filter((n) => !isTestFile(n.file)); const nodeIds = new Set(); @@ -37,7 +44,7 @@ function buildFileLevelGraph(db, noTests) { nodeIds.add(n.id); } - const edges = getImportEdges(db); + const edges = isRepo ? dbOrRepo.getImportEdges() : getImportEdges(dbOrRepo); for (const e of edges) { if (!nodeIds.has(e.source_id) || !nodeIds.has(e.target_id)) continue; const src = String(e.source_id); @@ -51,10 +58,11 @@ function buildFileLevelGraph(db, noTests) { return graph; } -function buildFunctionLevelGraph(db, noTests, minConfidence) { +function buildFunctionLevelGraph(dbOrRepo, noTests, minConfidence) { const graph = new CodeGraph(); + const isRepo = dbOrRepo instanceof Repository; - let nodes = getCallableNodes(db); + let nodes = isRepo ? dbOrRepo.getCallableNodes() : getCallableNodes(dbOrRepo); if (noTests) nodes = nodes.filter((n) => !isTestFile(n.file)); const nodeIds = new Set(); @@ -70,11 +78,16 @@ function buildFunctionLevelGraph(db, noTests, minConfidence) { let edges; if (minConfidence != null) { - edges = db - .prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?") - .all(minConfidence); + if (isRepo) { + // minConfidence filtering not supported by Repository — fall back to getCallEdges + edges = dbOrRepo.getCallEdges(); + } else { + edges = dbOrRepo + .prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?") + .all(minConfidence); + } } else { - edges = getCallEdges(db); + edges = isRepo ? dbOrRepo.getCallEdges() : getCallEdges(dbOrRepo); } for (const e of edges) { diff --git a/tests/integration/communities.test.js b/tests/integration/communities.test.js index 3981beb2..1cee942c 100644 --- a/tests/integration/communities.test.js +++ b/tests/integration/communities.test.js @@ -1,129 +1,83 @@ /** * Integration tests for community detection (Louvain). * - * Uses a hand-crafted in-file DB with multi-directory structure: + * Uses InMemoryRepository via createTestRepo() for fast, SQLite-free testing. * + * Graph topology: * src/auth/login.js + src/auth/session.js → tight auth cluster * src/data/db.js + src/data/cache.js → tight data cluster * src/api/handler.js → imports from both clusters (bridge) * lib/format.js → depends on data modules (drift signal) */ -import fs from 'node:fs'; -import os from 'node:os'; -import path from 'node:path'; -import Database from 'better-sqlite3'; -import { afterAll, beforeAll, describe, expect, test } from 'vitest'; -import { initSchema } from '../../src/db/index.js'; +import { beforeAll, describe, expect, test } from 'vitest'; import { communitiesData, communitySummaryForStats } from '../../src/features/communities.js'; +import { createTestRepo } from '../helpers/fixtures.js'; -// ─── Helpers ─────────────────────────────────────────────────────────── +// ─── Fixture ────────────────────────────────────────────────────────── -function insertNode(db, name, kind, file, line) { - return db - .prepare('INSERT INTO nodes (name, kind, file, line) VALUES (?, ?, ?, ?)') - .run(name, kind, file, line).lastInsertRowid; -} - -function insertEdge(db, sourceId, targetId, kind, confidence = 1.0) { - db.prepare( - 'INSERT INTO edges (source_id, target_id, kind, confidence, dynamic) VALUES (?, ?, ?, ?, 0)', - ).run(sourceId, targetId, kind, confidence); -} - -// ─── Fixture DB ──────────────────────────────────────────────────────── - -let tmpDir, dbPath; +let repo; beforeAll(() => { - tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-communities-')); - fs.mkdirSync(path.join(tmpDir, '.codegraph')); - dbPath = path.join(tmpDir, '.codegraph', 'graph.db'); - - const db = new Database(dbPath); - db.pragma('journal_mode = WAL'); - initSchema(db); - - // ── File nodes (multi-directory) ── - const fAuthLogin = insertNode(db, 'src/auth/login.js', 'file', 'src/auth/login.js', 0); - const fAuthSession = insertNode(db, 'src/auth/session.js', 'file', 'src/auth/session.js', 0); - const fDataDb = insertNode(db, 'src/data/db.js', 'file', 'src/data/db.js', 0); - const fDataCache = insertNode(db, 'src/data/cache.js', 'file', 'src/data/cache.js', 0); - const fApiHandler = insertNode(db, 'src/api/handler.js', 'file', 'src/api/handler.js', 0); - const fLibFormat = insertNode(db, 'lib/format.js', 'file', 'lib/format.js', 0); - const fTestAuth = insertNode(db, 'tests/auth.test.js', 'file', 'tests/auth.test.js', 0); - - // ── Function nodes ── - const fnLogin = insertNode(db, 'login', 'function', 'src/auth/login.js', 5); - const fnCreateSession = insertNode(db, 'createSession', 'function', 'src/auth/session.js', 5); - const fnValidateSession = insertNode( - db, - 'validateSession', - 'function', - 'src/auth/session.js', - 20, - ); - const fnQuery = insertNode(db, 'query', 'function', 'src/data/db.js', 5); - const fnGetCache = insertNode(db, 'getCache', 'function', 'src/data/cache.js', 5); - const fnSetCache = insertNode(db, 'setCache', 'function', 'src/data/cache.js', 15); - const fnHandleRequest = insertNode(db, 'handleRequest', 'function', 'src/api/handler.js', 5); - const fnFormatOutput = insertNode(db, 'formatOutput', 'function', 'lib/format.js', 5); - const fnTestLogin = insertNode(db, 'testLogin', 'function', 'tests/auth.test.js', 5); - - // ── File-level import edges ── - // Auth cluster: login <-> session - insertEdge(db, fAuthLogin, fAuthSession, 'imports'); - insertEdge(db, fAuthSession, fAuthLogin, 'imports'); - - // Data cluster: db <-> cache - insertEdge(db, fDataDb, fDataCache, 'imports'); - insertEdge(db, fDataCache, fDataDb, 'imports'); - - // Bridge: api/handler imports from both clusters - insertEdge(db, fApiHandler, fAuthLogin, 'imports'); - insertEdge(db, fApiHandler, fDataDb, 'imports'); - - // Drift signal: lib/format depends on data modules - insertEdge(db, fLibFormat, fDataDb, 'imports'); - insertEdge(db, fLibFormat, fDataCache, 'imports'); - - // Test file imports - insertEdge(db, fTestAuth, fAuthLogin, 'imports'); - - // ── Function-level call edges ── - // Auth cluster calls - insertEdge(db, fnLogin, fnCreateSession, 'calls'); - insertEdge(db, fnLogin, fnValidateSession, 'calls'); - insertEdge(db, fnCreateSession, fnValidateSession, 'calls'); - - // Data cluster calls - insertEdge(db, fnQuery, fnGetCache, 'calls'); - insertEdge(db, fnQuery, fnSetCache, 'calls'); - insertEdge(db, fnGetCache, fnSetCache, 'calls'); - - // Bridge: handleRequest calls across clusters - insertEdge(db, fnHandleRequest, fnLogin, 'calls'); - insertEdge(db, fnHandleRequest, fnQuery, 'calls'); - insertEdge(db, fnHandleRequest, fnFormatOutput, 'calls'); - - // lib/format calls data - insertEdge(db, fnFormatOutput, fnGetCache, 'calls'); - - // Test calls - insertEdge(db, fnTestLogin, fnLogin, 'calls'); - - db.close(); -}); - -afterAll(() => { - if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + ({ repo } = createTestRepo() + // ── File nodes (multi-directory) ── + .file('src/auth/login.js') + .file('src/auth/session.js') + .file('src/data/db.js') + .file('src/data/cache.js') + .file('src/api/handler.js') + .file('lib/format.js') + .file('tests/auth.test.js') + // ── Function nodes ── + .fn('login', 'src/auth/login.js', 5) + .fn('createSession', 'src/auth/session.js', 5) + .fn('validateSession', 'src/auth/session.js', 20) + .fn('query', 'src/data/db.js', 5) + .fn('getCache', 'src/data/cache.js', 5) + .fn('setCache', 'src/data/cache.js', 15) + .fn('handleRequest', 'src/api/handler.js', 5) + .fn('formatOutput', 'lib/format.js', 5) + .fn('testLogin', 'tests/auth.test.js', 5) + // ── File-level import edges ── + // Auth cluster: login <-> session + .imports('src/auth/login.js', 'src/auth/session.js') + .imports('src/auth/session.js', 'src/auth/login.js') + // Data cluster: db <-> cache + .imports('src/data/db.js', 'src/data/cache.js') + .imports('src/data/cache.js', 'src/data/db.js') + // Bridge: api/handler imports from both clusters + .imports('src/api/handler.js', 'src/auth/login.js') + .imports('src/api/handler.js', 'src/data/db.js') + // Drift signal: lib/format depends on data modules + .imports('lib/format.js', 'src/data/db.js') + .imports('lib/format.js', 'src/data/cache.js') + // Test file imports + .imports('tests/auth.test.js', 'src/auth/login.js') + // ── Function-level call edges ── + // Auth cluster calls + .calls('login', 'createSession') + .calls('login', 'validateSession') + .calls('createSession', 'validateSession') + // Data cluster calls + .calls('query', 'getCache') + .calls('query', 'setCache') + .calls('getCache', 'setCache') + // Bridge: handleRequest calls across clusters + .calls('handleRequest', 'login') + .calls('handleRequest', 'query') + .calls('handleRequest', 'formatOutput') + // lib/format calls data + .calls('formatOutput', 'getCache') + // Test calls + .calls('testLogin', 'login') + .build()); }); // ─── File-Level Tests ────────────────────────────────────────────────── describe('communitiesData (file-level)', () => { test('returns valid community structure', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); expect(data.communities).toBeInstanceOf(Array); expect(data.communities.length).toBeGreaterThan(0); for (const c of data.communities) { @@ -137,38 +91,38 @@ describe('communitiesData (file-level)', () => { }); test('detects 2+ communities from distinct clusters', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); expect(data.summary.communityCount).toBeGreaterThanOrEqual(2); }); test('modularity is between 0 and 1', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); expect(data.modularity).toBeGreaterThanOrEqual(0); expect(data.modularity).toBeLessThanOrEqual(1); }); test('drift analysis finds split candidates', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); // At minimum, lib/format.js groups with data but lives in a different dir expect(data.drift).toHaveProperty('splitCandidates'); expect(data.drift.splitCandidates).toBeInstanceOf(Array); }); test('drift analysis finds merge candidates', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); expect(data.drift).toHaveProperty('mergeCandidates'); expect(data.drift.mergeCandidates).toBeInstanceOf(Array); }); test('drift score is 0-100', () => { - const data = communitiesData(dbPath); + const data = communitiesData(null, { repo }); expect(data.summary.driftScore).toBeGreaterThanOrEqual(0); expect(data.summary.driftScore).toBeLessThanOrEqual(100); }); test('noTests excludes test files', () => { - const withTests = communitiesData(dbPath); - const withoutTests = communitiesData(dbPath, { noTests: true }); + const withTests = communitiesData(null, { repo }); + const withoutTests = communitiesData(null, { repo, noTests: true }); const allMembers = withTests.communities.flatMap((c) => c.members.map((m) => m.file)); const filteredMembers = withoutTests.communities.flatMap((c) => c.members.map((m) => m.file)); @@ -178,8 +132,8 @@ describe('communitiesData (file-level)', () => { }); test('higher resolution produces >= same number of communities', () => { - const low = communitiesData(dbPath, { resolution: 0.5 }); - const high = communitiesData(dbPath, { resolution: 2.0 }); + const low = communitiesData(null, { repo, resolution: 0.5 }); + const high = communitiesData(null, { repo, resolution: 2.0 }); expect(high.summary.communityCount).toBeGreaterThanOrEqual(low.summary.communityCount); }); }); @@ -188,7 +142,7 @@ describe('communitiesData (file-level)', () => { describe('communitiesData (function-level)', () => { test('returns function-level results with kind field', () => { - const data = communitiesData(dbPath, { functions: true }); + const data = communitiesData(null, { repo, functions: true }); expect(data.communities.length).toBeGreaterThan(0); for (const c of data.communities) { for (const m of c.members) { @@ -199,7 +153,7 @@ describe('communitiesData (function-level)', () => { }); test('function-level detects 2+ communities', () => { - const data = communitiesData(dbPath, { functions: true }); + const data = communitiesData(null, { repo, functions: true }); expect(data.summary.communityCount).toBeGreaterThanOrEqual(2); }); }); @@ -208,7 +162,7 @@ describe('communitiesData (function-level)', () => { describe('drift-only mode', () => { test('drift: true returns empty communities array', () => { - const data = communitiesData(dbPath, { drift: true }); + const data = communitiesData(null, { repo, drift: true }); expect(data.communities).toEqual([]); expect(data.drift.splitCandidates).toBeInstanceOf(Array); expect(data.drift.mergeCandidates).toBeInstanceOf(Array); @@ -220,7 +174,7 @@ describe('drift-only mode', () => { describe('communitySummaryForStats', () => { test('returns lightweight summary with expected fields', () => { - const summary = communitySummaryForStats(dbPath); + const summary = communitySummaryForStats(null, { repo }); expect(summary).toHaveProperty('communityCount'); expect(summary).toHaveProperty('modularity'); expect(summary).toHaveProperty('driftScore'); @@ -234,25 +188,9 @@ describe('communitySummaryForStats', () => { // ─── Empty Graph ────────────────────────────────────────────────────── describe('empty graph', () => { - let emptyTmpDir, emptyDbPath; - - beforeAll(() => { - emptyTmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-communities-empty-')); - fs.mkdirSync(path.join(emptyTmpDir, '.codegraph')); - emptyDbPath = path.join(emptyTmpDir, '.codegraph', 'graph.db'); - - const db = new Database(emptyDbPath); - db.pragma('journal_mode = WAL'); - initSchema(db); - db.close(); - }); - - afterAll(() => { - if (emptyTmpDir) fs.rmSync(emptyTmpDir, { recursive: true, force: true }); - }); - test('empty graph returns zero communities', () => { - const data = communitiesData(emptyDbPath); + const { repo: emptyRepo } = createTestRepo().build(); + const data = communitiesData(null, { repo: emptyRepo }); expect(data.communities).toEqual([]); expect(data.summary.communityCount).toBe(0); expect(data.summary.modularity).toBe(0); diff --git a/tests/integration/sequence.test.js b/tests/integration/sequence.test.js index fade9cca..e8705820 100644 --- a/tests/integration/sequence.test.js +++ b/tests/integration/sequence.test.js @@ -1,8 +1,10 @@ /** * Integration tests for sequence diagram generation. * - * Uses a hand-crafted in-memory DB with known graph topology: + * Main tests use InMemoryRepository via createTestRepo() for fast, SQLite-free testing. + * Dataflow annotation tests still use SQLite (InMemoryRepository has no dataflow table). * + * Graph topology: * buildGraph() → parseFiles() [src/builder.js → src/parser.js] * → resolveImports() [src/builder.js → src/resolve.js] * parseFiles() → extractSymbols() [src/parser.js → src/parser.js, same-file] @@ -21,8 +23,9 @@ import Database from 'better-sqlite3'; import { afterAll, beforeAll, describe, expect, test } from 'vitest'; import { initSchema } from '../../src/db/index.js'; import { sequenceData, sequenceToMermaid } from '../../src/features/sequence.js'; +import { createTestRepo } from '../helpers/fixtures.js'; -// ─── Helpers ─────────────────────────────────────────────────────────── +// ─── Helpers (for dataflow SQLite tests only) ───────────────────────── function insertNode(db, name, kind, file, line) { return db @@ -36,52 +39,37 @@ function insertEdge(db, sourceId, targetId, kind, confidence = 1.0) { ).run(sourceId, targetId, kind, confidence); } -// ─── Fixture DB ──────────────────────────────────────────────────────── +// ─── InMemory Fixture ───────────────────────────────────────────────── -let tmpDir, dbPath; +let repo; beforeAll(() => { - tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-sequence-')); - fs.mkdirSync(path.join(tmpDir, '.codegraph')); - dbPath = path.join(tmpDir, '.codegraph', 'graph.db'); - - const db = new Database(dbPath); - db.pragma('journal_mode = WAL'); - initSchema(db); - - // Core nodes - const buildGraph = insertNode(db, 'buildGraph', 'function', 'src/builder.js', 10); - const parseFiles = insertNode(db, 'parseFiles', 'function', 'src/parser.js', 5); - const extractSymbols = insertNode(db, 'extractSymbols', 'function', 'src/parser.js', 20); - const resolveImports = insertNode(db, 'resolveImports', 'function', 'src/resolve.js', 1); - - // Call edges - insertEdge(db, buildGraph, parseFiles, 'calls'); - insertEdge(db, buildGraph, resolveImports, 'calls'); - insertEdge(db, parseFiles, extractSymbols, 'calls'); - - // Alias collision nodes (two different helper.js files) - const helperA = insertNode(db, 'helperA', 'function', 'src/utils/helper.js', 1); - const helperB = insertNode(db, 'helperB', 'function', 'lib/utils/helper.js', 1); - insertEdge(db, buildGraph, helperA, 'calls'); - insertEdge(db, helperA, helperB, 'calls'); - - // Test file node (for noTests filtering) - const testFn = insertNode(db, 'testBuild', 'function', 'tests/builder.test.js', 1); - insertEdge(db, buildGraph, testFn, 'calls'); - - db.close(); -}); - -afterAll(() => { - if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + ({ repo } = createTestRepo() + // Core nodes + .fn('buildGraph', 'src/builder.js', 10) + .fn('parseFiles', 'src/parser.js', 5) + .fn('extractSymbols', 'src/parser.js', 20) + .fn('resolveImports', 'src/resolve.js', 1) + // Alias collision nodes (two different helper.js files) + .fn('helperA', 'src/utils/helper.js', 1) + .fn('helperB', 'lib/utils/helper.js', 1) + // Test file node (for noTests filtering) + .fn('testBuild', 'tests/builder.test.js', 1) + // Call edges + .calls('buildGraph', 'parseFiles') + .calls('buildGraph', 'resolveImports') + .calls('parseFiles', 'extractSymbols') + .calls('buildGraph', 'helperA') + .calls('helperA', 'helperB') + .calls('buildGraph', 'testBuild') + .build()); }); // ─── sequenceData ────────────────────────────────────────────────────── describe('sequenceData', () => { test('basic sequence — correct participants and messages in BFS order', () => { - const data = sequenceData('buildGraph', dbPath, { noTests: true }); + const data = sequenceData('buildGraph', null, { repo, noTests: true }); expect(data.entry).not.toBeNull(); expect(data.entry.name).toBe('buildGraph'); @@ -98,7 +86,7 @@ describe('sequenceData', () => { }); test('self-call — same-file call appears as self-message', () => { - const data = sequenceData('parseFiles', dbPath, { noTests: true }); + const data = sequenceData('parseFiles', null, { repo, noTests: true }); expect(data.entry).not.toBeNull(); // parseFiles → extractSymbols are both in src/parser.js @@ -108,7 +96,7 @@ describe('sequenceData', () => { }); test('depth limiting — depth:1 truncates', () => { - const data = sequenceData('buildGraph', dbPath, { depth: 1, noTests: true }); + const data = sequenceData('buildGraph', null, { repo, depth: 1, noTests: true }); expect(data.truncated).toBe(true); expect(data.depth).toBe(1); @@ -118,14 +106,14 @@ describe('sequenceData', () => { }); test('unknown name — entry is null', () => { - const data = sequenceData('nonExistentFunction', dbPath); + const data = sequenceData('nonExistentFunction', null, { repo }); expect(data.entry).toBeNull(); expect(data.participants).toHaveLength(0); expect(data.messages).toHaveLength(0); }); test('leaf entry — entry exists, zero messages', () => { - const data = sequenceData('extractSymbols', dbPath); + const data = sequenceData('extractSymbols', null, { repo }); expect(data.entry).not.toBeNull(); expect(data.entry.name).toBe('extractSymbols'); expect(data.messages).toHaveLength(0); @@ -134,7 +122,7 @@ describe('sequenceData', () => { }); test('participant alias collision — two helper.js files get distinct IDs', () => { - const data = sequenceData('buildGraph', dbPath, { noTests: true }); + const data = sequenceData('buildGraph', null, { repo, noTests: true }); const helperParticipants = data.participants.filter((p) => p.label === 'helper.js'); expect(helperParticipants.length).toBe(2); @@ -149,8 +137,8 @@ describe('sequenceData', () => { }); test('noTests filtering — test file nodes excluded', () => { - const withTests = sequenceData('buildGraph', dbPath, { noTests: false }); - const withoutTests = sequenceData('buildGraph', dbPath, { noTests: true }); + const withTests = sequenceData('buildGraph', null, { repo, noTests: false }); + const withoutTests = sequenceData('buildGraph', null, { repo, noTests: true }); // With tests should have more messages (includes testBuild) expect(withTests.totalMessages).toBeGreaterThan(withoutTests.totalMessages); @@ -165,7 +153,7 @@ describe('sequenceData', () => { describe('sequenceToMermaid', () => { test('starts with sequenceDiagram and has participant lines', () => { - const data = sequenceData('buildGraph', dbPath, { noTests: true }); + const data = sequenceData('buildGraph', null, { repo, noTests: true }); const mermaid = sequenceToMermaid(data); expect(mermaid).toMatch(/^sequenceDiagram/); @@ -173,13 +161,13 @@ describe('sequenceToMermaid', () => { }); test('has ->> arrows for calls', () => { - const data = sequenceData('buildGraph', dbPath, { noTests: true }); + const data = sequenceData('buildGraph', null, { repo, noTests: true }); const mermaid = sequenceToMermaid(data); expect(mermaid).toContain('->>'); }); test('truncation note when truncated', () => { - const data = sequenceData('buildGraph', dbPath, { depth: 1, noTests: true }); + const data = sequenceData('buildGraph', null, { repo, depth: 1, noTests: true }); const mermaid = sequenceToMermaid(data); expect(mermaid).toContain('Truncated at depth'); }); @@ -208,7 +196,7 @@ describe('sequenceToMermaid', () => { }); }); -// ─── Dataflow annotations ─────────────────────────────────────────────── +// ─── Dataflow annotations (SQLite — requires dataflow table) ────────── describe('dataflow annotations', () => { let dfTmpDir, dfDbPath; diff --git a/tests/integration/triage.test.js b/tests/integration/triage.test.js index 47192cf3..32a2b692 100644 --- a/tests/integration/triage.test.js +++ b/tests/integration/triage.test.js @@ -1,118 +1,73 @@ /** * Integration tests for triage — composite risk audit queue. * - * Uses a hand-crafted fixture DB with known nodes, edges, - * function_complexity, and file_commit_counts rows. + * Uses InMemoryRepository via createTestRepo() for fast, SQLite-free testing. */ -import fs from 'node:fs'; -import os from 'node:os'; -import path from 'node:path'; -import Database from 'better-sqlite3'; -import { afterAll, beforeAll, describe, expect, test } from 'vitest'; +import { beforeAll, describe, expect, test } from 'vitest'; import { triageData } from '../../src/features/triage.js'; +import { createTestRepo } from '../helpers/fixtures.js'; + +// ─── Fixture ────────────────────────────────────────────────────────── + +let repo; + +beforeAll(() => { + const builder = createTestRepo() + // High-risk: core role, high fan-in, high complexity + .fn('processRequest', 'src/handler.js', 10, { role: 'core' }) + // Medium-risk: utility role, moderate signals + .fn('formatOutput', 'src/formatter.js', 1, { role: 'utility' }) + // Low-risk: leaf role, minimal signals + .fn('add', 'src/math.js', 1, { role: 'leaf' }) + // Test file: should be excluded with noTests + .fn('testHelper', 'tests/helper.test.js', 1, { role: 'utility' }) + // Class node + .cls('Router', 'src/router.js', 1, { role: 'entry' }) + // Callers for fan-in + .fn('caller1', 'src/a.js', 1) + .fn('caller2', 'src/b.js', 1) + .fn('caller3', 'src/c.js', 1) + // Edges: processRequest has fan_in=3, formatOutput=1, add=0 + .calls('caller1', 'processRequest') + .calls('caller2', 'processRequest') + .calls('caller3', 'processRequest') + .calls('caller1', 'formatOutput') + // Complexity + .complexity('processRequest', { + cognitive: 30, + cyclomatic: 15, + max_nesting: 5, + maintainability_index: 20, + }) + .complexity('formatOutput', { + cognitive: 10, + cyclomatic: 5, + max_nesting: 2, + maintainability_index: 60, + }) + .complexity('add', { cognitive: 1, cyclomatic: 1, max_nesting: 0, maintainability_index: 90 }) + .complexity('testHelper', { + cognitive: 5, + cyclomatic: 3, + max_nesting: 1, + maintainability_index: 70, + }) + .complexity('Router', { + cognitive: 15, + cyclomatic: 8, + max_nesting: 3, + maintainability_index: 40, + }); -// ─── Helpers ─────────────────────────────────────────────────────────── - -function insertNode(db, name, kind, file, line, { endLine = null, role = null } = {}) { - const stmt = db.prepare( - 'INSERT INTO nodes (name, kind, file, line, end_line, role) VALUES (?, ?, ?, ?, ?, ?)', - ); - return stmt.run(name, kind, file, line, endLine, role).lastInsertRowid; -} - -function insertEdge(db, sourceId, targetId, kind = 'calls') { - db.prepare('INSERT INTO edges (source_id, target_id, kind) VALUES (?, ?, ?)').run( - sourceId, - targetId, - kind, - ); -} - -function insertComplexity(db, nodeId, cognitive, cyclomatic, maxNesting, mi = 60) { - db.prepare( - `INSERT INTO function_complexity - (node_id, cognitive, cyclomatic, max_nesting, - loc, sloc, comment_lines, - halstead_n1, halstead_n2, halstead_big_n1, halstead_big_n2, - halstead_vocabulary, halstead_length, halstead_volume, - halstead_difficulty, halstead_effort, halstead_bugs, - maintainability_index) - VALUES (?, ?, ?, ?, 10, 8, 1, 10, 15, 30, 40, 25, 70, 100, 5, 500, 0.03, ?)`, - ).run(nodeId, cognitive, cyclomatic, maxNesting, mi); -} - -function insertChurn(db, file, commitCount) { - db.prepare('INSERT OR REPLACE INTO file_commit_counts (file, commit_count) VALUES (?, ?)').run( - file, - commitCount, - ); -} - -// ─── Fixture DB ──────────────────────────────────────────────────────── - -let tmpDir, dbPath; - -// Node IDs -let fnHigh, fnMed, fnLow, fnTest, fnClass; - -beforeAll(async () => { - const { initSchema } = await import('../../src/db/index.js'); - - tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-triage-')); - fs.mkdirSync(path.join(tmpDir, '.codegraph')); - dbPath = path.join(tmpDir, '.codegraph', 'graph.db'); - - const db = new Database(dbPath); - db.pragma('journal_mode = WAL'); - initSchema(db); - - // High-risk: core role, high fan-in, high complexity, high churn, low MI - fnHigh = insertNode(db, 'processRequest', 'function', 'src/handler.js', 10, { role: 'core' }); - // Medium-risk: utility role, moderate signals - fnMed = insertNode(db, 'formatOutput', 'function', 'src/formatter.js', 1, { role: 'utility' }); - // Low-risk: leaf role, minimal signals - fnLow = insertNode(db, 'add', 'function', 'src/math.js', 1, { role: 'leaf' }); - // Test file: should be excluded with noTests - fnTest = insertNode(db, 'testHelper', 'function', 'tests/helper.test.js', 1, { role: 'utility' }); - // Class node - fnClass = insertNode(db, 'Router', 'class', 'src/router.js', 1, { role: 'entry' }); - - // Edges: processRequest has fan_in=3, formatOutput=1, add=0 - const caller1 = insertNode(db, 'caller1', 'function', 'src/a.js', 1); - const caller2 = insertNode(db, 'caller2', 'function', 'src/b.js', 1); - const caller3 = insertNode(db, 'caller3', 'function', 'src/c.js', 1); - insertEdge(db, caller1, fnHigh); - insertEdge(db, caller2, fnHigh); - insertEdge(db, caller3, fnHigh); - insertEdge(db, caller1, fnMed); - - // Complexity - insertComplexity(db, fnHigh, 30, 15, 5, 20); // high cognitive, low MI - insertComplexity(db, fnMed, 10, 5, 2, 60); - insertComplexity(db, fnLow, 1, 1, 0, 90); // simple, high MI - insertComplexity(db, fnTest, 5, 3, 1, 70); - insertComplexity(db, fnClass, 15, 8, 3, 40); - - // Churn (file-level) - insertChurn(db, 'src/handler.js', 50); - insertChurn(db, 'src/formatter.js', 20); - insertChurn(db, 'src/math.js', 2); - insertChurn(db, 'tests/helper.test.js', 10); - insertChurn(db, 'src/router.js', 30); - - db.close(); -}); - -afterAll(() => { - if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + ({ repo } = builder.build()); }); // ─── Tests ───────────────────────────────────────────────────────────── describe('triage', () => { test('ranks symbols by composite risk score (default sort)', () => { - const result = triageData(dbPath, { limit: 100 }); + const result = triageData(null, { repo, limit: 100 }); expect(result.items.length).toBeGreaterThanOrEqual(3); // processRequest should be highest risk @@ -125,14 +80,14 @@ describe('triage', () => { }); test('scores are in descending order by default', () => { - const result = triageData(dbPath, { limit: 100 }); + const result = triageData(null, { repo, limit: 100 }); for (let i = 1; i < result.items.length; i++) { expect(result.items[i - 1].riskScore).toBeGreaterThanOrEqual(result.items[i].riskScore); } }); test('normalization: max fan_in → normFanIn=1.0', () => { - const result = triageData(dbPath, { limit: 100 }); + const result = triageData(null, { repo, limit: 100 }); const high = result.items.find((it) => it.name === 'processRequest'); expect(high.normFanIn).toBe(1); }); @@ -140,7 +95,7 @@ describe('triage', () => { test('normalization: min cognitive → normComplexity=0.0', () => { // callers have cognitive=0 (no complexity row), so add (cognitive=1) is not the min. // Filter to only nodes with complexity data to test properly. - const result = triageData(dbPath, { file: 'src/math', limit: 100 }); + const result = triageData(null, { repo, file: 'src/math', limit: 100 }); const low = result.items.find((it) => it.name === 'add'); // Single item → all norms are 0 expect(low.normComplexity).toBe(0); @@ -148,7 +103,8 @@ describe('triage', () => { test('custom weights override ranking', () => { // Pure fan-in ranking: only fan_in matters - const result = triageData(dbPath, { + const result = triageData(null, { + repo, limit: 100, weights: { fanIn: 1, complexity: 0, churn: 0, role: 0, mi: 0 }, }); @@ -159,27 +115,27 @@ describe('triage', () => { }); test('filters by file', () => { - const result = triageData(dbPath, { file: 'handler', limit: 100 }); + const result = triageData(null, { repo, file: 'handler', limit: 100 }); expect(result.items.length).toBe(1); expect(result.items[0].name).toBe('processRequest'); }); test('filters by kind', () => { - const result = triageData(dbPath, { kind: 'class', limit: 100 }); + const result = triageData(null, { repo, kind: 'class', limit: 100 }); expect(result.items.length).toBe(1); expect(result.items[0].name).toBe('Router'); }); test('filters by role', () => { - const result = triageData(dbPath, { role: 'core', limit: 100 }); + const result = triageData(null, { repo, role: 'core', limit: 100 }); expect(result.items.length).toBe(1); expect(result.items[0].name).toBe('processRequest'); }); test('filters by minScore', () => { - const all = triageData(dbPath, { limit: 100 }); + const all = triageData(null, { repo, limit: 100 }); const maxScore = all.items[0].riskScore; - const result = triageData(dbPath, { minScore: maxScore, limit: 100 }); + const result = triageData(null, { repo, minScore: maxScore, limit: 100 }); // Only the highest-scoring item(s) should pass expect(result.items.length).toBeGreaterThanOrEqual(1); for (const item of result.items) { @@ -188,8 +144,8 @@ describe('triage', () => { }); test('noTests excludes test files', () => { - const withTests = triageData(dbPath, { limit: 100 }); - const withoutTests = triageData(dbPath, { noTests: true, limit: 100 }); + const withTests = triageData(null, { repo, limit: 100 }); + const withoutTests = triageData(null, { repo, noTests: true, limit: 100 }); const testItem = withTests.items.find((it) => it.file.includes('.test.')); const testItemFiltered = withoutTests.items.find((it) => it.file.includes('.test.')); expect(testItem).toBeDefined(); @@ -197,28 +153,29 @@ describe('triage', () => { }); test('sort by complexity', () => { - const result = triageData(dbPath, { sort: 'complexity', limit: 100 }); + const result = triageData(null, { repo, sort: 'complexity', limit: 100 }); for (let i = 1; i < result.items.length; i++) { expect(result.items[i - 1].cognitive).toBeGreaterThanOrEqual(result.items[i].cognitive); } }); test('sort by churn', () => { - const result = triageData(dbPath, { sort: 'churn', limit: 100 }); + const result = triageData(null, { repo, sort: 'churn', limit: 100 }); + // InMemoryRepository returns churn=0 for all — verify no errors for (let i = 1; i < result.items.length; i++) { expect(result.items[i - 1].churn).toBeGreaterThanOrEqual(result.items[i].churn); } }); test('sort by fan-in', () => { - const result = triageData(dbPath, { sort: 'fan-in', limit: 100 }); + const result = triageData(null, { repo, sort: 'fan-in', limit: 100 }); for (let i = 1; i < result.items.length; i++) { expect(result.items[i - 1].fanIn).toBeGreaterThanOrEqual(result.items[i].fanIn); } }); test('sort by mi (ascending — lower MI = riskier)', () => { - const result = triageData(dbPath, { sort: 'mi', limit: 100 }); + const result = triageData(null, { repo, sort: 'mi', limit: 100 }); for (let i = 1; i < result.items.length; i++) { expect(result.items[i - 1].maintainabilityIndex).toBeLessThanOrEqual( result.items[i].maintainabilityIndex, @@ -227,7 +184,7 @@ describe('triage', () => { }); test('pagination with _pagination metadata', () => { - const result = triageData(dbPath, { limit: 2, offset: 0 }); + const result = triageData(null, { repo, limit: 2, offset: 0 }); expect(result.items.length).toBeLessThanOrEqual(2); expect(result._pagination).toBeDefined(); expect(result._pagination.limit).toBe(2); @@ -237,13 +194,13 @@ describe('triage', () => { }); test('pagination offset skips items', () => { - const page1 = triageData(dbPath, { limit: 2, offset: 0 }); - const page2 = triageData(dbPath, { limit: 2, offset: 2 }); + const page1 = triageData(null, { repo, limit: 2, offset: 0 }); + const page2 = triageData(null, { repo, limit: 2, offset: 2 }); expect(page1.items[0].name).not.toBe(page2.items[0].name); }); test('summary contains expected fields', () => { - const result = triageData(dbPath, { limit: 100 }); + const result = triageData(null, { repo, limit: 100 }); const s = result.summary; expect(s.total).toBeGreaterThan(0); expect(s.analyzed).toBeGreaterThan(0); @@ -261,7 +218,7 @@ describe('triage', () => { }); test('items include all expected fields', () => { - const result = triageData(dbPath, { limit: 1 }); + const result = triageData(null, { repo, limit: 1 }); const item = result.items[0]; expect(item).toHaveProperty('name'); expect(item).toHaveProperty('kind'); @@ -280,31 +237,22 @@ describe('triage', () => { expect(item).toHaveProperty('riskScore'); }); - test('graceful with missing complexity/churn data', async () => { - // Create a DB with a node but no complexity or churn rows - const sparseDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-triage-sparse-')); - fs.mkdirSync(path.join(sparseDir, '.codegraph')); - const sparseDbPath = path.join(sparseDir, '.codegraph', 'graph.db'); + test('graceful with missing complexity/churn data', () => { + // Create a repo with a node but no complexity + const { repo: sparseRepo } = createTestRepo() + .fn('lonely', 'src/lonely.js', 1, { role: 'leaf' }) + .build(); - const { initSchema } = await import('../../src/db/index.js'); - const db = new Database(sparseDbPath); - db.pragma('journal_mode = WAL'); - initSchema(db); - insertNode(db, 'lonely', 'function', 'src/lonely.js', 1, { role: 'leaf' }); - db.close(); - - // Should not throw - const result = triageData(sparseDbPath, { limit: 100 }); + const result = triageData(null, { repo: sparseRepo, limit: 100 }); expect(result.items.length).toBe(1); expect(result.items[0].cognitive).toBe(0); expect(result.items[0].churn).toBe(0); expect(result.items[0].fanIn).toBe(0); - - fs.rmSync(sparseDir, { recursive: true, force: true }); }); test('role weights applied correctly', () => { - const result = triageData(dbPath, { + const result = triageData(null, { + repo, limit: 100, // Only role matters weights: { fanIn: 0, complexity: 0, churn: 0, role: 1, mi: 0 },