Skip to content

isink17/codegraph

Repository files navigation

codegraph

Latest Release Go version License Platforms MCP Tools


codegraph is a local-first code context engine and MCP server that builds a persistent knowledge graph of your source repositories in SQLite. It gives AI coding assistants deep structural awareness — symbols, call graphs, dependencies, and semantic search — without sending a single byte to the cloud.

Single binary. Zero config. No external databases. No API keys.


Why codegraph?

AI coding assistants are powerful, but they spend most of their token budget discovering what to change — grepping files, reading code, reconstructing call graphs from partial evidence.

codegraph shifts that cost. One call to context_for_task returns the exact files, symbols, and relationships an agent needs. One call to agentic_query gets a synthesized answer backed by graph traversal and semantic search.

❌ Without codegraph
   AI reads files one by one → greps for patterns → burns tokens on context-gathering

✅ With codegraph
   AI calls context_for_task("add retry logic to HTTP client")
   → instantly gets relevant files, functions, callers, callees, and tests

How It Works

Your Code ──▶ tree-sitter AST ──▶ SQLite Graph ──▶ MCP Tools ──▶ AI Assistant
                    │                   │                │
               12 languages       symbols, edges      29 tools
               framework detect   embeddings          agentic reasoning
               import resolution  session memory      hybrid search

codegraph index . walks your repo, parses every file with tree-sitter, resolves imports using four strategies (exact, name, suffix, method-receiver), and writes a fully-linked symbol graph into a local codegraph.sqlite. The MCP server then exposes that graph to any compatible AI assistant via 29 structured tools — no cloud, no Docker, no API keys.


Features

🔍 Parsing & Indexing

  • Tree-sitter parsing for all 12 supported languages — robust AST extraction, not regex
  • 4-strategy import resolution — exact, name, suffix, and method-receiver matching
  • Cross-language linking — connects symbols across language boundaries
  • Incremental updates — only re-indexes changed files; fast on large repos
  • Framework detection — recognizes 20+ frameworks (Express, Django, gin, React, Spring, Laravel, …)

🔎 Search & Query

  • Hybrid search — vector similarity (Ollama embeddings) + FTS5, fused with Reciprocal Rank Fusion
  • Semantic search — find code by meaning, not just text
  • Call graph traversal — callers, callees, transitive dependency chains
  • Impact analysis — know what breaks before you change it
  • Dead code detection — find symbols with zero references
  • Architecture overview — language breakdown, entry points, hub symbols, coupling metrics

🤖 AI Integration

  • 29 MCP tools — comprehensive API for AI coding assistants
  • Agentic reasoning — ReAct loop over a local Ollama LLM that chains tools and synthesizes answers
  • Context building — one tool call returns everything an agent needs for a task
  • Session memory — persist reads, edits, decisions, and facts across sessions
  • Token benchmarking — measure savings vs. naive file reading

📊 Graph Analytics

  • PageRank — find the most important symbols in your codebase
  • Coupling metrics — identify tightly coupled file pairs
  • Cycle detection — find circular dependencies at the file level
  • Interactive visualization — D3.js force-directed graph with search and zoom

🛠 Developer Experience

  • Single Go binary — no runtime dependencies, cross-platform
  • Zero-config SQLite — no Docker, no external databases
  • codegraph install — auto-detects and configures Claude Code, Cursor, Windsurf, Gemini CLI
  • File watching — automatic re-indexing on changes
  • 100% local — no data leaves your machine

Supported Languages

All languages use tree-sitter for AST parsing:

Language Extensions
Go .go
Python .py
TypeScript .ts, .tsx
JavaScript .js, .jsx, .mjs
Java .java
Kotlin .kt, .kts
Rust .rs
C# .cs
Ruby .rb
Swift .swift
PHP .php
C / C++ .c, .h, .cpp, .hpp, .cc

Node.js repos are supported; full tree-sitter node support is still in progress.


Quick Start

1. Install

go install github.com/isink17/codegraph/cmd/codegraph@latest

Requires Go 1.23+ and a C compiler (for tree-sitter CGo bindings).

Build from source

git clone https://github.com/isink17/codegraph
cd codegraph
go build ./cmd/codegraph
go test ./...

Requires Go 1.23+ and a C compiler (for tree-sitter CGo bindings).

Clean rebuild

cd your-project
codegraph index . --rebuild

Use this after parser or indexer changes when you need a true full reindex from scratch. codegraph index . --rebuild needs exclusive access to the repo database. If rebuild fails because the DB is in use, stop codegraph serve or other codegraph processes and retry. Use codegraph clean . for database maintenance tasks like WAL checkpointing, VACUUM, FTS optimize, ANALYZE, and incremental vacuum.

Version

codegraph --version
codegraph version

Prints the installed local version only. It does not contact GitHub.

2. Auto-configure your AI tool

codegraph install

Detects Claude Code, Cursor, Windsurf, and Gemini CLI and writes the MCP config automatically.

3. Index your project

cd your-project
codegraph index .

4. Start the MCP server

codegraph serve

Repo root is auto-detected from git (or falls back to your current working directory).

That's it. Your AI assistant now has deep structural code understanding.


MCP Setup

Auto-configure (recommended)

codegraph install

Manual Setup

Claude Code — add to .mcp.json
{
  "mcpServers": {
    "codegraph": {
      "command": "codegraph",
      "args": ["serve"]
    }
  }
}
Cursor / Windsurf — add to mcp.json
{
  "mcpServers": {
    "codegraph": {
      "command": "codegraph",
      "args": ["serve"]
    }
  }
}
Codex — add to config.toml
[mcp_servers.codegraph]
command = "codegraph"
args = ["serve"]
startup_timeout_sec = 60

See the examples/ directory for more configuration samples.


MCP Tools (29)

Code Intelligence

Tool Description
find_symbol Find symbols by exact or fuzzy query
search_symbols Search symbol names, signatures, and docs (FTS5)
search_semantic Hybrid semantic search (vector + FTS when embeddings enabled)
find_callers Find what calls a given function
find_callees Find what a given function calls
get_impact_radius Estimate affected symbols and files around a change
trace_dependencies Trace transitive dependency chains (upstream/downstream)
find_related_tests Find tests for a symbol, file, or set of changed files
find_dead_code Find symbols with no callers or references
context_for_task Build a focused context bundle for a natural-language task

Architecture & Analysis

Tool Description
architecture_overview Language breakdown, directories, entry points, hub symbols
graph_analytics PageRank, coupling metrics, or cycle detection
detect_frameworks Detect frameworks and libraries used in the repo
cross_language_links Find and create cross-language symbol references
benchmark_tokens Estimate token savings vs. reading raw files

Repository Management

Tool Description
index_repo Index a repository into the local code graph
update_graph Update only changed files
list_files List indexed files with optional path filter
graph_stats Repository graph statistics
supported_languages List supported languages and extensions
list_repos List known repositories
list_scans List recent scans
latest_scan_errors List indexer errors from the last scan

Session Memory

Tool Description
session_log Log a session event (read, edit, decision, task, fact)
session_history Get session event history
session_hot_files Get most frequently accessed files
session_context Get aggregated session context for pre-loading

Agentic

Tool Description
agentic_query Ask a question answered by a local AI agent that reasons over the code graph (requires Ollama)

CLI Reference

# Setup
codegraph install                         # Auto-configure AI tools
codegraph doctor                          # Check installation health
codegraph config show                     # Show current config
codegraph --version                       # Print current version
codegraph version                          # Print current version

# Indexing
codegraph index <path>                    # Full index
codegraph update <path>                   # Incremental update
codegraph watch <path>                    # Watch and auto-reindex
codegraph clean <path>                    # Clean database

# MCP Server
codegraph serve [--repo-root <path>]      # Start MCP server (auto-detects repo root)

# Query
codegraph stats <path>                    # Graph statistics
codegraph find-symbol <path> <query>      # Find symbols
codegraph search <path> <query>           # Full-text symbol search
codegraph callers <path> --symbol <name>  # Find callers
codegraph callees <path> --symbol <name>  # Find callees
codegraph impact <path> --symbol <name>   # Impact analysis

# Testing
codegraph affected-tests [--stdin] <files>  # Tests affected by changed files

# Visualization & Export
codegraph visualize [--repo-root <path>]    # Interactive D3.js graph (auto-detects repo root)
codegraph graph export <path> --format dot  # Export as Graphviz DOT
codegraph graph export <path> --format json # Export as JSON

# Benchmarking
codegraph benchmark                         # Token savings benchmark

Affected Tests with Git

# Find tests affected by uncommitted changes
git diff --name-only | codegraph affected-tests --stdin

# CI integration
TESTS=$(git diff --name-only HEAD~1 | codegraph affected-tests --stdin)
go test $TESTS

Replace --repo-root . with nothing if you are already in the repo you want to inspect.


Optional: Embeddings & Agentic Mode

codegraph works fully without any external services. Enable these for enhanced capabilities:

Vector Embeddings (via Ollama)

Enables hybrid semantic search (vector + FTS):

# Install and pull embedding model
ollama pull nomic-embed-text

# Initialize repo config
codegraph config init --repo .

Edit .codegraph/config.json:

{
  "embedding": {
    "enabled": true,
    "model": "nomic-embed-text"
  }
}

Then re-index to generate embeddings:

codegraph index . --force

Agentic Reasoning (via Ollama)

The agentic_query tool uses a local LLM to reason over the graph with a ReAct loop:

ollama pull llama3.2

Edit .codegraph/config.json:

{
  "agent": {
    "enabled": true,
    "model": "llama3.2"
  }
}

Configuration

Global config

Platform Path
macOS ~/Library/Application Support/codegraph/config.json
Linux ${XDG_CONFIG_HOME:-~/.config}/codegraph/config.json
Windows %AppData%\codegraph\config.json

Repo config

Created with codegraph config init --repo . at .codegraph/config.json:

{
  "include": [],
  "exclude": ["vendor/**", "node_modules/**"],
  "languages": [],
  "embedding": {
    "enabled": false,
    "model": "nomic-embed-text"
  },
  "agent": {
    "enabled": false,
    "model": "llama3.2"
  }
}

Repo Root Resolution

When --repo-root (CLI) or repo_root (MCP tool parameter) is omitted, codegraph resolves the repo root using:

  1. Per-call repo_root MCP tool parameter
  2. --repo-root CLI flag (process-level default)
  3. git rev-parse --show-toplevel from the current working directory
  4. os.Getwd() (current working directory)
  5. Return error

Ignore file

Create .codegraphignore in the repo root (same syntax as .gitignore):

build/
dist/
*.generated.go

Note: Common generated directories (node_modules, .next, .nuxt, etc.) are always skipped and cannot be un-ignored.


Architecture

cmd/codegraph           CLI entrypoint
internal/
  agent/                Agentic reasoning (ReAct loop over Ollama)
  cli/                  Command handlers and MCP auto-configuration
  config/               Config loading and path resolution
  embedding/            Vector embedding (Ollama HTTP client)
  export/               JSON and DOT graph export
  framework/            Framework detection (20+ frameworks)
  graph/                Core types (Symbol, Edge, Reference, …)
  indexer/              Repository scan, incremental updates, embedding
  mcp/                  MCP stdio server (29 tools)
  parser/               Parser interface and adapters
    treesitter/         Tree-sitter adapters (12 languages)
    golang/             Go AST parser (legacy)
    python/             Python heuristic parser (legacy)
    heuristic/          Regex-based parsers (legacy fallback)
  query/                Query orchestration and hybrid search
  store/                SQLite storage, migrations, graph analytics
  viz/                  Interactive D3.js graph visualization
  watcher/              File watch and debounced updates

Building from Source

git clone https://github.com/isink17/codegraph
cd codegraph
go build ./cmd/codegraph
go test ./...

Requires Go 1.23+ and a C compiler for the tree-sitter CGo bindings.

For a clean rebuild after parser/indexer changes:

codegraph index . --rebuild

This rebuild path needs exclusive access to the repo database. If it fails because another codegraph process is holding the DB, stop that process and retry.


License

This project is licensed under the Functional Source License, Version 1.1, MIT Future License (FSL-1.1-MIT).

On the second anniversary of each version's release, that version converts to the MIT License. See LICENSE for full terms.

About

codegraph is a local-first code context engine and MCP server for repositories. It incrementally indexes source code into a lightweight SQLite graph, then exposes high-signal tools for symbol lookup, call graph traversal, impact analysis, related test discovery, and semantic search. Built in Go with a single-binary workflow, it is designed to work

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages