A lightweight, effective (AST-based) semantic code search tool for your codebase. Built on CocoIndex β a Rust-based ultra performant data transformation engine. Use it from the CLI, or integrate with Claude, Codex, Cursor β any coding agent β via Skill or MCP.
- Instant token saving by 70%.
- 1 min setup β install and go, zero config needed!
π Please help star CocoIndex if you like this project!
Deutsch | English | EspaΓ±ol | franΓ§ais | ζ₯ζ¬θͺ | νκ΅μ΄ | PortuguΓͺs | Π ΡΡΡΠΊΠΈΠΉ | δΈζ
Using pipx:
pipx install cocoindex-code # first install
pipx upgrade cocoindex-code # upgradeUsing uv:
uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"ccc init # initialize project (creates settings)
ccc index # build the index
ccc search "authentication logic" # search!That's it! The background daemon starts automatically on first use. The default embedding model runs locally (sentence-transformers/all-MiniLM-L6-v2) β no API key required, completely free.
Tip:
ccc indexauto-initializes if you haven't runccc inityet, so you can skip straight to indexing.
Install the ccc skill so your coding agent automatically uses semantic search when needed:
npx skills add cocoindex-io/cocoindex-codeThis installs the skill into your project's .claude/skills/ directory. Once installed, the agent automatically triggers semantic code search when it would be helpful β no manual prompting required.
Works with Claude Code and other skill-compatible agents.
Alternatively, use ccc mcp to run as an MCP server:
Claude Code
claude mcp add cocoindex-code -- ccc mcpCodex
codex mcp add cocoindex-code -- ccc mcpOpenCode
opencode mcp addEnter MCP server name: cocoindex-code
Select MCP server type: local
Enter command to run: ccc mcp
Or use opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"cocoindex-code": {
"type": "local",
"command": [
"ccc", "mcp"
]
}
}
}Once configured, the agent automatically decides when semantic code search is helpful β finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names.
Note: The
cocoindex-codecommand (without subcommand) still works as an MCP server for backward compatibility. It auto-creates settings from environment variables on first run.
MCP Tool Reference
When running as an MCP server (ccc mcp), the following tool is exposed:
search β Search the codebase using semantic similarity.
search(
query: str, # Natural language query or code snippet
limit: int = 5, # Maximum results (1-100)
offset: int = 0, # Pagination offset
refresh_index: bool = True, # Refresh index before querying
languages: list[str] | None = None, # Filter by language (e.g. ["python", "typescript"])
paths: list[str] | None = None, # Filter by path glob (e.g. ["src/utils/*"])
)
Returns matching code chunks with file path, language, code content, line numbers, and similarity score.
- Semantic Code Search: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.
- Ultra Performant: β‘ Built on top of ultra performant Rust indexing engine. Only re-indexes changed files for fast updates.
- Multi-Language Support: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.
- Embedded: Portable and just works, no database setup required!
- Flexible Embeddings: Local SentenceTransformers by default (free!) or 100+ cloud providers via LiteLLM.
| Command | Description |
|---|---|
ccc init |
Initialize a project β creates settings files, adds .cocoindex_code/ to .gitignore |
ccc index |
Build or update the index (auto-inits if needed). Shows streaming progress. |
ccc search <query> |
Semantic search across the codebase |
ccc status |
Show index stats (chunk count, file count, language breakdown) |
ccc mcp |
Run as MCP server in stdio mode |
ccc reset |
Delete index databases. --all also removes settings. -f skips confirmation. |
ccc daemon status |
Show daemon version, uptime, and loaded projects |
ccc daemon restart |
Restart the background daemon |
ccc daemon stop |
Stop the daemon |
ccc search database schema # basic search
ccc search --lang python --lang markdown schema # filter by language
ccc search --path 'src/utils/*' query handler # filter by path
ccc search --offset 10 --limit 5 database schema # pagination
ccc search --refresh database schema # update index first, then searchBy default, ccc search scopes results to your current working directory (relative to the project root). Use --path to override.
Configuration lives in two YAML files, both created automatically by ccc init.
Shared across all projects. Controls the embedding model and environment variables for the daemon.
embedding:
provider: sentence-transformers # or "litellm"
model: sentence-transformers/all-MiniLM-L6-v2
device: mps # optional: cpu, cuda, mps (auto-detected if omitted)
envs: # extra environment variables for the daemon
OPENAI_API_KEY: your-key # only needed if not already in your shell environmentNote: The daemon inherits your shell environment. If an API key (e.g.
OPENAI_API_KEY) is already set as an environment variable, you don't need to duplicate it inenvs. Theenvsfield is only for values that aren't in your environment.
Per-project. Controls which files to index.
include_patterns:
- "**/*.py"
- "**/*.js"
- "**/*.ts"
- "**/*.rs"
- "**/*.go"
# ... (sensible defaults for 28+ file types)
exclude_patterns:
- "**/.*" # hidden directories
- "**/__pycache__"
- "**/node_modules"
- "**/dist"
# ...
language_overrides:
- ext: inc # treat .inc files as PHP
lang: php
.cocoindex_code/is automatically added to.gitignoreduring init.
By default, a local SentenceTransformers model (sentence-transformers/all-MiniLM-L6-v2) is used β no API key required. To use a different model, edit ~/.cocoindex_code/global_settings.yml.
The
envsentries below are only needed if the key isn't already in your shell environment β the daemon inherits your environment automatically.
Ollama (Local)
embedding:
model: ollama/nomic-embed-textSet OLLAMA_API_BASE in envs: if your Ollama server is not at http://localhost:11434.
OpenAI
embedding:
model: text-embedding-3-small
envs:
OPENAI_API_KEY: your-api-keyAzure OpenAI
embedding:
model: azure/your-deployment-name
envs:
AZURE_API_KEY: your-api-key
AZURE_API_BASE: https://your-resource.openai.azure.com
AZURE_API_VERSION: "2024-06-01"Gemini
embedding:
model: gemini/gemini-embedding-001
envs:
GEMINI_API_KEY: your-api-keyMistral
embedding:
model: mistral/mistral-embed
envs:
MISTRAL_API_KEY: your-api-keyVoyage (Code-Optimized)
embedding:
model: voyage/voyage-code-3
envs:
VOYAGE_API_KEY: your-api-keyCohere
embedding:
model: cohere/embed-v4.0
envs:
COHERE_API_KEY: your-api-keyAWS Bedrock
embedding:
model: bedrock/amazon.titan-embed-text-v2:0
envs:
AWS_ACCESS_KEY_ID: your-access-key
AWS_SECRET_ACCESS_KEY: your-secret-key
AWS_REGION_NAME: us-east-1Nebius
embedding:
model: nebius/BAAI/bge-en-icl
envs:
NEBIUS_API_KEY: your-api-keyAny LiteLLM-supported model works. When using a LiteLLM model, set provider: litellm (or omit provider β LiteLLM is the default for non-sentence-transformers models).
Set provider: sentence-transformers and use any SentenceTransformers model (no API key required).
Example β general purpose text model:
embedding:
provider: sentence-transformers
model: nomic-ai/nomic-embed-text-v1.5GPU-optimised code retrieval:
nomic-ai/CodeRankEmbed delivers significantly better code retrieval than the default model. It is 137M parameters, requires ~1 GB VRAM, and has an 8192-token context window.
embedding:
provider: sentence-transformers
model: nomic-ai/CodeRankEmbedNote: Switching models requires re-indexing your codebase (ccc reset && ccc index) since the vector dimensions differ.
| Language | Aliases | File Extensions |
|---|---|---|
| c | .c |
|
| cpp | c++ | .cpp, .cc, .cxx, .h, .hpp |
| csharp | csharp, cs | .cs |
| css | .css, .scss |
|
| dtd | .dtd |
|
| fortran | f, f90, f95, f03 | .f, .f90, .f95, .f03 |
| go | golang | .go |
| html | .html, .htm |
|
| java | .java |
|
| javascript | js | .js |
| json | .json |
|
| kotlin | .kt, .kts |
|
| lua | .lua |
|
| markdown | md | .md, .mdx |
| pascal | pas, dpr, delphi | .pas, .dpr |
| php | .php |
|
| python | .py |
|
| r | .r |
|
| ruby | .rb |
|
| rust | rs | .rs |
| scala | .scala |
|
| solidity | .sol |
|
| sql | .sql |
|
| swift | .swift |
|
| toml | .toml |
|
| tsx | .tsx |
|
| typescript | ts | .ts |
| xml | .xml |
|
| yaml | .yaml, .yml |
Some Python installations (e.g. the one pre-installed on macOS) ship with a SQLite library that doesn't enable extensions.
macOS fix: Install Python through Homebrew:
brew install python3Then re-install cocoindex-code (see Get Started for install options):
Using pipx:
pipx install cocoindex-code # first install
pipx upgrade cocoindex-code # upgradeUsing uv (install or upgrade):
uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"If you previously configured cocoindex-code via environment variables, the cocoindex-code MCP command still reads them and auto-migrates to YAML settings on first run. We recommend switching to the YAML settings for new setups.
| Environment Variable | YAML Equivalent |
|---|---|
COCOINDEX_CODE_EMBEDDING_MODEL |
embedding.model in global_settings.yml |
COCOINDEX_CODE_DEVICE |
embedding.device in global_settings.yml |
COCOINDEX_CODE_ROOT_PATH |
Run ccc init in your project root instead |
COCOINDEX_CODE_EXCLUDED_PATTERNS |
exclude_patterns in project settings.yml |
COCOINDEX_CODE_EXTRA_EXTENSIONS |
include_patterns + language_overrides in project settings.yml |
CocoIndex is an ultra efficient indexing engine that also works on large codebases at scale for enterprises. In enterprise scenarios it is a lot more efficient to share indexes with teammates when there are large or many repos. We also have advanced features like branch dedupe etc designed for enterprise users.
If you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help!
Apache-2.0

