Skip to content

nguyenchiencong/local-docs-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Docs MCP

A modular semantic search system with MCP (Model Context Protocol) integration for searching local documentation. The Retrieval-Augmented Generation (RAG) system not only lets you manage document chunks for knowledge retrieval but also gives AI assistants semantic search capabilities through the MCP.

Key Features

Core Capability Technical Implementation
Document Indexing A full indexing pipeline that processes documents from the docs/ directory, chunks them, and creates embeddings using Ollama. With Cocoindex, it updates only the parts that have changed — when users edit or add new content, the system detects those changes and updates selectively.
Vector Database Uses Qdrant to store document embeddings for semantic search.
Retrieval The search service provides semantic search capabilities with multiple strategies (semantic, hybrid, and filtered).
MCP Integration The MCP server exposes these retrieval capabilities to AI assistants.

Quick Start

Installation

  1. Clone and setup the project:
git clone git@github.com:nguyenchiencong/local-docs-mcp.git
cd local-docs-mcp
  1. Start required services:
# Start Qdrant
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

# Make sure Ollama is running with the embedding model
ollama pull hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:F16

# Setup postgres for cocoindex
docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d
  1. Configure environment:
# Edit .env with your specific configuration
cp .env.example .env

# Don't forget to setup your .cocoignore file
cp .cocoignore.example .cocoignore
  1. Install dependencies:
uv sync

Usage

Before indexing, add your documents to the docs folder.

To index your documents:

uv run python -m src.indexing.main_flow
# To run the force reindex utility
uv run python -m src.indexing.force_reindex

To start the MCP server:

uv run python -m src.mcp_server.server

Make the CLI available on your PATH

If you want to run local-docs-mcp from any directory:

  • Windows (PowerShell):

    setx PATH "path\to\local-docs-mcp\.venv\Scripts;$($env:Path)"
  • Linux/macOS (bash/zsh):

    echo 'export PATH="/path/to/local-docs-mcp/.venv/bin:$PATH"' >> ~/.bashrc

To run MCP tools directly from the CLI (one-off calls):

# Start the server (default behavior)
local-docs-mcp

# Run a semantic search once and exit
local-docs-mcp semantic_search --query "vector search overview" --limit 5

# Run hybrid search
local-docs-mcp hybrid_search --query "async await" --semantic-weight 0.7 --limit 5

# Run a filtered search with metadata (JSON object)
local-docs-mcp search_with_metadata_filter --query "UI tutorial" --metadata-filter '{"filename": "ui.md"}' --limit 5

# Retrieve a specific document by ID
local-docs-mcp document_retrieval --document-id "doc-123"

# Fetch collection info
local-docs-mcp get_collection_info --json

Configuration

System Settings

All configuration is managed in pyproject.toml under the [tool.local-docs] section:

[tool.local-docs]
# Qdrant configuration
qdrant_url = "http://localhost:6334"
qdrant_collection = "local-docs-collection"

# Ollama configuration
ollama_url = "http://localhost:11434"
ollama_model = "hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:F16"
embedding_dimension = 1024

# Document configuration
docs_directory = "docs"
supported_extensions = [".md", ".rst", ".txt"]

# Search configuration
search_limit = 10

# Chunking configuration
chunk_size = 1200
chunk_overlap = 200

# Search configuration
search_limit = 10
similarity_threshold = 0.15
search_hnsw_ef = 256
hybrid_semantic_weight = 0.85
mmr_lambda = 0.75

Environment Variables: Override any setting with LOCAL_DOCS_* environment variables:

export LOCAL_DOCS_SEARCH_LIMIT=20
export LOCAL_DOCS_OLLAMA_MODEL="different-model"

MCP Client Setup

Add this to your MCP client configuration (e.g., Claude Code):

{
  "mcpServers": {
    "local-docs-mcp": {
      "command": "uv",
      "args": ["run", "--project", "/path/to/local-docs-mcp", "-m", "src.mcp_server.server"]
    }
  }
}

MCP Tools

The MCP server exposes the following semantic search tools to AI assistants:

Tool Purpose Parameters Example Prompt
semantic_search Perform semantic search on indexed documents. Finds content based on meaning and context rather than exact keywords. query (required string), limit (optional number, default: 10), min_similarity_score (optional number, default: 0.0) "Find information about error handling patterns in the codebase"
hybrid_search Combine semantic search with keyword matching. Useful when exact terminology matters alongside conceptual meaning. query (required string), semantic_weight (optional number, default: 0.7), limit (optional number, default: 10), min_similarity_score (optional number, default: 0.0) "Search for 'async await' patterns and asynchronous programming concepts"
document_retrieval Retrieve complete document by ID. Use this when you need the full context of a specific document found in search results. document_id (required string) "Get the full document for ID 'doc_12345'"
search_with_metadata_filter Search with metadata constraints. Use this to narrow down search results by specific document properties. query (required string), metadata_filter (optional object), limit (optional number, default: 10), min_similarity_score (optional number, default: 0.0) "Search for API documentation in files with filename containing 'api'"
get_collection_info Get information about the indexed document collection, including statistics and status. none "Show me collection statistics and indexing status"

Use cases

Documentation Research:

  • "What are signals and how do they work in Godot?"
  • "Find tutorials about character controllers"
  • "Explain the difference between KinematicBody and RigidBody"

Problem Solving:

  • "How do I fix 'node not found' errors?"
  • "What are the best practices for performance optimization?"
  • "Search for debugging techniques in Godot"

Learning Paths:

  • "I'm a beginner, show me getting started content"
  • "What should I learn after basic GDScript?"
  • "Find intermediate tutorials about physics"

Specific Searches:

  • "Show me the top 5 most relevant results about animations"
  • "Find only tutorial files about UI design"
  • "Look for performance optimization guides"

Development

Running Tests

uv run pytest tests/

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects

About

A modular semantic search system with MCP integration for searching local documentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages