Skip to content

Add code search tool with RAG capabilities#34

Closed
neubig wants to merge 7 commits intomainfrom
add-code-search-tool
Closed

Add code search tool with RAG capabilities#34
neubig wants to merge 7 commits intomainfrom
add-code-search-tool

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Dec 22, 2024

Description

This PR adds a new code search tool that uses Retrieval Augmented Generation (RAG) to enable semantic code search across repositories.

Features

  • Semantic code search using sentence transformers (configurable via env var)
  • Fast similarity search using FAISS
  • Support for indexing any git repository with configurable file extensions
  • Save/load functionality for search indices
  • Comprehensive error handling and test coverage

Implementation Details

  • Added new dependencies: PyTorch CPU, Sentence Transformers, FAISS
  • New module structure:
    • code_search/core.py: Core indexing and search functionality
    • code_search/tools.py: High-level tool functions
    • Tests in tests/test_code_search.py

Example Usage

# Initialize search for a repository
result = initialize_code_search(
    repo_path="/path/to/repo",
    save_dir="/path/to/save",
    extensions=[".py"],  # optional
    embedding_model="BAAI/bge-base-en-v1.5"  # optional
)

# Search code
result = search_code(
    save_dir="/path/to/save",
    query="function that handles HTTP requests",
    k=5  # number of results
)

Testing

  • Added comprehensive unit tests
  • Tested on the openhands-aci repository itself with good results
  • Example search results for "code that handles file editing":
    File: openhands_aci/editor/__init__.py
    Score: 0.727
    ...
    

Notes

  • The embedding model can be configured through an environment variable
  • All tests are passing
  • Documentation included in code

@neubig neubig changed the title [AI Generated] Add code search tool with RAG capabilities Add code search tool with RAG capabilities Dec 22, 2024
openhands-agent and others added 4 commits December 22, 2024 16:43
- Remove binary index files and add to .gitignore
- Reorganize dependencies into optional groups:
  - code-search: sentence-transformers and faiss-cpu
  - pytorch-cpu: PyTorch CPU version
  - pytorch: Default PyTorch version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants