Skip to content

A modern, containerized Retrieval-Augmented Generation (RAG) system built with Node.js, LangChain, ChromaDB, and Ollama.

License

Notifications You must be signed in to change notification settings

anu-rock/easy-rag

Repository files navigation

Easy RAG

A modern, containerized Retrieval-Augmented Generation (RAG) system built with Node.js, LangChain, ChromaDB, and Ollama. Designed for scalable, local-first knowledge base search and LLM-powered question answering.

You are fully in control of your data. This app uses locally served vector db and LLM models.

πŸš€ Introduction

Easy RAG is a backend service that enables you to upload plain text knowledge bases, embed them into a vector store (ChromaDB), and query them using LLMs (via Ollama). It is optimized for local development and containerized deployment, with a focus on extensibility and transparency.

πŸ₯… Goals

  • Teach myself RAG through implementation.
  • Offer a fully-assembled working solution* for anyone else wanting to learn.

* Tutorials on the web (including ones in LangChain and Chroma docs) become outdated quickly, leading to frustration and wasted time for beginners.

πŸ—οΈ System Design & Architecture

graph TD;
    A[User Uploads .txt File] --> B[API Server - Express]
    B --> C[Chunk & Embed - LangChain, Ollama]
    C --> D[Vector Store - ChromaDB]
    D -->|Query| B
    B -->|LLM Augmentation| E[Ollama LLM]
Loading

Key Components:

  • API Layer (Express): Handles file uploads, triggers vector store initialization, and exposes endpoints for querying.
  • Vector Store (ChromaDB): Stores document embeddings for fast similarity search.
  • Embeddings (Ollama): Generates vector representations of text using local LLMs.
  • Chunking & Batching: Large files are split and processed in batches for reliability and performance.

Data Flow:

  1. Upload: User uploads a .txt file (kb.txt) via the API.
  2. Chunk & Embed: The file is split into overlapping chunks, each chunk is embedded using Ollama.
  3. Store: Embeddings are stored in ChromaDB, organized by collection.
  4. Query: Users can query the vector store for relevant knowledge, with results augmented by LLMs.

Containerization:

  • All services (API, ChromaDB, Ollama) run as Docker containers, with persistent volumes for data and models.
  • Inter-service communication is handled via Docker networking.

⚑ Installation & Setup

Prerequisites

1. Clone the Repository

git clone https://github.com/anu-rock/easy-rag.git
cd easy-rag

2. Install Dependencies (for local dev)

npm install

3. Start with Docker Compose

docker-compose up --build
  • This will start:
    • The API server (port 3000)
    • ChromaDB (port 8000, data in chroma_data/)
    • Ollama (port 11434, models in ollama_data/)
  • Models (llama3.1:8b, nomic-embed-text:v1.5) are auto-downloaded if not present.

πŸ› οΈ Usage

Tip

For a better experience interacting with API endpoints, use Brunoβ€”the codebase includes a ready-to-use Bruno collection (api/docs). Import it into Bruno to easily test and explore all available endpoints.

Screenshot of Bruno showing an example request to the /api/ask endpoint.

1. Upload Knowledge Base

Endpoint: POST /api/upload
Description: Upload a plain text file (max 50MB). Only .txt files are accepted.

curl -X POST http://localhost:3000/api/upload \
    -H "Content-Type: multipart/form-data" \
    -F "file=@yourfile.txt"
  • The file is saved as kb.txt in the project root.

Example: Uploading a Book from Project Gutenberg

Download "Alice's Adventures in Wonderland":

curl -O https://www.gutenberg.org/files/11/11-0.txt
curl -X POST http://localhost:3000/api/upload \
    -H "Content-Type: multipart/form-data" \
    -F "file=@11-0.txt"

Example: Uploading a Large Article Series

Suppose you have a long-form article or a concatenated series (e.g., a Wikipedia dump or a multi-part blog series). Save it as series.txt and upload:

curl -X POST http://localhost:3000/api/upload \
    -H "Content-Type: multipart/form-data" \
    -F "file=@series.txt"

2. Initialize Vector Store

Endpoint: POST /api/init-kb
Description: Triggers chunking, embedding, and storage of the uploaded knowledge base.
Returns: Server-Sent Events (SSE) stream with progress updates.

curl -X POST http://localhost:3000/api/init-kb \
    -H "Content-Type: application/json" \
    -d '{"collectionName": "book-alice-wonderland"}'
  • The collectionName parameter is required in the JSON body to specify where the embeddings will be stored.
  • Progress and errors are streamed in real time.

3. Query Knowledge Base

Endpoint: GET /api/ask
Description: Query the vector store and receive LLM-augmented answers.

curl -G http://localhost:3000/api/ask \
    --data-urlencode "collectionName=book-alice-wonderland" \
    --data-urlencode "query=What is Alice doing in Wonderland?"
  • The response contains relevant context from your uploaded knowledge base, along with an answer generated by the LLM.
  • Use this endpoint to interactively search and retrieve information from your documents.

🧩 Project Structure

api/                # API route handlers
api/docs/           # Bruno collection cum API documentation
lib/                # Core logic (vector store, chunking, RAG, etc.)
chroma_data/        # ChromaDB persistent data
ollama_data/        # Ollama model cache
kb.txt              # Uploaded knowledge base
.vscode/            # Debugging and dev configs

🐳 Docker Notes

  • All data and models are persisted on the host for fast restarts.
  • To rebuild with fresh dependencies:
    docker-compose build --no-cache
  • To stop all services:
    docker-compose down

πŸ§ͺ Testing

  • Unit tests are written with Vitest.
  • Run all tests:
    npm test

πŸͺœ Potential Roadmap

  • Support for more data source formats - pdf, epub, docx, html.
  • Support for larger data source files (currently limited to 50 MiB).
  • More integrations for - embedding models, vector stores, LLM models.
  • UI dashboard:
    • chat interface (with history)
    • manage configuration - choice of models and store
    • manage knowledge bases

Basically, an open-source RAG-as-a-service app similar to Verba πŸ€”

🀝 Contributing

  • PRs and issues are welcome!
  • Please lint and test your code before submitting. I use the Prettier VS Code extension with its out-of-the-box config.

πŸ›Ÿ Troubleshooting

  • ChromaDB connection errors: Ensure ChromaDB is running and accessible at the correct URL (CHROMA_URL).
  • Ollama/model errors: Check that the required models are downloaded and Ollama is running.
  • File upload issues: Only .txt files up to 50MB are accepted.

πŸ–₯️ System Requirements

You will need a decently spec'd machine to run this. I developed and tested it on an Apple M3 Pro MBP with 36 GiB of memory. Even then RAG searches (inferences) took north of 2 minutes. Initializing a knowledge base roughly 7K lines long took 5-10 minutes.

Non-containerized performance will likely be better. Install ChromaDB and Ollama directly on your machine and connect the main Node app with them for optimal latencies.

πŸ“„ License

See LICENSE.

About

A modern, containerized Retrieval-Augmented Generation (RAG) system built with Node.js, LangChain, ChromaDB, and Ollama.

Topics

Resources

License

Stars

Watchers

Forks