A modern, containerized Retrieval-Augmented Generation (RAG) system built with Node.js, LangChain, ChromaDB, and Ollama. Designed for scalable, local-first knowledge base search and LLM-powered question answering.
You are fully in control of your data. This app uses locally served vector db and LLM models.
Easy RAG is a backend service that enables you to upload plain text knowledge bases, embed them into a vector store (ChromaDB), and query them using LLMs (via Ollama). It is optimized for local development and containerized deployment, with a focus on extensibility and transparency.
- Teach myself RAG through implementation.
- Offer a fully-assembled working solution* for anyone else wanting to learn.
* Tutorials on the web (including ones in LangChain and Chroma docs) become outdated quickly, leading to frustration and wasted time for beginners.
graph TD;
A[User Uploads .txt File] --> B[API Server - Express]
B --> C[Chunk & Embed - LangChain, Ollama]
C --> D[Vector Store - ChromaDB]
D -->|Query| B
B -->|LLM Augmentation| E[Ollama LLM]
Key Components:
- API Layer (Express): Handles file uploads, triggers vector store initialization, and exposes endpoints for querying.
- Vector Store (ChromaDB): Stores document embeddings for fast similarity search.
- Embeddings (Ollama): Generates vector representations of text using local LLMs.
- Chunking & Batching: Large files are split and processed in batches for reliability and performance.
Data Flow:
- Upload: User uploads a
.txtfile (kb.txt) via the API. - Chunk & Embed: The file is split into overlapping chunks, each chunk is embedded using Ollama.
- Store: Embeddings are stored in ChromaDB, organized by collection.
- Query: Users can query the vector store for relevant knowledge, with results augmented by LLMs.
Containerization:
- All services (API, ChromaDB, Ollama) run as Docker containers, with persistent volumes for data and models.
- Inter-service communication is handled via Docker networking.
git clone https://github.com/anu-rock/easy-rag.git
cd easy-ragnpm installdocker-compose up --build- This will start:
- The API server (port 3000)
- ChromaDB (port 8000, data in
chroma_data/) - Ollama (port 11434, models in
ollama_data/)
- Models (
llama3.1:8b,nomic-embed-text:v1.5) are auto-downloaded if not present.
Tip
For a better experience interacting with API endpoints, use Brunoβthe codebase includes a ready-to-use Bruno collection (api/docs). Import it into Bruno to easily test and explore all available endpoints.
Endpoint: POST /api/upload
Description: Upload a plain text file (max 50MB). Only .txt files are accepted.
curl -X POST http://localhost:3000/api/upload \
-H "Content-Type: multipart/form-data" \
-F "file=@yourfile.txt"- The file is saved as
kb.txtin the project root.
Download "Alice's Adventures in Wonderland":
curl -O https://www.gutenberg.org/files/11/11-0.txt
curl -X POST http://localhost:3000/api/upload \
-H "Content-Type: multipart/form-data" \
-F "file=@11-0.txt"Suppose you have a long-form article or a concatenated series (e.g., a Wikipedia dump or a multi-part blog series). Save it as series.txt and upload:
curl -X POST http://localhost:3000/api/upload \
-H "Content-Type: multipart/form-data" \
-F "file=@series.txt"Endpoint: POST /api/init-kb
Description: Triggers chunking, embedding, and storage of the uploaded knowledge base.
Returns: Server-Sent Events (SSE) stream with progress updates.
curl -X POST http://localhost:3000/api/init-kb \
-H "Content-Type: application/json" \
-d '{"collectionName": "book-alice-wonderland"}'- The
collectionNameparameter is required in the JSON body to specify where the embeddings will be stored. - Progress and errors are streamed in real time.
Endpoint: GET /api/ask
Description: Query the vector store and receive LLM-augmented answers.
curl -G http://localhost:3000/api/ask \
--data-urlencode "collectionName=book-alice-wonderland" \
--data-urlencode "query=What is Alice doing in Wonderland?"- The response contains relevant context from your uploaded knowledge base, along with an answer generated by the LLM.
- Use this endpoint to interactively search and retrieve information from your documents.
api/ # API route handlers
api/docs/ # Bruno collection cum API documentation
lib/ # Core logic (vector store, chunking, RAG, etc.)
chroma_data/ # ChromaDB persistent data
ollama_data/ # Ollama model cache
kb.txt # Uploaded knowledge base
.vscode/ # Debugging and dev configs
- All data and models are persisted on the host for fast restarts.
- To rebuild with fresh dependencies:
docker-compose build --no-cache
- To stop all services:
docker-compose down
- Unit tests are written with Vitest.
- Run all tests:
npm test
- Support for more data source formats - pdf, epub, docx, html.
- Support for larger data source files (currently limited to 50 MiB).
- More integrations for - embedding models, vector stores, LLM models.
- UI dashboard:
- chat interface (with history)
- manage configuration - choice of models and store
- manage knowledge bases
Basically, an open-source RAG-as-a-service app similar to Verba π€
- PRs and issues are welcome!
- Please lint and test your code before submitting. I use the Prettier VS Code extension with its out-of-the-box config.
- ChromaDB connection errors: Ensure ChromaDB is running and accessible at the correct URL (
CHROMA_URL). - Ollama/model errors: Check that the required models are downloaded and Ollama is running.
- File upload issues: Only
.txtfiles up to 50MB are accepted.
You will need a decently spec'd machine to run this. I developed and tested it on an Apple M3 Pro MBP with 36 GiB of memory. Even then RAG searches (inferences) took north of 2 minutes. Initializing a knowledge base roughly 7K lines long took 5-10 minutes.
Non-containerized performance will likely be better. Install ChromaDB and Ollama directly on your machine and connect the main Node app with them for optimal latencies.
See LICENSE.
