A lightweight Retrieval-Augmented Generation (RAG) app to query competition rules using semantic search and a local Mistral LLM via Ollama.
- 📄 Extracts text from rulebook PDFs
- ✂️ Splits text into semantic chunks (~200 words)
- 🧠 Embeds the chunks using
MiniLMviasentence-transformers - 🗂️ Stores them in a FAISS vector index
- 💬 Retrieves relevant chunks and sends them to a local Mistral LLM for answering
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtPlace your WRRC PDFs in a folder called regulations_pdfs in the project root:
mkdir regulations_pdfs
# add your PDFs herepython index_pdfs.pyThis script will:
- Extract and chunk text
- Embed chunks using MiniLM
- Create
faiss_index.idxandindex_metadata.pkl
In a separate terminal:
ollama serve
ollama run mistralThis will start the Ollama server and load the Mistral model (downloaded if needed).
python query_rag.pyExample:
Ask a question (or type 'exit'): How many categories are defined in the competition rules?
- Python 3.8+
sentence-transformersfaiss-cpupdfplumberrequests- Ollama with
mistralmodel