A local RAG (Retrieval-Augmented Generation) chatbot that answers housing-related questions using a vector database and a local or containerized LLM. It is designed to help users understand programs like rental assistance programs, affordable housing listings, tenant rights, and affordable housing options using real local documents instead of internet data.
- Overview
- How It Works
- Prerequisites
- Local Setup (CPU)
- Run Ingestion Pipeline in Google Colab (Optional)
- GPU Environment (Optional)
- Troubleshooting
This project is a housing-focused chatbot built on a RAG pipeline:
- A parsing and ingestion pipeline processes local housing PDFs / web pages into structured chunks.
- A vector store (backed by precomputed embeddings in
all_parsed_docs.joblib) is used for semantic retrieval. - A backend service builds prompts with citations from the retrieved chunks and calls a local or containerized LLM.
- A simple web UI exposes a chat interface where users can ask housing questions and see cited sources.
The goal is to provide grounded, citation-rich answers using official local documents, improving reliability over generic LLM responses.
The ingestion step runs in a notebook: ingesting/Ingest_pipeline.ipynb.
At a high level, it:
- Loads source materials (PDFs, web archives, etc.).
- Cleans and normalizes the text (removing boilerplate where possible).
- Splits the content into overlapping chunks suitable for semantic search.
- Computes embeddings for each chunk.
- Stores the resulting list of chunks, metadata, and embeddings into a single file:
all_parsed_docs.joblib
This file is later loaded by the backend to initialize or populate the vector index.
When a user asks a question in the chat UI:
- The backend receives the user message and recent conversation history.
- A semantic search step retrieves the top-k most relevant chunks from the vector data (backed by
all_parsed_docs.joblib). - The system constructs a prompt that includes:
- A system message describing the assistant’s role (housing-focused, grounded in documents).
- The current conversation history.
- The retrieved chunks, numbered so the model can cite them as [1], [2], etc.
- The LLM generates an answer using only the provided context for policy details, and includes inline citations like
[1]. - The backend returns both:
- The answer text.
- The list of citations (with title, path, and snippet) so the UI can show “View details” for each source.
The frontend (HTML + CSS + JS) provides:
- A chat window for messages from the user and the bot.
- A “View sources” area listing the retrieved documents.
- Clickable citation markers like
[1],[2]within the bot message that open a modal with:- Title
- Archive/source type
- Source path / URL
- A short snippet of the original text
This makes it easy for users to verify where each answer came from.
- Docker and Docker Compose installed
- (For GPU mode) NVIDIA GPU + drivers + NVIDIA Container Toolkit
- (Optional) Google Colab account if you want to run the ingestion pipeline there
# Clone the repository
git clone <repo-url>
cd NLP_Housing_Vector_DB # adjust if your repo folder name is different
# Build and start the app (CPU)
docker compose -f docker-compose.cpu.yml up --buildOnce the containers are up, open your browser at the port defined in docker-compose.cpu.yml (for example: http://localhost:<port>).
The ingestion pipeline parses the housing PDFs/web data and creates a single joblib file with all chunks.
- Open
ingesting/Ingest_pipeline.ipynbin Google Colab. - Run all cells.
It will generate:
all_parsed_docs.joblib
Now move this file into the project data/ directory so the app can load it.
If your repo is cloned in Colab as NLP_Housing_Vector_DB/:
%cd /content/NLP_Housing_Vector_DB
!mkdir -p data
!mv all_parsed_docs.joblib data/If you instead downloaded the file locally, move it on your machine into:
NLP_Housing_Vector_DB/data/all_parsed_docs.joblib
Then restart the app so it picks up the new data:
docker compose -f docker-compose.cpu.yml restart appUse this if you have a GPU-capable machine and want faster LLM inference.
# Clone the repository
git clone <repo-url>
cd NLP_Housing_Vector_DB
# Build and start the app (GPU)
docker compose -f docker-compose.gpu.yml up --buildMake sure data/all_parsed_docs.joblib exists in the same project directory on the host:
mkdir -p data
mv all_parsed_docs.joblib data/ # if you haven’t moved it alreadyThen restart the GPU app:
docker compose -f docker-compose.gpu.yml restart appOpen your browser at the port defined in docker-compose.gpu.yml.
-
Containers won’t start / crash immediately
Run:
docker compose -f docker-compose.cpu.yml logs app
or for GPU:
docker compose -f docker-compose.gpu.yml logs app
and check for missing files or configuration errors.
-
App says it cannot find
all_parsed_docs.joblib- Confirm the file exists at:
NLP_Housing_Vector_DB/data/all_parsed_docs.joblib - Ensure the
data/directory is mounted correctly in the relevantdocker-compose.*.ymlfile. - Restart the app container after adding the file.
- Confirm the file exists at:
-
GPU not being used
- Verify
nvidia-smiworks on the host. - Check that the GPU service in
docker-compose.gpu.ymlis configured for the NVIDIA runtime or equivalent. - Make sure you started the project with the GPU compose file:
docker compose -f docker-compose.gpu.yml up --build
- Verify