A multimodal fashion recommendation project that builds a visual embedding index from product images and serves a Streamlit chat UI powered by a local LLM (Ollama) and ChromaDB.
- Image embedding with CLIP via
sentence-transformers - Vector search using ChromaDB (Docker)
- Local LLM responses via Ollama (Docker)
- Streamlit-based chat UI
- Simple indexing pipeline for large datasets
main.py: Builds/updates the vector index fromarchive/data.csvandarchive/data/chat_app.py: Streamlit chat UI for recommendationssrc/data_loader.py: CSV loading and image path validationsrc/embedding.py: Image/text embedding utilitiessrc/vector_db.py: ChromaDB client wrappersrc/llm_service.py: Ollama LLM wrapperdocker-compose.yml: ChromaDB + Ollama services
- Python 3.10+ recommended
- Docker + Docker Compose
- Sufficient disk space for dataset and vector index
This repo expects the dataset to exist locally (not committed to Git).
Dataset source:
Expected structure:
archive/
data.csv
data/
10000.jpg
10001.jpg
...
Notes:
archive/is ignored by.gitignoreto keep Git history small.- If you want to store the dataset in Git, use Git LFS.
Create a virtual environment and install dependencies:
python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
# source .venv/bin/activate
pip install -r requirements.txtRun ChromaDB and Ollama:
docker-compose up -dPull required Ollama models inside the container:
docker exec -it ollama_service ollama pull llama3
docker exec -it ollama_service ollama pull llavaThis step reads the dataset and stores embeddings in ChromaDB.
python main.pyStart the Streamlit UI:
streamlit run chat_app.pyOpen the URL printed in the terminal (usually http://localhost:8501).
- ChromaDB runs on
localhost:8000by default. - Ollama runs on
localhost:11434by default. - If you do not have an NVIDIA GPU, remove the
deploy.resourcesGPU block indocker-compose.yml. chroma_data/stores local ChromaDB data and is ignored by Git.
- Chroma connection error: Ensure
docker-compose up -dis running and port8000is open. - Ollama model not found: Run the
ollama pullcommands inside the container. - Very slow indexing: CPU indexing can be slow for large datasets; reduce batch size or use a GPU.
MIT License. See LICENSE for details.