An AI-powered product research assistant built with FastAPI, LangGraph, Qdrant, and Ollama. The system helps e-commerce teams make data-driven decisions about products through intelligent query routing and multi-tool orchestration.
- Product Catalog RAG: Semantic search over product catalog with metadata filtering
- Web Search: Market trends and competitor research (with mock fallback)
- Price Analysis: Deterministic margin calculations and pricing recommendations
- Intelligent Routing: LangGraph-based agent automatically selects appropriate tools
- Chat Interface: Modern React-based frontend for interactive research
- Query History: Track all queries with feedback support
- Docker and Docker Compose
- Node.js 18+ (for Frontend)
- Python 3.11+ (for local development/evaluation)
- 8GB+ RAM recommended (for Ollama)
- GPU recommended but not required (Ollama can run on CPU)
-
Clone the repository:
git clone <repository-url> cd ai-product-research-assistant
-
Create environment file:
cp .env.example .env
-
Start backend services:
docker-compose up -d
-
Pull the Ollama model (first time only):
docker exec -it ollama ollama pull qwen3:0.6bNote: You can use larger models like
llama3.2ormistralby updating.envif your hardware permits. -
Run data ingestion:
docker exec -it product-research-assistant python -m src.ingestion.pipeline -
Verify backend:
curl http://localhost:8000/health
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
npm install
-
Start development server:
npm run dev
Access the UI at: http://localhost:5173
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "What wireless headphones do we have in stock?"}'curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "Current market price for noise-cancelling headphones?"}'curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "Which products have lowest profit margins?"}'curl http://localhost:8000/queriesRun the automated test suite to measure agent performance (accuracy, tool selection, goal completion):
# From root directory (ensure python environment is set up)
pip install -r requirements.txt
python src/evaluation/evaluator.pyReports will be saved to evaluation_reports/.
pip install locust
cd load_tests
locust -f locustfile.py --host=http://localhost:8000
ββββββββββββββββ βββββββββββββββ ββββββββββββββββββ
β Frontend β β FastAPI β β LangGraph β
β (React) β ββββΆ β Server β ββββΆ β Agent β
ββββββββββββββββ βββββββββββββββ βββββββββ¬βββββββββ
β
βββββββββββββββββ¬βββββββββββββββββββββββββ΄ββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
β Product RAG β β Web Search β β Price Analysis β
β (Qdrant) β β (Tavily/API) β β (Deterministic)β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
See architecture/ARCHITECTURE.md for detailed documentation.
ai-product-research-assistant/
βββ architecture/ # System design docs
βββ data/ # Raw data (catalog.csv)
βββ frontend/ # React application
β βββ src/
β β βββ api/ # API client
β β βββ components/ # Chat UI components
βββ src/
β βββ agent/ # LangGraph agent logic
β βββ evaluation/ # Evaluation metrics & scripts
β βββ ingestion/ # Data pipeline
β βββ tools/ # RAG, Search, Analysis tools
β βββ models/ # Pydantic schemas
β βββ server.py # FastAPI main app
βββ tests/ # Unit tests
βββ docker-compose.yml # Service orchestration
βββ requirements.txt # Python dependencies
- Memory: Running Qdrant, Ollama, and API services simultaneously requires significant RAM (8GB+ recommended).
- Model Size: The optimization to use
qwen3:0.6b(for speed/memory) trades off some reasoning capability compared to larger models like Llama 3. - No Auth: API currently has no authentication mechanism.
- Product List: The product list is not included in the response body of catalog search requests in the current API schema.
- Caching Layer: Implement Redis for frequent query caching.
- Conversation Memory: Enhance agent with long-term memory across sessions.
- Advanced Auth: API Key/OAuth integration.
- Rate Limiting: Protect endpoints from abuse.
- Dashboard: Expand frontend to include admin/metrics view.
- Monitoring: Add monitoring for agent performance and system health.
- Logging: Add logging for agent performance and system health.
- Error Handling: Add error handling for agent performance and system health.
- Evaluation Framework: Automated metric collection for RAG and agent performance
- Resource Management: Balancing Docker resource limits for local LLM inference.
- Tool Selection: Tuning prompts to ensure the small model selects the correct tool (Search vs RAG).
- Architecture: Designing clean boundaries between Agent logic and API layer.
What I learned can refers to walkthrough.md
