ScribeNova is a fully local, privacy-first AI chat application built on Next.js 16, LangChain, and Ollama. It features persistent vector memory, website crawling & Q&A, a fully customizable chatbot persona, and a canvas-rendered 3D animated mascot — all running on your own machine with zero cloud dependency.
Features · Architecture · Quick Start · Configuration · API Reference · Troubleshooting
- Powered by Ollama — fully local LLM inference, no API keys needed
- ReAct agent via LangGraph — reasons, selects tools, and responds
- 5 built-in tools: web search, calculator, clock, Pokémon info, website Q&A
- Markdown responses with clickable links, emails, and phone numbers
- Add personal facts in plain language: "My name is Tarun", "I live in Delhi"
- Facts are embedded and stored in Qdrant vector DB
- Semantically retrieved at query time — the bot knows who you are
- Manage facts from the Settings panel (add / delete)
- Every conversation is embedded and stored in Qdrant
- Hybrid retrieval: top-2 semantically similar + top-3 most recent
- 95% similarity deduplication — no redundant storage
- Survives server restarts
- Paste any URL → the bot crawls up to 15 pages with Playwright
- Content is chunked, embedded, and indexed in Qdrant
- Fuzzy URL matching — ask about "iotsolvez" and it finds
iotsolvez_vercel_app - Manage indexed websites from Settings: see chunk counts, delete entries
- Rich structured responses: bold sections, clickable links, contact blocks
- Set a custom name and description from the Settings panel
- The agent's system prompt updates live — the bot introduces itself by your chosen name
- Persona persists across the session
- Pure HTML5 Canvas, zero external dependencies
- 3D white sphere with perspective-projected eyes, specular highlights, ground shadow
- 6 expressions:
idle,happy,think,surprise,loading,sleep - Auto-blink, auto-glance, squash-and-stretch bounce physics
- Expression driven by chat state:
- Loading → cycles
loading → think → surpriseover time - Response arrives →
happyfor 2s →idle - Error →
surprisefor 1.5s →idle - Past messages →
sleep(closed eyes, floating z's) - Latest message →
idle(alive and breathing)
- Loading → cycles
┌─────────────────────────────────────────────────────────────────┐
│ Browser (Next.js) │
│ │
│ Chat.tsx ──► KiroMascot.tsx (canvas, RAF loop) │
│ │ │
│ ├── POST /api/agent { message, botName, botDescription } │
│ ├── GET/POST/DELETE /api/memory (custom facts) │
│ ├── POST /api/website (crawl & index) │
│ └── GET/DELETE /api/websites (list & remove) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ Next.js API Routes │
│ │
│ agent/route.ts │
│ └── runAgent(message, userId, botName, botDescription) │
│ ├── VectorMemory.getRelevantHistory() ─┐ │
│ ├── VectorMemory.getRecentHistory() ├─ Qdrant │
│ ├── CustomMemory.getRelevantFacts() ──┘ │
│ ├── createReactAgent(llm, tools, systemPrompt) │
│ │ ├── searchTool (DuckDuckGo) │
│ │ ├── calculatorTool │
│ │ ├── timeTool │
│ │ ├── pokemonTool │
│ │ └── websiteQATool │
│ │ ├── resolveWebsiteDomain() ── Qdrant │
│ │ ├── crawlWebsite() ─────────── Playwright │
│ │ ├── chunkText() │
│ │ ├── createVectorstore() ────── Qdrant │
│ │ └── getQaChain() ───────────── Ollama LLM │
│ └── VectorMemory.saveConversation() ──── Qdrant │
└─────────────────────────────────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ Local Infrastructure │
│ │
│ Ollama (port 11434) Qdrant (port 6333) │
│ ├── qwen2.5:1.5b (LLM) ├── conversation_memory │
│ └── nomic-embed-text ├── user_custom_memory │
│ (embeddings, 768d) └── website_chunks │
└─────────────────────────────────────────────────────────────────┘
scribe-nova/
├── app/
│ ├── api/
│ │ ├── agent/route.ts # Main chat endpoint
│ │ ├── memory/route.ts # Custom facts CRUD
│ │ ├── website/route.ts # Crawl & index a URL
│ │ └── websites/route.ts # List & delete indexed sites
│ ├── components/
│ │ ├── Chat.tsx # Full chat UI + Settings modal
│ │ └── KiroMascot.tsx # Canvas 3D animated mascot
│ ├── globals.css
│ ├── layout.tsx
│ └── page.tsx
├── lib/
│ ├── agent.ts # ReAct agent orchestration
│ ├── chunker.ts # RecursiveCharacterTextSplitter
│ ├── crawler.ts # Playwright website crawler
│ ├── customMemory.ts # User facts (Qdrant)
│ ├── memory.ts # Legacy (unused)
│ ├── qa.ts # RAG Q&A chain
│ ├── tools.ts # Tool definitions
│ ├── vectorMemory.ts # Conversation memory (Qdrant)
│ ├── vectorstore.ts # Website chunks + fuzzy resolver
│ └── websiteTool.ts # LangChain website_qa tool
├── .env.local # Environment variables
├── README.md # This file
└── SYSTEM.md # Deep-dive technical reference
| Requirement | Version | Notes |
|---|---|---|
| Node.js | 20+ | |
| Ollama | latest | ollama.ai |
| Docker | any | for Qdrant |
| Playwright Chromium | auto-installed | via npx playwright install |
git clone https://github.com/tarunkumar-sys/CHAT_BOT.git
cd CHAT_BOT
npm install# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5:1.5b
ollama pull nomic-embed-text
# Verify
ollama listdocker run -d --name qdrant -p 6333:6333 qdrant/qdrant
# Verify
curl http://localhost:6333npx playwright install chromiumCreate .env.local in the project root:
# Ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:1.5b
# Qdrant
QDRANT_URL=http://localhost:6333
# Optional: LangSmith tracing
# LANGCHAIN_TRACING_V2=true
# LANGCHAIN_API_KEY=your-key
# LANGCHAIN_PROJECT=scribe-novanpm run devType any message and press Enter or click Send. The agent automatically selects the right tool.
You: What time is it in Tokyo?
Kiro: It is currently Monday, March 21, 2026, 06:30 PM JST.
You: Calculate 1234 * 5678
Kiro: 1234 × 5678 = 7,006,652
You: Search for latest LLM benchmarks
Kiro: [searches DuckDuckGo and summarizes top 3 results]
Open Settings → Memory and add personal facts:
My name is Tarun
I am a software engineer
I live in Delhi
I prefer concise answers
My favourite language is Python
Now ask:
You: What do you know about me?
Kiro: You're Tarun, a software engineer based in Delhi who prefers
concise answers and works primarily with Python.
Facts are embedded and retrieved semantically — the bot only surfaces facts relevant to the current question.
Option A — From Settings panel:
- Open Settings → Website
- Paste a URL and click "Crawl & Index Website"
- Wait for indexing (15–60s depending on site size)
- Ask questions in chat
Option B — Directly in chat:
You: Tell me about https://example.com
Kiro: [crawls automatically if not indexed, then answers]
You: What services does example offer?
Kiro: [uses fuzzy matching to find the indexed site]
Fuzzy URL matching — once a site is indexed, you can refer to it by partial name:
# Site indexed as: iotsolvez_vercel_app
You: What is iotsolvez about? ← works without full URL
You: Tell me about iotsolvez.vercel.app ← also works
Open Settings → General:
- Bot Name — changes the name shown in the header and used in the system prompt
- Description — added to the system prompt so the bot adopts a persona
Name: DevBot
Description: A no-nonsense assistant for senior engineers
The agent will now introduce itself as DevBot and respond accordingly.
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
qwen2.5:1.5b |
LLM model name |
QDRANT_URL |
http://localhost:6333 |
Qdrant server URL |
LANGCHAIN_TRACING_V2 |
— | Enable LangSmith tracing |
LANGCHAIN_API_KEY |
— | LangSmith API key |
LANGCHAIN_PROJECT |
— | LangSmith project name |
Edit OLLAMA_MODEL in .env.local:
# Fastest (least capable)
OLLAMA_MODEL=tinyllama
# Default — good balance
OLLAMA_MODEL=qwen2.5:1.5b
# Better quality, slower
OLLAMA_MODEL=llama3.2:3b
# Best quality, requires more RAM
OLLAMA_MODEL=mistral:7bIn lib/agent.ts:
// How many semantically similar past conversations to load
const relevantHistory = await vectorMemory.getRelevantHistory(userId, input, 2);
// How many most-recent conversations to load
const recentHistory = await vectorMemory.getRecentHistory(userId, 3);
// Max total conversations passed to LLM
.slice(0, 3)In lib/websiteTool.ts:
const pages = await crawlWebsite(fullUrl, { maxPages: 15 });
// 5 → fast, shallow
// 15 → default
// 30 → thorough, slowIn lib/qa.ts:
numPredict: 600, // max tokens in Q&A response
k: 8, // chunks retrieved per query
// context limit
return context.length > 3500 ? context.substring(0, 3500) + '...' : context;In lib/vectorMemory.ts:
const similarityThreshold = 0.95;
// 0.90 → stricter (saves more unique conversations)
// 0.98 → looser (deduplicates more aggressively)Run the AI agent.
Request
{
"message": "What is on example.com?",
"botName": "ScribeNova",
"botDescription": "Your intelligent AI assistant"
}Response
{
"response": "Example.com is a domain reserved for illustrative examples..."
}List all custom memory facts for the default user.
Response
{
"facts": [
{ "id": "uuid", "text": "My name is Tarun", "userId": "default-user", "createdAt": "2026-03-21T..." }
]
}Add a new fact.
Request
{ "fact": "I prefer dark mode", "userId": "default-user" }Delete a fact by ID.
Request
{ "factId": "uuid", "userId": "default-user" }Crawl and index a website.
Request
{ "url": "https://example.com" }Response
{ "success": true, "pages": 12, "chunks": 87, "url": "https://example.com" }List all indexed websites with chunk counts.
Response
{
"sites": [
{ "domain": "example_com", "url": "https://example.com", "chunks": 87 }
]
}Remove all indexed data for a domain.
Request
{ "domain": "example_com" }# Check
curl http://localhost:6333
# Start
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
# Restart existing container
docker start qdrantollama list # see what's installed
ollama pull qwen2.5:1.5b # pull the LLM
ollama pull nomic-embed-text # pull the embedding model
ollama serve # make sure server is runningnpx playwright install chromium
# If on Linux, also install system deps:
npx playwright install-deps chromiumThe first query to a new website takes 15–60s (crawling + embedding). Subsequent queries use the cached index and respond in 5–15s. This is expected behavior.
Some sites are single-page apps or use JavaScript routing. The crawler only follows <a href> links on the same domain. This is a known limitation of static crawling.
Qdrant stores data in-memory by default with the basic Docker command. To persist data across container restarts:
docker run -d --name qdrant \
-p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant# Clear all Qdrant collections
curl -X DELETE http://localhost:6333/collections/conversation_memory
curl -X DELETE http://localhost:6333/collections/user_custom_memory
curl -X DELETE http://localhost:6333/collections/website_chunks
# Clear Next.js build cache
rm -rf .next
# Reinstall dependencies
rm -rf node_modules && npm install
# Restart
npm run dev- Deploy the Next.js app to Vercel
- Host Ollama on a GPU VM (e.g. RunPod, vast.ai, or a VPS)
- Host Qdrant on Qdrant Cloud (free tier available)
- Set environment variables in Vercel dashboard
OLLAMA_BASE_URL=https://your-ollama-server.com
OLLAMA_MODEL=qwen2.5:1.5b
QDRANT_URL=https://your-cluster.qdrant.io:6333version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- QDRANT_URL=http://qdrant:6333
depends_on:
- qdrant
qdrant:
image: qdrant/qdrant
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
volumes:
qdrant_data:Note: Ollama with GPU support requires the NVIDIA Container Toolkit and a separate compose profile. See Ollama Docker docs.
- Vector-based persistent conversation memory
- Custom user memory (personal facts)
- Website crawling and Q&A
- Fuzzy URL / domain matching
- Indexed website management (list + delete)
- Customizable bot name and description
- Canvas 3D animated mascot (KiroMascot)
- Expression-driven mascot state machine
- Streaming responses (SSE)
- Multi-user / authentication
- File upload and document Q&A
- Voice input (Web Speech API)
- Conversation export (JSON / Markdown)
- Mobile-responsive layout
| Layer | Technology |
|---|---|
| Framework | Next.js 16.1.6, React 19 |
| Language | TypeScript 5 |
| Styling | TailwindCSS 4 |
| AI Framework | LangChain 1.x, LangGraph 1.x |
| LLM | Ollama (qwen2.5:1.5b) |
| Embeddings | nomic-embed-text (768d, via Ollama) |
| Vector DB | Qdrant |
| Web Scraping | Playwright (Chromium) |
| Web Search | DuckDuckGo (duck-duck-scrape) |
| Mascot | HTML5 Canvas 2D API (zero deps) |
| Icons | Lucide React |
| Markdown | react-markdown 9 |
MIT — see LICENSE for details.
Built with care using Next.js · LangChain · Ollama · Qdrant