A small Python command-line research assistant powered by Groq + LangChain.
You type a question like:
“research Bangladesh population, save to a file”
…and it will:
- search the web
- pull background from Wikipedia
- optionally scrape a URL or summarize a PDF you give it
- write a clean research report
- optionally save that report to
research_output.txt
-
Web search
Uses DuckDuckGo to fetch recent information. -
Wikipedia lookup
Grabs concise background knowledge. -
Optional web scraping
Scrapes and cleans the main text from a URL you paste
(with simple CAPTCHA / error handling and length limits). -
Optional PDF summarization
Summarizes local.pdffiles when you give an explicit file path. -
Save to file
When you say “save to a file”, the final report is appended toresearch_output.txt. -
Tool gating logic
scrape_websiteis only enabled if your query containshttp://orhttps://.summarize_pdfis only enabled if your query contains.pdf.- If
GROQ_API_KEYis missing, the app fails fast with a clear error.
.
├── main.py # Entry point – CLI research agent
├── tools.py # LangChain tools: search, wikipedia, save, scrape wrapper
├── web_scraper.py # Low-level requests + BeautifulSoup scraping utilities
├── pdf_tools.py # PDF summarization tool
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
- Prerequisites
--Python 3.10+ --A Groq API key (for langchain-groq) Sign up at Groq and create an API key, then keep it somewhere safe
3.Setup 1.Clone the repository git clone https://github.com/vic219/ai-personal-assistant.git cd ai-personal-assistant
2.Create and activate a virtual environment (recommended) python -m venv venv
venv\Scripts\Activate.ps1
venv\Scripts\activate.bat
source venv/bin/activate
3.Install dependencies pip install -r requirements.txt
4.pip install -r requirements.txt In the project root: GROQ_API_KEY=your_actual_groq_api_key_here
- Usage
From the project directory (with virtualenv activated): python main.py
You’ll be prompted: What can I help you to research?
Example 1 – Simple research + save to file research Bangladesh population, save to a file
The assistant will:
Use wikipedia and search
Generate a short report
Call save_text_to_file and append the report to research_output.txt
Print the report in the terminal
Example 2 – Scrape a specific web page summarize the key points from this article: https://example.com/some-article
Because the query contains https://, the scrape_website tool is enabled.
The scraper: --fetches the page with requests
--strips scripts/styles with BeautifulSoup
--detects simple CAPTCHA pages and returns a message instead of junk HTML
--truncates overly long content before sending it to the LLM
Example 3 – Summarize a PDF summarize this pdf: E:/papers/ai_safety.pdf, save to a file
--Because the query contains .pdf, the summarize_pdf tool becomes available.
--The agent summarizes the file and, if you asked, saves the report to research_output.txt.
#How the tools are chosen
In main.py, the tools list is built per query:
Always available:
--search
--wikipedia
--save_text_to_file
Added only when needed:
--scrape_website → only if the user’s query contains http:// or https://
--summarize_pdf → only if the query contains .pdf
The system prompt also reminds the model:
--not to invent file paths like /path/to/file.pdf
--not to invent URLs from titles
--to pass the entire final report text to save_text_to_file (not a filename or JSON)
--RuntimeError: GROQ_API_KEY is not set → Create .env in the project root and set GROQ_API_KEY, then restart.
--Scraping fails or returns a CAPTCHA message → The site may be blocking bots, require JS, or show a CAPTCHA. In that case, the tool returns a message instead of raw HTML.
--PDF summarization error → Check that the file path is correct and accessible from your machine.