Skip to content

Command-line AI research assistant using Groq + LangChain with web search, scraping, and PDF summarization.

Notifications You must be signed in to change notification settings

vic219/ai-personal-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

AI Personal Assistant

A small Python command-line research assistant powered by Groq + LangChain.

You type a question like:

“research Bangladesh population, save to a file”

…and it will:

  • search the web
  • pull background from Wikipedia
  • optionally scrape a URL or summarize a PDF you give it
  • write a clean research report
  • optionally save that report to research_output.txt

✨ Features

  • Web search
    Uses DuckDuckGo to fetch recent information.

  • Wikipedia lookup
    Grabs concise background knowledge.

  • Optional web scraping
    Scrapes and cleans the main text from a URL you paste
    (with simple CAPTCHA / error handling and length limits).

  • Optional PDF summarization
    Summarizes local .pdf files when you give an explicit file path.

  • Save to file
    When you say “save to a file”, the final report is appended to research_output.txt.

  • Tool gating logic

    • scrape_website is only enabled if your query contains http:// or https://.
    • summarize_pdf is only enabled if your query contains .pdf.
    • If GROQ_API_KEY is missing, the app fails fast with a clear error.

🧱 Project Structure

.
├── main.py           # Entry point – CLI research agent
├── tools.py          # LangChain tools: search, wikipedia, save, scrape wrapper
├── web_scraper.py    # Low-level requests + BeautifulSoup scraping utilities
├── pdf_tools.py      # PDF summarization tool
├── requirements.txt  # Python dependencies
├── .gitignore
└── README.md
  1. Prerequisites

--Python 3.10+ --A Groq API key (for langchain-groq) Sign up at Groq and create an API key, then keep it somewhere safe

3.Setup 1.Clone the repository git clone https://github.com/vic219/ai-personal-assistant.git cd ai-personal-assistant

2.Create and activate a virtual environment (recommended) python -m venv venv

Windows (PowerShell)

venv\Scripts\Activate.ps1

Windows (cmd)

venv\Scripts\activate.bat

macOS / Linux

source venv/bin/activate

3.Install dependencies pip install -r requirements.txt

4.pip install -r requirements.txt In the project root: GROQ_API_KEY=your_actual_groq_api_key_here

  1. Usage

From the project directory (with virtualenv activated): python main.py

You’ll be prompted: What can I help you to research?

Example 1 – Simple research + save to file research Bangladesh population, save to a file

The assistant will:

Use wikipedia and search

Generate a short report

Call save_text_to_file and append the report to research_output.txt

Print the report in the terminal

Example 2 – Scrape a specific web page summarize the key points from this article: https://example.com/some-article

Because the query contains https://, the scrape_website tool is enabled.

The scraper: --fetches the page with requests

--strips scripts/styles with BeautifulSoup

--detects simple CAPTCHA pages and returns a message instead of junk HTML

--truncates overly long content before sending it to the LLM

Example 3 – Summarize a PDF summarize this pdf: E:/papers/ai_safety.pdf, save to a file

--Because the query contains .pdf, the summarize_pdf tool becomes available.

--The agent summarizes the file and, if you asked, saves the report to research_output.txt.

#How the tools are chosen

In main.py, the tools list is built per query:

Always available:

--search

--wikipedia

--save_text_to_file

Added only when needed:

--scrape_website → only if the user’s query contains http:// or https://

--summarize_pdf → only if the query contains .pdf

The system prompt also reminds the model:

--not to invent file paths like /path/to/file.pdf

--not to invent URLs from titles

--to pass the entire final report text to save_text_to_file (not a filename or JSON)

Troubleshooting

--RuntimeError: GROQ_API_KEY is not set → Create .env in the project root and set GROQ_API_KEY, then restart.

--Scraping fails or returns a CAPTCHA message → The site may be blocking bots, require JS, or show a CAPTCHA. In that case, the tool returns a message instead of raw HTML.

--PDF summarization error → Check that the file path is correct and accessible from your machine.

About

Command-line AI research assistant using Groq + LangChain with web search, scraping, and PDF summarization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages