This project includes an efficient web scraper and a RAG (Retrieval-Augmented Generation) search system that supports both semantic and keyword-based searches.
-
Web Scraper:
- Scrapes the website
https://cryptorank.io/all-coins-listto extract URLs for individual cryptocurrency coins. - Subsequently, scrapes each coin's page for specific data, such as the website, social media links, and other metadata, to generate a comprehensive description.
- The extracted data is saved in:
full_scrape.txt: Contains detailed JSON data for the first 10 coins.data_dir/description.txt: Stores token descriptions only.
- Note: The scraping is limited to 10 coins to minimize API call costs.
- Scrapes the website
-
RAG Search System:
- Enables robust search functionality that supports both semantic and keyword-based queries.
- Provides answers to qualitative and quantitative questions about cryptocurrencies, based on the descriptions found in the
data_dir/description.txtdatabase.
- Llama-index
- OpenAI
- Crawl4AI
- Flask
- Use GitHub Codespaces to create a development environment.
- When prompted, ensure you build the development container for proper functionality.
- Use the provided
Makefilefor easy setup. Run the following command in the terminal:If the virtual environment doesn't activate aftermake all
make all, try runningpoetry env activateon the terminal. This would provide the activation script. Use that script to activate the virtual environment.
-
Create a
.envfile:- Use the
.env_examplefile as a reference. - This file should include your OpenAI API key.
- Note: This step is necessary if you plan to run
scrapper.pyandrag.pyfrom the terminal. - Alternatively, you can view the scraped content in
full_scrape.txtand token descriptions indata_dir/description.txt. The scraping process is limited to 10 coins to manage API costs.
- Use the
-
Run the Scraper and RAG Search System:
- To scrape data, execute:
python scrapper.py
- To run the RAG system, execute:
python rag.py
- To scrape data, execute:
-
Run the Flask Application:
- You can skip running
scrapper.pyandrag.pydirectly. - Instead, start the Flask app by running:
python app.py
- Running the Flask app provides a URL where you can test the RAG search system through a simple HTML, CSS, and JavaScript interface.
- You can skip running
-
Note:
- The project is not deployed online; it is designed to be run locally.
- The scraping process is limited to 10 coins to efficiently manage API costs.