Dcentralab Assignment

Project Description

This project includes an efficient web scraper and a RAG (Retrieval-Augmented Generation) search system that supports both semantic and keyword-based searches.

Components:

Web Scraper:
- Scrapes the website https://cryptorank.io/all-coins-list to extract URLs for individual cryptocurrency coins.
- Subsequently, scrapes each coin's page for specific data, such as the website, social media links, and other metadata, to generate a comprehensive description.
- The extracted data is saved in:
  - full_scrape.txt: Contains detailed JSON data for the first 10 coins.
  - data_dir/description.txt: Stores token descriptions only.
- Note: The scraping is limited to 10 coins to minimize API call costs.
RAG Search System:
- Enables robust search functionality that supports both semantic and keyword-based queries.
- Provides answers to qualitative and quantitative questions about cryptocurrencies, based on the descriptions found in the data_dir/description.txt database.

Libraries Used

Llama-index
OpenAI
Crawl4AI
Flask

Running the Project

1. Create a Codespace and Build the Development Container

Use GitHub Codespaces to create a development environment.
When prompted, ensure you build the development container for proper functionality.

2. Install Dependencies and Activate the Virtual Environment

Use the provided Makefile for easy setup. Run the following command in the terminal:
```
make all
```
If the virtual environment doesn't activate after make all, try running poetry env activate on the terminal. This would provide the activation script. Use that script to activate the virtual environment.

Create a .env file:
- Use the .env_example file as a reference.
- This file should include your OpenAI API key.
- Note: This step is necessary if you plan to run scrapper.py and rag.py from the terminal.
- Alternatively, you can view the scraped content in full_scrape.txt and token descriptions in data_dir/description.txt. The scraping process is limited to 10 coins to manage API costs.
Run the Scraper and RAG Search System:
- To scrape data, execute:
```
python scrapper.py
```
- To run the RAG system, execute:
```
python rag.py
```
Run the Flask Application:
- You can skip running scrapper.py and rag.py directly.
- Instead, start the Flask app by running:
```
python app.py
```
- Running the Flask app provides a URL where you can test the RAG search system through a simple HTML, CSS, and JavaScript interface.
Note:
- The project is not deployed online; it is designed to be run locally.
- The scraping process is limited to 10 coins to efficiently manage API costs.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.github		.github
data_dir		data_dir
templates		templates
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
full_scrape.txt		full_scrape.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
rag.py		rag.py
scrapper.py		scrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dcentralab Assignment

Project Description

Components:

Libraries Used

Running the Project

1. Create a Codespace and Build the Development Container

2. Install Dependencies and Activate the Virtual Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dcentralab Assignment

Project Description

Components:

Libraries Used

Running the Project

1. Create a Codespace and Build the Development Container

2. Install Dependencies and Activate the Virtual Environment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages