Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 98 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,97 @@
> There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living. Nelson Mandela.

# Distributed Training Framework
![Subnet25](https://github.com/HeliosPrimeOne/DistributedTraining/assets/89754687/9c2c5413-e07b-4ef0-8891-815429be032e)

## There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living. Nelson Mandela.


## Welcome to Subnet 25



## Essential Guide for Bittensor Enthusiasts:
Beginners to Bittensor are encouraged to start with the basics by visiting the Bittensor official site. This foundational step is crucial for understanding the innovative landscape Bittensor is shaping.

## Launching Subnet Hivemind-25: Our Quest:
Our ambition is to train the largest ever Large Language Model (LLM), harnessing the unique strengths of the Bittensor network's and in a completely decentralised manner. Our approach is rooted in transparency and open collaboration in the AI space.

## The Significance of Our Endeavor:
An overview of the "LMSYS Chatbot Arena Leaderboard" highlights the predominance of proprietary models. Our endeavor seeks to challenge this norm, ushering in a new phase of open and inclusive AI development practices.

![Screenshot from 2024-03-31 23-58-53](https://github.com/HeliosPrimeOne/DistributedTraining/assets/89754687/70c204a4-98ef-46d8-bda1-fc4b1aac7dd8)


## What we’ve done
After a period of intense experimentation and evaluation, we have successfully trained our inaugural model (tinygpt2), marking the first ever incentivized distributed training over the internet (incentivized with TAO).

We are currently in our next project phase (training a slightly larger GPT2 model - 677m params), and we invite you to join our mission by dedicating your computational resources towards this training run. All you have to do is run a miner, and it may just change the world.


## New Bounties
-- Improving our validation mechanism is key, reach out if you want to tackle a bounty so we are aligned on terms and timelines. (@bitcurrent on discord) - if accepted you can execute. Please propose before execution!

## Frequently Asked Questions
-- What are the minimum requirements to run a validator? A GPU with a minimum of 16GB RAM e.g. RTX A4000
-- What are the minimum requirements to run a miner? A GPU with a minimum of 16GB RAM e.g. RTX A4000

## Running a Miner on HiveMind : A Step-by-Step Guide

## Running a Miner on Testnet
For detailed instructions on how to run a miner on the testnet, please refer to the following documentation: Running a Miner on Testnet

## Prerequisites
Before you start, ensure your system meets the following requirements:

Your machine meets the minimum hardware requirements for mining on subnet 25: miner/validator GPU - 16GB RAM e.g. RTX A4000.
You have the requisite amount of tao in your wallet for registration fees (approx. 0.00001 Tao at the time of writing).
This repository requires python3.8 or higher.

## Setting Up
Clone the Repository: Start by cloning the Distributed Training repository.

git clone https://github.com/bit-current/DistributedTraining

Navigate to the Repository: Change your directory to the cloned repository.
cd DistributedTraining

Install Dependencies: Install all necessary dependencies and run post-install scripts.
pip install -e . && python post_install.py

You also need to install pm2.
On linux:
sudo apt update && sudo apt install jq && sudo apt install npm && sudo npm install pm2 -g && pm2 update
On macOS:
brew update && brew install jq && brew install npm && sudo npm install pm2 -g && pm2 update


Register on Subnet 25: To register, execute the following command:
btcli subnet register --netuid 25 --subtensor.network test --wallet.name miner --wallet.hotkey hotkey

Once you have installed this repo you can run the miner and validator with auto updates enabled using the following commands.

## To run the miner
chmod +x run_miner.sh
```pm2 start run_miner.sh --name distributed_training_miner_auto_update --
--netuid <your netuid> # Must be attained by following the instructions in the docs/running_on_*.md files
--subtensor.chain_endpoint <your chain url> # Must be attained by following the instructions in the docs/running_on_*.md files
--wallet.name <your miner wallet> # Must be created using the bittensor-cli
--wallet.hotkey <your validator hotkey> # Must be created using the bittensor-cli
--logging.debug # Run in debug mode, alternatively --logging.trace for trace mode
--miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address
```

## To run the validator
chmod +x run_validator.sh
```pm2 start run_validator.sh --name distributed_training_auto_update --
--netuid <your netuid> # Must be attained by following the instructions in the docs/running_on_*.md files
--subtensor.chain_endpoint <your chain url> # Must be attained by following the instructions in the docs/running_on_*.md files
--wallet.name <your validator wallet> # Must be created using the bittensor-cli
--wallet.hotkey <your validator hotkey> # Must be created using the bittensor-cli
--logging.debug # Run in debug mode, alternatively --logging.trace for trace mode
--axon.port <an open port to serve the bt axon on>
--dht.port <another open port to serve the dht axon on>
--dht.announce_ip <your device ip address>
```
## Distributed Training Framework

(WARNING: IN ACTIVE DEVELOPMENT)

Expand Down Expand Up @@ -33,12 +124,6 @@ cd DistributedTraining
pip install -e .
```

## Checkout the Dev Branch

```
git checkout test-lightning
```

## Build the Docker Image

```
Expand All @@ -58,7 +143,7 @@ This will be provided for you. For the latest, see the pinned post on the Discor
Add this environment variable to your `.env` file:

```
INITIAL_PEERS="/ip4/peer_ip/tcp/peer_dht_port/p2p/12D3KooWE_some_hash_that_looks_like_this_VqgXKo9EUQ4hguny9"
--miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address
```

After that, you may join the training run with:
Expand Down Expand Up @@ -98,21 +183,21 @@ pip install -e .
```
btcli regen_coldkey --mnemonic your super secret mnemonic
btcli regen_hotkey --mnemonic your super secret mnemonic
btcli s register --netuid 100 --subtensor.network test
btcli s register --netuid 25 --subtensor.network finney
```

## Miner Run Command

```
python miner.py --netuid 25 --wallet.name some_test_wallet_cold --wallet.hotkey some_test_wallet_hot --initial_peers (please add an existing miner's dht address here. Check discord pinned post or ask on discord channel)
pm2 start neurons/miner.py --interpreter python3 --name trainer -- --netuid 25 --wallet.name xxxxx --wallet.hotkey xxxxx --subtensor.network finney --logging.debug --miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address
```

## Validator

### Validators need to have at least 10 test TAO to be able to set weights.
### Validators need to have at least 1000 TAO to be able to set weights.

```
python validator.py --netuid 25 --wallet.name some_test_wallet_cold --wallet.hotkey some_test_wallet_hot --axon.external_ip your_external_ip --axon.port your_external_port --logging.debug --logging.trace --axon.ip your_extrenal_ip_still --axon.external_port your_external_port_still --flask.host_address on_device_ip_to_bind_to --flask.host_port on_device_port_to_bind_to
pm2 start neurons/validator.py --interpreter python3 --name VAL -- --netuid 25 --wallet.name xxxx --wallet.hotkey xxxx --axon.external_ip x.x.x.x --axon.port xxxx --subtensor.network finney --logging.debug --axon.ip x.x.x.x --axon.external_port xxxx --flask.host_address 0.0.0.0 --flask.host_port xxxx
```

## Bug Reporting and Contributions
Expand Down
39 changes: 39 additions & 0 deletions entrypoint-miner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Hive Mining Script

This script is designed to execute the HiveOS miner using `hiveminer.py` from HiveOS, a popular open-source mining platform for Monero (XMR) and other cryptocurrencies.

## Prerequisites

1. Make sure you have Python3 installed on your system.
2. Install HiveOS by following the official installation guide: [HiveOS Installation Guide](https://docs.hiveos.farm/install/)

## Usage

```bash
#!/bin/bash

python3 hivetrain/hiveminer.py \
--initial_peers ${INITIAL_PEERS} \
--batch_size ${BATCH_SIZE} \
--save_every ${SAVE_EVERY}
```

The script includes the following command which launches HiveOS miner with customized options:

```bash
python3 hivetrain/hiveminer.py [options]
```

## Options

- `--initial_peers <peer1>:<port>,<peer2>:<port>,...`: A comma-separated list of initial peers to connect to the HiveOS network.
- `--batch_size <integer>`: The batch size for mining, which is a measure of how many transactions can be processed in one round before sending them off to the blockchain.
- `--save_every <number_of_blocks>`: Saves the current miner state every `<number_of_blocks>` blocks mined. This option helps maintain the miner's progress and settings across system restarts or crashes.

## Running the Script

1. Save the script in a file, let's call it `mine.sh`.
2. Make sure to replace `${INITIAL_PEERS}`, `${BATCH_SIZE}`, and `${SAVE_EVERY}` with the desired values for your mining setup.
3. Run the script using: `bash mine.sh`

This script is just a simple wrapper around the HiveOS miner, providing an easy way to launch it with custom options from the command line.
33 changes: 33 additions & 0 deletions entrypoint-validator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# HiveTrain Validator Script

This script is designed to execute the `validator.py` file from the `hivetrain` directory using Python 3 and its associated dependencies. The script sets the interpreter as `/bin/bash`.

```bash
#!/bin/bash

# Start the execution of validator.py with Python 3
python3 hivetrain/validator.py \
```

## Prerequisites

- A working Python environment (Python 3 or later) should be installed and configured on your system.
- The HiveTrain project, which includes the `validator.py` file, should be present in the specified directory `hivetrain`.

## Usage

1. Make sure you have all required packages for running the script.
2. Save the script in a `.sh` file with an appropriate name, e.g., `run_validator.sh`.
3. Grant execution permissions to the script using the command: `chmod +x run_validator.sh`
4. Run the script using: `./run_validator.sh`

## Configuration

The script does not include any specific configuration options, but you can modify the arguments passed to the `validator.py` script in the following line:

```bash
python3 hivetrain/validator.py \
--port 4000
```

Replace the value of `--port 4000` with your desired configuration option, if needed. For more information about available options, consult the `validator.py` documentation or use the command `python3 hivetrain/validator.py -h`.
22 changes: 22 additions & 0 deletions hivetrain/__init__.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Module Documentation

## Overview
This module provides functions related to managing the version of the software. It also imports necessary sub-modules for interacting with the BTT connector and handling authentication.

```python
__version__ = "0.3.0"
version_split = __version__.split(".")
__spec_version__ = (100 * int(version_split[0])) + (10 * int(version_split[1])) + (1 * int(version_split[2]))
```

### Version Management
The version of the software is stored as a string in the `__version__` variable. The script then splits this string into its major, minor, and patch components using the `split()` method. These components are then used to calculate the semantic version number and store it in the `__spec_version__` variable.

The calculated semantic version number is obtained by multiplying each component with a factor (100 for major, 10 for minor, and 1 for patch) and summing the results. This calculation follows the standard semantic versioning format, where the first digit represents the major version, the second digit represents the minor version, and the third digit represents the patch version.

## Imported Sub-Modules
### btt_connector
The `btt_connector` sub-module contains functions for connecting to a BTT device or server using various communication protocols. It abstracts away the complexities of setting up connections, allowing users to interact with the devices in a simple and efficient way.

### auth
The `auth` sub-module provides functions for handling authentication procedures, such as login and logout. These functions ensure secure communication between the software and remote servers or devices by implementing proper encryption, token management, and error handling mechanisms.
51 changes: 51 additions & 0 deletions hivetrain/auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Authenticate Request with Bittensor Decorator

This Python script defines a decorator function called `authenticate_request_with_bittensor` that can be used to authenticate and authorize incoming requests in Flask applications. The decorator uses Bittensor's metagraph, wallet, and rate limiter for authentication and verification.

## Dependencies

To use the `authenticate_request_with_bittensor` decorator, you need to import the following modules:

- `functools`: For using the `wraps()` function.
- `flask`: For accessing the request data and creating response objects.
- `bittensor`: The Bittensor library for handling various tasks such as metagraph, wallet, and rate limiter.
- `logging`: For logging error messages.
- `substrateinterface`: For handling public key verification.

Additionally, you need to import the necessary functions, classes, and variables from the specified modules.

## Metagraph Syncing

Although not included in the provided code snippet, it's recommended to ensure that the metagraph is synced before using it in the decorator by uncommenting `metagraph = bittensor.metagraph()`. This can be done outside the decorator function.

## Logger Initialization

The logger is initialized with a minimum log level of DEBUG:

```python
logger = logging.getLogger('waitress')
logger.setLevel(logging.DEBUG)
```

## Decorator Logic

The `authenticate_request_with_bittensor` decorator function takes an existing function `f` as its argument and returns a new decorated function. The decorated function checks the incoming request data for required authentication information such as message, signature, public address, and miner version. It then performs the following checks:

1. Check if the necessary data is present in the request. If not, return an error with status code 400.
2. Check if the miner version is correct. If not, return an error with status code 403.
3. Check if the public address is registered in the metagraph. If not, return an error with status code 403.
4. Perform signature verification using either Bittensor's wallet or Substrateinterface.
5. Check if the rate limiter allows the request from the given public address. If not, return an error with status code 429.

If all checks pass, the decorated function is executed with the original arguments and keyword arguments. Otherwise, an appropriate error message and response are returned.

## Usage

To use this decorator in your Flask application, you can simply apply it to the endpoint or view function:

```python
@app.route('/some_endpoint')
@authenticate_request_with_bittensor
def some_function():
# Function logic goes here
```
Loading