diff --git a/README.md b/README.md index ef9c107..8c710cb 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,97 @@ -> There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living. Nelson Mandela. -# Distributed Training Framework +![Subnet25](https://github.com/HeliosPrimeOne/DistributedTraining/assets/89754687/9c2c5413-e07b-4ef0-8891-815429be032e) + +## There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living. Nelson Mandela. + + +## Welcome to Subnet 25 + + + +## Essential Guide for Bittensor Enthusiasts: +Beginners to Bittensor are encouraged to start with the basics by visiting the Bittensor official site. This foundational step is crucial for understanding the innovative landscape Bittensor is shaping. + +## Launching Subnet Hivemind-25: Our Quest: +Our ambition is to train the largest ever Large Language Model (LLM), harnessing the unique strengths of the Bittensor network's and in a completely decentralised manner. Our approach is rooted in transparency and open collaboration in the AI space. + +## The Significance of Our Endeavor: +An overview of the "LMSYS Chatbot Arena Leaderboard" highlights the predominance of proprietary models. Our endeavor seeks to challenge this norm, ushering in a new phase of open and inclusive AI development practices. + +![Screenshot from 2024-03-31 23-58-53](https://github.com/HeliosPrimeOne/DistributedTraining/assets/89754687/70c204a4-98ef-46d8-bda1-fc4b1aac7dd8) + + +## What we’ve done +After a period of intense experimentation and evaluation, we have successfully trained our inaugural model (tinygpt2), marking the first ever incentivized distributed training over the internet (incentivized with TAO). + +We are currently in our next project phase (training a slightly larger GPT2 model - 677m params), and we invite you to join our mission by dedicating your computational resources towards this training run. All you have to do is run a miner, and it may just change the world. + + +## New Bounties +-- Improving our validation mechanism is key, reach out if you want to tackle a bounty so we are aligned on terms and timelines. (@bitcurrent on discord) - if accepted you can execute. Please propose before execution! + +## Frequently Asked Questions +-- What are the minimum requirements to run a validator? A GPU with a minimum of 16GB RAM e.g. RTX A4000 +-- What are the minimum requirements to run a miner? A GPU with a minimum of 16GB RAM e.g. RTX A4000 + +## Running a Miner on HiveMind : A Step-by-Step Guide + +## Running a Miner on Testnet +For detailed instructions on how to run a miner on the testnet, please refer to the following documentation: Running a Miner on Testnet + +## Prerequisites +Before you start, ensure your system meets the following requirements: + +Your machine meets the minimum hardware requirements for mining on subnet 25: miner/validator GPU - 16GB RAM e.g. RTX A4000. +You have the requisite amount of tao in your wallet for registration fees (approx. 0.00001 Tao at the time of writing). +This repository requires python3.8 or higher. + +## Setting Up +Clone the Repository: Start by cloning the Distributed Training repository. + +git clone https://github.com/bit-current/DistributedTraining + +Navigate to the Repository: Change your directory to the cloned repository. +cd DistributedTraining + +Install Dependencies: Install all necessary dependencies and run post-install scripts. +pip install -e . && python post_install.py + +You also need to install pm2. +On linux: +sudo apt update && sudo apt install jq && sudo apt install npm && sudo npm install pm2 -g && pm2 update +On macOS: +brew update && brew install jq && brew install npm && sudo npm install pm2 -g && pm2 update + + +Register on Subnet 25: To register, execute the following command: +btcli subnet register --netuid 25 --subtensor.network test --wallet.name miner --wallet.hotkey hotkey + +Once you have installed this repo you can run the miner and validator with auto updates enabled using the following commands. + +## To run the miner +chmod +x run_miner.sh +```pm2 start run_miner.sh --name distributed_training_miner_auto_update -- + --netuid # Must be attained by following the instructions in the docs/running_on_*.md files + --subtensor.chain_endpoint # Must be attained by following the instructions in the docs/running_on_*.md files + --wallet.name # Must be created using the bittensor-cli + --wallet.hotkey # Must be created using the bittensor-cli + --logging.debug # Run in debug mode, alternatively --logging.trace for trace mode + --miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address +``` + +## To run the validator +chmod +x run_validator.sh +```pm2 start run_validator.sh --name distributed_training_auto_update -- + --netuid # Must be attained by following the instructions in the docs/running_on_*.md files + --subtensor.chain_endpoint # Must be attained by following the instructions in the docs/running_on_*.md files + --wallet.name # Must be created using the bittensor-cli + --wallet.hotkey # Must be created using the bittensor-cli + --logging.debug # Run in debug mode, alternatively --logging.trace for trace mode + --axon.port + --dht.port + --dht.announce_ip +``` +## Distributed Training Framework (WARNING: IN ACTIVE DEVELOPMENT) @@ -33,12 +124,6 @@ cd DistributedTraining pip install -e . ``` -## Checkout the Dev Branch - -``` -git checkout test-lightning -``` - ## Build the Docker Image ``` @@ -58,7 +143,7 @@ This will be provided for you. For the latest, see the pinned post on the Discor Add this environment variable to your `.env` file: ``` -INITIAL_PEERS="/ip4/peer_ip/tcp/peer_dht_port/p2p/12D3KooWE_some_hash_that_looks_like_this_VqgXKo9EUQ4hguny9" +--miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address ``` After that, you may join the training run with: @@ -98,21 +183,21 @@ pip install -e . ``` btcli regen_coldkey --mnemonic your super secret mnemonic btcli regen_hotkey --mnemonic your super secret mnemonic -btcli s register --netuid 100 --subtensor.network test +btcli s register --netuid 25 --subtensor.network finney ``` ## Miner Run Command ``` -python miner.py --netuid 25 --wallet.name some_test_wallet_cold --wallet.hotkey some_test_wallet_hot --initial_peers (please add an existing miner's dht address here. Check discord pinned post or ask on discord channel) +pm2 start neurons/miner.py --interpreter python3 --name trainer -- --netuid 25 --wallet.name xxxxx --wallet.hotkey xxxxx --subtensor.network finney --logging.debug --miner.bootstrapping_server http://35.239.40.23:4999/return_dht_address ``` ## Validator -### Validators need to have at least 10 test TAO to be able to set weights. +### Validators need to have at least 1000 TAO to be able to set weights. ``` -python validator.py --netuid 25 --wallet.name some_test_wallet_cold --wallet.hotkey some_test_wallet_hot --axon.external_ip your_external_ip --axon.port your_external_port --logging.debug --logging.trace --axon.ip your_extrenal_ip_still --axon.external_port your_external_port_still --flask.host_address on_device_ip_to_bind_to --flask.host_port on_device_port_to_bind_to +pm2 start neurons/validator.py --interpreter python3 --name VAL -- --netuid 25 --wallet.name xxxx --wallet.hotkey xxxx --axon.external_ip x.x.x.x --axon.port xxxx --subtensor.network finney --logging.debug --axon.ip x.x.x.x --axon.external_port xxxx --flask.host_address 0.0.0.0 --flask.host_port xxxx ``` ## Bug Reporting and Contributions diff --git a/entrypoint-miner.md b/entrypoint-miner.md new file mode 100644 index 0000000..bc27dcb --- /dev/null +++ b/entrypoint-miner.md @@ -0,0 +1,39 @@ + # Hive Mining Script + +This script is designed to execute the HiveOS miner using `hiveminer.py` from HiveOS, a popular open-source mining platform for Monero (XMR) and other cryptocurrencies. + +## Prerequisites + +1. Make sure you have Python3 installed on your system. +2. Install HiveOS by following the official installation guide: [HiveOS Installation Guide](https://docs.hiveos.farm/install/) + +## Usage + +```bash +#!/bin/bash + +python3 hivetrain/hiveminer.py \ + --initial_peers ${INITIAL_PEERS} \ + --batch_size ${BATCH_SIZE} \ + --save_every ${SAVE_EVERY} +``` + +The script includes the following command which launches HiveOS miner with customized options: + +```bash +python3 hivetrain/hiveminer.py [options] +``` + +## Options + +- `--initial_peers :,:,...`: A comma-separated list of initial peers to connect to the HiveOS network. +- `--batch_size `: The batch size for mining, which is a measure of how many transactions can be processed in one round before sending them off to the blockchain. +- `--save_every `: Saves the current miner state every `` blocks mined. This option helps maintain the miner's progress and settings across system restarts or crashes. + +## Running the Script + +1. Save the script in a file, let's call it `mine.sh`. +2. Make sure to replace `${INITIAL_PEERS}`, `${BATCH_SIZE}`, and `${SAVE_EVERY}` with the desired values for your mining setup. +3. Run the script using: `bash mine.sh` + +This script is just a simple wrapper around the HiveOS miner, providing an easy way to launch it with custom options from the command line. \ No newline at end of file diff --git a/entrypoint-validator.md b/entrypoint-validator.md new file mode 100644 index 0000000..6a73e40 --- /dev/null +++ b/entrypoint-validator.md @@ -0,0 +1,33 @@ + # HiveTrain Validator Script + +This script is designed to execute the `validator.py` file from the `hivetrain` directory using Python 3 and its associated dependencies. The script sets the interpreter as `/bin/bash`. + +```bash +#!/bin/bash + +# Start the execution of validator.py with Python 3 +python3 hivetrain/validator.py \ +``` + +## Prerequisites + +- A working Python environment (Python 3 or later) should be installed and configured on your system. +- The HiveTrain project, which includes the `validator.py` file, should be present in the specified directory `hivetrain`. + +## Usage + +1. Make sure you have all required packages for running the script. +2. Save the script in a `.sh` file with an appropriate name, e.g., `run_validator.sh`. +3. Grant execution permissions to the script using the command: `chmod +x run_validator.sh` +4. Run the script using: `./run_validator.sh` + +## Configuration + +The script does not include any specific configuration options, but you can modify the arguments passed to the `validator.py` script in the following line: + +```bash +python3 hivetrain/validator.py \ + --port 4000 +``` + +Replace the value of `--port 4000` with your desired configuration option, if needed. For more information about available options, consult the `validator.py` documentation or use the command `python3 hivetrain/validator.py -h`. \ No newline at end of file diff --git a/hivetrain/__init__.md b/hivetrain/__init__.md new file mode 100644 index 0000000..2d7a63c --- /dev/null +++ b/hivetrain/__init__.md @@ -0,0 +1,22 @@ + # Module Documentation + +## Overview +This module provides functions related to managing the version of the software. It also imports necessary sub-modules for interacting with the BTT connector and handling authentication. + +```python +__version__ = "0.3.0" +version_split = __version__.split(".") +__spec_version__ = (100 * int(version_split[0])) + (10 * int(version_split[1])) + (1 * int(version_split[2])) +``` + +### Version Management +The version of the software is stored as a string in the `__version__` variable. The script then splits this string into its major, minor, and patch components using the `split()` method. These components are then used to calculate the semantic version number and store it in the `__spec_version__` variable. + +The calculated semantic version number is obtained by multiplying each component with a factor (100 for major, 10 for minor, and 1 for patch) and summing the results. This calculation follows the standard semantic versioning format, where the first digit represents the major version, the second digit represents the minor version, and the third digit represents the patch version. + +## Imported Sub-Modules +### btt_connector +The `btt_connector` sub-module contains functions for connecting to a BTT device or server using various communication protocols. It abstracts away the complexities of setting up connections, allowing users to interact with the devices in a simple and efficient way. + +### auth +The `auth` sub-module provides functions for handling authentication procedures, such as login and logout. These functions ensure secure communication between the software and remote servers or devices by implementing proper encryption, token management, and error handling mechanisms. \ No newline at end of file diff --git a/hivetrain/auth.md b/hivetrain/auth.md new file mode 100644 index 0000000..339d38a --- /dev/null +++ b/hivetrain/auth.md @@ -0,0 +1,51 @@ + # Authenticate Request with Bittensor Decorator + +This Python script defines a decorator function called `authenticate_request_with_bittensor` that can be used to authenticate and authorize incoming requests in Flask applications. The decorator uses Bittensor's metagraph, wallet, and rate limiter for authentication and verification. + +## Dependencies + +To use the `authenticate_request_with_bittensor` decorator, you need to import the following modules: + +- `functools`: For using the `wraps()` function. +- `flask`: For accessing the request data and creating response objects. +- `bittensor`: The Bittensor library for handling various tasks such as metagraph, wallet, and rate limiter. +- `logging`: For logging error messages. +- `substrateinterface`: For handling public key verification. + +Additionally, you need to import the necessary functions, classes, and variables from the specified modules. + +## Metagraph Syncing + +Although not included in the provided code snippet, it's recommended to ensure that the metagraph is synced before using it in the decorator by uncommenting `metagraph = bittensor.metagraph()`. This can be done outside the decorator function. + +## Logger Initialization + +The logger is initialized with a minimum log level of DEBUG: + +```python +logger = logging.getLogger('waitress') +logger.setLevel(logging.DEBUG) +``` + +## Decorator Logic + +The `authenticate_request_with_bittensor` decorator function takes an existing function `f` as its argument and returns a new decorated function. The decorated function checks the incoming request data for required authentication information such as message, signature, public address, and miner version. It then performs the following checks: + +1. Check if the necessary data is present in the request. If not, return an error with status code 400. +2. Check if the miner version is correct. If not, return an error with status code 403. +3. Check if the public address is registered in the metagraph. If not, return an error with status code 403. +4. Perform signature verification using either Bittensor's wallet or Substrateinterface. +5. Check if the rate limiter allows the request from the given public address. If not, return an error with status code 429. + +If all checks pass, the decorated function is executed with the original arguments and keyword arguments. Otherwise, an appropriate error message and response are returned. + +## Usage + +To use this decorator in your Flask application, you can simply apply it to the endpoint or view function: + +```python +@app.route('/some_endpoint') +@authenticate_request_with_bittensor +def some_function(): + # Function logic goes here +``` \ No newline at end of file diff --git a/hivetrain/btt_connector.md b/hivetrain/btt_connector.md new file mode 100644 index 0000000..e91d5f3 --- /dev/null +++ b/hivetrain/btt_connector.md @@ -0,0 +1,107 @@ + # Bittensor Network Module Documentation + +This documentation describes the logic and functionality of the `BittensorNetwork` Python module. The module is designed to interact with the Bittensor network, managing wallets, subtensors, metagraphs, configurations, and various network-related tasks. + +## Contents +1. [Import Statements](#import-statements) +2. [Initialization](#initialization) +3. [Functions](#functions) + * [initialize_bittensor_objects()](#initialize_bittensor_objects) + * [check_registered(netuid)](#check_registered) + * [resync_metagraph()] + * [should_sync_metagraph(last_sync_time, sync_interval)] + * [sync(last_sync_time, sync_interval, config)] + * [serve_extrinsic()] + * [serve_axon(netuid, host_address, external_address, host_port, external_port)] +4. [BittensorNetwork Class](#bittensornetwork-class) + * [__new__(cls)] + * [initialize(config)] + * [set_weights(scores)] + * [should_set_weights()] + * [detect_metric_anomaly(metric="loss", OUTLIER_THRESHOLD=2, MEDIAN_ABSOLUTE_DEVIATION=True)] + * [run_evaluation()] + * [rate_limiter(public_address, n=10, t=60)] + +## Import Statements + +The module imports the following libraries and modules: + +```python +import bittensor as bt +import copy +import math +import numpy as np +import torch +import time +from typing import List, Tuple +import bittensor.utils.networking as net +import threading +import logging +from . import __spec_version__ +``` + +## Initialization + +The `BittensorNetwork` module initializes the necessary objects for interacting with the Bittensor network. These objects include wallets, subtensors, metagraphs, and configurations. The initialization process checks if a hotkey is registered on the specified netuid, creates new objects if not, and sets up various locks for thread safety. + +## Functions + +### `initialize_bittensor_objects()` + +This function initializes the Bittensor wallet, subtensor, metagraph, and configuration based on the provided base config. If the config's mock flag is set to True, mock objects are created instead of real ones for testing purposes. + +### `check_registered(netuid)` + +Checks if a hotkey with the specified netuid is registered on the current Subtensor network and prints an error message if not. + +### `resync_metagraph()` + +Resynchronizes the metagraph with the latest state from the Bittensor network, updating the local copy of the metagraph object. + +### `should_sync_metagraph(last_sync_time, sync_interval)` + +Determines if the metagraph should be synced based on the last sync time and specified sync interval. + +### `sync(last_sync_time, sync_interval, config)` + +Synchronizes the metagraph with the latest state from the Bittensor network if it's been enough time since the last sync. The function also handles any errors or exceptions that may occur during synchronization. + +### `serve_extrinsic()` + +Subscribes a Bittensor endpoint to the subtensor chain by serving an extrinsic with the specified parameters. The function checks if the axon information is up-to-date before attempting to serve and returns True if successful or False otherwise. + +### `serve_axon(netuid, host_address, external_address, host_port, external_port)` + +Initializes a new Axon instance with the specified parameters and serves it on the network using the Subtensor API. The function returns the newly created Axon object. + +## BittensorNetwork Class + +The `BittensorNetwork` class is a singleton designed to manage various aspects of the Bittensor network, such as wallets, subtensors, metagraphs, configurations, and thread safety. + +### `__new__(cls)` + +Initializes a new instance of the `BittensorNetwork` class, creating instances of the required objects (wallet, subtensor, metagraph, and config), registering the hotkey on the network if necessary, and setting up locks for thread safety. + +### `initialize(config)` + +Initializes the Bittensor wallet, subtensor, metagraph, and configuration based on the provided base config, just like the `initialize_bittensor_objects()` function. + +### `set_weights(scores)` + +Sets the neuron weights with the specified scores on the Subtensor network. The function processes the raw scores to fit within Subtensor's limitations and sends them to the network for storage. + +### `should_set_weights()` + +Determines if the neuron weights should be updated based on the current block and last update of the metagraph. Returns a boolean value indicating whether it's time to set new weights. + +### `detect_metric_anomaly(metric="loss", OUTLIER_THRESHOLD=2, MEDIAN_ABSOLUTE_DEVIATION=True)` + +Detects anomalies in the specified metric for each miner by calculating their scores based on whether they are outliers. Returns a dictionary with public addresses as keys and their corresponding scores as values. + +### `run_evaluation()` + +Evaluates the miners based on their model checksum consensus or metric anomalies, setting new weights accordingly if necessary. Clears both the `model_checksums` and `metrics_data` dictionaries after each evaluation. + +### `rate_limiter(public_address, n=10, t=60)` + +Checks if the specified public address has exceeded the maximum number of requests within the given time window. If so, it adds the address to a blacklist for the specified time period. Otherwise, it allows the request to proceed. \ No newline at end of file diff --git a/hivetrain/config/__init__.md b/hivetrain/config/__init__.md new file mode 100644 index 0000000..7ef6fde --- /dev/null +++ b/hivetrain/config/__init__.md @@ -0,0 +1,29 @@ + # Subnet Configuration Logic + +This documentation describes the logic behind a Python script that imports modules from various files to configure subnets. + +## Dependencies + +The following Python code relies on three imported modules: `base_subnet_config`, `config`, and `hivetrain_config`. These modules provide different functionalities necessary for the subnet configuration logic. + +```python +from .base_subnet_config import * # Importing functions and variables from base_subnet_config +from .config import check_config, Configurator # Importing check_config function and Configurator class from config module +from .hivetrain_config import * # Importing functions and variables from hivetrain_config +``` + +## Base Subnet Configuration + +The `base_subnet_config` module is the foundation for our subnet configuration logic. It contains various configurations, such as CIDR blocks, IP ranges, and other constants needed to configure subnets. By importing all functions and variables from this module, we can access these configurations throughout our script. + +## Config Validation and Classes + +The `config` module is responsible for checking the validity of configuration files using the `check_config` function. It also provides a custom `Configurator` class to simplify the process of managing and applying configs. The `Configurator` class may include methods or properties that help our script interact with these configurations. + +## HiTrain Configuration + +The `hivetrain_config` module likely contains configurations specific to a HiTrain project, such as network settings for various components within the system. By importing all functions and variables from this module, we can access those configurations when configuring subnets for the HiTrain project. + +## Combining Logic + +The exact usage of these imported modules depends on the implementation details of your specific Python script. The logic behind the subnet configuration process would likely involve validating the provided configuration files, importing required configurations from various modules, and applying these configurations to create or update subnets accordingly. \ No newline at end of file diff --git a/hivetrain/config/base_subnet_config.md b/hivetrain/config/base_subnet_config.md new file mode 100644 index 0000000..a083825 --- /dev/null +++ b/hivetrain/config/base_subnet_config.md @@ -0,0 +1,181 @@ + # The MIT License (MIT) +# Copyright © 2023 Yuma Rao +# Copyright © 2023 Opentensor Foundation + +# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated +# documentation files (the “Software”), to deal in the Software without restriction, including without limitation +# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, +# and to permit persons to whom the Software is furnished to do so, subject to the following conditions: + +# The above copyright notice and this permission notice shall be included in all copies or substantial portions of +# the Software. + +# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO +# THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +# DEALINGS IN THE SOFTWARE. + +```python +# Import necessary libraries and modules +import os +import torch +import argparse +import bittensor as bt +from loguru import logger + +# Checks/validates the config namespace object +def check_config(config): + """Checks and validates the configuration object.""" + # Call check_config function from logging module in bittensor package + bt.logging.check_config(RCFile=config) + + # Define full path for neuron directory using config object values + full_path = os.path.expanduser( + "{}/{}/{}/netuid{}/{}".format( + config.logging.logging_dir, # TODO: change from ~/.bittensor/miners to ~/.bittensor/neurons + config.wallet.name, + config.wallet.hotkey, + config.netuid, + config.neuron.name, + ) + ) + + # Assign expanded user path to neuron's full_path attribute + config.neuron.full_path = os.path.expanduser(full_path) + + # If not set to save events, create and configure event logger + if not config.neuron.dont_save_events: + logger.level("EVENTS", no=38, icon="📝") + logger.add( + os.path.join(config.neuron.full_path, "events.log"), + rotation=config.neuron.events_retention_size, + serialize=True, + enqueue=True, + backtrace=False, + diagnose=False, + level="EVENTS", + format="{time:YYYY-MM-DD at HH:mm:ss} | {level} | {message}", + ) + +# Add relevant arguments to the parser for neuron operation +def add_neuron_args(parser): + """Adds neuron specific arguments to the parser.""" + parser.add_argument("--netuid", type=int, help="Subnet netuid", default=1) + + parser.add_argument( + "--neuron.device", + type=str, + help="Device to run on.", + default="cuda" if torch.cuda.is_available() else "cpu", + ) + + parser.add_argument( + "--neuron.epoch_length", + type=int, + help="The default epoch length (how often we set weights, measured in 12 second blocks).", + default=100, + ) + + parser.add_argument( + "--mock", + action="store_true", + help="Mock neuron and all network components.", + default=False, + ) + + parser.add_argument( + "--neuron.events_retention_size", + type=str, + help="Events retention size.", + default="2 GB", + ) + + parser.add_argument( + "--neuron.dont_save_events", + action="store_true", + help="If set, we dont save events to a log file.", + default=False, + ) + + parser.add_argument( + "--neuron.initial_peers", + type=str, + nargs="+", + help="Initial peers to connect to.", + default=None, + ) + +# Add arguments specific to miner operation +def add_miner_args(parser): + """Adds miner specific arguments to the parser.""" + + parser.add_argument( + "--blacklist.force_validator_permit", + action="store_true", + help="If set, we will force incoming requests to have a permit.", + default=False, + ) + + parser.add_argument( + "--blacklist.allow_non_registered", + action="store_true", + help="If set, miners will accept queries from non registered entities. (Dangerous!)", + default=False, + ) + +# Add arguments specific to validator operation +def add_validator_args(parser): + """Adds validator specific arguments to the parser.""" + + parser.add_argument( + "--neuron.timeout", + type=float, + help="The timeout for each forward call in seconds.", + default=10, + ) + + parser.add_argument( + "--neuron.num_concurrent_forwards", + type=int, + help="The number of concurrent forwards running at any time.", + default=1, + ) + + parser.add_argument( + "--neuron.sample_size", + type=int, + help="The number of miners to query in a single step.", + default=50, + ) + + parser.add_argument( + "--neuron.disable_set_weights", + action="store_true", + help="Disables setting weights.", + default=False, + ) + + parser.add_argument( + "--neuron.moving_average_alpha", + type=float, + help="Moving average alpha parameter, how much to add of the new observation.", + default=0.1, + ) + + parser.add_argument( + "--neuron.axon_off", + "--axon_off", + action="store_true", + help="Set this flag to not attempt to serve an Axon.", + default=False, + ) + + parser.add_argument( + "--neuron.vpermit_tao_limit", + type=int, + help="The maximum number of TAO allowed to query a validator with a vpermit.", + default=4096, + ) +``` +This Python script is part of the Bittensor network implementation. It includes several functions and classes for configuring and running a neuron on the network using the provided configuration object and command line arguments. The code also includes checks to validate the configuration object and create necessary directories for logging events. The `check_config` function validates the `bt.Config` object, sets the full path of the neuron directory based on configuration values, and initializes the event logger if not disabled. The `add_neuron_args`, `add_miner_args`, and `add_validator_args` functions add relevant command line arguments to the parser for specific use cases (neuron, miner, or validator) when using the script. \ No newline at end of file diff --git a/hivetrain/config/config.md b/hivetrain/config/config.md new file mode 100644 index 0000000..3e36226 --- /dev/null +++ b/hivetrain/config/config.md @@ -0,0 +1,75 @@ + # Bittensor Configurator Documentation + +This Python script outlines the configuration setup for a Bittensor node. It utilizes various libraries, including Torch, Argparse, and the logging library Loguru. The script also imports custom configurations from `hivetrain_config` and `base_subnet_config`. + +## Importing Required Libraries +```python +import os +import torch +import argparse +import bittensor as bt +from loguru import logger +from argparse import ArgumentParser +import bittensor.hivetrain_config as hivetrain_config # s, bittensor.base_subnet_config as base_subnet_config +from .hivetrain_config import add_meta_miner_args, add_orchestrator_args, add_torch_miner_args, add_validator_args +from .base_subnet_config import add_neuron_args, add_validator_args, add_miner_args +``` +Here, we import the necessary modules and functions for our script. We also import custom configuration classes from `hivetrain_config` and `base_subnet_config`. + +## Checking the Config Object +```python +def check_config(cls, config: "bt.Config"): + """Checks/validates the config namespace object.""" + bt.logging.check_config(config) + + full_path = os.path.expanduser( + "{}/{}/{}/netuid{}/{}".format( + config.logging.logging_dir, # TODO: change from ~/.bittensor/miners to ~/.bittensor/neurons + config.wallet.name, + config.wallet.hotkey, + config.netuid, + config.neuron.name, + ) + ) + print("full path:", full_path) + config.neuron.full_path = os.path.expanduser(full_path) + if not os.path.exists(config.neuron.full_path): + os.makedirs(config.neuron.full_path, exist_ok=True) + + if not config.neuron.dont_save_events: + # Add custom event logger for the events. + logger.level("EVENTS", no=38, icon="📝") + logger.add( + os.path.join(config.neuron.full_path, "events.log"), + rotation=config.neuron.events_retention_size, + serialize=True, + enqueue=True, + backtrace=False, + diagnose=False, + level="EVENTS", + format="{time:YYYY-MM-DD at HH:mm:ss} | {level} | {message}", + ) +``` +The `check_config` function checks the validity and existence of the configuration object. It sets the full path to the neuron directory and creates it if it doesn't exist. Additionally, if `neuron.dont_save_events` is set to false, a custom event logger is added for the events. + +## Configurator Class +```python +class Configurator: + @staticmethod + def combine_configs(): + parser = ArgumentParser(description="Unified Configuration for Bittensor") + bt.wallet.add_args(parser) + bt.subtensor.add_args(parser) + bt.logging.add_args(parser) + bt.axon.add_args(parser) + + add_torch_miner_args(parser) + add_meta_miner_args(parser) + add_orchestrator_args(parser) + add_neuron_args(parser) + add_miner_args(parser) + add_validator_args(parser) + args = parser.parse_args() + return bt.config(parser)[0] +``` +The `Configurator` class combines all the arguments into a single configuration object using Argparse. The resulting configuration can be accessed by calling the `combine_configs` method. \ No newline at end of file diff --git a/hivetrain/config/hivetrain_config.md b/hivetrain/config/hivetrain_config.md new file mode 100644 index 0000000..2357d05 --- /dev/null +++ b/hivetrain/config/hivetrain_config.md @@ -0,0 +1,70 @@ + # Bittensor Meta-Miner and Torch Miner Arguments Documentation + +## Table of Contents +1. [Introduction](#intro) +2. [Importing Necessary Libraries and Modules](#imports) +3. [Defining Argument Parsers for Meta-Miner and Torch Miner](#args) +4. [`add_meta_miner_args` Function](#add-meta-miner-args) +5. [`add_torch_miner_args` Function](#add-torch-miner-args) +6. [`add_orchestrator_args` Function](#add-orchestrator-args) +7. [Conclusion](#conclusion) + + +## Introduction +This documentation explains the logic behind the provided Python code, which consists of several functions and imports necessary for running Bittensor Meta-Miner and Torch Miner applications. The code snippet is structured to add custom arguments to `argparse` for easier configuration of these applications. + + +## Importing Necessary Libraries and Modules +First, the script imports essential libraries and modules: + +1. `os`: Operating system dependent functionality +2. `torch`: PyTorch machine learning library +3. `argparse`: For parsing command-line arguments +4. `bittensor as bt`: Bittensor framework for decentralized machine learning +5. `logger from loguru`: For logging messages + + +## Defining Argument Parsers for Meta-Miner and Torch Miner +The script then defines three functions: + +1. `add_meta_miner_args(parser)` +2. `add_torch_miner_args(parser)` +3. `add_orchestrator_args(parser)` + +These functions are used to add specific arguments for Meta-Miner, Torch Miner, and Orchestrator respectively. + + +## `add_meta_miner_args` Function +The `add_meta_miner_args(parser)` function sets up the arguments for Meta-Miner: + +1. **Boolean flag -** `--meta-miner.log-activity`: Displays logging message every request +2. **String argument -** `--meta-miner.orchestrator-url`: URL of the orchestrator +3. **String default argument -** `--miner-script`: The miner script to execute for training (default value: "miner_cpu.py") +4. **Integer arguments -** `--miner.batch-size`, `--miner.epochs`: Batch size per forward/backward pass and number of epochs to train +5. **String list arguments -** `--miner.validator-urls`, `--miner.tcp-store-address`: URLs of the validators for local testing only (accepts multiple values) +6. **String argument -** `--bootstrapping_server`: Bootstrapping server address +7. **String argument -** `--flask.host_address`: URLs of the validators for local testing only +8. **Integer argument -** `--flask.host_port`: URLs of the validators for local testing only + + +## `add_torch_miner_args(parser)` Function +The `add_torch_miner_args(parser)` function sets up the arguments for Torch Miner: + +1. **Integer argument -** `--rank`: Rank of process/node in training run +2. **Integer argument -** `--world-size`: Number of processes/nodes in training run +3. **String argument -** `--store-address`: IP/URL of the TCPStore +4. **Integer argument -** `--store-port`: Port of the test TCPStore +5. **List argument -** `--initial_peers`: Add a peer. Can be used multiple times to pass multiple peers. +6. **Integer argument -** `--batch_size`: The largest batch size able to fit on your GPU. +7. **Integer argument -** `--save_every`: Save the model every X global steps. + + +## `add_orchestrator_args(parser)` Function +The `add_orchestrator_args(parser)` function sets up the arguments for Orchestrator: + +1. **Integer argument -** `--port`: Port for the orchestrator +2. **String argument -** `--host-address`: Host address for the orchestrator + + +## Conclusion +These functions provide a clear way to define and parse custom arguments specific to Bittensor Meta-Miner, Torch Miner, and Orchestrator. This modular approach makes it easy to configure these applications based on the required settings. \ No newline at end of file diff --git a/hivetrain/utils/auto_update.md b/hivetrain/utils/auto_update.md new file mode 100644 index 0000000..3a9194c --- /dev/null +++ b/hivetrain/utils/auto_update.md @@ -0,0 +1,36 @@ + # Monitor GitHub repository and run a script when the `__init__.py` file is updated + +## Table of Contents +1. [Introduction](#introduction) +2. [Dependencies](#dependencies) +3. [Functions Description](#functions-description) +4. [monitor_repo() Function](#monitor-repo-function) +5. [Run Script (run_script()) Function](#run-script-function) +6. [Get Latest Commit SHA (get_latest_commit_sha()) Function](#get-latest-commit-sha-function) + +## Introduction +This Python script monitors a GitHub repository and runs a local script whenever the `__init__.py` file is updated. The script uses `os`, `subprocess`, `time`, and `requests` libraries. + +## Dependencies +- python: >=3.7 +- git: For cloning and pulling the repository +- pm2: To start and stop the local script + +## Functions Description +### `run_script(repo_dir)` +This function checks if the given repository directory exists, removes it if it does, installs a package (hivetrain) using pip, stops an existing PM2 process, and starts the script using pm2. + +### `get_latest_commit_sha(repo_owner, repo_name, file_path)` +This function retrieves the latest commit SHA for the given GitHub repository file path. + +### `monitor_repo()` +This function checks if the GitHub repository's `__init__.py` file has been updated and runs the script if an update is detected. It also installs the package (hivetrain) and starts/stops the script using pm2 when necessary. + +## monitor_repo() Function +The `monitor_repo()` function monitors a GitHub repository for changes to the `__init__.py` file, installs the package, and starts/stops the script accordingly. It checks every 60 seconds (adjustable). + +## Run Script (run_script()) Function +The `run_script()` function clones or pulls the latest version of a GitHub repository, installs the package using pip, stops an existing pm2 process and starts a new one. It also reverted back to the original working directory after completing the process. + +## Get Latest Commit SHA (get_latest_commit_sha()) Function +The `get_latest_commit_sha()` function uses the GitHub API to get the latest commit SHA for a given file in a repository. If the request is successful, it returns the latest commit SHA; otherwise, it prints an error message and returns None. \ No newline at end of file diff --git a/hivetrain/utils/bootstrap_server.md b/hivetrain/utils/bootstrap_server.md new file mode 100644 index 0000000..b71b405 --- /dev/null +++ b/hivetrain/utils/bootstrap_server.md @@ -0,0 +1,89 @@ + # DHT Manager Documentation + +This Python script is designed to manage a Decentralized Hash Table (DHT) network. It uses the HiveMind library for implementing the DHT functionality and Waitress for running the Flask application as a production-ready web server. This documentation explains the logic behind the provided code. + +## Importing Libraries +```python +import argparse +from flask import Flask, jsonify +import hivemind +import time +import random +import threading +import logging +import sys +``` +The script starts by importing required libraries and modules: +- `argparse`: For parsing command line arguments. +- `Flask`: A web framework for building web applications in Python. +- `hivemind`: The library used to create and manage DHTs. +- `time`: For handling time-related functionalities. +- `random`: To select a random element from the list. +- `threading`: To create thread-safe locks. +- `logging`: For setting up logging. + +## Logging Configuration +```python +logging.basicConfig(level=logging.ERROR) +logger = logging.getLogger('bootstrap') +logger.setLevel(logging.INFO) +handler = logging.StreamHandler() +formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') +handler.setFormatter(formatter) +logger.addHandler(handler) +``` +Logging is configured with a basic logger at the error level. A new handler is created for stdout, and its logging level is set to info. The formatter is defined, and the handler is added to the logger. + +## Creating and Managing DHTs +The `check_and_manage_dhts` function checks each DHT in the list for its availability status. If a connection fails, it marks that particular DHT as non-responsive and removes it from the list. The function also creates new DHTs if needed to maintain a count of 10 DHTs in the list. + +```python +def check_and_manage_dhts(): + global last_checked + + for dht in dht_list: + try: + test_dht = hivemind.DHT(initial_peers=[str(dht.get_visible_maddrs()[0])], start=True) + test_dht.shutdown() + except Exception as e: + dht.terminate() + dht_list.remove(dht) + + if len(dht_list) < 10: + initial_peers = [dht.get_visible_maddrs()[0] for dht in dht_list] + new_dht = hivemind.DHT(host_maddrs=[f"/ip4/{args.host_address}/tcp/0", f"/ip4/{args.host_address}/udp/0/quic"], initial_peers=initial_peers, start=True) + dht_list.append(new_dht) + + last_checked = time.time() +``` + +## Flask Application Setup +A new Flask application is created using the `__name__` as the name of the file. An empty list named `dht_list` is used to store interconnected DHTs, and a global variable `last_checked` is initialized with 0. A lock object named `lock` is created for thread-safe access to shared resources. + +## Before Request Hook +```python +@app.before_request +def before_request(): + global last_checked + + if (time.time() - last_checked > 100) and len(dht_list) > 0: + check_and_manage_dhts() +``` +A before request hook is defined, which checks the status of DHTs before handling incoming requests. If more than 10 minutes have passed since the last check or if there are less than ten active DHTs in the list, it calls the `check_and_manage_dhts()` function to update the list and ensure that there are at least 10 active DHTs. + +## Routes +The script defines a single route named `/return_dht_address` which returns initial peers addresses for connecting to an available DHT when requested by another node. This route checks if there are any available DHTs in the list, then selects a random one and returns its initial peers as a JSON response. If no available DHTs are found, it creates a new one and adds it to the list before returning its initial peers. + +## Main Function +The script runs the Flask application using Waitress instead of the traditional `app.run()` method, which provides better performance, scalability, and production-ready features for serving the web application. + +```python +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='DHT Manager') + parser.add_argument('--host_address', type=str, default="0.0.0.0", help='Machine\'s internal IP') + parser.add_argument('--host_port', type=int, default=5000, help='Port number (default: 5000)') + parser.add_argument('--external_address', type=str, default="20.20.20.20", help='Machine\'s external IP') + args = parser.parse_args() + serve(app, host=args.host_address, port=args.host_port) +``` +The main function initializes an `argparse` object for handling command-line arguments and parses them accordingly. The script then runs the Flask application using Waitress by calling the `serve()` function with the application instance and specified host address and port number as arguments. \ No newline at end of file diff --git a/hivetrain/utils/bootstrap_stress.md b/hivetrain/utils/bootstrap_stress.md new file mode 100644 index 0000000..354461d --- /dev/null +++ b/hivetrain/utils/bootstrap_stress.md @@ -0,0 +1,42 @@ + # Stress Test Client Documentation + +This Python script is designed to perform a stress test on a given server by sending multiple concurrent HTTP requests and measuring the response time. The following components make up the logic of this script: + +## Import Statements +```python +import argparse +import asyncio +import aiohttp +import random +import time +``` +The import statements load several Python modules required to run the stress test. `argparse` is used for parsing command-line arguments, `asyncio` and `aiohttp` handle asynchronous tasks andMAGE_TARGET_URL requests, respectively, while `random` and `time` are used for randomizing request order and measuring time elapsed. + +## `ping_server` Function (Async) +```python +async def ping_server(session, server_url): + ... +``` +The `ping_server` function sends a single HTTP GET request to the provided server URL using an asynchronous session created with `aiohttp.ClientSession()`. If the response status code is 200 OK, it extracts the DHT address from the server's response and calculates the latency time. The function prints the received DHT address and its corresponding latency to the console. + +## `stress_test` Function (Async) +```python +async def stress_test(server_url, num_requests, concurrent_requests): + ... +``` +The `stress_test` function is responsible for sending multiple requests in parallel. It creates a given number of tasks each running the `ping_server` function and limits the active number of tasks based on the specified `concurrent_requests`. Once the limit is reached, it waits for all finished tasks to complete before starting new ones. This process repeats until all the required number of requests (`num_requests`) have been sent. + +## `run_stress_test` Function (Async) +```python +async def run_stress_test(server_url, num_requests, concurrent_requests, duration): + ... +``` +The `run_stress_test` function is the main entry point of the script. It sets up the server URL, number of requests, and concurrent requests based on the command-line arguments provided. The function then runs an asynchronous loop that continues for the specified test duration. During each iteration of the loop, it calls `stress_test` with the given configuration to send the required number of requests in parallel. The loop prints a message at the end of each completed iteration, and finally, it prints a summary message once the stress test has ended. + +## Command-Line Arguments (argparse) +```python +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Stress Test Client') + ... +``` +The script's main logic is wrapped in an `if __name__ == '__main__'` block, which initializes the `argparse` parser and sets up various arguments such as `--server_url`, `--num_requests`, `--concurrent_requests`, and `--duration`. These arguments can be provided when running the script from the command line to customize its behavior. Finally, it calls `asyncio.run()` to execute the `run_stress_test` function asynchronously using the provided arguments. \ No newline at end of file diff --git a/neurons/miner.md b/neurons/miner.md new file mode 100644 index 0000000..de2ee36 --- /dev/null +++ b/neurons/miner.md @@ -0,0 +1,254 @@ + # HiveMiner Python Code Documentation + +This Python script is designed for training a machine learning model using PyTorch and Bittensor's Hivemind decentralized training platform. The code imports necessary libraries, sets up configurations, and trains the model in a distributed manner. + +## Importing Libraries +```python +import argparse # For command-line arguments +import ipaddress # For IP address manipulation +import logging # For logging +import os # For file system operations +import random # For random number generation +import re # For regular expressions +import sys # For interacting with the Python interpreter +from functools import partial # For creating function wrappers +import math # For mathematical functions +import bittensor as bt # Bittensor library for Hivemind integration +from bittensor import metagraph # Import Metagraph module for handling network connections + +import numpy as np # For numerical operations +import requests # For making HTTP requests +import torch # PyTorch library for machine learning and deep learning +from datasets import load_dataset # For loading remote datasets +from hivemind.utils.networking import log_visible_maddrs +from lightning.fabric.utilities.seed import reset_seed, seed_everything +from lightning.pytorch import LightningModule, LightningTrainer +from lightning.pytorch.callbacks import Callback +from lightning.pytorch.core.datamodule import LightningDataModule +from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer + +from hivetrain.btt_connector import ( + BittensorNetwork, + # get_validator_uids_and_addresses, + serve_axon, +) +from hivetrain.config import Configurator +``` + +## Configuration +```python +logging.getLogger("lightning.pytorch").setLevel(logging.INFO) +logger = logging.getLogger("lightning.pytorch") + +args = Configurator.combine_configs() + +# ... (rest of the code) +``` + +This section initializes the logging system and merges various configuration files into a single `args` object, which will be used throughout the script. + +## Helper Functions +```python +def flatten_list(nested_list): + """Flattens a nested list.""" + if nested_list and isinstance(nested_list[0], list): + # Assumes only one level of nesting + return [item for sublist in nested_list for item in sublist] + return nested_list + +# ... (rest of the code) +``` + +This section defines a helper function `flatten_list()`, which is used to flatten lists that may have one level of nesting. + +## Basic Configuration Values +```python +inital_peers_request = requests.get(args.miner.bootstrapping_server) +initial_peers = inital_peers_request.json()["initial_peers"] +assert not (initial_peers is None) +# initial_peers = flatten_list(args.initial_peers) +batch_size = args.batch_size +save_every = args.save_every +block_size = 512 +num_steps = 100_000_000_000 # infinite training +target_batch_size = 81920 # when to average all weights. + +dataset_config = { + "dataset": "tiiuae/falcon-refinedweb", + "key": "content", + "split": "train", + "block_size": block_size, +} +``` + +This section sets some basic configuration values such as batch size, save interval, initial peers list from the `args` object, and dataset-related configurations. + +## Initializing Model Components +```python +config = AutoConfig.from_pretrained( + "gpt2", + n_embd=block_size, + n_ctx=block_size, + n_layer=2, + n_head=2, + n_positions=block_size, + n_inner=block_size * 4, + resid_pdrop=0.1, + embd_pdrop=0.1, + attn_pdrop=0.1, + summary_first_dropout=0.1, + layer_norm_epsilon=1e-5, + initializer_range=0.05, + summary_type="cls_index", + summary_proj_to_labels=True, + summary_use_proj=True, + torch_dtype=torch.bfloat16, +) + +print(config) + +model = AutoModelForCausalLM.from_config(config) +tokenizer = AutoTokenizer.from_pretrained( + "openai-community/gpt2", + cache_dir="/tmp/tokenizer", + padding="max_length", + padding_side="left", + use_fast=True, + return_overflowing_tokens=True, + truncation=True, +) +tokenizer.pad_token = tokenizer.eos_token +``` + +This section initializes and loads the model components using Hugging Face's `AutoConfig` and `AutoModelForCausalLM` classes. Additionally, it loads the tokenizer for converting text to tensors. + +## Dataset Handling +```python +class StreamingDataModule(LightningDataModule): + # ... (class definition) + +class StreamingDataset(IterableDataset): + # ... (class definition) + +dataset = StreamingDataModule(tokenizer, dataset_config) +``` + +This section defines two custom classes `StreamingDataModule` and `StreamingDataset` for handling streaming datasets. The `StreamingDataModule` class wraps the `StreamingDataset` instance to be compatible with PyTorch's `LightningDataModule`. + +## Model Training +```python +class MinerTrainer(LightningModule): + # ... (class definition) + +hparams = dict( + learning_rate=0.001, + weight_decay=0.1, + eps=1e-8, + warmup_steps=10, + batch_size=batch_size, + num_steps=num_steps, + block_size=block_size, +) + +strategy = HivemindStrategy( + run_id=f"hiveminer_{str(__spec_version__)}", + batch_size=batch_size, + target_batch_size=target_batch_size, + initial_peers=initial_peers, + use_ipfs=False, + use_relay=True, + use_auto_relay=True, + verbose=False, + wait_timeout=180, + bootstrap_timeout=135, + matchmaking_time=360.0, + averaging_timeout=600.0, + delay_state_averaging=True, + delay_grad_averaging=True, + delay_optimizer_step=True, + offload_optimizer=True, + reuse_grad_buffers=False, + # grad_compression=Float16Compression(), + # state_averaging_compression=Float16Compression(), + # load_state_compression=NoCompression(), + # scheduler_fn=partial(torch.optim.lr_scheduler.ExponentialLR, gamma=0.9999), +) + +visible_addresses = [ + str(a) + for a in strategy.dht.get_visible_maddrs() + if not ipaddress.ip_address(a.values()[0]).is_loopback +] + +log_visible_maddrs(strategy.dht.get_visible_maddrs(), only_p2p=False) +# my_ids = [] +# pattern = r"(/p2p/.*)" +# for peer in list(visible_addresses): +# match = re.search(pattern, peer) +# if match: +# my_ids.append(match.group(1)) + +# for peer in list(set(my_ids)): +# print(f"PEER-ID: {peer}") + +params = set_trainable_parameters(model, hparams) +optimizer = AdamW(params, lr=hparams["learning_rate"], eps=hparams["eps"]) + +class MinerConsoleLogging(Callback): + # ... (class definition) + +class MinerModelSaver(Callback): + # ... (class definition) + +class ValidationCommunicator(Callback): + # ... (class definition) + +train_params = dict( + accelerator="auto", + strategy=strategy, + devices="auto", + max_steps=num_steps * target_batch_size, + max_epochs=-1, + reload_dataloaders_every_n_epochs=1, + precision="32-true", + accumulate_grad_batches=1, # must be 1 for Hivemind training + gradient_clip_val=1.0, + gradient_clip_algorithm="norm", + benchmark=True, + enable_progress_bar=False, + callbacks=[], +) + +# Set weights as trainable (looks like this is useless) +# def set_trainable_parameters(model, hparams): +# no_decay = ["bias", "LayerNorm.weight"] +# grouped_parameters = [] + +# for n, p in model.named_parameters(): +# if not p.requires_grad: +# continue + +# if any(nd in n for nd in no_decay): +# weight_decay = 0.0 +# else: +# weight_decay = hparams["weight_decay"] + +# grouped_parameters.append( +# { +# "params": [p], +# "weight_decay": weight_decay, +# } +# ) + +# Set model parameters as trainable +params = set_trainable_parameters(model, hparams) + +optimizer = AdamW(params, lr=hparams.get("learning_rate", 0.001), eps=hparams.get("eps", 1e-8)) + +train_model = MinerTrainer(model, optimizer, hparams) + +trainer = Trainer(**train_params) +trainer.fit(train_model, dataset) +``` + +This section initializes the Hivemind strategy and sets up various training-related configurations. Then, it creates a PyTorch trainer instance and trains the model using the provided dataset with the defined `MinerTrainer` class. \ No newline at end of file diff --git a/neurons/validator.md b/neurons/validator.md new file mode 100644 index 0000000..c2837f2 --- /dev/null +++ b/neurons/validator.md @@ -0,0 +1,44 @@ + # Bittensor Validator Flask App Documentation + +This Python script is a Flask application designed to validate models and collect metrics from validators in the Bittensor network. The app uses various libraries, including `threading`, `bittensor`, `argparse`, `Flask`, `numpy`, `time`, `hivetrain`, `waitress`, and `logging`. + +## Import Statements + +The script starts by importing essential libraries and modules: + +- `threading`: For using locks to secure concurrent access to shared data. +- `bittensor as bt`: The main library for interacting with the Bittensor network. +- `argparse`: For parsing command line arguments. (Not used in this script) +- `Flask`, `request`, and `jsonify`: For creating and handling HTTP requests. +- `numpy as np`: For numerical computations. +- `time`: For measuring time intervals. +- `hivetrain.auth` and `hivetrain.config`: Custom modules from the HiveTrain project for authentication and configuration. +- `hivetrain.btt_connector`: Custom module from the HiveTrain project for interacting with the Bittensor network. +- `__spec_version__`: For accessing the current version of the script. +- `torch`: For implementing machine learning models. (Not used in this script) +- `logging`: For creating and managing loggers. + +## Initialization + +After importing required libraries, the script sets up a logger for Waitress, initializes Flask app, and defines some global variables with their default values: + +- `last_evaluation_time`: The time of the last model evaluation. +- `evaluation_interval`: The interval between two consecutive model evaluations in seconds. +- `sync_interval`: The synchronization interval in seconds between the validator and the Bittensor network. +- `last_sync_time`: The time of the last synchronization with the Bittensor network. +- `config`: A configuration object from HiveTrain's Configurator class. +- `BittensorNetwork`, `wallet`, `subtensor`, and `metagraph`: Instances of the corresponding classes in the hivetrain.btt_connector module for interacting with the Bittensor network. +- `model_checksums_lock` and `metrics_data_lock`: Threading locks to secure concurrent access to shared data. +- `evaluation_time_lock` and `sync_time_lock`: Threading locks to secure concurrent access to shared variables. + +## Authentication and Metrics Handling Functions (Commented Out) + +The script defines two functions, `verify_model_checksum` and `detect_metric_anomaly`, for handling model checksums and detecting anomalous metrics respectively. These functions were commented out in the provided code. + +## Flask Routes + +The script sets up a before_request decorator to evaluate models and synchronize the validator with the network before processing any request. It also defines two routes for handling model validation and metric submission requests, respectively. Both routes use an @authenticate\_request\_with\_bittensor decorator for authentication checks. + +## Starting the App + +Finally, the script starts the Flask app by initializing an Axon object from Bittensor, serves it using Waitress, and runs the application. \ No newline at end of file diff --git a/post_install.md b/post_install.md new file mode 100644 index 0000000..5264f1c --- /dev/null +++ b/post_install.md @@ -0,0 +1,64 @@ + # Documentation for `remove_nest_asyncio_import.py` + +This Python script, named `remove_nest_asyncio_import.py`, is designed to remove an import statement of a specific package (`nest_asyncio`) from the `__init__.py` file of another package called `bittensor`. Additionally, it uninstalls the `nest_asyncio` package if found in the system. + +## Dependencies + +To run this script, you need to have Python and the following packages installed: +- `os` (Python built-in library) +- `pkg_resources` (installed via pip) + +## Functionality + +The main function of the script is defined as `remove_nest_asyncio_import()`. It contains logic for locating the `bittensor` package, checking if its `__init__.py` file exists, removing any import statements related to `nest_asyncio`, and uninstalling the `nest_asciio` package if it is installed. + +### Locate the installation directory of bittensor + +```python +distribution = pkg_resources.get_distribution('bittensor') +bittensor_path = distribution.location +``` + +This code uses `pkg_resources` to get information about the installed `bittensor` package and retrieves its installation directory. + +### Check for existence of bittensor __init__.py + +```python +if not os.path.exists(init_path): + print("bittensor __init__.py not found. Ensure bittensor is correctly installed.") + return +``` + +This block checks if the `__init__.py` file for `bittensor` exists in its installation directory. If it doesn't, an error message is printed and the script stops executing. + +### Remove nest_asyncio imports from bittensor __init__.py + +```python +with open(init_path, 'r') as file: + lines = file.readlines() + +with open(init_path, 'w') as file: + for line in lines: + if 'nest_asyncio' not in line: + file.write(line) +``` + +This block reads the contents of `bittensor/__init__.py`, removes any lines that contain `import nest_asciio`, and writes the updated content back to the file. + +### Uninstall nest_asyncio + +```python +os.system("pip uninstall -y nest_asyncio") +``` + +If `nest_asyncio` is detected as installed, this command uses the operating system's `os.system()` method to execute a pip command and remove the package. + +## Usage + +To use this script, simply run it with Python: + +```bash +python remove_nest_asyncio_import.py +``` + +The script will attempt to locate the `bittensor` package, check its integrity, and remove any references to `nest_asciio` if present. Additionally, it will uninstall the `nest_asciio` package if found. \ No newline at end of file diff --git a/run.md b/run.md new file mode 100644 index 0000000..354dd4d --- /dev/null +++ b/run.md @@ -0,0 +1,26 @@ + # Neurons Miner Shell Script + +This shell script is designed to execute the `neurons/miner.py` Python script using the specified arguments. The following documentation outlines the logic of the script. + +```bash +#!/bin/bash + +# Commence the execution of the neurons miner script with the provided arguments +python3 neurons/miner.py \ + --netuid 25 \ + --wallet.name "JJcold" \ + --wallet.hotkey "JJb" \ + --miner.bootstrapping_server "http://35.239.40.23:4999/return_dht_address" +``` + +## Script Logic + +1. The shebang `#!/bin/bash` line specifies that this script should be executed using the Bash shell interpreter. + +2. The command `python3 neurons/miner.py` invokes the Python 3 interpreter and runs the `neurons/miner.py` script located within the current directory. + +3. The following arguments are passed to the `neurons/miner.py` script: + - `--netuid 25`: Sets the network ID to 25. This value may represent a specific Monero obfuscated pool or a different network configuration. + - `--wallet.name "JJcold"`: Specifies the name of the cold wallet as "JJcold." The miner will use this wallet for storing and managing monero rewards. + - `--wallet.hotkey "JJb"`: Sets the hot key (recovery phrase seed) for the cold wallet to "JJb." This seed is required to access the funds in the cold wallet. + - `--miner.bootstrapping_server "http://35.239.40.23:4999/return_dht_address"`: Configures the miner to connect to a bootstrapping server located at `http://35.239.40.23:4999/return_dht_address`. This server will provide the miner with the necessary information to connect to other nodes on the Monero network and begin mining. \ No newline at end of file diff --git a/run_miner.md b/run_miner.md new file mode 100644 index 0000000..fee9a86 --- /dev/null +++ b/run_miner.md @@ -0,0 +1,50 @@ + # Script Documentation + +This script is designed to automate the process of updating and running a Python script called `miner.py` using Git, pm2, and various utilities like jq and base64. The script also ensures that the required packages are installed and the Python version is up-to-date. + +## Prerequisites + +Before executing this script, make sure you have the following dependencies installed: + +1. Git +2. pm2 +3. jq (optional but recommended for checking remote versions) +4. Python 3 and pip for running the script + +## Script Logic + +The script begins by initializing various variables such as the path to the Python script, the name of the process that will be run using pm2, arguments for cloning the repository and installing packages, and locations of important files. + +Next, it checks if the required dependency `pm2` is installed. If not, the script exits with an error message. + +### get_version_difference Function + +This function calculates the difference in version numbers between two strings provided as arguments (in the format 'vX.Y.Z'). It returns the difference as a numerical value. + +### read_version_value Function + +This function reads the value of the `__version__` variable from the specified file and stores it in the local variable `local_version`. + +### check_variable_value_on_github Function + +This function fetches the value of a given variable from a specific GitHub repository's file on the main branch and returns it. + +### check_package_installed Function + +This function checks whether a specified package (such as jq) is installed based on the operating system. + +### strip_quotes Function + +This function removes leading and trailing quotes from a given string. + +## Main Logic + +The script then performs the following tasks: + +1. It clones the required repository if it's not already available. +2. Installs required packages using pip. +3. Starts the Python script with pm2. +4. Checks for updated versions of the script on GitHub and enforces the latest version if necessary. +5. Continuously checks for updated versions every 3 hours until the local version matches or exceeds the remote version. If the local branch is not main, it enforces the main branch's changes instead. This process repeats indefinitely. + +In summary, this script automates updating and running a Python script while ensuring that all dependencies are met and the latest versions of the code and packages are used. \ No newline at end of file diff --git a/run_validator.md b/run_validator.md new file mode 100644 index 0000000..de43371 --- /dev/null +++ b/run_validator.md @@ -0,0 +1,57 @@ + # Neurons Validator Script Documentation + +This Shell script is designed to manage and enforce the running of a Python script named `validator.py` in a Distributed Training project using Git for version synchronization and PM2 for process management. The script checks for updates on GitHub, reclones or enforces changes based on the branch, and starts or restarts the script with pm2. + +## Initializing Variables + +The script initializes several variables such as: + +- `script`: The path to the Python script that needs to be run. +- `autoRunLoc`: The realpath of this shell script file. +- `proc_name`: The name of the process that will be managed by PM2. +- `args`: An array to store command-line arguments. +- `version_location`: The location of the `__init__.py` file in the repository where the version information is stored. +- `version`: The variable name used for storing the Python package version number. +- `repo`: The GitHub repository URL. +- `branch`: The desired branch for fetching updates from. +- `repo_url`: The base URL for cloning the Git repository. + +## Checking pm2 Installation and Dependencies + +Before proceeding, the script checks if PM2 is installed by using the `command -v`. If not, an error message is displayed and the script exits. It also checks if necessary dependencies, such as `jq`, are present. + +## Helper Functions + +The script includes several helper functions to perform specific tasks: + +### `get_version_difference` + +This function calculates the difference between two version numbers (in semver format) by splitting them into arrays and comparing each component individually. + +### `read_version_value` + +Reads the value of the Python package version from the specified file (in this case, the `__init__.py` file). + +### `check_variable_value_on_github` + +Fetches the current value of a GitHub file's content and checks if it contains the given variable name. + +### `check_package_installed` + +Checks whether the specified package is installed on the system using either `dpkg-query` for Linux or Homebrew for macOS. + +### `strip_quotes` + +Removes leading and trailing double quotes from a given string. + +## Controlling the Clone and Enforce Workflow + +The script provides two main functions to handle cloning/checking out the latest version of the repository (`check_and_clone`) and enforcing the main branch changes locally (`enforce_main`). These functions are responsible for updating the local environment with changes from GitHub. + +## Managing Command-line Arguments + +The script processes command-line arguments, keeping track of flags and their values. The `script` argument is mandatory; if it's not provided, an error message is displayed and the script exits. + +## Enforcing Updates and Starting/Restarting the Python Script + +Finally, the script checks whether packages are installed and performs Git updates (checking the GitHub value, cloning or enforcing changes) if necessary. Once all prerequisites are met, it creates a PM2 configuration file and starts or restarts the script as required. \ No newline at end of file diff --git a/setup.md b/setup.md new file mode 100644 index 0000000..fddc356 --- /dev/null +++ b/setup.md @@ -0,0 +1,59 @@ + # hivetrain + +This documentation describes the logic behind the `hivetrain` Python package as defined in the following code. + +## Installation + +To install the `hivetrain` package, you can use pip: + +```bash +pip install hivetrain +``` + +The package requirements are listed in the `requirements.txt` file and will be installed along with it. + +## Package Information + +The code uses the `setuptools` library to define and manage the package metadata. The main function is `setup()`, which takes several arguments: + +```python +setup( + # Package information + name='hivetrain', + version='0.2.7', + author='Hivetrain', + author_email='test@test.com', + description='A short description of your project', + long_description=open('README.md').read(), + long_description_content_type='text/markdown', + url='https://github.com/yourgithubusername/your_project_repo', + # Package contents and dependencies + packages=find_packages(), + include_package_data=True, + install_requires=open('requirements.txt').read().splitlines(), + classifiers=[ + 'Programming Language :: Python :: 3', + 'License :: OSI Approved :: MIT License', + 'Operating System :: OS Independent', + ], + python_requires='>=3.6', +) +``` + +### Package Information + +- `name`: The name of the package. +- `version`: The current version of the package. +- `author`: The name of the author. +- `author_email`: The email address of the author. +- `description`: A short description of what the project does. +- `long_description`: A more detailed description of the project, typically in markdown format and located in a separate file (`README.md`). +- `url`: The URL to the project repository on GitHub. + +### Package Contents and Dependencies + +- `packages`: A list of packages that should be included in the distribution. `find_packages()` is used to automatically find all Python packages in the current directory and its subdirectories. +- `include_package_data`: A boolean flag indicating whether to include non-Python files (such as images, templates, etc.) in the package distribution. +- `install_requires`: A list of dependencies that should be installed when installing the package. These dependencies are listed in the `requirements.txt` file. +- `classifiers`: A list of classifiers that describe the package and its intended audience (e.g., programming language, license, operating system). +- `python_requires`: The minimum Python version required to use the package. In this case, it is set to Python 3.6 or later. \ No newline at end of file diff --git a/template/__init__.md b/template/__init__.md new file mode 100644 index 0000000..752da91 --- /dev/null +++ b/template/__init__.md @@ -0,0 +1,35 @@ + # Documentation for Version Nomenclature and Compatibility Logic + +## Introduction +This documentation explains the logic behind the version nomenclature and compatibility checks implemented in the following Python code. + +## Code Snippet +```python +# For backward compatibility with Auto-Update + +# The MIT License (MIT) +# ... + +__version__ = "0.0.30" + +version_split = __version__.split(".") +__spec_version__ = (100 * int(version_split[0])) + (10 * int(version_split[1])) + (1 * int(version_split[2])) +``` +## Version Nomenclature +The version number in the `__version__` variable follows the MAJOR.MINOR.PATCH nomenclature, where: +- MAJOR is incremented when making backwards-incompatible changes (e.g., changing an API). +- MINOR is incremented when adding new features or improvements that maintain backwards compatibility with previous versions. +- PATCH is incremented for bug fixes and minor changes that do not impact the API. + +## Version Specification +The `__spec_version__` variable represents a more precise version number derived from the MAJOR, MINOR, and PATCH components of the `__version__`. It's calculated as: +```makefile +spec_version = 100 * major + 10 * minor + patch +``` +This version specification is used for more precise dependency management. + +## Backward Compatibility and Auto-Update +The commented line `# For backward compatibility with Auto-Update` indicates that this code was added to maintain backward compatibility when using an auto-update mechanism that may not fully respect the semantic versioning rules. The code converts the `__version__` string to a more precise `__spec_version__` value, allowing for more precise dependency management and ensuring compatibility between different versions of the library. + +## License +This code is released under the MIT License. For further details on the terms of use, please refer to the LICENSE file provided with this software. \ No newline at end of file