Skip to content

dirdr/alphaStral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alphaStral

A project by the Rustaceans Teams:

  • Adrien Pelfresne github | linkedin | resume

    I'm Adrien Pelfresne, a 22 year old Backend Software engineer specializing in Distributed systems, If you are hiring at this position, please reach out to me :)

  • Alexis Vapaille github | linkedin | resume

    I'm Alexis Vapaille, a Backend Software Developer who loves Distributed Systems and Rust. I also build neural networks from scratch in Rust - check out nn_lib. If you are hiring, please reach out to me :)

drawing

About

Fine-tuned vs Foundation, Who will win a Pokemon Showdown Match ?

alphaStrale

David-Goliath

Showdown server

drawing

Pokemon ShowDown is the biggest Pokemon Battle online platform. It will act as the game engine of the model battle. Battles will run on the local machine running the program. you will need to host an instance by following those steps:

git clone https://github.com/smogon/pokemon-showdown.git
cd pokemon-showdown
npm install
cp config/config-example.js config/config.js
node pokemon-showdown start --no-security

Then run battles:

uv run python main.py --p1 random --p2 random --n 10

Warning

Due to a bug, the live battle will only be available on Chrome and Firefox, Not safari :()

CLI reference

uv run python main.py [OPTIONS]
Argument Default Description
--p1 random Agent for player 1
--p2 random Agent for player 2
--n 1 Number of battles to run
--format gen9randombattle Battle format (poke-env format ID, e.g. gen9ou, gen8randombattle)
--move-delay SECONDS 0 Wait before each move, usefully set to 23 for comfortable live spectating
--log-level INFO Verbosity: DEBUG INFO WARNING ERROR (also LOG_LEVEL env var)

Examples

# 10 local battles, with no delay between move (this might go brrr)
uv run python main.py --p1 random --p2 random --n 10

# watch live (slowed down)
uv run python main.py --format gen9ou --n 1 --move-delay 2

How to watch live battle ?

  1. Start a battle (random vs random for reference):
uv run python main.py --p1 random --p2 random --move-delay 2
  1. Go to the locally hosted showdown and click Watch Battle, you will see the current running battle on the right side panel

battle-example

  1. Click on the current battle, and watch it !

watch_battle

Results

After each run a JSON file lands in runs/ and a visual HTML report in reports/. Open the report in your browser.

uv run python -m viz runs/<your_run_file>.json
open reports/<your_run_file>.html

Random vs Random benchmark report

Fine-tuning

Warning

To make thing go brrr, we bought collab pro and fine tuned our model on 100k samples with H100 GPU. Do not try this at home!

go_brr

The dataset is scraped from Pokemon Showdown replays and enriched with PokeAPI data (types, stats, move power/category).

uv run python finetune/scraper.py

Each sample is a (prompt, completion) pair where the prompt describes the battle state and the completion is the chosen move. The dataset is split 95% train / 5% validation.

{
  "prompt": "Turn 3. Weather: none. Your pokemon: Magnezone (99/100 HP, healthy) | Type: electric/steel | Atk: 70 SpA: 130 Spe: 60. Opponent: Raging Bolt (100/100 HP, healthy) | Type: electric/dragon | Def: 91 SpD: 89 Spe: 75 | Moves seen: Air Slash (flying, 75pw, special). What move do you use?",
  "completion": "Flash Cannon (steel, 80pw, special)"
}

Test 1 - QLoRA on T4 (finetune/finetune_colab_qlora.ipynb)

First experiment run on a Colab T4. Dataset: gen9ou replays at 1500+ ELO. The model was loaded with 4-bit NF4 quantization to fit in VRAM. The adapter was pushed directly to HuggingFace Hub without merging.

LoRA

Parameter Value
r 16
lora_alpha 16
target_modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
lora_dropout 0
bias none

Training

Parameter Value
max_steps 100
learning_rate 1e-4
warmup_steps 10
batch_size (per device) 4
gradient_accumulation_steps 4 (effective batch: 16)
optimizer paged_adamw_8bit
eval / save every 50 steps
max_length 512
precision bf16

Test 2 - LoRA on A100 (finetune/finetune_colab.ipynb)

Second experiment on an A100. Dataset: gen9randombattle replays. Quantization was dropped entirely in favor of full BF16, allowing a larger rank and 10x more training steps. LoRA weights were merged into the base model before pushing the full model to mistral-hackaton-2026/ministral-3b-pokemon-showdown on HuggingFace Hub.

LoRA

Parameter Value
r 32
lora_alpha 32
target_modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
lora_dropout 0
bias none

Training

Parameter Value
max_steps 1000
learning_rate 1e-4
warmup_steps 50
batch_size (per device) 16
gradient_accumulation_steps 1 (effective batch: 16)
optimizer adamw_torch
eval / save every 200 steps
max_length 512
precision bf16

Benchmarks

Matchup Battles Win rate
AlphaStral vs ministral-3b-latest (3B) 10 80%
AlphaStral vs mistral-small-latest (24B) 10 50%
AlphaStral vs mistral-large-latest (123B) 10 40%

Potential improvements

  • Dataset diversity - scraping more replays could improve generalization.
  • Richer prompt context - adding held items to the prompt could help the model make better decisions.
  • Larger rank - increasing r beyond 32 would give the adapters more capacity to learn complex battle strategies, at the cost of more VRAM.

About

Fine-tuned vs Foundation, Who will win a Pokemon Showdown Match ?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors