Add instructions for installing unsloth on RTX 5090 #2812

jeromeku · 2025-06-26T23:32:51Z

Unsloth Blackwell Compatibility

Overview

Blackwell (sm100+) requires all dependent libraries to be compiled with cuda 12.8.

The core libs for running unsloth which have dependencies on CUDA version are:

bitsandbytes - already has wheels built with CUDA 12.8 so pip install should work out of the box
triton - requires triton>=3.3.1
torch - requires installing with pip install torch --extra-index-url https://download.pytorch.org/whl/cu128
vllm - safest is to use the nightly build: uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
xformers - as of 6/26, xformers wheels are not yet built with sm100+ enabled as support was only recently added so will require a source build (see below).

Installation

The installation order is important, since we want to overwrite bundled dependencies with specific versions (namely, xformers and triton).

I prefer to use uv over pip as it's faster and better for resolving dependencies, especially for libraries which depend on torch but for which a specific CUDA version is required per this scenario.

Install uv
```
curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env
```
Create a project dir and venv:
```
mkdir `unsloth-blackwell` && cd `unsloth-blackwell`
uv venv .venv --python=3.12 --seed
source .venv/bin/activate
```
Install vllm
```
uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
```
Note that we have to specify cu128, otherwise vllm will install torch==2.7.0 but with cu126.

Install unsloth dependencies

uv pip install unsloth unsloth_zoo bitsandbytes

Download and build xformers

# First uninstall xformers installed by previous libraries
uv pip uninstall xformers

# Clone and build
git clone --depth=1 https://github.com/facebookresearch/xformers --recursive
cd xformers
export TORCH_CUDA_ARCH_LIST="12.0"
python setup.py install

Note that we have to explicitly set TORCH_CUDA_ARCH_LIST=12.0.

Update triton
```
uv pip install -U triton>=3.3.1
```
triton>=3.3.1 is required for Blackwell support.
transformers
transformers >= 4.53.0 breaks unsloth inference. Specifically, transformers with gradient_checkpointing enabled will automatically switch off caching.

When using unsloth FastLanguageModel to generate directly after training with use_cache=True, this will result in mismatch between expected and actual outputs here.

Temporary solution is to switch off gradient_checkpointing (e.g., model.disable_gradient_checkpointing()) before generation if using 4.53.0 or stick with 4.52.4 for now:
```
uv pip install -U transformers==4.52.4
```

After installation, your environment should look similar to blackwell.requirements.txt.

Note, might need to downgrade numpy<=2.2 after all the installs.

Test

Both test_llama32_sft.py and test_qwen3_grpo.py should run without issue if correct install. If not, check diff between your installed env and blackwell.requirements.txt.

Tested on RTX 5090 though should also work for sm100+ in general.

rolandtannous · 2025-06-27T08:46:06Z

in terms of documentation, i think we should also include instructions that use pip and conda as a lot of people are not using uv yet and we do not want to force a specific package manager. I can probably test the equivalent using conda/pip and we can add those in a separate section

danielhanchen · 2025-06-27T11:54:52Z

Very nice - i'll also use disable_gradient_checkpointing()

l-cacherr · 2025-06-28T03:42:44Z

Very very nice job,thanks! In my env, it's necessary to downgrade numpy<=2.2 after all the installs. Test successed with 5090 in test_qwen3_grpo.py. Max memory usage: mem=~8G CUDA mem=25G, CUDA share mem =2.7G. Moreover, Test successed with 5090 in test_llama32_sft.py, almost no memory usage, < 5G.

0xrushi · 2025-07-13T01:36:10Z

How did you install that vllm command?

$ uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
error: invalid value 'cu128' for '--torch-backend <TORCH_BACKEND>'
  [possible values: auto, cpu, cu126, cu125, cu124, cu123, cu122, cu121, cu120, cu118, cu117, cu116, cu115, cu114, cu113, cu112, cu111, cu110, cu102, cu101, cu100, cu92, cu91, cu90, cu80]

  tip: a similar value exists: 'cu102'

0xrushi · 2025-07-14T02:06:45Z

nvm my uv was outdated, I had to do uv self update

john-yick-modv · 2025-07-23T15:30:01Z

A few notes for things I needed to do to get it running on my 5090 to fine tune Qwen-14B

The user guide https://docs.unsloth.ai/basics/training-llms-with-blackwell-rtx-50-series-and-unsloth here works fine but there are some cavets
Cuda-toolkit MUST be version 12.8, using 12.9 will cause errors, this is what caught me for well over a day as 12.9 was throwing errors preventing unsloth from running.
I am using Nvidia drivers 575.64.03
Not too sure if needed, but I needed to monkey patch .venv/lib/python3.12/site-packages/transformers/configuration_utils.py

by adding this function to the bottom of the file

def layer_type_validation(config, expected_layer_types):
    layer_type = getattr(config, "layer_type", None)
    if layer_type not in expected_layer_types:
        raise ValueError(f"Unexpected layer type: {layer_type}")
    return True

After this I was able to run my normal training code

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 07-23 22:51:59 [__init__.py:235] Automatically detected platform cuda.
==((====))==  Unsloth 2025.7.8: Fast Qwen3 patching. Transformers: 4.52.4. vLLM: 0.10.0rc2.dev73+gf59ec35b7.
   \\   /|    NVIDIA GeForce RTX 5090. Num GPUs = 1. Max memory: 31.354 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.1+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.3.1
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: https://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.02s/it]
Unsloth 2025.7.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 7,628 | Num Epochs = 3 | Total steps = 2,862
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 64,225,280 of 14,832,532,480 (0.43% trained)
  0%|                                                                                                                                                                                                 | 0/2862 [00:00<?, ?it/s]Unsloth: Will smartly offload gradients to save VRAM!
{'loss': 2.0211, 'grad_norm': 0.24703896045684814, 'learning_rate': 0.0, 'epoch': 0.0}                                                                                                                                         
{'loss': 2.0929, 'grad_norm': 0.2276933193206787, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}                                                                                                                       
{'loss': 1.934, 'grad_norm': 0.3005346357822418, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.0}                                                                                                                         
{'loss': 1.7917, 'grad_norm': 0.22617483139038086, 'learning_rate': 6e-06, 'epoch': 0.0}                                                                                                                                       
{'loss': 1.911, 'grad_norm': 0.21807023882865906, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.01}

It is processing at around 7s per it

add instructions for installing on blackwell

6818bdd

jeromeku changed the title ~~Add instructions for installing on blackwell~~ Add instructions for installing unsloth on RTX 5090 Jun 26, 2025

danielhanchen merged commit b02be21 into unslothai:main Jun 27, 2025

This was referenced Jun 28, 2025

[Bug]RTX 5090 error #2431

Closed

[Bug] rtx5090 inference problem #2728

Closed

Please support RTX 50XX GPUs #1856

Open

john-yick-modv mentioned this pull request Aug 12, 2025

[Bug] GTX 5090 not recognised #3116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add instructions for installing unsloth on RTX 5090 #2812

Add instructions for installing unsloth on RTX 5090 #2812

Uh oh!

jeromeku commented Jun 26, 2025 •

edited

Loading

Uh oh!

rolandtannous commented Jun 27, 2025

Uh oh!

danielhanchen commented Jun 27, 2025

Uh oh!

l-cacherr commented Jun 28, 2025

Uh oh!

0xrushi commented Jul 13, 2025

Uh oh!

0xrushi commented Jul 14, 2025

Uh oh!

john-yick-modv commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Add instructions for installing unsloth on RTX 5090 #2812

Add instructions for installing unsloth on RTX 5090 #2812

Uh oh!

Conversation

jeromeku commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unsloth Blackwell Compatibility

Overview

Installation

Test

Uh oh!

rolandtannous commented Jun 27, 2025

Uh oh!

danielhanchen commented Jun 27, 2025

Uh oh!

l-cacherr commented Jun 28, 2025

Uh oh!

0xrushi commented Jul 13, 2025

Uh oh!

0xrushi commented Jul 14, 2025

Uh oh!

john-yick-modv commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jeromeku commented Jun 26, 2025 •

edited

Loading