-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Add instructions for installing unsloth on RTX 5090 #2812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
in terms of documentation, i think we should also include instructions that use pip and conda as a lot of people are not using uv yet and we do not want to force a specific package manager. I can probably test the equivalent using conda/pip and we can add those in a separate section |
|
Very nice - i'll also use |
|
Very very nice job,thanks! In my env, it's necessary to downgrade numpy<=2.2 after all the installs. Test successed with 5090 in test_qwen3_grpo.py. Max memory usage: mem=~8G CUDA mem=25G, CUDA share mem =2.7G. Moreover, Test successed with 5090 in test_llama32_sft.py, almost no memory usage, < 5G. |
|
How did you install that vllm command? |
|
nvm my uv was outdated, I had to do |
|
A few notes for things I needed to do to get it running on my 5090 to fine tune Qwen-14B
by adding this function to the bottom of the file After this I was able to run my normal training code It is processing at around 7s per it |
Unsloth Blackwell Compatibility
Overview
Blackwell(sm100+) requires all dependent libraries to be compiled withcuda 12.8.The core libs for running unsloth which have dependencies on
CUDAversion are:bitsandbytes- already has wheels built withCUDA 12.8sopip installshould work out of the boxtriton- requirestriton>=3.3.1torch- requires installing withpip install torch --extra-index-url https://download.pytorch.org/whl/cu128vllm- safest is to use the nightly build:uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightlyxformers- as of 6/26,xformerswheels are not yet built withsm100+enabled as support was only recently added so will require a source build (see below).Installation
The installation order is important, since we want to overwrite bundled dependencies with specific versions (namely,
xformersandtriton).I prefer to use
uvoverpipas it's faster and better for resolving dependencies, especially for libraries which depend ontorchbut for which a specificCUDAversion is required per this scenario.Install
uvCreate a project dir and venv:
Install
vllmNote that we have to specify
cu128, otherwisevllmwill installtorch==2.7.0but withcu126.Install
unslothdependenciesDownload and build
xformersNote that we have to explicitly set
TORCH_CUDA_ARCH_LIST=12.0.Update
tritonuv pip install -U triton>=3.3.1triton>=3.3.1is required forBlackwellsupport.transformerstransformers >= 4.53.0breaksunslothinference. Specifically,transformerswithgradient_checkpointingenabled will automatically switch off caching.When using
unslothFastLanguageModeltogeneratedirectly after training withuse_cache=True, this will result in mismatch between expected and actual outputs here.Temporary solution is to switch off
gradient_checkpointing(e.g.,model.disable_gradient_checkpointing()) before generation if using4.53.0or stick with4.52.4for now:After installation, your environment should look similar to
blackwell.requirements.txt.Note, might need to downgrade
numpy<=2.2after all the installs.Test
Both
test_llama32_sft.pyandtest_qwen3_grpo.pyshould run without issue if correct install. If not, check diff between your installed env andblackwell.requirements.txt.Tested on
RTX 5090though should also work forsm100+in general.