Reminder
System Info
llamafactory version: 0.8.4.dev0
- Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.27
- Python version: 3.11.9
- PyTorch version: 2.4.0+cu121 (GPU)
- Transformers version: 4.43.4
- Datasets version: 2.20.0
- Accelerate version: 0.32.0
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A100-PCIE-40GB
- DeepSpeed version: 0.15.0
- Bitsandbytes version: 0.43.3
- vLLM version: 0.5.5
Reproduction
In the current highest supported version of accelerate (0.32.0), loading the 7B model with the following code in my environment results in a speed of "Loading checkpoint shards 00:30, 7.63s/it". However, in version 0.33.0 of accelerate, the loading speed is "Loading checkpoint shards 00:04, 1.14s/it".
model = AutoModelForCausalLM.from_pretrained(model_name)
Additionally, when loading the model in web UI chat mode, the loading speed is twice as slow.
Expected behavior
Please upgrade the accelerate requirement to version 0.33.0 or let me know any possible mistakes that could cause the model loading speed to be slow under version 0.32.0 of accelerate.
Others
No response
Reminder
System Info
llamafactoryversion: 0.8.4.dev0Reproduction
In the current highest supported version of
accelerate(0.32.0), loading the 7B model with the following code in my environment results in a speed of "Loading checkpoint shards 00:30, 7.63s/it". However, in version 0.33.0 ofaccelerate, the loading speed is "Loading checkpoint shards 00:04, 1.14s/it".Additionally, when loading the model in web UI chat mode, the loading speed is twice as slow.
Expected behavior
Please upgrade the
acceleraterequirement to version 0.33.0 or let me know any possible mistakes that could cause the model loading speed to be slow under version 0.32.0 ofaccelerate.Others
No response