Releases: hiyouga/LlamaFactory
Releases · hiyouga/LlamaFactory
v0.9.4: Goodbye 2025
Immutable
release. Only release title and notes can be modified.
Farewell to 2025. Thank you to all contributors and supporters. We will continue to deliver an easy and efficient LLM fine-tuning framework to the community in 2026. Stay tuned.
Breaking
- Repository name updated: LLaMA-Factory → LlamaFactory
- Python 3.9–3.10 have been deprecated; LlamaFactory now requires Python 3.11–3.13
- Migrated from pip to uv; use
uv pip install llamafactory - The official LlamaFactory blog is now live: https://blog.llamafactory.net/en/
New features
- 🔥 Support Orthogononal Fine-Tuning (OFT) by @zqiu24 in #8623
- 🔥 Support Semantic Initialization for new added tokens by @ximinng in #9267
- 🔥 Support Megatron-LM training via MCoreAdapter by @Kuangdd01 in #9237
- 🔥 Support KTransformers backend by @JimmyPeilinLi in #9400
- Support MPO algorithm by @Kuangdd01 in #8930
- Support FP8 training by @penfever in #8960
- Support Transformers v5 by @tangefly in #9569
- Support reasoning and plaintext in function call message by @tangefly in #9610
- Support DeepSpeed AutoTP by @sunyi0505 in #9602
- Support efficient NPU fused kernels by @frozenleaves in #9520
- Support TRL 0.24 by @UsernameFull in #9617
Models
- Falcon H1 by @dhiaEddineRhaiem in #8403
- Kimi-VL and GLM-4.5V by @Kuangdd01 in #8462
- Gemma3n by @Kuangdd01 in #8509
- Granite4 by @Tuyohai in #8680
- Qwen3-2507 by @hiyouga in #8750
- MiniCPM-V 4.0 by @ZMXJJ in #8813
- Intern-S1-mini by @hhaAndroid in #8976
- Seed-OSS by @Kuangdd01 in #8992
- MiniCPM-V 4.5 by @tc-mb in #9022
- InternVL-3.5 by @Kuangdd01 in #9028
- ERNIE-4.5-Text and ERNIE-4.5-VL by @isLinXu in #9165
- Ling-V2 by @wangsff in #9188
- Qwen3-VL and Qwen3-Omni by @xvxuopop and @Kuangdd01 in #9196
- Hunyuan-mt by @wyfdgg in #9284
- GLM-4.6V by @isLinXu in #9586
- Ministral 3 by @tangefly in #9582
- VibeThinker by @isLinXu in #9616
- MiMo-V2-Flash by @isLinXu in #9637
- MiniMax-M1 and MiniMax-M2 by @isLinXu in #9680
Thanks to teams collaborating with LlamaFactory in 2025
- NPU Team: @jiaqiw09 @frozenleaves @xvxuopop @UsernameFull @codemayq
- KTransformers Team: @JimmyPeilinLi @poryfly @mrhaoxx
- ROLL Team
And to individuals who made significant contributions
Full Changelog: v0.9.3...v0.9.4
v0.9.3: Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni
We will attend the AWS Summit Shanghai 2025 on June 20th! See you in Shanghai 👋
New features
- 🔥 InternVL2.5/InternVL3 model by @Kuangdd01 in #7258
- 🔥 Qwen2.5-Omni model by @Kuangdd01 in #7537
- 🔥 Llama 4 and Gemma 3 multimodal model by @hiyouga in #7273 and #7611
- 🔥 Official GPU docker image by @yzoaim in #8181
- 🔥 SGLang inference by @Qiaolin-Yu and @jhinpan in #7278
- GLM-4-0414 and GLM-Z1 model by @zRzRzRzRzRzRzR in #7695
- Kimi-VL model by @Kuangdd01 in #7719
- Qwen3 model by @hiyouga in #7885
- MiMo and MiMo-VL model by @Kuangdd01 in #7946 #8249
- SmolLM/SmolLM2 model by @akshatsehgal in #8050 #8220
- MiniCPM4 model by @LDLINGLINGLING in #8314
- Mistral-Small-3.1 model by @Kuangdd01 in #8335
- Add
scripts/eval_bleu_rouge.pyby @SnowFox4004 in #7419 - Add Muon optimizer by @tianshijing in #7749
- Support video/audio inference with vLLM by @hiyouga in #7566
- Support S3/GCS cloud data by @erictang000 in #7567
- Support vLLM-ascend by @leo-pony in #7739
- Support OmegaConf by @hiyouga in #7793
- Support early-stopping by @hiyouga in #7797
- Add
enable_thinkingargument for reasoning models by @hiyouga in #7928 - PyTorch-elastic and fault-tolerant launch by @hubutui in #8286
- Length Desensitization DPO (LD-DPO) by @amangup in #8362
New models
- Base models
- SmolLM/SmolLM2 (135M/360M/1.7B) 📄
- Qwen3 Base (0.6B/1.7B/4B/8B/14B/30B) 📄
- Gemma 3 (1B/4B/12B/27B) 📄🖼️
- MedGemma (4B) 📄🩺
- MiMo Base (7B) 📄
- Seed-Coder Base (8B) 📄⌨️
- Mistral-Small-3.1 Base (24B) 📄🖼️
- GLM-4-0414 Base (32B) 📄
- Llama 4 (109B/492B) 📄🖼️
- Instruct/Chat models
- SmolLM/SmolLM2 Instruct (135M/360M/1.7B) 📄🤖
- MiniCPM4 (0.5B/8B) 📄🤖
- Qwen3 (0.6B/1.7B/4B/8B/14B/32B/30B/235B) 📄🤖🧠
- Gemma 3 Instruct (1B/4B/12B/27B) 📄🤖🖼️
- InternVL2.5/3 Instruct/MPO (1B/2B/8B/14B/38B/78B) 📄🤖🖼️
- Qwen2.5-Omni (3B/7B) 📄🤖🖼️🔈
- MedGemma Instruct (4B/27B) 📄🤖🩺
- MiMo SFT/RL (7B) 📄🤖
- MiMo-VL SFT/RL (7B) 📄🤖🖼️
- Hunyuan Instruct (7B) 📄🤖
- Seed-Coder Instruct/Reasoning (8B) 📄🤖🧠⌨️
- GLM-4-0414/GLM-Z1 Instruct (9B/32B) 📄🤖🧠
- DeepSeek-R1-0528 (8B/671B) 📄🤖🧠
- Kimi-VL Instruct/Thinking (17B) 📄🤖🧠🖼️
- Mistral-Small-3.1 Instruct (24B) 📄🤖🖼️
- Qwen2.5-VL Instruct (32B) 📄🤖🖼️
- Llama 4 Instruct (109B/492B) 📄🤖🖼️
New datasets
- Preference datasets
- COIG-P (zh) 📄
Bug fix
- Fix add new tokens by @flashJd in #7253
- Fix ultrachat_200k dataset by @felladrin in #7259
- Add efficient 4D attention mask for neat packing by @BlackWingedKing in #7272
- Fix WSD lr scheduler by @x22x22 in #7304
- Fix position ids in neat packing by @BlackWingedKing in #7318
- Fix proxy setting in webui by @taoharry in #7332
- Improve entrypoint by @ENg-122 in #7345
- Fix ray destroy process group by @erictang000 in #7395
- Fix SGLang dependencies by @guoquan in #7432
- Upgrade docker package version by @rumichi2210 in #7442
- Update liger kernel for qwen2.5-vl by @xiaosu-zhu in #7453
- Fix lora on quant models by @GuoCoder in #7456
- Enable liger kernel for gemma3 by @kennylam777 in #7462
- Enable liger kernel for paligemma by @eljandoubi in #7466
- Add Swanlab lark notification by @Xu-pixel in #7481
- Fix gemma3 use cache attribute by @ysjprojects in #7500
- Fix pixtral plugin by @Kuangdd01 in #7505
- Fix KTO mismatch pair strategy by @himalalps in #7509
- Support
dataset_shardsby @aliencaocao in #7530 - Fix qwen2.5omni plugin by @Kuangdd01 in #7573 #7578 #7883
- Fix ppo trainer by @gechengze in #7576
- Fix workflow by @Shawn-Tao in #7635
- Support qwen2.5omni audio+video2text by @Kuangdd01 in #7638
- Upgrade deps for SGLang by @adarshxs in #7639
- Allow ray env setting by @erictang000 in #7647
- Fix CUDA warning on intel xpus by @jilongW in #7655
- Fix liger kernel patch by @danny980521 in #7660
- Fix rocm dockerfile by @fluidnumerics-joe in #7725
- Fix qwen2vl with neat packing by @GeoffreyChen777 in #7754
- Fix a constant by @AlphaBladez in #7765
- Fix autogptq for Gemma by @ddddng in #7786
- Fix internvl models by @Kuangdd01 in #7801 #7803 #7817 #8129
- Fix DeepSpeed ZeRO3 on moe models by @hiyouga in #7826 #7879
- Fix gradient checkpoint func for vit by @hiyouga in #7830
- Support S3 ray storage by @erictang000 in #7854
- Fix Kimi-VL attention by @Kuangdd01 in #7867
- Fix minicpm-o vllm inference by @hiyouga in #7870
- Unfreeze muiltimodal projector in freeze training by @zhaop-l in #7872
- Fix Qwen2.5-omni plugin by @hiyouga in #7875 #7962
- Add warp support link by @ericdachen in #7887
- Replace eos token for base model by @hiyouga in #7911
- Add
eval_on_each_datasetarg by @hiyouga in #7912 - Fix qwen3 loss by @hiyouga in #7923 #8109
- Add repetition_penalty to api by @wangzhanxd in #7958
- Add graphgen to readme by @tpoisonooo in #7974
- Support video params in vllm batch infer by @Kuangdd01 in #7992
- Fix tool formatter by @yunhao-tech in #8000
- Fix kimi vl plugin by @hiyouga in #8015
- Support batch preprocess in vllm batch infer by @Shawn-Tao in #8051
- Support loading remote folder by @erictang000 in #8078
- Fix video utils import by @Kuangdd01 in #8077
- Fix SGLang LoRA inference by @Kiko-RWan in #8067
- Fix cli by @Wangbiao2 in #8095
- Fix pretrain workflow by @SunnyHaze in #8099
- Fix rope args for yarn by @piamo in #8101
- Add no build isolation in installing by @hiyouga in #8103
- Switch to GPTQModel and deprecate AutoGPTQ by @hiyouga in #8108
- Support llama3 parallel function call by @hiyouga in #8124
- Add
data_shared_file_systemby @hiyouga in #8179 - Fix load remote files by @youngwookim in #8183
- Fix dataset info by @Muqi1029 in #8197
- Fix qwen2.5 omni merge script by @Kuangdd01 in #8227 #8293
- Add unittest for VLM save load by @Kuangdd01 in #8248
- Add tag in swanlab by @Zeyi-Lin in #8258
- Support input video frames by @Kuangdd01 in #8264
- Fix empty template by @hiyouga in #8312
- Support full-finetuning with unsloth by @Remorax in #8325
- Add awesome work by @MING-ZCH in #8333
- Release v0.9.3 by @hiyouga in #8386
- Fix qwen2vl position ids by @hiyouga in #8387
- Fix vlm utils by @hiyouga in #8388
- Fix #3802 #4443 #5548 #6236 #6322 #6432 #6708 #6739 #6881 #6919 #7080 #7105 #7119 #7225 #7267 #7327 #7389 #7416 #7427 #7428 #7443 #7447 #7454 #7490 #7501 #7502 #7513 #7520 #7541 #7545 #7552 #7563 #7598 #7600 #7613 #7636 #7678 #7680 #7687 #7688 #7730 #7743 #7772 #7791 #7800 #7816 #7829 #7845 #7865 #7874 #7889 #7905 #7906 #7907 #7909 #7916 #7918 #7919 #7939 #7953 #7965 #7990 #8008 #8056 #8061 #8066 #8069 #8087 #8091 #8092 #8096 #8097 #8111 #8119 #8147 #8166 #8169 #8174 #8182 #8189 #8223 #8241 #8247 #8253 #8294 #8309 #8324 #8326 #8332
Full Changelog: v0.9.2...v0.9.3
v0.9.2: MiniCPM-o, SwanLab, APOLLO
We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋
New features
- 🔥 APOLLO optimizer by @zhuhanqing in #6617
- 🔥 SwanLab experiment tracker by @Zeyi-Lin in #6401
- 🔥 Ray Trainer by @erictang000 in #6542
- Batch inference with vLLM TP by @JieShenAI in #6190
- QLoRA on Ascend NPU by @codemayq in #6601
- Yarn and Llama3 rope scaling by @hiyouga in #6693
- Support
uv runby @erictang000 in #6907 - Ollama modelfile auto-generation by @codemayq in #4686
- Mistral tool prompt by @AlongWY in #5473
- Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369
New models
- Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
- Granite 3.0-3.1 (1B/2B/3B/8B) 📄
- PaliGemma2 (3B/10B/28B) 📄🖼️
- Moonlight (16B) 📄
- DeepSeek V2-V2.5 Base (236B) 📄
- DeepSeek V3 Base (671B) 📄
- Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
- Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
- Marco-o1 (8B) 📄🤖
- Skywork-o1 (8B) 📄🤖
- Phi-4 (14B) 📄🤖
- Moonlight Instruct (16B) 📄
- Mistral Small (24B) 📄🤖
- QwQ (32B) 📄🤖
- Llama-3.3-Instruct (70B) 📄🤖
- QvQ (72B) 📄🤖🖼️
- DeepSeek V2-V2.5 (236B) 📄🤖
- DeepSeek V3 (671B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- OpenO1 (en) 📄
- Open Thoughts (en) 📄
- Open-R1-Math (en) 📄
- Chinese-DeepSeek-R1-Distill (zh) 📄
Changes
- Refactor VLMs register by @hiyouga in #6600
- Refactor mm plugin by @hiyouga in #6895
- Refactor template by @hiyouga in #6896
- Refactor data pipeline by @hiyouga in #6901
- Update vlm arguments by @hiyouga in #6976
- We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here
Bug fix
- Add
trust_remote_codeoption by @yafshar in #5819 - Fix mllama config by @hiyouga in #6137 and #6140
- Fix mllama pad by @hiyouga in #6151 and #6874
- Pin tokenizers version by @hiyouga in #6157
- Fix tokenized data loading by @village-way in #6160
- Show hostname in webui by @hykilpikonna in #6170
- Fix VLMs zero3 training by @hiyouga in #6233
- Add
skip_special_tokensby @hiyouga in #6363 - Support non-reenterent-gc by @hiyouga in #6364
- Add
disable_shufflingoption by @hiyouga in #6388 - Fix gen kwargs by @hiyouga in #6395
- Enable module run by @youkaichao in #6457
- Fix eval loss value by @hiyouga in #6465
- Fix paligemma inference by @hiyouga in #6483
- Add deepseek v3 template by @piamo in #5507
- Add http proxy argument in dockerfile by @shibingli in #6462
- Fix trainer generate by @hiyouga in #6512
- Fix pixtral DPO training by @hiyouga in #6547
- Fix ray args by @stephen-nju in #6564
- Fix minicpm template by @BUAADreamer in #6620
- Fix stop tokens for visual detection by @hiyouga in #6624
- Pin vllm version by @hiyouga in #6629
- Fix mllama any image by @hiyouga in #6637 and #7053
- Fix tokenizer max length by @xiaosu-zhu in #6632
- Fix webui locale by @steveepreston in #6653
- Fix MiniCPM-o DPO training by @BUAADreamer in #6657
- Fix Qwen2 MoE training by @hiyouga in #6684
- Upgrade to gradio 5 by @hiyouga in #6688
- Support Japanese local file by @engchina in #6698
- Fix DPO loss by @yinpu in #6722
- Webui thinking mode by @hiyouga in #6778
- Upgrade to transformers 4.48 by @hiyouga in #6628
- Fix ci by @hiyouga in #6787
- Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
- Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
- Fix qwen2 tool prompt by @yueqis in #6796
- Fix llama pro by @hiyouga in #6814
- Allow thought in function call by @yueqis in #6797
- Add
ALLOW_EXTRA_ARGSby @hiyouga in #6831 - Fix Qwen2vl plugin by @hiyouga in #6855
- Upgrade vllm to 0.7.2 by @hiyouga in #6857
- Fix unit test for tool using by @hiyouga in #6865
- Skip broken data in sharegpt converter by @JJJYmmm in #6879
- Fix qwen2.5 plugin for video by @JJJYmmm in #6868
- Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)
- Fix mllama KTO training by @marko1616 in #6904
- Fix grad checkpointing by @hiyouga in #6916 and #6931
- Fix ollama template by @hiyouga in #6902
- Fix ray example by @erictang000 in #6906
- Improve error handling for media by @noahc1510 in #6128
- Support split on each dataset by @SrWYG in #5522
- Fix gen kwargs in training by @aliencaocao in #5451
- Liger kernel for qwen2.5vl by @hiyouga in #6930
- Fix lora target modules by @hiyouga in #6944
- Add
ray_storage_pathby @erictang000 in #6920 - Fix trainer.predict by @hiyouga in #6972
- Add min resolution control by @hiyouga in #6975
- Upgrade transformers to 4.49 by @hiyouga in #6982
- Add seed in vllm batch predict by @JieShenAI in #7058
- Fix pyproject.toml by @hiyouga in #7067
- Upgrade CANN images by @leo-pony in #7061
- Display swanlab link by @Zeyi-Lin in #7089
- Fix hf engine by @hiyouga in #7120
- Add bailing chat template by @oldstree in #7117
- Use bicubic resampler instead of nearest by @hiyouga in #7143
- Fix Qwen2Audio plugin by @lsrami in #7166
- Destroy process group by @hiyouga in #7174
- Fix swanlab callback by @Zeyi-Lin in #7176
- Fix paligemma plugin by @hiyouga in #7181
- Escape html tag in webui by @hiyouga in #7190
- Upgrade vllm to 0.7.3 by @hiyouga in #7183 and #7193
- Fix parser by @hiyouga in #7204
- Fix function formatter by @zhangch-ss in #7201
- Fix deepspeed config by @hiyouga in #7205
- Fix dataloader by @hiyouga in #7207
- Fix export tokenizer by @hiyouga in #7230
- Update arguments by @hiyouga in #7231
- Add
swanlab_logdirby @Zeyi-Lin in #7219 - Fix vllm batch prediction by @hiyouga in #7235
- Avoid exit after saving tokenized data by @hiyouga in #7244
- Support commit in env by @hiyouga in #7247
- Release v0.9.2 by @hiyouga in #7242
- Fix #1204 #3306 #3462 #5121 #5270 #5404 #5444 #5472 #5518 #5616 #5712 #5714 #5756 #5944 #5986 #6020 #6056 #6092 #6136 #6139 #6149 #6165 #6213 #6287 #6320 #6345 #6345 #6346 #6348 #6358 #6362 #6391 #6415 #6439 #6448 #6452 #6482 #6499 #6543 #6546 #6551 #6552 #6610 #6612 #6636 #6639 #6662 #6669 #6738 #6772 #6776 #6780 #6782 #6793 #6806 #6812 #6819 #6826 #6833 #6839 #6850 #6854 #6860 #6878 #6885 #6889 #6937 #6948 #6952 #6960 #6966 #6973 #6981 #7036 #7064 #7072 #7116 #7125 #7130 #7171 #7173 #7180 #7182 #7184 #7192 #7198 #7213 #7234 #7243
Full Changelog: v0.9.1...v0.9.2
v0.9.1: Many Vision Models, Qwen2.5 Coder, Gradient Fix
New features
- 🔥Support Llama-3.2 and Llama-3.2-Vision by @marko1616 in #5547 and #5555
- 🔥Support LLaVA-NeXT, LLaVA-NeXT-Video and Video-LLaVA by @BUAADreamer in #5574
- 🔥Support Pixtral model by @Kuangdd01 in #5581
- Support EXAONE3.0 by @shing100 in #5585
- Support Index-series models by @Cuiyn in #5910
- Support Liger-Kernel for Qwen2-VL by @aliencaocao in #5438
- Support download models from ModelHub by @huniu20 in #5642
- Fix abnormal loss values in transformers 4.46 by @hiyouga in #5852 #5871
- Support multi-image inference by @hiyouga in #5895
- Support calculating effective tokens for SFT and DPO by @wtmlon in #6078
Note: now you can install transformers>=4.46.0,<=4.46.1 to make the gradient accumulation fix enabled.
New models
- Base models
- Qwen2.5 (0.5B/1.5B/3B/7B/14B/32B/72B) 📄
- Qwen2.5-Coder (0.5B/1.5B/3B/7B/14B/32B) 📄🖥️
- Llama-3.2 (1B/3B) 📄
- OpenCoder (1.5B/8B) 📄🖥️
- Index (1.9B) 📄
- Instruct/Chat models
- Qwen2.5-Instruct (0.5B/1.5B/3B/7B/14B/32B/72B) 📄🤖
- Qwen2.5-Coder-Instruct (0.5B/1.5B/3B/7B/14B/32B) 📄🤖🖥️
- Llama-3.2-Instruct (1B/3B) 📄🤖
- OpenCoder-Instruct (1.5B/8B) 📄🤖🖥️
- Index-Chat (1.9B) 📄🤖
- LLaVA-NeXT (7B/8B/13B/34B/72B/110B) 📄🤖🖼️
- LLaVA-NeXT-Video (7B/34B) 📄🤖🖼️
- Video-LLaVA (7B) 📄🤖🖼️
- Pixtral (12B) 📄🤖🖼️
- EXAONE-3.0-Instruct (8B) 📄🤖
Security fix
- Fix CVE-2024-52803 by @superboy-zjc in aa6a174
Bug fix
- Update version of rocm docker by @HardAndHeavy in #5427
- Fix Phi-3-small template by @menibrief in #5475
- Fix function call dataset process function by @whybeyoung in #5483
- Add docker args by @StrangeBytesDev in #5533
- Fix logger by @chengchengpei in #5546
- Fix Gemma2 flash attention warning by @amrear in #5580
- Update setup by @johnnynunez in #5615 #5665
- Add project by @NLPJCL in #5801
- Fix saving Qwen2-VL processor by @hiyouga in #5857
- Support change base image in dockerfile by @sd3ntato in #5880
- Fix template replace behaviour by @hiyouga in #5907
- Add
image_dirargument by @hiyouga in #5909 - Add rank0 logger by @hiyouga in #5912
- Fix DPO metrics by @hiyouga in #5913 #6052
- Update datasets version by @hiyouga in #5926
- Fix chat engines by @hiyouga in #5927
- Fix vllm 0.6.3 by @hiyouga in #5970
- Fix extra args in llamaboard by @hiyouga in #5971
- Fix vllm input args by @JJJJerry in #5973
- Add
vllm_configargs by @hiyouga in #5982 #5990 - Add shm_size in docker compose config by @XYZliang in #6010
- Fix tyro version by @hiyouga in #6065
- Fix ci by @hiyouga in #6120
- Fix Qwen2-VL inference on vLLM by @hiyouga in #6123 #6126
- Release v0.9.1 by @hiyouga in #6124
- Fix #3881 #4712 #5411 #5542 #5549 #5611 #5668 #5705 #5747 #5749 #5768 #5796 #5797 #5883 #5904 #5966 #5988 #6050 #6061
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini
Congratulations on 30,000 stars 🎉 Follow us at X (twitter)
New features
- 🔥Support fine-tuning Qwen2-VL model on multi-image datasets by @simonJJJ in #5290
- 🔥Support time&memory-efficient Liger-Kernel via the
enable_liger_kernelargument by @hiyouga - 🔥Support memory-efficient Adam-mini optimizer via the
use_adam_miniargument by @relic-yuexi in #5095 - Support fine-tuning Qwen2-VL model on video datasets by @hiyouga in #5365 and @BUAADreamer in #4136 (needs patch huggingface/transformers#33307)
- Support fine-tuning vision language models (VLMs) using RLHF/DPO/ORPO/SimPO approaches by @hiyouga
- Support Unsloth's asynchronous activation offloading method via the
use_unsloth_gcargument - Support vLLM 0.6.0 version
- Support MFU calculation by @yzoaim in #5388
New models
- Base models
- Qwen2-Math (1.5B/7B/72B) 📄🔢
- Yi-Coder (1.5B/9B) 📄🖥️
- InternLM2.5 (1.8B/7B/20B) 📄
- Gemma-2-2B 📄
- Meta-Llama-3.1 (8B/70B) 📄
- Instruct/Chat models
- MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
- Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
- Yi-Coder-Chat (1.5B/9B) 📄🤖🖥️
- InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
- Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
- Gemma-2-2B-it by @codemayq in #5037 📄🤖
- Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
- Mistral-Nemo-Instruct (12B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Magpie-ultra-v0.1 (en) 📄
- Pokemon-gpt4o-captions (en&zh) 📄🖼️
- Preference datasets
- RLHF-V (en) 📄🖼️
- VLFeedback (en) 📄🖼️
Changes
- Due to compatibility consideration, fine-tuning vision language models (VLMs) requires
transformers>=4.35.0.dev0, trypip install git+https://github.com/huggingface/transformers.gitto install it. visual_inputshas been deprecated, now you do not need to specify this argument.- LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use
preprocessing_batch_sizeto restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ). - LlamaFactory now supports
lmf(equivalent tollamafactory-cli) as a shortcut command.
Bug fix
- Fix LlamaBoard export by @liuwwang in #4950
- Add ROCm dockerfiles by @HardAndHeavy in #4970
- Fix deepseek template by @piamo in #4892
- Fix pissa savecallback by @codemayq in #4995
- Add Korean display language in LlamaBoard by @Eruly in #5010
- Fix deepseekcoder template by @relic-yuexi in #5072
- Fix examples by @codemayq in #5109
- Fix
mask_historytruncate from last by @YeQiuO in #5115 - Fix jinja template by @YeQiuO in #5156
- Fix PPO optimizer and lr scheduler by @liu-zichen in #5163
- Add SailorLLM template by @chenhuiyu in #5185
- Fix XPU device count by @Zxilly in #5188
- Fix bf16 check in NPU by @Ricardo-L-C in #5193
- Update NPU docker image by @MengqingCao in #5230
- Fix image input api by @marko1616 in #5237
- Add liger-kernel link by @ByronHsu in #5317
- Fix #4684 #4696 #4917 #4925 #4928 #4944 #4959 #4992 #5035 #5048 #5060 #5092 #5228 #5252 #5292 #5295 #5305 #5307 #5308 #5324 #5331 #5334 #5338 #5344 #5366 #5384
v0.8.3: Neat Packing, Split Evaluation
New features
- 🔥Support contamination-free packing via the
neat_packingargument by @chuan298 in #4224 - 🔥Support split evaluation via the
eval_datasetargument by @codemayq in #4691 - 🔥Support HQQ/EETQ quantization via the
quantization_methodargument by @hiyouga - 🔥Support ZeRO-3 when using BAdam by @Ledzy in #4352
- Support train on the last turn via the
mask_historyargument by @aofengdaxia in #4878 - Add NPU Dockerfile by @MengqingCao in #4355
- Support building FlashAttention2 in Dockerfile by @hzhaoy in #4461
- Support
batch_eval_metricsat evaluation by @hiyouga
New models
- Base models
- InternLM2.5-7B 📄
- Gemma2 (9B/27B) 📄
- Instruct/Chat models
Changes
- Fix DPO cutoff len and deprecate
reserved_label_lenargument - Improve loss function for reward modeling
Bug fix
- Fix numpy version by @MengqingCao in #4382
- Improve cli by @kno10 in #4409
- Add
tool_formatparameter to control prompt by @mMrBun in #4417 - Automatically label npu issue by @MengqingCao in #4445
- Fix flash_attn args by @stceum in #4446
- Fix docker-compose path by @MengqingCao in #4544
- Fix torch-npu dependency by @hashstone in #4561
- Fix deepspeed + pissa by @hzhaoy in #4580
- Improve cli by @injet-zhou in #4590
- Add project by @wzh1994 in #4662
- Fix docstring by @hzhaoy in #4673
- Fix Windows command preview in WebUI by @marko1616 in #4700
- Fix vllm 0.5.1 by @T-Atlas in #4706
- Fix save value head model callback by @yzoaim in #4746
- Fix CUDA Dockerfile by @hzhaoy in #4781
- Fix examples by @codemayq in #4804
- Fix evaluation data split by @codemayq in #4821
- Fix CI by @codemayq in #4822
- Fix #2290 #3974 #4113 #4379 #4398 #4402 #4410 #4419 #4432 #4456 #4458 #4549 #4556 #4579 #4592 #4609 #4617 #4674 #4677 #4683 #4684 #4699 #4705 #4731 #4742 #4779 #4780 #4786 #4792 #4820 #4826
v0.8.2: PiSSA, Parallel Functions
New features
- Support GLM-4 tools and parallel function calling by @mMrBun in #4173
- Support PiSSA fine-tuning by @hiyouga in #4307
New models
- Base models
- DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
- Instruct/Chat models
- MiniCPM-2B 📄🤖
- DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en) by @EliMCosta in #4309
- WebInstruct (en) by @EliMCosta in #4309
Bug fix
- Fix DPO+ZeRO3 problem by @hiyouga
- Add MANIFEST.in by @iamthebot in #4191
- Fix eos_token in llama3 pretrain by @dignfei in #4204
- Fix vllm version by @kimdwkimdw and @hzhaoy in #4234 and #4246
- Fix Dockerfile by @EliMCosta in #4314
- Fix pandas version by @zzxzz12345 in #4334
- Fix #3162 #3196 #3778 #4198 #4209 #4221 #4227 #4238 #4242 #4271 #4292 #4295 #4326 #4346 #4357 #4362
v0.8.1: Patch release
v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO
Stronger LlamaBoard 💪😀
- Support single-node distributed training in Web UI
- Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
- Support selecting checkpoints of full/freeze tuning
- Add throughput metrics to LlamaBoard by @injet-zhou in #4066
- Faster UI loading
New features
- Add KTO algorithm by @enji-zhou in #3785
- Add SimPO algorithm by @hiyouga
- Support passing
max_lora_rankto the vLLM backend by @jue-jue-zi in #3794 - Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
- Support setting system messages in CLI inference by @ycjcl868 in #3812
- Add
num_samplesoption indataset_info.jsonby @seanzhang-zhichen in #3829 - Add NPU docker image by @dongdongqiang2018 in #3876
- Improve NPU document by @MengqingCao in #3930
- Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
- Add
llamafactory-cli envfor bug report - Support image input in the API mode
- Support random initialization via the
train_from_scratchargument - Initialize CI
New models
- Base models
- Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
- PaliGemma-3B (pt/mix) 📄🖼️
- GLM-4-9B 📄
- Falcon-11B 📄
- DeepSeek-V2-Lite (16B) 📄
- Instruct/Chat models
New datasets
- Pre-training datasets
- FineWeb (en)
- FineWeb-Edu (en)
- Supervised fine-tuning datasets
- Ruozhiba-GPT4 (zh)
- STEM-Instruction (zh)
- Preference datasets
- Argilla-KTO-mix-15K (en)
- UltraFeedback (en)
Bug fix
- Fix RLHF for multimodal finetuning
- Fix LoRA target in multimodal finetuning by @BUAADreamer in #3835
- Fix
yitemplate by @Yimi81 in #3925 - Fix abort issue in LlamaBoard by @injet-zhou in #3987
- Pass
scheduler_specific_kwargstoget_schedulerby @Uminosachi in #4006 - Fix hyperparameters helps by @xu-song in #4007
- Update issue template by @statelesshz in #4011
- Fix vllm dtype parameter
- Fix exporting hyperparameters by @MengqingCao in #4080
- Fix DeepSpeed ZeRO3 in PPO trainer
- Fix #3108 #3387 #3646 #3717 #3764 #3769 #3803 #3807 #3818 #3837 #3847 #3853 #3873 #3900 #3931 #3965 #3971 #3978 #3992 #4005 #4012 #4013 #4022 #4033 #4043 #4061 #4075 #4077 #4079 #4085 #4090 #4120 #4132 #4137 #4139
v0.7.1: Ascend NPU Support, Yi-VL Models
🚨🚨 Core refactor 🚨🚨
- Add CLIs usage, now we recommend using
llamafactory-clito launch training and inference, the entry point is located at the cli.py - Rename files:
train_bash.py->train.py,train_web.py->webui.py,api_demo.py->api.py - Remove files:
cli_demo.py,evaluate.py,export_model.py,web_demo.py, usellamafactory-cli chat/eval/export/webchatinstead - Use YAML configs in examples instead of shell scripts for a pretty view
- Remove the sha1 hash check when loading datasets
- Rename arguments:
num_layer_trainable->freeze_trainable_layers,name_module_trainable->freeze_trainable_modules
The above changes are made by @hiyouga in #3596
REMINDER: Now installation is mandatory to use LLaMA Factory
New features
- Support training and inference on the Ascend NPU 910 devices by @zhou-wjjw and @statelesshz (docker images are also provided)
- Support
stopparameter in vLLM engine by @zhaonx in #3527 - Support fine-tuning token embeddings in freeze tuning via the
freeze_extra_modulesargument - Add Llama3 quickstart to readme
New models
- Base models
- Yi-1.5 (6B/9B/34B) 📄
- DeepSeek-V2 (236B) 📄
- Instruct/Chat models
- Yi-1.5-Chat (6B/9B/34B) 📄🤖
- Yi-VL-Chat (6B/34B) by @BUAADreamer in #3748 📄🖼️🤖
- Llama3-Chinese-Chat (8B/70B) 📄🤖
- DeepSeek-V2-Chat (236B) 📄🤖
Bug fix
- Add badam arguments to LlamaBoard by @codemayq in #3487
- Add openai data format to readme by @khazic in #3490
- Fix slow operation in dpo/orpo trainer by @hiyouga
- Fix badam examples by @pha123661 in #3578
- Fix download link of the nectar_rm dataset by @ZeyuTeng96 in #3588
- Add project by @Katehuuh in #3601
- Fix dockerfile by @gaussian8 in #3604
- Fix full tuning of MLLMs by @BUAADreamer in #3651
- Fix gradio environment variables by @cocktailpeanut in #3654
- Fix typo and add log in API by @Tendo33 in #3655
- Fix download link of the phi-3 model by @YUUUCC in #3683
- Fix #3559 #3560 #3602 #3603 #3606 #3625 #3650 #3658 #3674 #3694 #3702 #3724 #3728