🤨 Problem Description
Fast-LLM doesn't yet support importing or exporting Qwen2 models such as https://huggingface.co/Qwen/Qwen2-7B or https://huggingface.co/Qwen/Qwen2.5-7B-Instruct.
These models are particularly relevant because DeepSeek-R1 has been distilled into Qwen2 (e.g., https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B). Many teams are investigating the reproduction of R1, and Qwen models appear to be a viable path forward. Additionally, Qwen exists in specialized variants for math and code, making it a useful foundation for exploring upcycled mixture-of-experts (MoE) models with the BTX (branch, train, mix) method.
💡 Proposed Solution
Add a Qwen2 HF converter that supports both import and export functionality:
- Enable exporting a Fast-LLM Qwen2-like model to Hugging Face's
Qwen2ForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2/modeling_qwen2.py).
- Load HF Qwen2 models into Fast-LLM.
- Ensure weight and output equivalence post-conversion.
🌟 Potential Benefits
- Allows for continual pretraining and fine-tuning of Qwen2-based models within Fast-LLM.
- Enables benchmarking and deployment of Qwen2-based models trained with Fast-LLM.
- Supports training and experimenting with Qwen2-based MoE models.
📝 Additional Context
🤨 Problem Description
Fast-LLM doesn't yet support importing or exporting Qwen2 models such as https://huggingface.co/Qwen/Qwen2-7B or https://huggingface.co/Qwen/Qwen2.5-7B-Instruct.
These models are particularly relevant because DeepSeek-R1 has been distilled into Qwen2 (e.g., https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B). Many teams are investigating the reproduction of R1, and Qwen models appear to be a viable path forward. Additionally, Qwen exists in specialized variants for math and code, making it a useful foundation for exploring upcycled mixture-of-experts (MoE) models with the BTX (branch, train, mix) method.
💡 Proposed Solution
Add a Qwen2 HF converter that supports both import and export functionality:
Qwen2ForCausalLMformat (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2/modeling_qwen2.py).🌟 Potential Benefits
📝 Additional Context
config.jsonfor Qwen 2.5 7B Instruct: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/config.jsonQwen2MoeForCausalLMmodel class from https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.pyFast-LLM/fast_llm/models/gpt/conversion.py
Line 323 in e359cbb