[feat] Qwen2 converter

# 🤨 Problem Description

Fast-LLM doesn't yet support importing or exporting Qwen2 models such as https://huggingface.co/Qwen/Qwen2-7B or https://huggingface.co/Qwen/Qwen2.5-7B-Instruct.

These models are particularly relevant because DeepSeek-R1 has been distilled into Qwen2 (e.g., https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B). Many teams are investigating the reproduction of R1, and Qwen models appear to be a viable path forward. Additionally, Qwen exists in specialized variants for math and code, making it a useful foundation for exploring upcycled mixture-of-experts (MoE) models with the BTX (branch, train, mix) method.

# 💡 Proposed Solution

Add a Qwen2 HF converter that supports both import and export functionality:

1. Enable exporting a Fast-LLM Qwen2-like model to Hugging Face's `Qwen2ForCausalLM` format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2/modeling_qwen2.py).
2. Load HF Qwen2 models into Fast-LLM.
3. Ensure weight and output equivalence post-conversion.

# 🌟 Potential Benefits

* Allows for continual pretraining and fine-tuning of Qwen2-based models within Fast-LLM.
* Enables benchmarking and deployment of Qwen2-based models trained with Fast-LLM.
* Supports training and experimenting with Qwen2-based MoE models.

# 📝 Additional Context

* Qwen2 model class: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2/modeling_qwen2.py
* Example `config.json` for Qwen 2.5 7B Instruct: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/config.json
* DeepSeek-R1 distillation into Qwen2: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
* Qwen has variants for math and code (e.g., https://huggingface.co/Qwen/Qwen2.5-7B-Code)
* Qwen2 MoE model: https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B, using the `Qwen2MoeForCausalLM` model class from https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py
* Fast-LLM model converter reference documentation: https://servicenow.github.io/Fast-LLM/developer_guide/conversion/
* Llama 3 converter implementation: https://github.com/ServiceNow/Fast-LLM/blob/e359cbbad429d06aaf2ccf2798912bd600c6bbae/fast_llm/models/gpt/conversion.py#L323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Qwen2 converter #135

🤨 Problem Description

💡 Proposed Solution

🌟 Potential Benefits

📝 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[feat] Qwen2 converter #135

Description

🤨 Problem Description

💡 Proposed Solution

🌟 Potential Benefits

📝 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions