Skip to content

[feat] Qwen2 converter #135

@tscholak

Description

@tscholak

🤨 Problem Description

Fast-LLM doesn't yet support importing or exporting Qwen2 models such as https://huggingface.co/Qwen/Qwen2-7B or https://huggingface.co/Qwen/Qwen2.5-7B-Instruct.

These models are particularly relevant because DeepSeek-R1 has been distilled into Qwen2 (e.g., https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B). Many teams are investigating the reproduction of R1, and Qwen models appear to be a viable path forward. Additionally, Qwen exists in specialized variants for math and code, making it a useful foundation for exploring upcycled mixture-of-experts (MoE) models with the BTX (branch, train, mix) method.

💡 Proposed Solution

Add a Qwen2 HF converter that supports both import and export functionality:

  1. Enable exporting a Fast-LLM Qwen2-like model to Hugging Face's Qwen2ForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2/modeling_qwen2.py).
  2. Load HF Qwen2 models into Fast-LLM.
  3. Ensure weight and output equivalence post-conversion.

🌟 Potential Benefits

  • Allows for continual pretraining and fine-tuning of Qwen2-based models within Fast-LLM.
  • Enables benchmarking and deployment of Qwen2-based models trained with Fast-LLM.
  • Supports training and experimenting with Qwen2-based MoE models.

📝 Additional Context

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions