[feat] OLMoE hf converter

# 🧐 Problem Description

Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.

# 💡 Proposed Solution

Add an OLMoE HF converter that offers both expert and import functionality:

1. Make it possible to export a Fast-LLM OLMoE-like model to HF's `OlmoeForCausalLM` format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py). 

2. Load HF OLMoE models into Fast-LLM.

3. Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e. https://github.com/ServiceNow/Fast-LLM/blob/436d8d22cd6c6de934197e8c78d43d43ffd2b06a/fast_llm/layers/transformer/transformer.py#L83 vs. https://github.com/huggingface/transformers/blob/54be2d7ae87e873482b984cc956e165ca4dc0ba3/src/transformers/models/olmoe/modeling_olmoe.py#L688

# 🔄 Alternatives Considered

It might be possible to export OLMoE-like models in HF Mixtral format.

# 📈 Potential Benefits

Allows for:

* Continual pretraining of existing OLMoE checkpoints from the HF Hub.
* Benchmarking and deployment of OLMoE-like models trained with Fast-LLM.

# 📝 Additional Context

* OLMoE model code: https://github.com/allenai/OLMo/blob/04a2da53db172bd9a0450705592ed50888bdcaa7/olmo/model.py#L674
* PR #48 introduced clamping of initial weights, which was shown to improve training stability for OLMoE models.
* Issue #56 describes a bug in Triton that prevents Fast-LLM to train OLMoE models with 64 experts and dropless MoE enabled.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] OLMoE hf converter #61

🧐 Problem Description

💡 Proposed Solution

🔄 Alternatives Considered

📈 Potential Benefits

📝 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[feat] OLMoE hf converter #61

Description

🧐 Problem Description

💡 Proposed Solution

🔄 Alternatives Considered

📈 Potential Benefits

📝 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions