🧐 Problem Description
Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.
💡 Proposed Solution
Add an OLMoE HF converter that offers both expert and import functionality:
-
Make it possible to export a Fast-LLM OLMoE-like model to HF's OlmoeForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py).
-
Load HF OLMoE models into Fast-LLM.
-
Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e.
|
def forward(self, input_: torch.Tensor, kwargs: dict, losses: dict | None = None, metrics: dict | None = None): |
vs. https://github.com/huggingface/transformers/blob/54be2d7ae87e873482b984cc956e165ca4dc0ba3/src/transformers/models/olmoe/modeling_olmoe.py#L688
🔄 Alternatives Considered
It might be possible to export OLMoE-like models in HF Mixtral format.
📈 Potential Benefits
Allows for:
- Continual pretraining of existing OLMoE checkpoints from the HF Hub.
- Benchmarking and deployment of OLMoE-like models trained with Fast-LLM.
📝 Additional Context
🧐 Problem Description
Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.
💡 Proposed Solution
Add an OLMoE HF converter that offers both expert and import functionality:
Make it possible to export a Fast-LLM OLMoE-like model to HF's
OlmoeForCausalLMformat (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py).Load HF OLMoE models into Fast-LLM.
Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e.
Fast-LLM/fast_llm/layers/transformer/transformer.py
Line 83 in 436d8d2
🔄 Alternatives Considered
It might be possible to export OLMoE-like models in HF Mixtral format.
📈 Potential Benefits
Allows for:
📝 Additional Context