Skip to content

[feat] OLMoE hf converter #61

@tscholak

Description

@tscholak

🧐 Problem Description

Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.

💡 Proposed Solution

Add an OLMoE HF converter that offers both expert and import functionality:

  1. Make it possible to export a Fast-LLM OLMoE-like model to HF's OlmoeForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py).

  2. Load HF OLMoE models into Fast-LLM.

  3. Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e.

    def forward(self, input_: torch.Tensor, kwargs: dict, losses: dict | None = None, metrics: dict | None = None):
    vs. https://github.com/huggingface/transformers/blob/54be2d7ae87e873482b984cc956e165ca4dc0ba3/src/transformers/models/olmoe/modeling_olmoe.py#L688

🔄 Alternatives Considered

It might be possible to export OLMoE-like models in HF Mixtral format.

📈 Potential Benefits

Allows for:

  • Continual pretraining of existing OLMoE checkpoints from the HF Hub.
  • Benchmarking and deployment of OLMoE-like models trained with Fast-LLM.

📝 Additional Context

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions