Skip to content

There are issues with the support for the Gemma 3 1B model. 4o #7427

@WhenMelancholy

Description

@WhenMelancholy

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.3.dev0
  • Platform: Linux-6.8.0-55-generic-x86_64-with-glibc2.39
  • Python version: 3.12.9
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.50.0
  • Datasets version: 3.3.2
  • Accelerate version: 1.4.0
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-80GB
  • GPU number: 8
  • GPU memory: 79.25GB
  • DeepSpeed version: 0.16.4
  • vLLM version: 0.7.3
  • Git commit: 76cb0a9fc305ee42e26472d5afe219caffc36bd7

Reproduction

Referring to Issues #7385, #7325, #7320, and #7291, several users have reported the problem Processor was not found, please check and update your processor config when training with Gemma 3. After inspecting the source code, I found that in this linehttps://github.com/hiyouga/LLaMA-Factory/blob/165d3ed084b093accb2aa1d1209d1903183245e1/src/llamafactory/model/loader.py#L96, when loading the processor, Gemma 1B fails to find the processor because it's a pure text model. This then causes an error later during the process_messages function call herehttps://github.com/hiyouga/LLaMA-Factory/blob/ef5f1c1def3da62ee2d5e6ba933f9d7d6aab4340/src/llamafactory/data/mm_plugin.py#L401, since this process_messages function seems to be written with a multimodal architecture in mind. It might be necessary to add some additional conditional handling specifically for Gemma 1 to address this.

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions