Skip to content

Qwen3 training config #3

@zhuhanqing

Description

@zhuhanqing

Hi author,

could you share your training config for qwen3-base series. One question is that we found in DAPO setting without KL penlaty, the qwen3-base model is easy to have model collapse after 200 training steps, I wonder whether you face the similar training instabilty for it when doing for off-policy RL. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions