Skip to content

ppo训练出现问题 #6881

@Lxhnnn

Description

@Lxhnnn

Reminder

  • I have read the above rules and searched the existing issues.

System Info

最新版本

Reproduction

llamafactory-cli train
--stage ppo
--do_train True
--model_name_or_path /data/lxh/pretrained/Qwen2.5-1.5B-Instruct
--preprocessing_num_workers 16
--finetuning_type lora
--template qwen
--flash_attn auto
--dataset_dir data
--dataset alpaca_zh_demo
--cutoff_len 2048
--learning_rate 5e-05
--num_train_epochs 20.0
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--packing False
--report_to none
--output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-10-14-41-02
--bf16 True
--plot_loss True
--trust_remote_code True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--optim adamw_torch
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target all
--reward_model saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-09-13-18-00
--reward_model_type lora
--top_k 0
--top_p 0.9

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions