Reminder
System Info
最新版本
Reproduction
llamafactory-cli train
--stage ppo
--do_train True
--model_name_or_path /data/lxh/pretrained/Qwen2.5-1.5B-Instruct
--preprocessing_num_workers 16
--finetuning_type lora
--template qwen
--flash_attn auto
--dataset_dir data
--dataset alpaca_zh_demo
--cutoff_len 2048
--learning_rate 5e-05
--num_train_epochs 20.0
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--packing False
--report_to none
--output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-10-14-41-02
--bf16 True
--plot_loss True
--trust_remote_code True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--optim adamw_torch
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target all
--reward_model saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-09-13-18-00
--reward_model_type lora
--top_k 0
--top_p 0.9
Others
No response
Reminder
System Info
最新版本
Reproduction
llamafactory-cli train
--stage ppo
--do_train True
--model_name_or_path /data/lxh/pretrained/Qwen2.5-1.5B-Instruct
--preprocessing_num_workers 16
--finetuning_type lora
--template qwen
--flash_attn auto
--dataset_dir data
--dataset alpaca_zh_demo
--cutoff_len 2048
--learning_rate 5e-05
--num_train_epochs 20.0
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--packing False
--report_to none
--output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-10-14-41-02
--bf16 True
--plot_loss True
--trust_remote_code True
--ddp_timeout 180000000
--include_num_input_tokens_seen True
--optim adamw_torch
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--lora_target all
--reward_model saves/Qwen2.5-1.5B-Instruct/lora/train_2025-02-09-13-18-00
--reward_model_type lora
--top_k 0
--top_p 0.9
Others
No response