Skip to content

经过sft或者dpo后,合并lora后导出的gguf,在ollama上回答效果不佳 #6020

@NeilL0412

Description

@NeilL0412

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
  • Python version: 3.11.0
  • PyTorch version: 2.4.1+cu121 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: GRID A100X-40C

Reproduction

使用的模型是Qwen/Qwen2.5-1.5B-Instruct

  • sft
    经过了6次sft微调,每次epoch 30次,合并lora模型后的回答效果还可以,但是转换成gguf后的回答效果差很多。
    以下是合并后的lora模型:
    微信图片_20241113232052

以下是在ollama上运行,导出的gguf:
微信图片_20241113231955

微调的参数大概是这样的:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset it_qa \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 30.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 20 \
    --optim adamw_torch \
    --packing False \
    --mask_history True \
    --report_to none \
    --output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2024-11-13-23-25-43 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT5 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all

合并lora的参数:

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

### model
model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
adapter_name_or_path: /app/neil/llm/LLaMA-Factory/saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT6
template: qwen
finetuning_type: lora

### export
export_dir: mymodel/qwen2.5/qwen2.5-1.5b/FT6/lora
export_size: 2
export_device: gpu
export_legacy_format: false
  • dpo
    经过前6次sft微调,第7次进行dpo强化学习
    以下是合并后lora模型的回答:
    微信图片_20241113232052

以下是在ollama上运行的gguf,总是像以下这种问答选项的这种形式:
微信图片_20241113235131

以下是合并lora参数:

llamafactory-cli train \
    --stage dpo \
    --do_train True \
    --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen \
    --flash_attn auto \
    --dataset_dir data \
    --dataset it_qa_dpo \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 30.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 20 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2.5-1.5B-Instruct/lora/train_2024-11-13-23-25-43 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/Qwen2.5-1.5B-Instruct/lora/Qwen2.5-1.5b-FT6 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --pref_beta 0.1 \
    --pref_ftx 0 \
    --pref_loss sigmoid

导出的参数与sft差不多,这里就不贴了

还请各位大佬帮忙看一下是什么情况,还请各位帮忙说一下可能出现这种问题的原因和解决方法都有哪些,谢谢

Expected behavior

No response

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomerssolvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions