Skip to content

engine_args.update(model_args.vllm_config) TypeError: 'NoneType' object is not iterable model_args没有vllm_config字段 #5988

@sunbeibei-hub

Description

@sunbeibei-hub

Reminder

  • I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.15
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.45.2
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A800-SXM4-80GB
vLLM version: dev

Reproduction

CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml

yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8

报错信息如下:

[INFO|2024-11-11 10:50:51] llamafactory.data.template:157 >> Replace eos token: <|im_end|>

Traceback (most recent call last):
  File "/root/miniconda/envs/bei_llamaFactory/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/data/bei/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
    run_chat()
  File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 158, in run_chat
    chat_model = ChatModel()
  File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 55, in __init__
    self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
  File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 88, in __init__
    engine_args.update(model_args.vllm_config)
TypeError: 'NoneType' object is not iterable

model_args内容如下

ModelArguments(vllm_maxlen=4096, vllm_gpu_util=0.8, vllm_enforce_eager=True, vllm_max_lora_rank=32, vllm_config=None, export_dir=None, export_size=1, export_device='cpu', export_quantization_bit=None, export_quantization_dataset=None, export_quantization_nsamples=128, export_quantization_maxlen=1024, export_legacy_format=False, export_hub_model_id=None, image_resolution=512, video_resolution=128, video_fps=2.0, video_maxlen=64, quantization_method='bitsandbytes', quantization_bit=None, quantization_type='nf4', double_quantization=True, quantization_device_map=None, model_name_or_path='/root/bei/Models/qwen/Qwen2-0___5B-Instruct/', adapter_name_or_path=None, adapter_folder=None, cache_dir=None, use_fast_tokenizer=True, resize_vocab=False, split_special_tokens=False, new_special_tokens=None, model_revision='main', low_cpu_mem_usage=True, rope_scaling=None, flash_attn='auto', shift_attn=False, mixture_of_depths=None, use_unsloth=False, use_unsloth_gc=False, enable_liger_kernel=False, moe_aux_loss_coef=None, disable_gradient_checkpointing=False, upcast_layernorm=False, upcast_lmhead_output=False, train_from_scratch=False, infer_backend='vllm', offload_folder='offload', use_cache=True, infer_dtype='auto', hf_hub_token=None, ms_hub_token=None, om_hub_token=None, print_param_status=False, compute_dtype=None, device_map='auto', model_max_length=None, block_diag_attn=False)

Expected behavior

vllm多卡可以正常推理

Others

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions