[INFO|2024-11-11 10:50:51] llamafactory.data.template:157 >> Replace eos token: <|im_end|>
Traceback (most recent call last):
File "/root/miniconda/envs/bei_llamaFactory/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/data/bei/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
run_chat()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 158, in run_chat
chat_model = ChatModel()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 55, in __init__
self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 88, in __init__
engine_args.update(model_args.vllm_config)
TypeError: 'NoneType' object is not iterable
ModelArguments(vllm_maxlen=4096, vllm_gpu_util=0.8, vllm_enforce_eager=True, vllm_max_lora_rank=32, vllm_config=None, export_dir=None, export_size=1, export_device='cpu', export_quantization_bit=None, export_quantization_dataset=None, export_quantization_nsamples=128, export_quantization_maxlen=1024, export_legacy_format=False, export_hub_model_id=None, image_resolution=512, video_resolution=128, video_fps=2.0, video_maxlen=64, quantization_method='bitsandbytes', quantization_bit=None, quantization_type='nf4', double_quantization=True, quantization_device_map=None, model_name_or_path='/root/bei/Models/qwen/Qwen2-0___5B-Instruct/', adapter_name_or_path=None, adapter_folder=None, cache_dir=None, use_fast_tokenizer=True, resize_vocab=False, split_special_tokens=False, new_special_tokens=None, model_revision='main', low_cpu_mem_usage=True, rope_scaling=None, flash_attn='auto', shift_attn=False, mixture_of_depths=None, use_unsloth=False, use_unsloth_gc=False, enable_liger_kernel=False, moe_aux_loss_coef=None, disable_gradient_checkpointing=False, upcast_layernorm=False, upcast_lmhead_output=False, train_from_scratch=False, infer_backend='vllm', offload_folder='offload', use_cache=True, infer_dtype='auto', hf_hub_token=None, ms_hub_token=None, om_hub_token=None, print_param_status=False, compute_dtype=None, device_map='auto', model_max_length=None, block_diag_attn=False)
Reminder
System Info
llamafactory version: 0.9.1.dev0
Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.15
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.45.2
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A800-SXM4-80GB
vLLM version: dev
Reproduction
CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml
yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8
报错信息如下:
model_args内容如下
Expected behavior
vllm多卡可以正常推理
Others
无