Skip to content

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

@sunbeibei-hub

Description

@sunbeibei-hub

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.4.0+cu121 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A800-SXM4-80GB
  • vLLM version: dev

Reproduction

CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml

yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8

报错信息如下
Welcome to the CLI application, use clear to remove the history, use exit to exit the application.

User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Expected behavior

可以完成正常推理

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions