vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search'

### Reminder

- [X] I have read the README and searched the existing issues.

### System Info

- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.4.0+cu121 (GPU)
- Transformers version: 4.45.2
- Datasets version: 3.1.0
- Accelerate version: 1.0.1
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A800-SXM4-80GB
- vLLM version: dev


### Reproduction

CUDA_VISIBLE_DEVICES=0,1  python cli.py chat  ../examples/inference/qwen2-0.5.yaml

yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8


报错信息如下
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.

User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]:   File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in <module>
[rank0]:     main()
[rank0]:   File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]:     run_chat()
[rank0]:   File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]:     for new_text in chat_model.stream_chat(messages):
[rank0]:   File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]:     yield task.result()
[rank0]:   File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]:     return self.__get_result()
[rank0]:   File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]:     raise self._exception
[rank0]:   File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]:     async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]:   File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]:     generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]:   File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]:     sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '




### Expected behavior

可以完成正常推理

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

Description

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions