Reminder
System Info
llamafactory version: 0.9.1.dev0
- Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.4.0+cu121 (GPU)
- Transformers version: 4.45.2
- Datasets version: 3.1.0
- Accelerate version: 1.0.1
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A800-SXM4-80GB
- vLLM version: dev
Reproduction
CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml
yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8
报错信息如下
Welcome to the CLI application, use clear to remove the history, use exit to exit the application.
User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Expected behavior
可以完成正常推理
Others
No response
Reminder
System Info
llamafactoryversion: 0.9.1.dev0Reproduction
CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml
yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8
报错信息如下
Welcome to the CLI application, use
clearto remove the history, useexitto exit the application.User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Expected behavior
可以完成正常推理
Others
No response