在v100上用vllm推理时报错

### Reminder

- [X] I have read the README and searched the existing issues.

### Reproduction

执行 
CUDA_VISIBLE_DEVICES=0 llamafactory-cli api LLaMA-Factory/examples/inference/qwen_vllm.yaml
报错
You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half
看到有相同错误，但我拉的是最新代码，依旧报错

### Expected behavior

_No response_

### System Info

_No response_

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在v100上用vllm推理时报错 #3717

Reminder

Reproduction

Expected behavior

System Info

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

在v100上用vllm推理时报错 #3717

Description

Reminder

Reproduction

Expected behavior

System Info

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions