Reminder
Reproduction
执行
CUDA_VISIBLE_DEVICES=0 llamafactory-cli api LLaMA-Factory/examples/inference/qwen_vllm.yaml
报错
You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half
看到有相同错误,但我拉的是最新代码,依旧报错
Expected behavior
No response
System Info
No response
Others
No response
Reminder
Reproduction
执行
CUDA_VISIBLE_DEVICES=0 llamafactory-cli api LLaMA-Factory/examples/inference/qwen_vllm.yaml
报错
You can use float16 instead by explicitly setting the
dtypeflag in CLI, for example: --dtype=half看到有相同错误,但我拉的是最新代码,依旧报错
Expected behavior
No response
System Info
No response
Others
No response