Skip to content

在使用llamafactory-cli api加载qwen2-72B-Instruct-AWQ出现CUDA out of memory #4326

@ToviHe

Description

@ToviHe

Reminder

  • I have read the README and searched the existing issues.

System Info

image

跑在内网环境,暂时只能通过截图

Reproduction

image

我看了下Qwen2官网提供的显存数据,确实需要40G以上,是AWQ量化版本不支持多卡推理吗?
image

目前机器是有四张A100 40G显卡,容器启动只指定了两张,现在运行情况看,只用了其中一张(因为显存不够,直接终止程序),另一张显卡压根没有使用到

image

Expected behavior

希望能提供参数支持AWQ量化模型或正常模型成功推理运行。现在翻了issue和文档没看见怎么指定多卡推理的

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions