CANN: Multi-Stream Support #23

cerrarlosojos · 2025-12-28T15:29:18Z

描述 (Description)

本次 PR 针对 CANN 后端实现了多流并行执行支持和计算图优化，主要包括：

1. 多流并行执行支持

新增 GGML_CANN_MULTI_STREAM 环境变量，用于启用多流并行执行模式
新增 GGML_CANN_NUM_STREAMS 环境变量，用于配置并行流的数量（默认为 4，最大为 GGML_CANN_MAX_STREAMS）
实现基于依赖关系的流调度：无依赖的节点使用轮询分配，有依赖的节点尽量复用源节点所在的流
使用 ACL Event 机制实现跨流同步，确保数据依赖的正确性

2. 计算图节点重排优化

参考PR vulkan: sort graph to allow more parallel execution ggml-org/llama.cpp#15850，新增 ggml_cann_optimize_graph 函数，对计算图节点进行重排以提高并行度

测试 (Testing)

使用 Llama-3.1-8B-Instruct-Q4_K_M 模型在 Ascend 910B3 上进行性能测试，每种配置运行 10 次取平均值：

配置	Prompt Eval Speed	Eval Speed
Multi-stream (4 streams)	6.14 tokens/s	0.455 tokens/s
Single stream (原始模式)	4.28 tokens/s	0.246 tokens/s
提升比例	+43%	+85%

示例命令

# 启用多流模式（默认 4 个流）
GGML_CANN_MULTI_STREAM=1 ./build/bin/llama-cli -m models/Llama-3.1-8B-Instruct-Q4_K_M.gguf -p "Building a website can be done in 10 steps:" -ngl 32

noemotiovon · 2026-01-07T08:58:55Z

看起来是一个很棒的工作！可以把相应的代码贡献到上游社区嘛？
另外你的测试使用的是Llama-3.1-8B-Instruct-Q4_K_M，这种量化类型在NPU上还不支持，可以选择使用Qwen2.5-0.5B-FP16，或者Qwen2.5-7B-FP16等模型来测试。

hipudding · 2026-01-07T09:08:15Z

使用fp16的数据格式做性能对比验证

CeoxNim2000 · 2026-01-12T17:37:27Z

我是组员，我补充一下Qwen2.5-0.5b-instruct-fp16模型的测试结果

Qwen2.5-0.5b-instruct-fp16 性能测试（Ascend 910B3，单轮测试）

配置	Prompt Eval Speed（tokens/s）	Eval Speed（tokens/s）
Single stream（原始模式）	94.78	5.22
Multi-stream（4 streams）	517.38	13.27
提升比例	+445.8%	+154.2%

noemotiovon · 2026-01-13T06:56:25Z

我刚使用了这个分支，在没有使用任何环境变量的情况下，执行脚本：

./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd -p "Building a website can be done in 10 steps:" -ngl 99

发现有精度问题：

feat: multi stream support

7ac470b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN: Multi-Stream Support #23

CANN: Multi-Stream Support #23

Uh oh!

cerrarlosojos commented Dec 28, 2025

Uh oh!

noemotiovon commented Jan 7, 2026 •

edited

Loading

Uh oh!

hipudding commented Jan 7, 2026

Uh oh!

CeoxNim2000 commented Jan 12, 2026

Uh oh!

noemotiovon commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CANN: Multi-Stream Support #23

Are you sure you want to change the base?

CANN: Multi-Stream Support #23

Uh oh!

Conversation

cerrarlosojos commented Dec 28, 2025

描述 (Description)

1. 多流并行执行支持

2. 计算图节点重排优化

测试 (Testing)

示例命令

Uh oh!

noemotiovon commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hipudding commented Jan 7, 2026

Uh oh!

CeoxNim2000 commented Jan 12, 2026

Qwen2.5-0.5b-instruct-fp16 性能测试（Ascend 910B3，单轮测试）

Uh oh!

noemotiovon commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

noemotiovon commented Jan 7, 2026 •

edited

Loading