Added custom low_latency operators for dispatch/combine in the A2 dec… by oagniqgnat · Pull Request #166 · sgl-project/sgl-kernel-npu

oagniqgnat · 2025-11-06T02:37:48Z

Modification：

A2 and A3 are packaged separately. To package on A2, you need to run bash build.sh -a deepep2.
If HCCL_INTRA_PCIE_ENABLE=1 and HCCL_INTRA_ROCE_ENABLE=0 are configured on A2, a hierarchical implementation of dispatch_low_latency/combine_low_latency will be executed; otherwise, a non-hierarchical implementation will be performed.
In the A2 dispatch_low_latency hierarchical implementation, an additional parameter topk_weights needs to be passed. In addition, an extra 1D Tensor expand_scales with shape (A,) will be returned. expand_scales will replace topk_weights as the weight parameter for the internal kernel in low_latency_combine.

gemini-code-assist · 2025-11-06T02:37:52Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…oding phase

…d_deep_moe.py.

oagniqgnat force-pushed the a2_low_latency branch 9 times, most recently from 4eeb5bc to 1aafad8 Compare November 6, 2025 03:07

Added custom low_latency operators for dispatch/combine in the A2 dec…

c791be8

…oding phase

oagniqgnat force-pushed the a2_low_latency branch from 1aafad8 to c791be8 Compare November 6, 2025 03:08

oagniqgnat and others added 4 commits November 6, 2025 11:16

util makeself

7f3abe9

util .py

182db61

fix 512bs

2aa862a

combine support larger bs

dd41950

oagniqgnat force-pushed the a2_low_latency branch from 6ec2643 to dd41950 Compare November 6, 2025 03:44

fix linting

360ab49

oagniqgnat force-pushed the a2_low_latency branch from 1cff65b to 360ab49 Compare November 6, 2025 04:21

oagniqgnat added 3 commits November 6, 2025 12:43

Add the opk_weights to the low_latency_dispatch function in test_fuse…

267ea25

…d_deep_moe.py.

Modify the layered + quantized diff to 1e-4

062b6c0

modify dispatch_low_latency in buffer.py

513b412

oagniqgnat force-pushed the a2_low_latency branch from 6bed429 to 513b412 Compare November 6, 2025 05:26

Yael-X approved these changes Nov 6, 2025

View reviewed changes

Yael-X merged commit 15801a8 into sgl-project:main Nov 6, 2025
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added custom low_latency operators for dispatch/combine in the A2 dec…#166

Added custom low_latency operators for dispatch/combine in the A2 dec…#166
Yael-X merged 9 commits intosgl-project:mainfrom
oagniqgnat:a2_low_latency

oagniqgnat commented Nov 6, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oagniqgnat commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oagniqgnat commented Nov 6, 2025 •

edited

Loading