Skip to content

Added custom low_latency operators for dispatch/combine in the A2 dec…#166

Merged
Yael-X merged 9 commits intosgl-project:mainfrom
oagniqgnat:a2_low_latency
Nov 6, 2025
Merged

Added custom low_latency operators for dispatch/combine in the A2 dec…#166
Yael-X merged 9 commits intosgl-project:mainfrom
oagniqgnat:a2_low_latency

Conversation

@oagniqgnat
Copy link
Contributor

@oagniqgnat oagniqgnat commented Nov 6, 2025

Modification:

  1. A2 and A3 are packaged separately. To package on A2, you need to run bash build.sh -a deepep2.
  2. If HCCL_INTRA_PCIE_ENABLE=1 and HCCL_INTRA_ROCE_ENABLE=0 are configured on A2, a hierarchical implementation of dispatch_low_latency/combine_low_latency will be executed; otherwise, a non-hierarchical implementation will be performed.
  3. In the A2 dispatch_low_latency hierarchical implementation, an additional parameter topk_weights needs to be passed. In addition, an extra 1D Tensor expand_scales with shape (A,) will be returned. expand_scales will replace topk_weights as the weight parameter for the internal kernel in low_latency_combine.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@oagniqgnat oagniqgnat force-pushed the a2_low_latency branch 9 times, most recently from 4eeb5bc to 1aafad8 Compare November 6, 2025 03:07
@Yael-X Yael-X merged commit 15801a8 into sgl-project:main Nov 6, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants