Releases · sgl-project/sgl-kernel-npu

06 Dec 13:52

iforgetmyname

20251206

3f61eab

20251206

What's Changed

add catlass ops demo by @ltcs11 in #200
Add release package dependencies by @monkeyLoveding in #215
bugfix: The rint interface causes ub to be unaligned. by @chenxu140 in #208
add pybind11 by @monkeyLoveding in #217
Upload wheel to release by @BourneSun0527 in #218
Upload wheel to release (#218) by @BourneSun0527 in #220
optimizer internode_dispatch hostbound by @zuje123 in #211
Release workflow switch to cpu by @BourneSun0527 in #222
switch release machine by @BourneSun0527 in #224
Add cann path to LD_LIBRARY_PATH by @BourneSun0527 in #225
[Feat] Lightning indexer op & GE helper engineering by @randgun in #203
Fixing missing.so files by @monkeyLoveding in #226
try Fixing missing.so files by @monkeyLoveding in #228
add sinks_attenton for GPT-OSS by @Todobe in #216
add zero_experts_compute_identity by @Todobe in #214
Add performance testing section to the moe script by @goosj in #198
normal_dispatch num_recv_tokens_per_expert_list support prefixSum by @zuje123 in #221
A2 dispatch/combine layered operator adaptation for SGLang interface by @oagniqgnat in #209
Add swiglu_oai for GPT-OSS by @Todobe in #233
[DFX] Adaptable to multiple model validations for fused moe by @kaniel-outis in #229
[Bugfix] add padding cases for causal_conv1d_update by @ltcs11 in #235
[Feat] add chunk_gated_delta_rule triton support by @ltcs11 in #232
Add two mixed-race tests: normal and low latency, normal and fused deep moe. by @goosj in #206
debug deepep build by @BourneSun0527 in #231
rework release build by @iforgetmyname in #237

New Contributors

@ltcs11 made their first contribution in #200
@monkeyLoveding made their first contribution in #215

Full Changelog: 2025112...2025120

Contributors

ltcs11, iforgetmyname, and 9 other contributors

Assets 2

28 Nov 08:39

iforgetmyname

20251128

da5a4e0

20251128

What's Changed

add test internode for deepep by @zuje123 in #193
Support run normal mode deepep on a single A2 machine by @luanyundu in #201
[Test] Testing the generalization of fused moe by @kaniel-outis in #167
Add whl packages to Github Release by @BourneSun0527 in #204
Add two scripts by @DubhepPan in #119
support long cat on a3 by @luanyundu in #182
calculate dispatch normal input parameters using npu instead of cpu by @lih827 in #177
Add alloc_extend_kernel by @hw-csong in #196
Modify deepep README_CN.md by @oagniqgnat in #187
notify_dispatch kernel change magic from int32_t to uint64_t by @zuje123 in #202

Full Changelog: 2025112...2025112

Contributors

kaniel-outis, luanyundu, and 6 other contributors

Assets 2

20 Nov 01:10

iforgetmyname

20251120

a8e003c

20251120

What's Changed

dispatch and combine batchsize support 4096 for A2 by @ruiqiangworking in #173
remove redundant check by @ruiqiangworking in #175
optimize deepep setup, package name with cann version by @zuje123 in #178
deepep low_latency d&c support a2 single server by @zuje123 in #176
Add README files for mlapo and batch_transpose_matmul by @randgun in #104
Support device with different counts of AICore (FusedDeepMoe operator) by @wangqiankun13 in #180
Add triton decode attention kernels by @RuixuanZhang06 in #184
fix cann version check by @hustmf in #188
update a verification of HCCL_BUFFSIZE for moe by @goosj in #183
op transfer kv fixbug by @husf1130 in #194
add_norm_bias and split_qkv_norm_rope for qwen3 by @chenxu140 in #157
[Chore] Upgrade CANN to 8.3.RC1 by @iforgetmyname in #195

New Contributors

@hustmf made their first contribution in #188
@chenxu140 made their first contribution in #157

Full Changelog: 2025111...2025112

Contributors

iforgetmyname, hustmf, and 8 other contributors

Assets 2

10 Nov 12:32

iforgetmyname

20251110

715fb0c

20251110

What's Changed

Added custom low_latency operators for dispatch/combine in the A2 dec… by @oagniqgnat in #166
deepep support internode api by @zuje123 in #169
add layout to ops2 directory by @luanyundu in #171
Modified the deep_ep README and add A2 operator performance data. by @oagniqgnat in #168
feat: add verify_tree_greedy_kernel triton kernel by @ranjiewen in #165
optimize a2 layered combine kernel code by @ruiqiangworking in #172
feat:tiny bugfix&Performance Optimization by @Yael-X in #170

New Contributors

@ruiqiangworking made their first contribution in #172

Full Changelog: 2025110...2025111

Contributors

luanyundu, Yael-X, and 4 other contributors

Assets 2

06 Nov 08:12

iforgetmyname

20251106

aae2a1a

20251106

What's Changed

Add dependency on the moe header file of CANN by @DubhepPan in #152
support small bs = 1 or 2 by @wangyibo1005 in #150
feat:adapt x86_64 compilation by @Yael-X in #143
[DFX] Compatible with CAN 8.2 and CAN 8.3 by @kaniel-outis in #158
add mla_preprocess test script by @LinyuanLi0046 in #153
[DFX] adapt cann8.3 by @kaniel-outis in #159
[bugfix] swiglu quant by @Liwansi in #162
[New Ops] build tree efficient by @hw-csong in #161
support shallow fused topk=-1 by @wangyibo1005 in #160
support kvcacheio by @husf1130 in #163
improve layout kernel on a2 by @luanyundu in #164

New Contributors

@DubhepPan made their first contribution in #152
@Liwansi made their first contribution in #162
@hw-csong made their first contribution in #161

Full Changelog: 2025103...2025110

Contributors

kaniel-outis, luanyundu, and 7 other contributors

Assets 2

30 Oct 07:22

iforgetmyname

20251030

97fb68c

20251030

What's Changed

add a2 dispatch layout and update its test by @luanyundu in #149
support topk=-1 by @wangyibo1005 in #132
add env to decide whether send out prefix sum or not by @luanyundu in #151
refactor: make hiddenStateDim a class member in MlaTilingData, Follow up closed PR#82 by @LinyuanLi0046 in #133
support cachemode int8_nzcache with bf16 in mla_preprocess by @LinyuanLi0046 in #135
add op transfer_kv_dim_exchange by @husf1130 in #148
impl fused_swiglu_quant with group_list for deepep-low-latency by @xiaobaicxy in #155
[Kernel] add Flash-Linear-Attention/layernorm_gated Triton op by @iforgetmyname in #154

New Contributors

@LinyuanLi0046 made their first contribution in #133
@husf1130 made their first contribution in #148
@xiaobaicxy made their first contribution in #155

Full Changelog: 2025102...2025103

Contributors

iforgetmyname, xiaobaicxy, and 4 other contributors

Assets 2

23 Oct 09:21

iforgetmyname

20251023

e8c6ab4

20251023

What's Changed

Change the padding generation from randperm back to arange by @oagniqgnat in #140
LoRA: moving kernels from vllm-ascend repo by @vlserov in #128
Update README.md of DeepEp by @goosj in #144

New Contributors

@vlserov made their first contribution in #128
@goosj made their first contribution in #144

Full Changelog: 2025102...2025102

Contributors

oagniqgnat, vlserov, and goosj

Assets 2

22 Oct 11:26

iforgetmyname

20251022

4dc412c

20251022

What's Changed

Update README.md: Add performace of normal and low latency dispatch/combine by @oagniqgnat in #106
Support debug info for build by @jia-rundong in #99
Update README by @oagniqgnat in #115
Synchronous fusion moe by @kaniel-outis in #108
Fix the severe performance degradation issue of the top9 dispatch in normal mode compared to top8. by @oagniqgnat in #117
feat:add moe fused operator test draft by @Yael-X in #120
mlapo fit different hidden state dim by @Todobe in #82
Not use download.pytorch.org by @jia-rundong in #121
EPLB for fused_deep_moe by @wangyibo1005 in #116
[FusedDeepMoe] Support EPLB by @kaniel-outis in #118
Support different token hidden sizes and gmm hidden sizes [FusedDeepMoe Operator] by @wangqiankun13 in #123
Delete left useless code [FusedDeepMoe Operator] by @wangqiankun13 in #129
update qwen3-next performance kernels by @iforgetmyname in #130
[Bugfix] Remove unused code that causes split failure in Qwen3-Next by @iforgetmyname in #142

New Contributors

@Todobe made their first contribution in #82
@wangqiankun13 made their first contribution in #123

Full Changelog: 2025092...2025102

Contributors

iforgetmyname, kaniel-outis, and 6 other contributors

Assets 2

26 Sep 01:13

iforgetmyname

20250926

41ba5d7

20250926

What's Changed

Reapply fix for hccl buffer use and verify by @zuje123 in #91
Compilation warnings pending cleanup by @oagniqgnat in #86
Add CI to test args -a deepep by @jia-rundong in #84
[Feature] Add diagnostic modules to dispatch and combine by @oagniqgnat in #95
add FusedDeepMoe by @wangyibo1005 in #92
fused_moe_for_sglang by @kaniel-outis in #94
fix some bug for fused moe by @kaniel-outis in #102
unfold layout expert limit and fix bug by @luanyundu in #107
Added --pressure-test function in test_low_latency by @oagniqgnat in #101
fix for sglang verl and readme by @lbk-sys in #98
[feat] add batch_matmul_transpose op by @randgun in #77
update fused moe readme by @kaniel-outis in #110
Modify test_low_latency to support int8 quantization testing. by @oagniqgnat in #109
feat:add env var to switch quant by @Yael-X in #112

New Contributors

@oagniqgnat made their first contribution in #86
@wangyibo1005 made their first contribution in #92
@kaniel-outis made their first contribution in #94

Full Changelog: 2025091...2025092

Contributors

kaniel-outis, randgun, and 7 other contributors

Assets 2

25 Sep 09:31

iforgetmyname

20250913

608d1cc

20250913

What's Changed

mlapo support bf16 KV Cache NZ format by @shengzhaotian in #79
Fix the memory verification issue within intranode dispatch by @lih827 in #83
[Feature] add fla and mamba kernels by @iforgetmyname in #87
Revert "Fix the memory verification issue within intranode dispatch" by @iforgetmyname in #88
Revert "Separate the buffers used by D/C and notify_dispatch to avoid conflicts" by @iforgetmyname in #89

Full Changelog: 2025090...2025091

Contributors

shengzhaotian, iforgetmyname, and lih827

Assets 2

Releases: sgl-project/sgl-kernel-npu

20251206

What's Changed

New Contributors

Contributors

Uh oh!

20251128

What's Changed

Contributors

Uh oh!

20251120

What's Changed

New Contributors

Contributors

Uh oh!

20251110

What's Changed

New Contributors

Contributors

Uh oh!

20251106

What's Changed

New Contributors

Contributors

Uh oh!

20251030

What's Changed

New Contributors

Contributors

Uh oh!

20251023

What's Changed

New Contributors

Contributors

Uh oh!

20251022

What's Changed

New Contributors

Contributors

Uh oh!

20250926

What's Changed

New Contributors

Contributors

Uh oh!

20250913

What's Changed

Contributors

Uh oh!