[Bug] NPU Qwen3-32B eagle3 failed with sgl_kernel_npu error.

## Bug Report
Failed to start serving with following errors.

```
[2026-01-22 08:26:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-01-22 08:26:23 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 2957, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 1126, in event_loop_overlap
    batch_result = self.run_batch(batch)
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 2278, in run_batch
    batch_result = self.model_worker.forward_batch_generation(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 656, in forward_batch_generation
    self.draft_worker._draft_extend_for_prefill(
  File "/home/chz/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 500, in _draft_extend_for_prefill
    logits_output = self.draft_runner.forward(forward_batch).logits_output
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2234, in forward
    output = self._forward_raw(
             ^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2333, in _forward_raw
    ret = self.forward_extend(
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2173, in forward_extend
    return self.model.forward(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 508, in forward
    hidden_states = self.model(
                    ^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/models/llama_eagle3.py", line 170, in forward
    hidden_states, residual = self.midlayer(
                              ^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/models/llama_eagle3.py", line 90, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 230, in forward
    q, k, v = self.forward_prepare_npu(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 208, in forward_prepare_npu
    q, k, v = split_qkv_rmsnorm_rope(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11/lib/python3.11/site-packages/sgl_kernel_npu/norm/split_qkv_rmsnorm_rope.py", line 218, in split_qkv_rmsnorm_rope
    assert KV_BLOCK_SIZE == head_dim
           ^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

server_eagle.sh: line 24: 576903 Killed                  python3 -m sglang.launch_server --model-path /home/chz/Qwen3-32B --host 0.0.0.0 --port 37654 --device npu --attention-backend ascend --tp-size 4 --dp-size 1 --mem-fraction-static 0.7 --disable-cuda-graph --speculative-draft-model-path "/home/ckpt/Qwen3-32B_eagle3" --speculative-algorithm "EAGLE3" --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4

```

## Reproduction
Running with following scripts, refering https://docs.sglang.io/platforms/ascend_npu_qwen3_examples.html.

```
export SGLANG_SET_CPU_AFFINITY=1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export STREAMS_PER_DEVICE=32
export HCCL_OP_EXPANSION_MODE=AIV
export SGLANG_ENABLE_OVERLAP_PLAN_STREAM=1
export SGLANG_ENABLE_SPEC_V2=1

python3 -m sglang.launch_server \
    --model-path /home/chz/Qwen3-32B \
    --host 0.0.0.0 \
    --port 37654 \
    --device npu \
    --attention-backend ascend \
    --tp-size 4 \
    --dp-size 1 \
    --mem-fraction-static 0.7 \
    --disable-cuda-graph \
    --speculative-draft-model-path "/home/ckpt/Qwen3-32B_eagle3" \
    --speculative-algorithm "EAGLE3" \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 
```

## Environment
```
Python: 3.11.11 (main, May 29 2025, 13:02:50) [Clang 17.0.6 ( 17.0.6-7.oe2203sp4)]
NPU available: True
NPU 0,1,2,3,4,5,6,7: Ascend910B1
CANN_HOME: /usr/local/Ascend/ascend-toolkit/latest
CANN: 8.3.0.1.200:8.3.RC1
BiSheng: 2025-10-24T18:53:37+08:00 clang version 15.0.5 (clang-5c68a1cb1231 flang-5c68a1cb1231)
Ascend Driver Version: 25.2.0
PyTorch: 2.7.1+cpu
sglang: 0.5.5.post3
sgl_kernel: Module Not Found
flashinfer_python: Module Not Found
flashinfer_cubin: Module Not Found
flashinfer_jit_cache: Module Not Found
triton: Module Not Found
transformers: 4.57.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.12.13
fastapi: 0.115.14
hf_transfer: 0.1.9
huggingface_hub: 0.34.3
interegular: 0.3.3
modelscope: 1.28.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.0
uvicorn: 0.35.0
uvloop: 0.21.0
vllm: 0.8.4.dev0+g296c657.d20250701.empty
xgrammar: 0.1.25
openai: 1.99.1
tiktoken: 0.9.0
anthropic: 0.57.1
litellm: 1.74.0.post1
decord2: 2.0.0
torch_npu: 2.7.1.post1.dev20251107
sgl-kernel-npu: 2026.1.12
deep_ep: Module Not Found
Ascend Topology:
           NPU0       NPU1       NPU2       NPU3       NPU4       NPU5       NPU6       NPU7       CPU Affinity
NPU0       X          HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       144-167
NPU1       HCCS       X          HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       0-23
NPU2       HCCS       HCCS       X          HCCS       HCCS       HCCS       HCCS       HCCS       144-167
NPU3       HCCS       HCCS       HCCS       X          HCCS       HCCS       HCCS       HCCS       0-23
NPU4       HCCS       HCCS       HCCS       HCCS       X          HCCS       HCCS       HCCS       96-119
NPU5       HCCS       HCCS       HCCS       HCCS       HCCS       X          HCCS       HCCS       48-71
NPU6       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       X          HCCS       96-119
NPU7       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       HCCS       X          48-71

Legend:

  X    = Self
  SYS  = Path traversing PCIe and NUMA nodes. Nodes are connected through SMP, such as QPI, UPI.
  PHB  = Path traversing PCIe and the PCIe host bridge of a CPU.
  PIX  = Path traversing a single PCIe switch
  PXB  = Path traversing multipul PCIe switches
  HCCS = Connection traversing HCCS.
  NA   = Unknown relationship.

ulimit soft: 1048576
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] NPU Qwen3-32B eagle3 failed with sgl_kernel_npu error. #343

Bug Report

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] NPU Qwen3-32B eagle3 failed with sgl_kernel_npu error. #343

Description

Bug Report

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions