-
Notifications
You must be signed in to change notification settings - Fork 90
Open
Description
Bug Report
Failed to start serving with following errors.
[2026-01-22 08:26:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-01-22 08:26:23 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 2957, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 1126, in event_loop_overlap
batch_result = self.run_batch(batch)
^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/managers/scheduler.py", line 2278, in run_batch
batch_result = self.model_worker.forward_batch_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 656, in forward_batch_generation
self.draft_worker._draft_extend_for_prefill(
File "/home/chz/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 500, in _draft_extend_for_prefill
logits_output = self.draft_runner.forward(forward_batch).logits_output
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2234, in forward
output = self._forward_raw(
^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2333, in _forward_raw
ret = self.forward_extend(
^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/model_executor/model_runner.py", line 2173, in forward_extend
return self.model.forward(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 508, in forward
hidden_states = self.model(
^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/models/llama_eagle3.py", line 170, in forward
hidden_states, residual = self.midlayer(
^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/models/llama_eagle3.py", line 90, in forward
hidden_states = self.self_attn(
^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 230, in forward
q, k, v = self.forward_prepare_npu(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chz/sglang/python/sglang/srt/models/llama.py", line 208, in forward_prepare_npu
q, k, v = split_qkv_rmsnorm_rope(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Python-3.11/lib/python3.11/site-packages/sgl_kernel_npu/norm/split_qkv_rmsnorm_rope.py", line 218, in split_qkv_rmsnorm_rope
assert KV_BLOCK_SIZE == head_dim
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
server_eagle.sh: line 24: 576903 Killed python3 -m sglang.launch_server --model-path /home/chz/Qwen3-32B --host 0.0.0.0 --port 37654 --device npu --attention-backend ascend --tp-size 4 --dp-size 1 --mem-fraction-static 0.7 --disable-cuda-graph --speculative-draft-model-path "/home/ckpt/Qwen3-32B_eagle3" --speculative-algorithm "EAGLE3" --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
Reproduction
Running with following scripts, refering https://docs.sglang.io/platforms/ascend_npu_qwen3_examples.html.
export SGLANG_SET_CPU_AFFINITY=1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export STREAMS_PER_DEVICE=32
export HCCL_OP_EXPANSION_MODE=AIV
export SGLANG_ENABLE_OVERLAP_PLAN_STREAM=1
export SGLANG_ENABLE_SPEC_V2=1
python3 -m sglang.launch_server \
--model-path /home/chz/Qwen3-32B \
--host 0.0.0.0 \
--port 37654 \
--device npu \
--attention-backend ascend \
--tp-size 4 \
--dp-size 1 \
--mem-fraction-static 0.7 \
--disable-cuda-graph \
--speculative-draft-model-path "/home/ckpt/Qwen3-32B_eagle3" \
--speculative-algorithm "EAGLE3" \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4
Environment
Python: 3.11.11 (main, May 29 2025, 13:02:50) [Clang 17.0.6 ( 17.0.6-7.oe2203sp4)]
NPU available: True
NPU 0,1,2,3,4,5,6,7: Ascend910B1
CANN_HOME: /usr/local/Ascend/ascend-toolkit/latest
CANN: 8.3.0.1.200:8.3.RC1
BiSheng: 2025-10-24T18:53:37+08:00 clang version 15.0.5 (clang-5c68a1cb1231 flang-5c68a1cb1231)
Ascend Driver Version: 25.2.0
PyTorch: 2.7.1+cpu
sglang: 0.5.5.post3
sgl_kernel: Module Not Found
flashinfer_python: Module Not Found
flashinfer_cubin: Module Not Found
flashinfer_jit_cache: Module Not Found
triton: Module Not Found
transformers: 4.57.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.12.13
fastapi: 0.115.14
hf_transfer: 0.1.9
huggingface_hub: 0.34.3
interegular: 0.3.3
modelscope: 1.28.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.0
uvicorn: 0.35.0
uvloop: 0.21.0
vllm: 0.8.4.dev0+g296c657.d20250701.empty
xgrammar: 0.1.25
openai: 1.99.1
tiktoken: 0.9.0
anthropic: 0.57.1
litellm: 1.74.0.post1
decord2: 2.0.0
torch_npu: 2.7.1.post1.dev20251107
sgl-kernel-npu: 2026.1.12
deep_ep: Module Not Found
Ascend Topology:
NPU0 NPU1 NPU2 NPU3 NPU4 NPU5 NPU6 NPU7 CPU Affinity
NPU0 X HCCS HCCS HCCS HCCS HCCS HCCS HCCS 144-167
NPU1 HCCS X HCCS HCCS HCCS HCCS HCCS HCCS 0-23
NPU2 HCCS HCCS X HCCS HCCS HCCS HCCS HCCS 144-167
NPU3 HCCS HCCS HCCS X HCCS HCCS HCCS HCCS 0-23
NPU4 HCCS HCCS HCCS HCCS X HCCS HCCS HCCS 96-119
NPU5 HCCS HCCS HCCS HCCS HCCS X HCCS HCCS 48-71
NPU6 HCCS HCCS HCCS HCCS HCCS HCCS X HCCS 96-119
NPU7 HCCS HCCS HCCS HCCS HCCS HCCS HCCS X 48-71
Legend:
X = Self
SYS = Path traversing PCIe and NUMA nodes. Nodes are connected through SMP, such as QPI, UPI.
PHB = Path traversing PCIe and the PCIe host bridge of a CPU.
PIX = Path traversing a single PCIe switch
PXB = Path traversing multipul PCIe switches
HCCS = Connection traversing HCCS.
NA = Unknown relationship.
ulimit soft: 1048576
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels