sglang on Ascend 910C runs less than half the speed of vllm-ascend

Hi, I encountered a significant performance gap when running sglang on Ascend 910C compared to vllm-ascend.

### Test Results

| Hardware      | Framework      | model  | Speed (tokens/sec) |
|---------------|----------------|------------------|------------------|
| NVIDIA 4090D  | vLLM           | Qwen3-8B | ~1055            |
| NVIDIA 4090D  | sglang          | Qwen3-8B | ~1109            |
| Ascend 910C   | vllm-ascend    | Qwen3-32B | ~567             |
| Ascend 910C   | sglang       | Qwen3-32B  | ~244             |

On 4090D, vLLM and sglang have very similar performance.  
But on Ascend 910C, sglang is less than half the speed of vllm-ascend.

What could be the potential factors causing this performance difference on Ascend 910C?

Thank you for your help!

Below is a screenshot of the experimental results.

<img width="1184" height="456" alt="Image" src="https://github.com/user-attachments/assets/6308b41b-67ef-446c-a33c-418c7c62e0e8" />

---

<img width="1284" height="521" alt="Image" src="https://github.com/user-attachments/assets/d84bd652-ad81-4073-978a-6c30285b589b" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sglang on Ascend 910C runs less than half the speed of vllm-ascend #207

Test Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hardware	Framework	model	Speed (tokens/sec)
NVIDIA 4090D	vLLM	Qwen3-8B	~1055
NVIDIA 4090D	sglang	Qwen3-8B	~1109
Ascend 910C	vllm-ascend	Qwen3-32B	~567
Ascend 910C	sglang	Qwen3-32B	~244

sglang on Ascend 910C runs less than half the speed of vllm-ascend #207

Description

Test Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions