Hi, I encountered a significant performance gap when running sglang on Ascend 910C compared to vllm-ascend.
Test Results
| Hardware |
Framework |
model |
Speed (tokens/sec) |
| NVIDIA 4090D |
vLLM |
Qwen3-8B |
~1055 |
| NVIDIA 4090D |
sglang |
Qwen3-8B |
~1109 |
| Ascend 910C |
vllm-ascend |
Qwen3-32B |
~567 |
| Ascend 910C |
sglang |
Qwen3-32B |
~244 |
On 4090D, vLLM and sglang have very similar performance.
But on Ascend 910C, sglang is less than half the speed of vllm-ascend.
What could be the potential factors causing this performance difference on Ascend 910C?
Thank you for your help!
Below is a screenshot of the experimental results.
