[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers)

### Description

We are looking to evaluate the current inference performance of [zai-org/GLM-Image](https://github.com/zai-org/GLM-Image) when running on the **sglang-diffusion** engine compared to the baseline **Diffusers** implementation.

Preliminary observations suggest that the current implementation for GLM-Image within our stack may be under-optimized. Specifically, it appears to lack support for **Sequence Parallelism (SP)**, which is crucial for handling high-resolution image generation efficiently. Improving this will not only boost GLM-Image performance but also provide architectural insights for the broader SGLang-D project.

### Goals

1. **Benchmarking:** Establish a performance baseline (latency, throughput, and VRAM usage) for GLM-Image using both `sglang-diffusion` and `diffusers`.
2. **Profiling:** Identify bottlenecks in the current `sglang-diffusion` path for this model (e.g., attention kernels, memory overhead).
3. **Optimization (Optional/Bonus):** Propose or implement initial optimizations, such as enabling Sequence Parallelism or improving memory management.

### Technical Tasks

* [ ] Set up a reproducible benchmarking script for GLM-Image.
* [ ] Compare inference latency across different batch sizes and resolutions.
* [ ] Analyze if and where Sequence Parallelism can be integrated into the current GLM-Image wrapper.
* [ ] Document the findings in a detailed report or table within this issue.

You can read this as reference:

https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/sgl_diffusion_en.md

**Calling SGLang-D community members!** If you are interested in high-performance computing, kernel optimization, or the latest diffusion models, we would love your help on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers) #18077

Description

Goals

Technical Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers) #18077

Description

Description

Goals

Technical Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions