Skip to content

[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers) #18077

@zhaochenyang20

Description

@zhaochenyang20

Description

We are looking to evaluate the current inference performance of zai-org/GLM-Image when running on the sglang-diffusion engine compared to the baseline Diffusers implementation.

Preliminary observations suggest that the current implementation for GLM-Image within our stack may be under-optimized. Specifically, it appears to lack support for Sequence Parallelism (SP), which is crucial for handling high-resolution image generation efficiently. Improving this will not only boost GLM-Image performance but also provide architectural insights for the broader SGLang-D project.

Goals

  1. Benchmarking: Establish a performance baseline (latency, throughput, and VRAM usage) for GLM-Image using both sglang-diffusion and diffusers.
  2. Profiling: Identify bottlenecks in the current sglang-diffusion path for this model (e.g., attention kernels, memory overhead).
  3. Optimization (Optional/Bonus): Propose or implement initial optimizations, such as enabling Sequence Parallelism or improving memory management.

Technical Tasks

  • Set up a reproducible benchmarking script for GLM-Image.
  • Compare inference latency across different batch sizes and resolutions.
  • Analyze if and where Sequence Parallelism can be integrated into the current GLM-Image wrapper.
  • Document the findings in a detailed report or table within this issue.

You can read this as reference:

https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/sgl_diffusion_en.md

Calling SGLang-D community members! If you are interested in high-performance computing, kernel optimization, or the latest diffusion models, we would love your help on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Good Pro IssueIssues for experienced contributors; requires a solid understanding of SGLang internals.diffusionSGLang Diffusion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions