[SGLang-Diffusion] Add offline throughput benchmark script for multi-modal models by haojin2 · Pull Request #18154 · sgl-project/sglang

haojin2 · 2026-02-03T05:46:27Z

Motivation

Address part of step 1 for #18077

Modifications

Added bench_offline_throughput.py under multimodal_gen similar to the counterpart for LLM

Accuracy Tests

N/A

Benchmarking and Profiling

Need all diffusion dependencies:
pip install imageio cache_dit remote-pdb accelerate addict
Need to install source version of transformers and diffusers
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/diffusers
Sample single-GPU (RTX 6000 pro) run with GLM-Image + sglang backend + torch.compile: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend sglang --enable-torch-compile --num-prompts 20 --batch-size 1 with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     233.38                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0857                        
MP Throughput (MP/sec):                       0.0225                        
Requests Per Second:                          0.0857                        
Latency Per Request (sec):                    11.6688                       
Peak Memory (MB):                             0                             
==============================================================================================================

Sample single-GPU (RTX 6000 pro) run with GLM-Image + diffusers backend: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend diffusers --num-prompts 20 --batch-size 1 with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     246.26                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0812                        
MP Throughput (MP/sec):                       0.0213                        
Requests Per Second:                          0.0812                        
Latency Per Request (sec):                    12.3132                       
Peak Memory (MB):                             0                             
==============================================================================================================

Verification of refactored bench_serving.py (on RTX 6000 pro) with GLM-Image

\server: sglang serve --model-path zai-org/GLM-Image --backend sglang
bench_serving: python3 -m sglang.multimodal_gen.benchmarks.bench_serving --dataset random --num-prompts 10 --width 512 --height 512 --model zai-org/GLM-Image

================= Serving Benchmark Result =================
Task:                                    text-to-image  
Model:                                   zai-org/GLM-Image
Dataset:                                 random         
--------------------------------------------------
Benchmark duration (s):                  131.30         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.08           
Latency Mean (s):                        13.1293        
Latency Median (s):                      12.9035        
Latency P99 (s):                         14.9457        
--------------------------------------------------
Peak Memory Max (MB):                    35387.64       
Peak Memory Mean (MB):                   35387.45       
Peak Memory Median (MB):                 35387.64       
============================================================

TODO: verify on all currently-supported models under multimodal_gen for runnability

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-03T05:46:31Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

mickqian · 2026-02-03T11:09:10Z

also, could you clean the code a bit?x

haojin2 · 2026-02-05T09:00:36Z

cc @zhaochenyang20 Refactored as requested
Also tested for new bench_serving script

zhaochenyang20 · 2026-02-05T17:17:44Z

Could you update your PR description also? I think it's two days ago.

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

zhaochenyang20 · 2026-02-05T17:53:26Z

We can first have this PR merged. And I think the profiling of diffusion router could be interesting:

radixark/miles#544 (comment)

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

zhaochenyang20

Refactor the print lines in LLM and Diffusion. I think you can put a helper function in https://github.com/sgl-project/sglang/blob/main/python/sglang/test/test_utils.py
debugging with the bench_offline launching commands for the engine over multi GPUs.

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

zhaochenyang20 · 2026-02-06T22:12:54Z

Also, could you also modify this document:

https://github.com/sgl-project/sglang/blob/main/python/sglang/multimodal_gen/docs/profiling.md

zhaochenyang20

I have a strong suggestion regarding the architecture of our benchmark tools. Instead of maintaining two separate scripts (bench_offline_throughput.py and bench_serving.py), we should merge them into a single, unified entry point (e.g., bench_throughput.py).

Both scenarios share identical logic for Argument Parsing, Dataset Loading, and Result Reporting/Metrics Calculation. The only distinct logic is the inference backend execution.

Unified Argument Parsing: Add a --backend argument (e.g., choices=["engine", "server"]) to switch modes.
Shared Data Loading: Reuse the datasets.py logic for both modes.
Backend Abstraction:

If backend == "engine": Initialize and launch the GPUWorker.

If backend == "server": Check the health of the endpoint.

Execution Loop: Send requests via the selected backend interface.
Unified Reporting: Calculate and print metrics using a shared logic to ensure fair comparison between offline and online performance.

This refactoring would significantly maximize code reuse and improve maintainability. What do you think?

yhyang201 · 2026-02-20T06:33:57Z

He has already switched to DiffGenerator. Could you please take a look? Thanks. @mickqian

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

python/sglang/srt/debug_utils/dumper.py

python/sglang/multimodal_gen/benchmarks/datasets.py

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

python/sglang/multimodal_gen/benchmarks/datasets.py

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

python/sglang/multimodal_gen/benchmarks/datasets.py

Add default value `eps=1e-5` to `register_fake` implementations of `fused_norm_scale_shift` and `fused_scale_residual_norm_scale_shift` custom ops, matching the default in the actual custom_op signatures. Made-with: Cursor

…ne_bench

zhaochenyang20 · 2026-03-03T01:06:15Z

My testing for this is:

uv pip install -e ".[diffusion]"  
# This is for GLM image
pip install --upgrade transformers

python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput \
    --model-path zai-org/GLM-Image \
    --height 512 --width 512 \
    --num-inference-steps 3 \
    --backend sglang \
    --num-prompts 3

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          3                             
---------------------------------------------------------------------------
Total Requests:                               3                             
Successful Requests:                          3                             
Failed Requests:                              0                             
Total Duration (seconds):                     31.16                         
---------------------------------------------------------------------------
Frames Generated:                             3                             
Megapixels Generated:                         0.79                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.10                          
MP Throughput (MP/sec):                       0.03                          
Requests Per Second:                          0.10                          
Latency Per Request (sec):                    10.39                         
Peak Memory (MB):                             35610.00                      
==============================================================================================================

python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput \
    --model-path zai-org/GLM-Image \
    --height 512 --width 512 \
    --num-inference-steps 3 \
    --backend sglang \
    --enable-torch-compile \
    --num-prompts 3

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          3                             
---------------------------------------------------------------------------
Total Requests:                               3                             
Successful Requests:                          3                             
Failed Requests:                              0                             
Total Duration (seconds):                     31.47                         
---------------------------------------------------------------------------
Frames Generated:                             3                             
Megapixels Generated:                         0.79                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.10                          
MP Throughput (MP/sec):                       0.02                          
Requests Per Second:                          0.10                          
Latency Per Request (sec):                    10.49                         
Peak Memory (MB):                             35634.00                      
==============================================================================================================

python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput \
    --model-path zai-org/GLM-Image \
    --height 512 --width 512 \
    --num-inference-steps 3 \
    --backend sglang \
    --num-prompts 3 \
    --skip-warmup \
    --output-file /tmp/bench_result.json

cat /tmp/bench_result.json

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          3                             
---------------------------------------------------------------------------
Total Requests:                               3                             
Successful Requests:                          3                             
Failed Requests:                              0                             
Total Duration (seconds):                     39.99                         
---------------------------------------------------------------------------
Frames Generated:                             3                             
Megapixels Generated:                         0.79                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.08                          
MP Throughput (MP/sec):                       0.02                          
Requests Per Second:                          0.08                          
Latency Per Request (sec):                    13.33                         
Peak Memory (MB):                             35610.00                      
==============================================================================================================

BBuf · 2026-03-03T07:29:58Z

Motivation

Address part of step 1 for #18077

Modifications

* Added bench_offline_throughput.py under multimodal_gen similar to the counterpart for LLM

Accuracy Tests

N/A

Benchmarking and Profiling

* Need all diffusion dependencies:

* `pip install imageio cache_dit remote-pdb accelerate addict`

* Need to install source version of `transformers` and `diffusers`

* `pip install git+https://github.com/huggingface/transformers`

* `pip install git+https://github.com/huggingface/diffusers`

* Sample single-GPU (RTX 6000 pro) run with `GLM-Image` + `sglang` backend + `torch.compile`: `python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend sglang --enable-torch-compile --num-prompts 20 --batch-size 1` with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     233.38                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0857                        
MP Throughput (MP/sec):                       0.0225                        
Requests Per Second:                          0.0857                        
Latency Per Request (sec):                    11.6688                       
Peak Memory (MB):                             0                             
==============================================================================================================

* Sample single-GPU (RTX 6000 pro) run with `GLM-Image` + `diffusers` backend: `python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend diffusers --num-prompts 20 --batch-size 1` with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     246.26                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0812                        
MP Throughput (MP/sec):                       0.0213                        
Requests Per Second:                          0.0812                        
Latency Per Request (sec):                    12.3132                       
Peak Memory (MB):                             0                             
==============================================================================================================

* Verification of refactored bench_serving.py (on RTX 6000 pro) with GLM-Image

\server: sglang serve --model-path zai-org/GLM-Image --backend sglang bench_serving: python3 -m sglang.multimodal_gen.benchmarks.bench_serving --dataset random --num-prompts 10 --width 512 --height 512 --model zai-org/GLM-Image

================= Serving Benchmark Result =================
Task:                                    text-to-image  
Model:                                   zai-org/GLM-Image
Dataset:                                 random         
--------------------------------------------------
Benchmark duration (s):                  131.30         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.08           
Latency Mean (s):                        13.1293        
Latency Median (s):                      12.9035        
Latency P99 (s):                         14.9457        
--------------------------------------------------
Peak Memory Max (MB):                    35387.64       
Peak Memory Mean (MB):                   35387.45       
Peak Memory Median (MB):                 35387.64       
============================================================

* TODO: verify on all currently-supported models under multimodal_gen for runnability

Checklist

* [x]  Format your code according to the [Format code with pre-commit](https://docs.sglang.io/developer_guide/contribution_guide.html#format-code-with-pre-commit).

* [x]  Add unit tests according to the [Run and add unit tests](https://docs.sglang.io/developer_guide/contribution_guide.html#run-and-add-unit-tests).

* [ ]  Update documentation according to [Write documentations](https://docs.sglang.io/developer_guide/contribution_guide.html#write-documentations).

* [x]  Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.io/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.io/developer_guide/contribution_guide.html#benchmark-the-speed).

* [x]  Follow the SGLang code style [guidance](https://docs.sglang.io/developer_guide/contribution_guide.html#code-style-guidance).

Review Process

1. Ping Merge Oncalls to start the PR flow. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md#pull-request-merge-process).

2. Get approvals from [CODEOWNERS](https://github.com/sgl-project/sglang/blob/main/.github/CODEOWNERS) and other reviewers.

3. Trigger CI tests with [comments](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests) or contact authorized users to do so.
   
   * `/tag-run-ci-label`, `/rerun-failed-ci`, `/tag-and-rerun-ci`

4. After green CI and required approvals, ask Merge Oncalls to merge.

5. 5. 5. 5. 5. 5. 5. 5. 5. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

Why peak memory is 0MIB?

BBuf · 2026-03-03T07:31:19Z

Motivation

Address part of step 1 for #18077

Modifications

* Added bench_offline_throughput.py under multimodal_gen similar to the counterpart for LLM

Accuracy Tests

N/A

Benchmarking and Profiling

* Need all diffusion dependencies:

* `pip install imageio cache_dit remote-pdb accelerate addict`

* Need to install source version of `transformers` and `diffusers`

* `pip install git+https://github.com/huggingface/transformers`

* `pip install git+https://github.com/huggingface/diffusers`

* Sample single-GPU (RTX 6000 pro) run with `GLM-Image` + `sglang` backend + `torch.compile`: `python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend sglang --enable-torch-compile --num-prompts 20 --batch-size 1` with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     233.38                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0857                        
MP Throughput (MP/sec):                       0.0225                        
Requests Per Second:                          0.0857                        
Latency Per Request (sec):                    11.6688                       
Peak Memory (MB):                             0                             
==============================================================================================================

* Sample single-GPU (RTX 6000 pro) run with `GLM-Image` + `diffusers` backend: `python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend diffusers --num-prompts 20 --batch-size 1` with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     246.26                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0812                        
MP Throughput (MP/sec):                       0.0213                        
Requests Per Second:                          0.0812                        
Latency Per Request (sec):                    12.3132                       
Peak Memory (MB):                             0                             
==============================================================================================================

* Verification of refactored bench_serving.py (on RTX 6000 pro) with GLM-Image

\server: sglang serve --model-path zai-org/GLM-Image --backend sglang bench_serving: python3 -m sglang.multimodal_gen.benchmarks.bench_serving --dataset random --num-prompts 10 --width 512 --height 512 --model zai-org/GLM-Image

================= Serving Benchmark Result =================
Task:                                    text-to-image  
Model:                                   zai-org/GLM-Image
Dataset:                                 random         
--------------------------------------------------
Benchmark duration (s):                  131.30         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.08           
Latency Mean (s):                        13.1293        
Latency Median (s):                      12.9035        
Latency P99 (s):                         14.9457        
--------------------------------------------------
Peak Memory Max (MB):                    35387.64       
Peak Memory Mean (MB):                   35387.45       
Peak Memory Median (MB):                 35387.64       
============================================================

* TODO: verify on all currently-supported models under multimodal_gen for runnability

Checklist

* [x]  Format your code according to the [Format code with pre-commit](https://docs.sglang.io/developer_guide/contribution_guide.html#format-code-with-pre-commit).

* [x]  Add unit tests according to the [Run and add unit tests](https://docs.sglang.io/developer_guide/contribution_guide.html#run-and-add-unit-tests).

* [ ]  Update documentation according to [Write documentations](https://docs.sglang.io/developer_guide/contribution_guide.html#write-documentations).

* [x]  Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.io/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.io/developer_guide/contribution_guide.html#benchmark-the-speed).

* [x]  Follow the SGLang code style [guidance](https://docs.sglang.io/developer_guide/contribution_guide.html#code-style-guidance).

Review Process

1. Ping Merge Oncalls to start the PR flow. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md#pull-request-merge-process).

2. Get approvals from [CODEOWNERS](https://github.com/sgl-project/sglang/blob/main/.github/CODEOWNERS) and other reviewers.

3. Trigger CI tests with [comments](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests) or contact authorized users to do so.
   
   * `/tag-run-ci-label`, `/rerun-failed-ci`, `/tag-and-rerun-ci`

4. After green CI and required approvals, ask Merge Oncalls to merge.

5. 5. 5. 5. 5. 5. 5. 5. 5. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

Why peak memory is 0MIB?

updated in #18154 (comment)

BBuf

LGTM.

BBuf · 2026-03-03T07:32:28Z

/tag-and-rerun-ci

…modal models (sgl-project#18154) Co-authored-by: Hao Jin <Hao Jin> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

haojin2 requested review from mickqian and yhyang201 as code owners February 3, 2026 05:46

github-actions bot added the diffusion SGLang Diffusion label Feb 3, 2026

haojin2 force-pushed the offline_bench branch 2 times, most recently from 466c89f to b81f932 Compare February 3, 2026 06:25

mickqian reviewed Feb 3, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Show resolved Hide resolved

haojin2 mentioned this pull request Feb 4, 2026

[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers) #18077

Open

4 tasks

haojin2 force-pushed the offline_bench branch 9 times, most recently from 8021c9e to 86ee88e Compare February 5, 2026 08:59

zhaochenyang20 requested changes Feb 5, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Outdated Show resolved Hide resolved

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Outdated Show resolved Hide resolved

zhaochenyang20 requested changes Feb 5, 2026

View reviewed changes

zhaochenyang20 mentioned this pull request Feb 5, 2026

[diffusion] docs: consolidate diffusion documentation into docs #18095

Merged

5 tasks

haojin2 force-pushed the offline_bench branch 2 times, most recently from 8ffe7c1 to 17e64df Compare February 6, 2026 03:43

zhaochenyang20 requested changes Feb 6, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Outdated Show resolved Hide resolved

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Outdated Show resolved Hide resolved

zhaochenyang20 reviewed Feb 6, 2026

View reviewed changes

haojin2 force-pushed the offline_bench branch from 42d2cfd to 862bad0 Compare February 20, 2026 05:10

haojin2 requested a review from ping1jing2 as a code owner February 20, 2026 05:10

haojin2 force-pushed the offline_bench branch 4 times, most recently from a797cb0 to e752743 Compare February 20, 2026 06:02

mickqian reviewed Feb 20, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Outdated Show resolved Hide resolved

haojin2 force-pushed the offline_bench branch from e752743 to 8e87e67 Compare February 22, 2026 22:10

Add offline throughput benchmark script for multi-modal models

50c4d0f

haojin2 force-pushed the offline_bench branch from 8e87e67 to 50c4d0f Compare February 26, 2026 08:35

zhaochenyang20 requested changes Mar 2, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/datasets.py Outdated Show resolved Hide resolved

zhaochenyang20 added 2 commits March 3, 2026 00:13

add Task name for benchmark identification

da59f86

zhaochenyang20 mentioned this pull request Mar 3, 2026

[SGLang-Diffusion] Fix custom op fake impl missing eps default for torch.compile #19725

Merged

Merge branch 'fix/fake-impl-eps-default-for-torch-compile' into offli…

50f682a

…ne_bench

zhaochenyang20 requested review from BBuf and yingluosanqian as code owners March 3, 2026 00:49

self fixing comments

8effef7

Merge branch 'main' into offline_bench

a4b3d41

BBuf approved these changes Mar 3, 2026

View reviewed changes

zhaochenyang20 approved these changes Mar 4, 2026

View reviewed changes

BBuf merged commit a69b943 into sgl-project:main Mar 4, 2026
48 checks passed

Conversation

haojin2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

Uh oh!

mickqian commented Feb 3, 2026

Uh oh!

haojin2 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Feb 6, 2026

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

yhyang201 commented Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BBuf commented Mar 3, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

BBuf commented Mar 3, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

BBuf commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

haojin2 commented Feb 3, 2026 •

edited

Loading

haojin2 commented Feb 5, 2026 •

edited

Loading

zhaochenyang20 commented Mar 3, 2026 •

edited

Loading