Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
90399cb
fix(ci): recover from corrupted MMMU parquet cache (#17256)
harvenstar Jan 18, 2026
e486a4d
[diffusion] feat: support default 4-step inference for Flux2-Klein di…
RuixiangMa Jan 18, 2026
7edb061
Add runner utilization report workflow (#17234)
alisonshao Jan 18, 2026
09491a9
cli: support sglang version (#17250)
mickqian Jan 18, 2026
e499258
Use swa radix cache and memory pool for gpt-oss model (#17261)
ispobock Jan 18, 2026
6d29d8a
[VLM][Reland] Refactor load_mm_data to improve performance (#16152)
yuan-luo Jan 18, 2026
088758c
[Tiny] Improve docs (#17264)
mmangkad Jan 18, 2026
8fd3399
[diffusion] fix: set guidance_scale default to None (#17182)
ChangyiYang Jan 18, 2026
f78201f
Tiny fix comment typo (#17287)
ispobock Jan 18, 2026
a45e0e5
[SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Ac…
YAMY1234 Jan 18, 2026
1fe0c82
Add kl test for swa radix cache (#17281)
ispobock Jan 18, 2026
2069050
fix: Handle multiple named chat templates in HuggingFace tokenizers (…
JustinTong0323 Jan 18, 2026
f3a7c7d
Move radix cache related tests (#17295)
ispobock Jan 18, 2026
4df74eb
[Refactor] Add `-fp4-gemm-backend` to replace `SGLANG_FLASHINFER_FP4_…
b8zhong Jan 18, 2026
bb6055b
[Bugfix] Fix PD accuracy when MTP is not configured on the prefill no…
Ch3ngY1 Jan 18, 2026
330605c
[Diffusion] Apply jit qk_norm to flux1 (#17296)
BBuf Jan 18, 2026
9343372
[Refactor] Split out deepseek v2 weight loader function into mixin (#…
xyjixyjixyji Jan 18, 2026
733de6b
[NPU]Support GPT-OSS for NPU (#14197)
Todobe Jan 18, 2026
e00b434
[jit-kernel] Add CuTe DSL GDN Decode Kernel (#15631)
liz-badada Jan 18, 2026
d3eafc7
[GLM 4.7] Add RTX 6000 Pro aka sm120 (#17235)
koush Jan 18, 2026
51f147a
Update CODEOWNERS for multimodal_gen (#17308)
mickqian Jan 19, 2026
ad1b4e4
[Feature] overlap LoRA weight loading with compute (#15512)
glenliu21 Jan 19, 2026
0227db8
[PD] Optimize MHA models pp util calculation logic (#17306)
ShangmingCai Jan 19, 2026
ea879c7
[Minor] Correct sglang version when installing from source (#17315)
Fridge003 Jan 19, 2026
84c8390
Use dsv3 optimized routing `fused_topk_deepseek` instead of `moe_fuse…
leejnau Jan 19, 2026
d2105d4
[DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementatio…
xu-yfei Jan 19, 2026
fc4b932
Update code sync scripts (#17319)
merrymercy Jan 19, 2026
e619f53
[Auto Sync] Update tokenizer_manager.py (20260119) (#17317)
merrymercy Jan 19, 2026
858a4d6
support new qwen3_coder_detector (#16744)
attack204 Jan 19, 2026
9fe56cd
Fix kernel selection in biased_grouped_topk_gpu (#17325)
yudian0504 Jan 19, 2026
5836324
KV Cache Events with Attention DP bug fix (#16030) (#16412)
kartikx Jan 19, 2026
6494667
[Perf] fuse q, k norm for Flux2Attention (#17241)
zminglei Jan 19, 2026
2d72e16
[CI] Add partition to stage-b-test-large-1-gpu (11->12) (#17245)
alisonshao Jan 19, 2026
fb88fb6
fix(ci): rate limit and permission errors in trace publishing (#17238)
alisonshao Jan 19, 2026
a3d9a21
Revert "[Perf] fuse q, k norm for Flux2Attention (#17241)" (#17332)
BBuf Jan 19, 2026
8916b9d
Migrate performance, accuracy, and quantization tests to CI registry …
alisonshao Jan 19, 2026
5c02217
Inclusion of nvfp4 blockscale in EPLB Rebalance (#17158)
wenscarl Jan 19, 2026
f374623
[Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-…
b8zhong Jan 19, 2026
cc410a1
[Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pas…
BBuf Jan 19, 2026
ebca587
Fix v32 continue_final_message not work (#16567)
whybeyoung Jan 19, 2026
ce8a6ac
Evict swa kv cache during decoding (#17220)
ispobock Jan 19, 2026
20b0523
[RadixTree][1/N Refactor]: Support unified match_prefix params (#17142)
hzh0425 Jan 19, 2026
2ea02f0
[AMD CI] Migrate and Add More Testcases (#17116)
bingxche Jan 19, 2026
1a053a8
[AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (#17345)
yctseng0211 Jan 19, 2026
0458136
Merge branch 'main' into utils-refactor
kashifulhaque Jan 19, 2026
69af665
Restore deepseek_v2.py to main's code, except the utils
kashifulhaque Jan 19, 2026
816712d
Ran `pre-commit`
kashifulhaque Jan 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
/python/sglang/jit_kernel @DarkSharpness @BBuf
/python/sglang/multimodal_gen @mickqian @yhyang201
/python/sglang/multimodal_gen/runtime/layers @mickqian @yhyang201 @BBuf
/python/sglang/multimodal_gen/runtime/models/dits @mickqian @yhyang201 @BBuf
/python/sglang/srt/batch_invariant_ops @Fridge003 @hebiao064
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
/python/sglang/srt/compilation @hebiao064
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/open-pr-copy-from-oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@ jobs:

- name: Copy from OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_PRIVATE }}
run: |
python3 scripts/code_sync/copy_from_oss.py
2 changes: 1 addition & 1 deletion .github/workflows/open-pr-copy-to-oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@ jobs:
- name: Copy to OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_OSS }}
run: |
python3 scripts/code_sync/copy_to_oss.py --commit ${{ github.event.inputs.commit_sha }}
164 changes: 99 additions & 65 deletions .github/workflows/pr-test-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,10 @@ jobs:
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_topk.py
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_kvcacheio.py
docker exec -w /sglang-checkout/sgl-kernel/tests/sgl_diffusion ci_sglang python3 -m pytest test_timestep_embedding.py

docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_moe_topk_sigmoid.py
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_torch_defaults_reset.py
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_amd_deterministic_custom_allreduce.py
docker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_amd_nccl_allreduce_determinism.py
# =============================================== primary ====================================================

stage-a-test-1-amd:
Expand Down Expand Up @@ -190,7 +193,7 @@ jobs:
- name: Run test
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-a-test-1
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-a-test-1-amd

stage-b-test-small-1-gpu-amd:
needs: [check-changes, stage-a-test-1-amd]
Expand All @@ -208,7 +211,7 @@ jobs:
fail-fast: false
matrix:
runner: [linux-mi325-gpu-1]
part: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
part: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
runs-on: ${{matrix.runner}}
steps:
- name: Checkout code
Expand All @@ -230,7 +233,7 @@ jobs:
- name: Run test
timeout-minutes: 30
run: |
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-b-test-small-1-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 12 --timeout-per-file 1800
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-b-test-small-1-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 13 --timeout-per-file 1800

stage-b-test-small-1-gpu-amd-mi35x:
needs: [check-changes, stage-a-test-1-amd]
Expand Down Expand Up @@ -548,52 +551,13 @@ jobs:
echo "=== Post-test System Memory Status ==="
free -h

unit-test-backend-1-gpu-amd:
needs: [check-changes, stage-a-test-1-amd]
if: |
always() &&
(
(inputs.target_stage == 'unit-test-backend-1-gpu-amd') ||
(
!inputs.target_stage &&
(!failure() && !cancelled()) &&
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
)
)
strategy:
fail-fast: false
matrix:
runner: [linux-mi325-gpu-1]
part: [0, 1]
runs-on: ${{matrix.runner}}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.pr_head_sha || inputs.ref || github.sha }}

- name: Ensure VRAM is clear
run: bash scripts/ensure_vram_clear.sh rocm

- name: Start CI container
run: bash scripts/ci/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd_ci_install_dependency.sh

- name: Run test
timeout-minutes: 30
run: |
bash scripts/ci/amd_ci_exec.sh python3 run_suite.py --suite per-commit-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2

unit-test-backend-8-gpu-amd:
needs: [check-changes, stage-a-test-1-amd]
stage-c-test-large-8-gpu-amd:
needs: [check-changes, call-gate, stage-b-test-small-1-gpu-amd, stage-b-test-large-2-gpu-amd]
if: |
always() &&
(
(inputs.target_stage == 'unit-test-backend-8-gpu-amd') ||
(inputs.target_stage == 'stage-c-test-large-8-gpu-amd') ||
(
!inputs.target_stage &&
(!failure() && !cancelled()) &&
Expand Down Expand Up @@ -634,7 +598,7 @@ jobs:
- name: Run test
timeout-minutes: 60
run: |
bash scripts/ci/amd_ci_exec.sh python3 run_suite.py --suite per-commit-8-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 --timeout-per-file 3600
bash scripts/ci/amd_ci_exec.sh -w "/sglang-checkout/test" python3 run_suite.py --hw amd --suite stage-c-test-large-8-gpu-amd --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 --timeout-per-file 3600

stage-c-test-large-8-gpu-amd-mi35x:
needs: [check-changes, call-gate, stage-b-test-small-1-gpu-amd, stage-b-test-large-2-gpu-amd]
Expand Down Expand Up @@ -713,23 +677,29 @@ jobs:
- name: Benchmark single latency
timeout-minutes: 20
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_bs1_small
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_bs1_default
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_1gpu.TestBenchOneBatch1GPU.test_bs1_small
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_1gpu.TestBenchOneBatch1GPU.test_bs1_default

- name: Benchmark online latency
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_online_latency_default

- name: Benchmark online latency (LoRA)
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_lora_online_latency
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_lora_online_latency_with_concurrent_adapter_updates

- name: Benchmark offline throughput
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_default

- name: Benchmark offline throughput (Non-streaming, small batch size)
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_non_stream_small_batch_size
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_non_stream_small_batch_size

performance-test-1-gpu-part-2-amd:
needs: [check-changes, stage-a-test-1-amd]
Expand Down Expand Up @@ -768,17 +738,81 @@ jobs:
- name: Benchmark offline throughput (w/o RadixAttention)
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_without_radix_cache

- name: Benchmark offline throughput (w/ Triton)
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_part1.TestBenchServing1GPUPart1.test_offline_throughput_with_triton_attention_backend

- name: Benchmark offline throughput (w/ FP8)
timeout-minutes: 15
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default_fp8
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_1gpu_large.TestBenchServing1GPULarge.test_offline_throughput_default_fp8

- name: Benchmark VLM offline throughput
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_vlm_offline_throughput

- name: Benchmark VLM online latency
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_vlm_online_latency

performance-test-1-gpu-part-3-amd:
needs: [check-changes, stage-a-test-1-amd]
if: |
always() &&
(
(inputs.target_stage == 'performance-test-1-gpu-part-3-amd') ||
(
!inputs.target_stage &&
(!failure() && !cancelled()) &&
((needs.check-changes.outputs.main_package == 'true') || (needs.check-changes.outputs.sgl_kernel == 'true'))
)
)
strategy:
fail-fast: false
matrix:
runner: [linux-mi325-gpu-1]
runs-on: ${{matrix.runner}}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.pr_head_sha || inputs.ref || github.sha }}

- name: Ensure VRAM is clear
run: bash scripts/ensure_vram_clear.sh rocm

- name: Start CI container
run: bash scripts/ci/amd_ci_start_container.sh
env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Install dependencies
run: bash scripts/ci/amd_ci_install_dependency.sh

- name: Benchmark Scores online latency and throughput
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_score_api_latency_throughput

- name: Benchmark Scores online latency and throughput (batch size scaling)
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_score_api_batch_scaling

- name: Benchmark Embeddings online latency and throughput
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_embeddings_api_latency_throughput

- name: Benchmark Embeddings online latency and throughput (batch size scaling)
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_embeddings_api_batch_scaling

performance-test-2-gpu-amd:
needs: [check-changes, stage-a-test-1-amd]
Expand Down Expand Up @@ -822,32 +856,32 @@ jobs:
- name: Benchmark single latency (TP=2)
timeout-minutes: 25
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_moe_tp2_bs1
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_2gpu.TestBenchOneBatch2GPU.test_moe_tp2_bs1

- name: Benchmark single latency + torch.compile (TP=2)
timeout-minutes: 25
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_one_batch.TestBenchOneBatch.test_torch_compile_tp2_bs1
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_one_batch_2gpu.TestBenchOneBatch2GPU.test_torch_compile_tp2_bs1

- name: Benchmark offline throughput (TP=2)
timeout-minutes: 25
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_default
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_moe_offline_throughput_default

- name: Benchmark offline throughput (w/o RadixAttention) (TP=2)
timeout-minutes: 25
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_moe_offline_throughput_without_radix_cache
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_moe_offline_throughput_without_radix_cache

- name: Benchmark offline PP decode throughput (PP=2)
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_pp_offline_throughput_default_decode
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_pp_offline_throughput_default_decode

- name: Benchmark offline PP prefill throughput (PP=2)
timeout-minutes: 10
run: |
bash scripts/ci/amd_ci_exec.sh python3 -m unittest test_bench_serving.TestBenchServing.test_pp_long_context_prefill
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/perf python3 -m unittest test_bench_serving_2gpu.TestBenchServing2GPU.test_pp_long_context_prefill

accuracy-test-1-gpu-amd:
needs: [check-changes, stage-a-test-1-amd]
Expand Down Expand Up @@ -886,7 +920,7 @@ jobs:
- name: Evaluate Accuracy
timeout-minutes: 30
run: |
bash scripts/ci/amd_ci_exec.sh -e SGLANG_USE_AITER=0 python3 test_eval_accuracy_large.py
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/eval -e SGLANG_USE_AITER=0 python3 test_eval_accuracy_large.py

accuracy-test-2-gpu-amd:
needs: [check-changes, accuracy-test-1-gpu-amd]
Expand Down Expand Up @@ -926,7 +960,7 @@ jobs:
- name: Evaluate accuracy (TP=2)
timeout-minutes: 30
run: |
bash scripts/ci/amd_ci_exec.sh -e SGLANG_USE_AITER_AR=0 -e SGLANG_USE_AITER=0 -e HF_HUB_ENABLE_HF_TRANSFER=0 python3 test_moe_eval_accuracy_large.py
bash scripts/ci/amd_ci_exec.sh -w /sglang-checkout/test/registered/eval -e SGLANG_USE_AITER_AR=0 -e SGLANG_USE_AITER=0 -e HF_HUB_ENABLE_HF_TRANSFER=0 python3 test_moe_eval_accuracy_large.py

pr-test-amd-finish:
needs:
Expand All @@ -942,11 +976,11 @@ jobs:
stage-b-test-small-1-gpu-amd,
stage-b-test-small-1-gpu-amd-mi35x,
stage-b-test-large-2-gpu-amd,
unit-test-backend-1-gpu-amd,
unit-test-backend-8-gpu-amd,
stage-c-test-large-8-gpu-amd,
stage-c-test-large-8-gpu-amd-mi35x,
performance-test-1-gpu-part-1-amd,
performance-test-1-gpu-part-2-amd,
performance-test-1-gpu-part-3-amd,
performance-test-2-gpu-amd,
accuracy-test-1-gpu-amd,
accuracy-test-2-gpu-amd,
Expand Down
Loading
Loading