Utils refactor by kashifulhaque · Pull Request #1 · DotSlash-A/sglang

kashifulhaque · 2026-01-19T17:16:23Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

…stilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com>

)

…ct#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

…clen Fix for DP under cudagraph mode (sgl-project#16974)

…gl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

…de (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com>

…gl-project#16649)

Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>

Co-authored-by: root <root@ubuntu-nvidia.localdomain>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

…ct#17315)

…d_gate` (sgl-project#15347)

sgl-project#16961)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>

…17245)

…oject#17238)

…sgl-project#17332)

…gl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

…backend` with `flashinfer_` prefix (sgl-project#17309)

…s kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…roject#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>

Co-authored-by: yctseng0211 <yctseng@amd.com>

…ct#17345)

Copilot

Pull request overview

This pull request performs a comprehensive refactoring of the SGLang test infrastructure and several core components. The changes focus on improving code organization, test suite management, and implementing new features for LoRA overlap loading and better cache management.

Changes:

Reorganization of test suites with new performance/accuracy split and suite naming updates
Introduction of MatchPrefixParams dataclass to unify prefix matching API across cache implementations
Implementation of LoRA overlap loading feature for asynchronous adapter weight loading
Extraction of DeepSeek weight loading logic into reusable mixin classes
Improvements to multimodal processing with fast path optimization
Various utility script additions for CI management

Reviewed changes

Copilot reviewed 131 out of 132 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
test/srt/test_nvfp4_gemm.py	Refactored FP4 GEMM tests with base class pattern and multiple backend tests
test/srt/test_deepseek_v3_mtp.py	Simplified acceptance length assertions
test/srt/test_bench_serving.py	Deleted - tests migrated to registered/perf/ directory
test/srt/run_suite.py	Updated suite names and removed quantization_test suite
test/run_suite.py	Added new performance and accuracy test suites for CUDA
python/sglang/srt/mem_cache/*.py	Updated match_prefix API to use MatchPrefixParams
python/sglang/srt/managers/schedule_batch.py	Added unified SWA eviction logic in maybe_evict_swa()
python/sglang/srt/lora/	Implemented LoRA overlap loading feature
python/sglang/srt/models/deepseek_*.py	Extracted weight loading to mixin class
scripts/ci/*.py	Added new CI utility scripts for runner tracking and commit checking

Comments suppressed due to low confidence (6)

test/srt/test_nvfp4_gemm.py:14

The model name has been updated from "nvidia/Llama-3.1-8B-Instruct-FP4" to "nvidia/Llama-3.1-8B-Instruct-NVFP4". Please verify this model name exists in the HuggingFace model registry, as this appears to be a significant change that could cause test failures if the model doesn't exist under this exact name.
test/srt/test_nvfp4_gemm.py:17
The class name FP4GemmBase doesn't follow the naming convention of a test class. Consider renaming to TestFP4GemmBase to be consistent with unittest naming patterns, which expect test classes to start with "Test".
test/srt/test_nvfp4_gemm.py:62
Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:67
Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:72
Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:77
Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-19T17:23:57Z

python/sglang/srt/managers/schedule_batch.py

+                    if req.decode_batch_idx % sliding_window_size == 1:
+                        self._evict_swa(req, req.seqlen - 1)


The condition check req.decode_batch_idx % sliding_window_size == 1 may not work correctly when sliding_window_size is 1, as it would only trigger on decode_batch_idx==1. Consider if this logic should be req.decode_batch_idx % sliding_window_size == 0 or handle the case where sliding_window_size is 1 separately.

Copilot · 2026-01-19T17:23:58Z

scripts/ci/runner_utilization_report.py

+            created_at = parse_time(run.get("created_at"))
+            if created_at and created_at >= since:
+                runs.append(run)
+            elif created_at and created_at < since:


Test is always true, because of this condition.

Copilot · 2026-01-19T17:23:58Z

python/sglang/jit_kernel/cutedsl_gdn.py

+
+    is_varlen_decode = B_q == 1 and T_q == N and N > 1
+    if scale is None:
+        scale = K**-0.5


Variable scale is not used.

Suggested change

scale = K**-0.5

scale = K**-0.5

logger.debug("cutedsl_fused_sigmoid_gating_delta_rule_update using scale=%s", scale)

Copilot · 2026-01-19T17:23:59Z

python/sglang/cli/utils.py

+        _CACHED_COMMIT_HASH = commit_hash
+        return commit_hash
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        _CACHED_COMMIT_HASH = "N/A"


Variable _CACHED_COMMIT_HASH is not used.

Suggested change

_CACHED_COMMIT_HASH = commit_hash

return commit_hash

except (subprocess.CalledProcessError, FileNotFoundError):

_CACHED_COMMIT_HASH = "N/A"

return commit_hash

except (subprocess.CalledProcessError, FileNotFoundError):

Copilot · 2026-01-19T17:23:59Z

python/sglang/cli/utils.py

+        _CACHED_COMMIT_HASH = commit_hash
+        return commit_hash
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        _CACHED_COMMIT_HASH = "N/A"


Variable _CACHED_COMMIT_HASH is not used.

Suggested change

_CACHED_COMMIT_HASH = commit_hash

return commit_hash

except (subprocess.CalledProcessError, FileNotFoundError):

_CACHED_COMMIT_HASH = "N/A"

return commit_hash

except (subprocess.CalledProcessError, FileNotFoundError):

Copilot · 2026-01-19T17:24:00Z

python/sglang/jit_kernel/cutedsl_gdn.py

+            if in_warp_tid == 0:
+                x = r_a + r_dt_bias
+                beta_x = softplus_beta * x
+                softplus_x = 0.0


This assignment to 'softplus_x' is unnecessary as it is redefined before this value is used.
This assignment to 'softplus_x' is unnecessary as it is redefined before this value is used.

Suggested change

softplus_x = 0.0

Copilot · 2026-01-19T17:24:00Z

python/sglang/jit_kernel/tests/test_cutedsl_gdn.py

+    v = torch.randn(B, T, HV, V, dtype=torch.bfloat16, device="cuda")
+    indices = torch.arange(B, dtype=torch.int32, device="cuda")
+    state_cutedsl = torch.randn(B, HV, K, V, dtype=torch.float32, device="cuda")
+    state_triton = state_cutedsl.clone().reshape(-1).contiguous()


This assignment to 'state_triton' is unnecessary as it is redefined before this value is used.

Suggested change

state_triton = state_cutedsl.clone().reshape(-1).contiguous()

Copilot · 2026-01-19T17:24:00Z

python/sglang/jit_kernel/tests/test_cutedsl_gdn.py

+
+try:
+    import cuda.bindings.driver as cuda_driver
+    import cutlass  # noqa: F401


Import of 'cutlass' is not used.

Suggested change

import cutlass # noqa: F401

Copilot · 2026-01-19T17:24:01Z

test/registered/lora/test_lora_overlap_loading.py

+if __name__ == "__main__":
+    try:
+        mp.set_start_method("spawn")
+    except RuntimeError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-01-19T17:24:01Z

python/sglang/srt/multimodal/processors/base_processor.py

+            assert len(prompt) and isinstance(prompt[0], int)
+            prompt = self._processor.tokenizer.decode(prompt)
+        else:
+            prompt = prompt


This assignment assigns a variable to itself.

harvenstar and others added 30 commits January 17, 2026 17:32

fix(ci): recover from corrupted MMMU parquet cache (sgl-project#17256)

90399cb

[diffusion] feat: support default 4-step inference for Flux2-Klein di…

e486a4d

…stilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com>

Add runner utilization report workflow (sgl-project#17234)

7edb061

cli: support sglang version (sgl-project#17250)

09491a9

Use swa radix cache and memory pool for gpt-oss model (sgl-project#17261

e499258

)

[VLM][Reland] Refactor load_mm_data to improve performance (sgl-proje…

6d29d8a

…ct#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

[Tiny] Improve docs (sgl-project#17264)

088758c

[diffusion] fix: set guidance_scale default to None (sgl-project#17182)

8fd3399

Tiny fix comment typo (sgl-project#17287)

f78201f

[SPEC_V2] Enable cudagraph draft_extend for trtllm_mla_backend and Ac…

a45e0e5

…clen Fix for DP under cudagraph mode (sgl-project#16974)

Add kl test for swa radix cache (sgl-project#17281)

1fe0c82

fix: Handle multiple named chat templates in HuggingFace tokenizers (s…

2069050

…gl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Move radix cache related tests (sgl-project#17295)

f3a7c7d

[Refactor] Add -fp4-gemm-backend to replace `SGLANG_FLASHINFER_FP4_…

4df74eb

…GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

[Bugfix] Fix PD accuracy when MTP is not configured on the prefill no…

bb6055b

…de (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com>

[Diffusion] Apply jit qk_norm to flux1 (sgl-project#17296)

330605c

[Refactor] Split out deepseek v2 weight loader function into mixin (s…

9343372

…gl-project#16649)

[NPU]Support GPT-OSS for NPU (sgl-project#14197)

733de6b

[jit-kernel] Add CuTe DSL GDN Decode Kernel (sgl-project#15631)

e00b434

Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>

[GLM 4.7] Add RTX 6000 Pro aka sm120 (sgl-project#17235)

d3eafc7

Co-authored-by: root <root@ubuntu-nvidia.localdomain>

Update CODEOWNERS for multimodal_gen (sgl-project#17308)

51f147a

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

[Feature] overlap LoRA weight loading with compute (sgl-project#15512)

ad1b4e4

[PD] Optimize MHA models pp util calculation logic (sgl-project#17306)

0227db8

[Minor] Correct sglang version when installing from source (sgl-proje…

ea879c7

…ct#17315)

Use dsv3 optimized routing fused_topk_deepseek instead of `moe_fuse…

84c8390

…d_gate` (sgl-project#15347)

[DeepSeek v3.2] Opt MTP decode cuda batch sizes and nsa implementation (

d2105d4

sgl-project#16961)

Update code sync scripts (sgl-project#17319)

fc4b932

[Auto Sync] Update tokenizer_manager.py (20260119) (sgl-project#17317)

e619f53

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

support new qwen3_coder_detector (sgl-project#16744)

858a4d6

Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>

Fix kernel selection in biased_grouped_topk_gpu (sgl-project#17325)

9fe56cd

alisonshao and others added 15 commits January 18, 2026 23:19

[CI] Add partition to stage-b-test-large-1-gpu (11->12) (sgl-project#…

2d72e16

…17245)

fix(ci): rate limit and permission errors in trace publishing (sgl-pr…

fb88fb6

…oject#17238)

Revert "[Perf] fuse q, k norm for Flux2Attention (sgl-project#17241)" (…

a3d9a21

…sgl-project#17332)

Migrate performance, accuracy, and quantization tests to CI registry (s…

8916b9d

…gl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

Inclusion of nvfp4 blockscale in EPLB Rebalance (sgl-project#17158)

5c02217

[Refactor] Set fp4-gemm-backend=auto on SM100 and rename `fp4-gemm-…

f374623

…backend` with `flashinfer_` prefix (sgl-project#17309)

[Diffusion] Apply qknorm to flux2 and apply lightx2v rms_norm_one_pas…

cc410a1

…s kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Fix v32 continue_final_message not work (sgl-project#16567)

ebca587

Evict swa kv cache during decoding (sgl-project#17220)

ce8a6ac

[RadixTree][1/N Refactor]: Support unified match_prefix params (sgl-p…

20b0523

…roject#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>

[AMD CI] Migrate and Add More Testcases (sgl-project#17116)

2ea02f0

Co-authored-by: yctseng0211 <yctseng@amd.com>

[AMD] CI - add partitions for stage-b-test-small-1-gpu-amd (sgl-proje…

1a053a8

…ct#17345)

Merge branch 'main' into utils-refactor

0458136

Restore deepseek_v2.py to main's code, except the utils

69af665

Ran pre-commit

816712d

Copilot AI review requested due to automatic review settings January 19, 2026 17:16

github-actions bot added documentation Improvements or additions to documentation dependencies Multi-modal diffusion lora quant amd npu blackwell deepseek hicache labels Jan 19, 2026

Copilot started reviewing on behalf of kashifulhaque January 19, 2026 17:17 View session

DotSlash-A merged commit 6c763c1 into DotSlash-A:utils-refactor Jan 19, 2026
5 checks passed

Copilot AI reviewed Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utils refactor#1

Utils refactor#1
DotSlash-A merged 47 commits intoDotSlash-A:utils-refactorfrom
kashifulhaque:utils-refactor

kashifulhaque commented Jan 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

		if req.decode_batch_idx % sliding_window_size == 1:
		self._evict_swa(req, req.seqlen - 1)

	scale = K**-0.5
	scale = K**-0.5
	logger.debug("cutedsl_fused_sigmoid_gating_delta_rule_update using scale=%s", scale)

Conversation

kashifulhaque commented Jan 19, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants