Skip to content

Utils refactor#1

Merged
DotSlash-A merged 47 commits intoDotSlash-A:utils-refactorfrom
kashifulhaque:utils-refactor
Jan 19, 2026
Merged

Utils refactor#1
DotSlash-A merged 47 commits intoDotSlash-A:utils-refactorfrom
kashifulhaque:utils-refactor

Conversation

@kashifulhaque
Copy link

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

harvenstar and others added 30 commits January 17, 2026 17:32
…stilled models (sgl-project#17225)

Signed-off-by: Lancer <maruixiang6688@gmail.com>
…ct#16152)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
…GEMM_BACKEND` (sgl-project#16534)

Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>
alisonshao and others added 15 commits January 18, 2026 23:19
…s kernel(without residual) (sgl-project#17305)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…roject#17142)

Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Copilot AI review requested due to automatic review settings January 19, 2026 17:16
@DotSlash-A DotSlash-A merged commit 6c763c1 into DotSlash-A:utils-refactor Jan 19, 2026
5 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request performs a comprehensive refactoring of the SGLang test infrastructure and several core components. The changes focus on improving code organization, test suite management, and implementing new features for LoRA overlap loading and better cache management.

Changes:

  • Reorganization of test suites with new performance/accuracy split and suite naming updates
  • Introduction of MatchPrefixParams dataclass to unify prefix matching API across cache implementations
  • Implementation of LoRA overlap loading feature for asynchronous adapter weight loading
  • Extraction of DeepSeek weight loading logic into reusable mixin classes
  • Improvements to multimodal processing with fast path optimization
  • Various utility script additions for CI management

Reviewed changes

Copilot reviewed 131 out of 132 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
test/srt/test_nvfp4_gemm.py Refactored FP4 GEMM tests with base class pattern and multiple backend tests
test/srt/test_deepseek_v3_mtp.py Simplified acceptance length assertions
test/srt/test_bench_serving.py Deleted - tests migrated to registered/perf/ directory
test/srt/run_suite.py Updated suite names and removed quantization_test suite
test/run_suite.py Added new performance and accuracy test suites for CUDA
python/sglang/srt/mem_cache/*.py Updated match_prefix API to use MatchPrefixParams
python/sglang/srt/managers/schedule_batch.py Added unified SWA eviction logic in maybe_evict_swa()
python/sglang/srt/lora/ Implemented LoRA overlap loading feature
python/sglang/srt/models/deepseek_*.py Extracted weight loading to mixin class
scripts/ci/*.py Added new CI utility scripts for runner tracking and commit checking
Comments suppressed due to low confidence (6)

test/srt/test_nvfp4_gemm.py:14

  • The model name has been updated from "nvidia/Llama-3.1-8B-Instruct-FP4" to "nvidia/Llama-3.1-8B-Instruct-NVFP4". Please verify this model name exists in the HuggingFace model registry, as this appears to be a significant change that could cause test failures if the model doesn't exist under this exact name.
    test/srt/test_nvfp4_gemm.py:17
  • The class name FP4GemmBase doesn't follow the naming convention of a test class. Consider renaming to TestFP4GemmBase to be consistent with unittest naming patterns, which expect test classes to start with "Test".
    test/srt/test_nvfp4_gemm.py:62
  • Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
    Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
    test/srt/test_nvfp4_gemm.py:67
  • Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
    Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
    test/srt/test_nvfp4_gemm.py:72
  • Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
    Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
    test/srt/test_nvfp4_gemm.py:77
  • Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
    Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2278 to +2279
if req.decode_batch_idx % sliding_window_size == 1:
self._evict_swa(req, req.seqlen - 1)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition check req.decode_batch_idx % sliding_window_size == 1 may not work correctly when sliding_window_size is 1, as it would only trigger on decode_batch_idx==1. Consider if this logic should be req.decode_batch_idx % sliding_window_size == 0 or handle the case where sliding_window_size is 1 separately.

Copilot uses AI. Check for mistakes.
created_at = parse_time(run.get("created_at"))
if created_at and created_at >= since:
runs.append(run)
elif created_at and created_at < since:
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is always true, because of this condition.

Copilot uses AI. Check for mistakes.

is_varlen_decode = B_q == 1 and T_q == N and N > 1
if scale is None:
scale = K**-0.5
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable scale is not used.

Suggested change
scale = K**-0.5
scale = K**-0.5
logger.debug("cutedsl_fused_sigmoid_gating_delta_rule_update using scale=%s", scale)

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +172
_CACHED_COMMIT_HASH = commit_hash
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):
_CACHED_COMMIT_HASH = "N/A"
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable _CACHED_COMMIT_HASH is not used.

Suggested change
_CACHED_COMMIT_HASH = commit_hash
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):
_CACHED_COMMIT_HASH = "N/A"
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +172
_CACHED_COMMIT_HASH = commit_hash
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):
_CACHED_COMMIT_HASH = "N/A"
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable _CACHED_COMMIT_HASH is not used.

Suggested change
_CACHED_COMMIT_HASH = commit_hash
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):
_CACHED_COMMIT_HASH = "N/A"
return commit_hash
except (subprocess.CalledProcessError, FileNotFoundError):

Copilot uses AI. Check for mistakes.
if in_warp_tid == 0:
x = r_a + r_dt_bias
beta_x = softplus_beta * x
softplus_x = 0.0
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'softplus_x' is unnecessary as it is redefined before this value is used.
This assignment to 'softplus_x' is unnecessary as it is redefined before this value is used.

Suggested change
softplus_x = 0.0

Copilot uses AI. Check for mistakes.
v = torch.randn(B, T, HV, V, dtype=torch.bfloat16, device="cuda")
indices = torch.arange(B, dtype=torch.int32, device="cuda")
state_cutedsl = torch.randn(B, HV, K, V, dtype=torch.float32, device="cuda")
state_triton = state_cutedsl.clone().reshape(-1).contiguous()
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'state_triton' is unnecessary as it is redefined before this value is used.

Suggested change
state_triton = state_cutedsl.clone().reshape(-1).contiguous()

Copilot uses AI. Check for mistakes.

try:
import cuda.bindings.driver as cuda_driver
import cutlass # noqa: F401
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'cutlass' is not used.

Suggested change
import cutlass # noqa: F401

Copilot uses AI. Check for mistakes.
if __name__ == "__main__":
try:
mp.set_start_method("spawn")
except RuntimeError:
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
assert len(prompt) and isinstance(prompt[0], int)
prompt = self._processor.tokenizer.decode(prompt)
else:
prompt = prompt
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment assigns a variable to itself.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.