Conversation
…stilled models (sgl-project#17225) Signed-off-by: Lancer <maruixiang6688@gmail.com>
…ct#16152) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
…clen Fix for DP under cudagraph mode (sgl-project#16974)
…gl-project#17236) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
…GEMM_BACKEND` (sgl-project#16534) Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
…de (sgl-project#17212) Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
Co-authored-by: root <root@ubuntu-nvidia.localdomain>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: liugaoji.lgj <liugaoji.lgj@alibaba-inc.com>
…gl-project#17177) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
…backend` with `flashinfer_` prefix (sgl-project#17309)
…s kernel(without residual) (sgl-project#17305) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…roject#17142) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
There was a problem hiding this comment.
Pull request overview
This pull request performs a comprehensive refactoring of the SGLang test infrastructure and several core components. The changes focus on improving code organization, test suite management, and implementing new features for LoRA overlap loading and better cache management.
Changes:
- Reorganization of test suites with new performance/accuracy split and suite naming updates
- Introduction of MatchPrefixParams dataclass to unify prefix matching API across cache implementations
- Implementation of LoRA overlap loading feature for asynchronous adapter weight loading
- Extraction of DeepSeek weight loading logic into reusable mixin classes
- Improvements to multimodal processing with fast path optimization
- Various utility script additions for CI management
Reviewed changes
Copilot reviewed 131 out of 132 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| test/srt/test_nvfp4_gemm.py | Refactored FP4 GEMM tests with base class pattern and multiple backend tests |
| test/srt/test_deepseek_v3_mtp.py | Simplified acceptance length assertions |
| test/srt/test_bench_serving.py | Deleted - tests migrated to registered/perf/ directory |
| test/srt/run_suite.py | Updated suite names and removed quantization_test suite |
| test/run_suite.py | Added new performance and accuracy test suites for CUDA |
| python/sglang/srt/mem_cache/*.py | Updated match_prefix API to use MatchPrefixParams |
| python/sglang/srt/managers/schedule_batch.py | Added unified SWA eviction logic in maybe_evict_swa() |
| python/sglang/srt/lora/ | Implemented LoRA overlap loading feature |
| python/sglang/srt/models/deepseek_*.py | Extracted weight loading to mixin class |
| scripts/ci/*.py | Added new CI utility scripts for runner tracking and commit checking |
Comments suppressed due to low confidence (6)
test/srt/test_nvfp4_gemm.py:14
- The model name has been updated from "nvidia/Llama-3.1-8B-Instruct-FP4" to "nvidia/Llama-3.1-8B-Instruct-NVFP4". Please verify this model name exists in the HuggingFace model registry, as this appears to be a significant change that could cause test failures if the model doesn't exist under this exact name.
test/srt/test_nvfp4_gemm.py:17 - The class name
FP4GemmBasedoesn't follow the naming convention of a test class. Consider renaming toTestFP4GemmBaseto be consistent with unittest naming patterns, which expect test classes to start with "Test".
test/srt/test_nvfp4_gemm.py:62 - Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:67 - Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:72 - Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
test/srt/test_nvfp4_gemm.py:77 - Base classes have conflicting values for attribute 'setUpClass': classmethod() and classmethod().
Base classes have conflicting values for attribute 'tearDownClass': classmethod() and classmethod().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if req.decode_batch_idx % sliding_window_size == 1: | ||
| self._evict_swa(req, req.seqlen - 1) |
There was a problem hiding this comment.
The condition check req.decode_batch_idx % sliding_window_size == 1 may not work correctly when sliding_window_size is 1, as it would only trigger on decode_batch_idx==1. Consider if this logic should be req.decode_batch_idx % sliding_window_size == 0 or handle the case where sliding_window_size is 1 separately.
| created_at = parse_time(run.get("created_at")) | ||
| if created_at and created_at >= since: | ||
| runs.append(run) | ||
| elif created_at and created_at < since: |
There was a problem hiding this comment.
Test is always true, because of this condition.
|
|
||
| is_varlen_decode = B_q == 1 and T_q == N and N > 1 | ||
| if scale is None: | ||
| scale = K**-0.5 |
There was a problem hiding this comment.
Variable scale is not used.
| scale = K**-0.5 | |
| scale = K**-0.5 | |
| logger.debug("cutedsl_fused_sigmoid_gating_delta_rule_update using scale=%s", scale) |
| _CACHED_COMMIT_HASH = commit_hash | ||
| return commit_hash | ||
| except (subprocess.CalledProcessError, FileNotFoundError): | ||
| _CACHED_COMMIT_HASH = "N/A" |
There was a problem hiding this comment.
Variable _CACHED_COMMIT_HASH is not used.
| _CACHED_COMMIT_HASH = commit_hash | |
| return commit_hash | |
| except (subprocess.CalledProcessError, FileNotFoundError): | |
| _CACHED_COMMIT_HASH = "N/A" | |
| return commit_hash | |
| except (subprocess.CalledProcessError, FileNotFoundError): |
| _CACHED_COMMIT_HASH = commit_hash | ||
| return commit_hash | ||
| except (subprocess.CalledProcessError, FileNotFoundError): | ||
| _CACHED_COMMIT_HASH = "N/A" |
There was a problem hiding this comment.
Variable _CACHED_COMMIT_HASH is not used.
| _CACHED_COMMIT_HASH = commit_hash | |
| return commit_hash | |
| except (subprocess.CalledProcessError, FileNotFoundError): | |
| _CACHED_COMMIT_HASH = "N/A" | |
| return commit_hash | |
| except (subprocess.CalledProcessError, FileNotFoundError): |
| if in_warp_tid == 0: | ||
| x = r_a + r_dt_bias | ||
| beta_x = softplus_beta * x | ||
| softplus_x = 0.0 |
There was a problem hiding this comment.
| v = torch.randn(B, T, HV, V, dtype=torch.bfloat16, device="cuda") | ||
| indices = torch.arange(B, dtype=torch.int32, device="cuda") | ||
| state_cutedsl = torch.randn(B, HV, K, V, dtype=torch.float32, device="cuda") | ||
| state_triton = state_cutedsl.clone().reshape(-1).contiguous() |
|
|
||
| try: | ||
| import cuda.bindings.driver as cuda_driver | ||
| import cutlass # noqa: F401 |
There was a problem hiding this comment.
Import of 'cutlass' is not used.
| import cutlass # noqa: F401 |
| if __name__ == "__main__": | ||
| try: | ||
| mp.set_start_method("spawn") | ||
| except RuntimeError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| assert len(prompt) and isinstance(prompt[0], int) | ||
| prompt = self._processor.tokenizer.decode(prompt) | ||
| else: | ||
| prompt = prompt |
There was a problem hiding this comment.
This assignment assigns a variable to itself.
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci