[WIP] Update Dsv4 B300 configs#1656
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - config-keys: | ||
| - dsv4-fp4-b300-vllm | ||
| description: | ||
| - "Update B300 dsv4 image to nvfp4" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652 |
There was a problem hiding this comment.
🔴 Two issues in the new perf-changelog entry for dsv4-fp4-b300-vllm: (1) The pr-link references #1652, but that's the MiniMax-M2.5 PR (the entry immediately above) — this is a copy-paste, the link should point to this PR (#1656). (2) The description says "Update B300 dsv4 image to nvfp4", but the diff only changes the model field; the image (vllm/vllm-openai:v0.22.0) is unchanged, so it should say something like "Update B300 dsv4 model to NVFP4 checkpoint".
Extended reasoning...
Bug 1: Wrong PR link
The new changelog entry at line 3438 sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652. However, PR #1652 is [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs (already merged at commit 7d4063d) — a completely different PR. The current PR is #1656 ("[WIP] Update Dsv4 B300 configs"). The entry immediately above (line 3432, for minimaxm2.5-fp4-b300-dynamo-vllm) correctly references #1652, so this is almost certainly a copy-paste error from the adjacent entry.
Bug 2: Misleading description ("image" vs "model")
The description says "Update B300 dsv4 image to nvfp4". But the diff in .github/configs/nvidia-master.yaml only changes the model field:
- model: deepseek-ai/DeepSeek-V4-Pro
+ model: nvidia/DeepSeek-V4-Pro-NVFP4The image field (vllm/vllm-openai:v0.22.0) is unchanged. In perf-changelog.yaml "image" consistently refers to the Docker container image — e.g. line 3430 of the same file says "Image: vllm/vllm-openai:v0.20.1". The PR's own Cursor Bugbot summary even acknowledges this: "Image, search space, and other recipe fields are unchanged." Calling the model swap an "image" change is misleading.
Step-by-step proof
- The PR metadata shows this is PR [WIP] Update Dsv4 B300 configs #1656, titled "[WIP] Update Dsv4 B300 configs".
- Recent commit history (
7d4063d) confirms PR [B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs #1652 is a separate, already-merged PR titled "[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs". - In
perf-changelog.yaml, the entry at line 3432 (the MiniMax-M2.5 entry) legitimately usespr-link: .../pull/1652. - The newly added entry at line 3438 (this PR's entry) uses the same
pr-link: .../pull/1652— duplicating the link from the line above instead of referencing [WIP] Update Dsv4 B300 configs #1656. - Reading the diff in
.github/configs/nvidia-master.yaml, onlymodel:changes;image:stays atvllm/vllm-openai:v0.22.0. So "Update B300 dsv4 image to nvfp4" describes a change that did not happen, and omits the change that did.
Suggested fix
- config-keys:
- dsv4-fp4-b300-vllm
description:
- "Update B300 dsv4 model to NVFP4 checkpoint (nvidia/DeepSeek-V4-Pro-NVFP4)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656Impact
Documentation-only — no runtime behavior is affected. But the changelog is the canonical record for tracing perf changes back to their PR, and the incorrect link points readers to an unrelated PR with no context for this model swap.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26901571751 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 41630f2. Configure here.
| fi | ||
| # if [ "${DP_ATTENTION}" = "true" ]; then | ||
| # MOE_ARGS=(--moe-backend deep_gemm_mega_moe) | ||
| # fi |
There was a problem hiding this comment.
DP-attn megamoe backend disabled
Medium Severity
With DP_ATTENTION=true, the script no longer passes --moe-backend deep_gemm_mega_moe, but dsv4-fp4-b300-vllm still schedules high-concurrency dp-attn/ep points. That diverges from the prior B300 pareto recipe and from dsv4_fp4_b300_vllm_mtp.sh / B200 vLLM siblings, so those runs may not match the intended serving path.
Reviewed by Cursor Bugbot for commit 41630f2. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26973148843 |


Note
Low Risk
Benchmark and runner config only; no auth, serving API, or production inference path changes.
Overview
Switches the dsv4-fp4-b300-vllm recipe to the
nvidia/DeepSeek-V4-Pro-NVFP4checkpoint instead of the upstreamdeepseek-ai/DeepSeek-V4-Proweights, and registersDeepSeek-V4-Pro-NVFP4in the B300 runner’s pre-staged model list so jobs can load from scratch without a fresh HF pull.The single-node benchmark script no longer passes
--moe-backend deep_gemm_mega_moewhen data-parallel attention is enabled—that block is commented out, so DP layouts run with default MoE backend behavior for this NVFP4 setup.Reviewed by Cursor Bugbot for commit 41630f2. Bugbot is set up for automated code reviews on this repo. Configure here.