Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) by Oseltamivir · Pull Request #1636 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-01T21:27:00Z

Points both B300 DSv4 TRT configs at the official NVIDIA release image and adds the MTP sibling to the sweep:

dsv4-fp4-b300-trt (non-MTP): feat-deepseek_v4-2dd03e6 → nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1
dsv4-fp4-b300-trt-mtp (MTP): feat-deepseek_v4-9aa3715 → nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1

This drops the custom ghcr.io semianalysis feat/deepseek_v4 builds in favor of the official RC, to evaluate whether the official image can serve DeepSeek-V4-Pro (non-MTP and MTP). The non-MTP launcher's TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 workaround (specific to the custom build) is removed so the official image runs with its native behavior, matching the MTP launcher which never had it.

Known risk

A prior run of 1.3.0rc15.post1 with attention-DP (dpa=true) served a couple of iterations and then crashed with CUDA_ERROR_ILLEGAL_ADDRESS in kv_cache_manager.free_resources (run 26786937394) — a different failure from the custom build's SWA-scratch-revert crash. So dpa=true jobs may still fail on the official image; the pure-TP (dpa=false) cases are more likely to pass. MTP on the official RC is untested. This sweep is what tells us where it stands.

Scope

B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715); its OOM follow-up is tracked separately.

🤖 Generated with Claude Code

Note

Medium Risk
Benchmark-only image swap for a large model on TRT-LLM; official RC may still hit known CUDA failures with dp-attn, affecting sweep stability rather than production services.

Overview
Switches B300 DeepSeek-V4-Pro FP4 TensorRT-LLM benchmark configs dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp from custom ghcr.io/semianalysisai/trtllm-deepseek-v4 feature builds to the official NVIDIA image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1, so sweeps evaluate the official RC for both standard and MTP runs.

Documents the change in perf-changelog.yaml (including dropping the custom-build-only TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround for the non-MTP path). B200 TRT configs are not changed in this diff.

^{Reviewed by Cursor Bugbot for commit ad529fb. Bugbot is set up for automated code reviews on this repo. Configure here.}

…03e6 Bumps the TensorRT-LLM DeepSeek-V4-Pro image for dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6. The -mtp variants are intentionally left on feat-deepseek_v4-9aa3715. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T21:27:08Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-01T21:27:08Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T21:27:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26783090679
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26783090679

claude · 2026-06-01T21:30:43Z

+- config-keys:
+    - dsv4-fp4-b200-trt
+    - dsv4-fp4-b300-trt
+  description:
+    - "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🟡 The new perf-changelog.yaml entry leaves the pr-link as the unfilled template placeholder https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. It should be pull/1636 to match the actual PR number and the convention of every preceding entry, otherwise the pr-link is broken for both humans and any tooling that consumes the changelog.

Extended reasoning...

What the bug is. The diff appends a new entry to perf-changelog.yaml (lines 3358-3363) for the DSv4 TRT image bump on dsv4-fp4-b200-trt and dsv4-fp4-b300-trt. The last field of that entry is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. XXXX is the literal template placeholder — it was never filled in with the actual PR number.

Why this is a real issue. Every other recent entry in the same file follows the convention of using the real PR number — the five entries immediately above this one link to pull/1602, pull/1624, pull/1616, pull/1626, and pull/1630 respectively. The PR metadata for this change shows it is PR #1636, so the value should be https://github.com/SemiAnalysisAI/InferenceX/pull/1636. With XXXX left in place, the link does not resolve to any PR, breaking the traceability that the pr-link field exists to provide.

Impact. This does not affect the actual image bump or any sweep behavior — the runtime is unchanged. The damage is to the changelog's documentation/audit value: anyone trying to find the originating PR for these two config-key changes from the changelog hits a dead 404, and any tooling that parses pr-link (e.g., to cross-link sweep results back to PRs, generate release notes, or validate entries) will either fail or produce a broken link.

Why existing checks didn't prevent it. There appears to be no schema validation that rejects XXXX as a PR number; the YAML is valid syntactically and the link is a syntactically valid URL — it just points nowhere meaningful. The placeholder is the kind of thing only a reviewer or a numeric-PR-id linter would catch.

Fix. Replace the placeholder with the real PR number:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636

Step-by-step proof.

Open perf-changelog.yaml at line 3363.

Observe the literal line: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

Compare to the five entries immediately above (lines ending around 3332, 3338, 3344, 3350, 3356), which read pull/1602, pull/1624, pull/1616, pull/1626, pull/1630 — all real PR numbers.

Check the PR metadata in this review: PR number is 1636.

Click (or curl) https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX — it does not resolve to a PR. Click https://github.com/SemiAnalysisAI/InferenceX/pull/1636 — it resolves to this PR. The placeholder thus makes the field useless for its stated purpose.

functionstackx · 2026-06-01T21:47:50Z


 dsv4-fp4-b200-trt:
-  image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715
+  image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6


is there any official nvidia RC that works...

Image is from dsv4 branch: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/deepseek_v4

Main dsv4 failing DPA: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26786937394

github-actions · 2026-06-01T22:33:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26783097365
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26783097365

github-actions · 2026-06-01T22:33:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786056973
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786056973

github-actions · 2026-06-01T22:37:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786107993
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786107993

… (non-MTP) Swap dsv4-fp4-b200-trt and dsv4-fp4-b300-trt from the custom ghcr.io semianalysis feat/deepseek_v4 build to the official nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 to test whether the official RC can serve DeepSeek-V4-Pro. The -mtp variants are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T01:18:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786937394
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786937394

…on-MTP) The official nvcr.io tensorrt-llm/release:1.3.0rc15.post1 loads DSv4-Pro but its DP-attention path deadlocks/crashes under concurrent load (every dpa=true job hung or failed; only pure-TP conc-1 points passed). Revert to the stable custom build until upstream fixes DSv4 + attention-DP (NVIDIA/TensorRT-LLM#13431). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4bc5592. Configure here.}

github-actions · 2026-06-02T06:57:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26803566770
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26803566770

github-actions · 2026-06-02T08:01:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26803566770
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26803566770

Bump dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d (TRT-LLM feat/deepseek_v4 @ 084cf2ba + kv_cache_manager_v2 fix). This resolves the engine crash on attention-DP context/generation reverts at high concurrency (the b300 8k1k conc>=512 "LLM is shutting down" hang). The -mtp variants stay on feat-deepseek_v4-9aa3715. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T09:40:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26803566770
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26803566770

github-actions · 2026-06-02T09:41:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26811531104
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26811531104

github-actions · 2026-06-02T12:54:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26811681728
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26811681728

…-2dd03e6

…reuse The c914d6d image's kv_cache_manager_v2 patch was wrong: freeing SWA scratch slots on the attention-DP revert->resize(shrink) path hits finish_event=None (a deferred request never forwarded), crashing every dpa=true job and hanging the engine. Root cause is a V2-scheduler / SWA-scratch-reuse conflict: the V2 scheduler grows a context request's KV cache (incl. SWA scratch) before delay batching can defer it, so revert_allocate_context -> resize(shrink) must release scratch slots that have no finish_event. Revert both non-MTP images to feat-deepseek_v4-2dd03e6 and set TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in the launchers so no scratch slots are allocated and the revert shrinks cleanly. MTP configs untouched (9aa3715). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T19:59:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-02T22:17:01Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-03T00:09:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-03T19:13:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

…-2dd03e6

B200 reverts to feat-deepseek_v4-9aa3715: the 2dd03e6 image OOMs on B200's smaller HBM at conc-256 once SWA scratch reuse is disabled. Only B300 moves to 2dd03e6 + TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in its launcher. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T21:14:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26912996470
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26912996470

…TP + MTP) Point dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp at the official nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 (from the custom feat/deepseek_v4 builds 2dd03e6 / 9aa3715) and drop the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround so the official image runs with native behavior. B200 TRT unchanged (9aa3715). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T23:41:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26914210927
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26914210927

github-actions · 2026-06-04T06:07:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26914210927
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26914210927

Oseltamivir requested a review from a team June 1, 2026 21:27

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners June 1, 2026 21:27

github-project-automation Bot added this to InferenceMAX Board Jun 1, 2026

Oseltamivir added the full-sweep-enabled label Jun 1, 2026

Backfill PR number in changelog pr-link

6b7558c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude Bot reviewed Jun 1, 2026

View reviewed changes

functionstackx reviewed Jun 1, 2026

View reviewed changes

functionstackx requested changes Jun 1, 2026

View reviewed changes

Oseltamivir added sweep-enabled and removed full-sweep-enabled labels Jun 1, 2026

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

bd3c94c

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6~~ Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP) Jun 1, 2026

Oseltamivir changed the title ~~Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP)~~ Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6 Jun 2, 2026

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

1b0afeb

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6~~ Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d) Jun 2, 2026

Oseltamivir and others added 2 commits June 2, 2026 12:32

Merge remote-tracking branch 'origin/main' into update-dsv4-trt-image…

242ab88

…-2dd03e6

Oseltamivir force-pushed the update-dsv4-trt-image-2dd03e6 branch from 1f70cac to e23a541 Compare June 2, 2026 19:34

Oseltamivir and others added 2 commits June 3, 2026 14:03

Merge remote-tracking branch 'origin/main' into update-dsv4-trt-image…

6118a76

…-2dd03e6

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d)~~ Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse Jun 3, 2026

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels Jun 3, 2026

Oseltamivir changed the title ~~Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse~~ Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) Jun 3, 2026

Conversation

Oseltamivir commented Jun 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Known risk

Scope

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

claude Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Oseltamivir Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented Jun 1, 2026 •

edited by cursor Bot

Loading