Skip to content

Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP)#1636

Open
Oseltamivir wants to merge 12 commits into
mainfrom
update-dsv4-trt-image-2dd03e6
Open

Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP)#1636
Oseltamivir wants to merge 12 commits into
mainfrom
update-dsv4-trt-image-2dd03e6

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir commented Jun 1, 2026

Points both B300 DSv4 TRT configs at the official NVIDIA release image and adds the MTP sibling to the sweep:

  • dsv4-fp4-b300-trt (non-MTP): feat-deepseek_v4-2dd03e6nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1
  • dsv4-fp4-b300-trt-mtp (MTP): feat-deepseek_v4-9aa3715nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1

This drops the custom ghcr.io semianalysis feat/deepseek_v4 builds in favor of the official RC, to evaluate whether the official image can serve DeepSeek-V4-Pro (non-MTP and MTP). The non-MTP launcher's TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 workaround (specific to the custom build) is removed so the official image runs with its native behavior, matching the MTP launcher which never had it.

Known risk

A prior run of 1.3.0rc15.post1 with attention-DP (dpa=true) served a couple of iterations and then crashed with CUDA_ERROR_ILLEGAL_ADDRESS in kv_cache_manager.free_resources (run 26786937394) — a different failure from the custom build's SWA-scratch-revert crash. So dpa=true jobs may still fail on the official image; the pure-TP (dpa=false) cases are more likely to pass. MTP on the official RC is untested. This sweep is what tells us where it stands.

Scope

B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715); its OOM follow-up is tracked separately.

🤖 Generated with Claude Code


Note

Medium Risk
Benchmark-only image swap for a large model on TRT-LLM; official RC may still hit known CUDA failures with dp-attn, affecting sweep stability rather than production services.

Overview
Switches B300 DeepSeek-V4-Pro FP4 TensorRT-LLM benchmark configs dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp from custom ghcr.io/semianalysisai/trtllm-deepseek-v4 feature builds to the official NVIDIA image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1, so sweeps evaluate the official RC for both standard and MTP runs.

Documents the change in perf-changelog.yaml (including dropping the custom-build-only TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround for the non-MTP path). B200 TRT configs are not changed in this diff.

Reviewed by Cursor Bugbot for commit ad529fb. Bugbot is set up for automated code reviews on this repo. Configure here.

…03e6

Bumps the TensorRT-LLM DeepSeek-V4-Pro image for dsv4-fp4-b200-trt and
dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6.
The -mtp variants are intentionally left on feat-deepseek_v4-9aa3715.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Comment thread perf-changelog.yaml Outdated
Comment on lines +3358 to +3363
- config-keys:
- dsv4-fp4-b200-trt
- dsv4-fp4-b300-trt
description:
- "Update the TensorRT-LLM DeepSeek-V4-Pro image to ghcr.io/semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry leaves the pr-link as the unfilled template placeholder https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. It should be pull/1636 to match the actual PR number and the convention of every preceding entry, otherwise the pr-link is broken for both humans and any tooling that consumes the changelog.

Extended reasoning...

What the bug is. The diff appends a new entry to perf-changelog.yaml (lines 3358-3363) for the DSv4 TRT image bump on dsv4-fp4-b200-trt and dsv4-fp4-b300-trt. The last field of that entry is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. XXXX is the literal template placeholder — it was never filled in with the actual PR number.

Why this is a real issue. Every other recent entry in the same file follows the convention of using the real PR number — the five entries immediately above this one link to pull/1602, pull/1624, pull/1616, pull/1626, and pull/1630 respectively. The PR metadata for this change shows it is PR #1636, so the value should be https://github.com/SemiAnalysisAI/InferenceX/pull/1636. With XXXX left in place, the link does not resolve to any PR, breaking the traceability that the pr-link field exists to provide.

Impact. This does not affect the actual image bump or any sweep behavior — the runtime is unchanged. The damage is to the changelog's documentation/audit value: anyone trying to find the originating PR for these two config-key changes from the changelog hits a dead 404, and any tooling that parses pr-link (e.g., to cross-link sweep results back to PRs, generate release notes, or validate entries) will either fail or produce a broken link.

Why existing checks didn't prevent it. There appears to be no schema validation that rejects XXXX as a PR number; the YAML is valid syntactically and the link is a syntactically valid URL — it just points nowhere meaningful. The placeholder is the kind of thing only a reviewer or a numeric-PR-id linter would catch.

Fix. Replace the placeholder with the real PR number:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636

Step-by-step proof.

  1. Open perf-changelog.yaml at line 3363.
  2. Observe the literal line: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  3. Compare to the five entries immediately above (lines ending around 3332, 3338, 3344, 3350, 3356), which read pull/1602, pull/1624, pull/1616, pull/1626, pull/1630 — all real PR numbers.
  4. Check the PR metadata in this review: PR number is 1636.
  5. Click (or curl) https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX — it does not resolve to a PR. Click https://github.com/SemiAnalysisAI/InferenceX/pull/1636 — it resolves to this PR. The placeholder thus makes the field useless for its stated purpose.

Comment thread .github/configs/nvidia-master.yaml Outdated

dsv4-fp4-b200-trt:
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715
image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any official nvidia RC that works...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

… (non-MTP)

Swap dsv4-fp4-b200-trt and dsv4-fp4-b300-trt from the custom
ghcr.io semianalysis feat/deepseek_v4 build to the official
nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 to test whether the
official RC can serve DeepSeek-V4-Pro. The -mtp variants are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir changed the title Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6 Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP) Jun 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

…on-MTP)

The official nvcr.io tensorrt-llm/release:1.3.0rc15.post1 loads DSv4-Pro but
its DP-attention path deadlocks/crashes under concurrent load (every dpa=true
job hung or failed; only pure-TP conc-1 points passed). Revert to the stable
custom build until upstream fixes DSv4 + attention-DP (NVIDIA/TensorRT-LLM#13431).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir changed the title Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP) Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6 Jun 2, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4bc5592. Configure here.

Comment thread .github/configs/nvidia-master.yaml Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Bump dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to
ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d
(TRT-LLM feat/deepseek_v4 @ 084cf2ba + kv_cache_manager_v2 fix). This resolves
the engine crash on attention-DP context/generation reverts at high concurrency
(the b300 8k1k conc>=512 "LLM is shutting down" hang). The -mtp variants stay on
feat-deepseek_v4-9aa3715.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir changed the title Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6 Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d) Jun 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Oseltamivir and others added 2 commits June 2, 2026 12:32
…reuse

The c914d6d image's kv_cache_manager_v2 patch was wrong: freeing SWA scratch
slots on the attention-DP revert->resize(shrink) path hits finish_event=None
(a deferred request never forwarded), crashing every dpa=true job and hanging
the engine. Root cause is a V2-scheduler / SWA-scratch-reuse conflict: the V2
scheduler grows a context request's KV cache (incl. SWA scratch) before delay
batching can defer it, so revert_allocate_context -> resize(shrink) must release
scratch slots that have no finish_event.

Revert both non-MTP images to feat-deepseek_v4-2dd03e6 and set
TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in the launchers so no scratch slots are
allocated and the revert shrinks cleanly. MTP configs untouched (9aa3715).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir force-pushed the update-dsv4-trt-image-2dd03e6 branch from 1f70cac to e23a541 Compare June 2, 2026 19:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

3 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Oseltamivir and others added 2 commits June 3, 2026 14:03
B200 reverts to feat-deepseek_v4-9aa3715: the 2dd03e6 image OOMs on B200's
smaller HBM at conc-256 once SWA scratch reuse is disabled. Only B300 moves
to 2dd03e6 + TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in its launcher.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir changed the title Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d) Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

…TP + MTP)

Point dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp at the official
nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 (from the custom
feat/deepseek_v4 builds 2dd03e6 / 9aa3715) and drop the
TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround so the official
image runs with native behavior. B200 TRT unchanged (9aa3715).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Oseltamivir Oseltamivir changed the title Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants