[TRTLLM-12128][feat] enable SageAttention for Wan/FLUX (new commits) by xrq-phys · Pull Request #13570 · NVIDIA/TensorRT-LLM

xrq-phys · 2026-04-28T17:50:09Z

Summary by CodeRabbit

New Features
- Added SageAttention optimization support for visual generation models. Users can enable it via the --enable_sage_attention flag (requires TRTLLM attention backend).
Documentation
- Updated visual generation documentation with SageAttention configuration examples and parameter descriptions.
Tests
- Added SageAttention integration and performance test suites for validation.

Description

Copied from #13425 with new commits to fix CI failures.

Integrate SageAttention kernels from #12937 into VisualGen for Wan and FLUX.2 models.
LTX-2 SageAttention is not yet supported.

Preliminary perf / quality compared to #12548 baseline from example scripts:

Model	Configs	Pipeline Time (Baseline)	Output (Baseline)	Pipeline Time (SageAttention)	Output (SageAttention)	Speedup
Wan2.1-T2V-1.3B	"A cute cat playing piano", 480x832, 33 frames, 50 steps, trtllm-fp8-blockwise linear	7.94s	video	7.54s	video	1.05x
Wan2.1-I2V-14B-480P	480x832, 81 frames, 50 steps, trtllm-nvfp4 linear, parallel VAE disabled	128.79s	video	117.38s	video	1.10x
Wan2.1-I2V-14B-720P	720x1280, 81 frames, 50 steps, trtllm-nvfp4 linear, parallel VAE disabled	501.23s	video	458.33s	video	1.09x
Wan2.1-T2V-14B-Diffusers	"A cute cat playing piano", 720x1280, 81 frames, 50 steps, trtllm-nvfp4 linear	484.63s	video	413.65s	video	1.17x
Wan2.2-I2V-A14B	720x1280, 81 frames, 40 steps, trtllm-nvfp4 linear, parallel VAE disabled	387.83s	video	340.48s	video	1.14x
Wan2.2-T2V-A14B	"A cute cat playing piano", 720x1280, 81 frames, 40 steps, trtllm-nvfp4 linear, parallel VAE disabled	389.10s	video	331.24s	video	1.17x
FLUX.2-dev	"A cat sitting on a windowsill", 1024x1024, 50 steps, trtllm-nvfp4 linear	5.60s	PNG	5.61s	PNG	1x

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

xrq-phys · 2026-04-28T17:50:33Z

/bot run

tensorrt-cicd · 2026-04-28T17:57:41Z

PR_Github #45971 [ run ] triggered by Bot. Commit: 83d8984 Link to invocation

coderabbitai · 2026-04-28T18:05:33Z

📝 Walkthrough

Walkthrough

This pull request introduces SageAttention support to the visual generation pipeline. Changes include adding a new SageAttentionConfig schema, integrating it into the attention backend selection, updating multiple visual generation scripts with CLI flags to enable SageAttention, refactoring attention module APIs to thread the configuration through layers, and adding comprehensive test coverage with performance benchmarking.

Changes

Cohort / File(s)	Summary
Configuration Schema `tensorrt_llm/_torch/visual_gen/config.py`	Adds new `SageAttentionConfig` pydantic model with per-block quantization parameters and optional `sage_attention_config` field to `AttentionConfig`. Includes validation logic requiring TRTLLM backend when SageAttention config is provided.
Attention Backend API Refactoring `tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`, `tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`	Refactors `TrtllmAttention` to accept `sage_attention_config` object instead of individual SageAttention parameters. Updates `forward` signature to require explicit `batch_size`/`seq_len` inputs and optional `seq_len_kv`. Dispatch logic conditionally uses SageAttention path when config is provided. `create_attention` helper forwards config to backend constructor.
Attention Module Threading `tensorrt_llm/_torch/visual_gen/modules/attention.py`, `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py`	`_attn_impl` now derives and injects `batch_size`, `seq_len`, and `seq_len_kv` into attention forward call. LTX2 pipeline raises `NotImplementedError` when SageAttention config is detected.
Visual Generation Scripts `examples/visual_gen/visual_gen_flux.py`, `examples/visual_gen/visual_gen_wan_t2v.py`, `examples/visual_gen/visual_gen_wan_i2v.py`, `examples/visual_gen/visual_gen_ltx2.py`	Adds `--enable_sage_attention` CLI flag to Flux and WAN scripts. Each constructs `attention_cfg` dict with backend and conditionally adds `sage_attention_config` with model-specific block quantization parameters. Includes checkpoint-detection helpers for Wan2.1 model variants.
Backend Validation `cpp/tensorrt_llm/thop/attentionOp.cpp`, `tensorrt_llm/_torch/attention_backend/trtllm.py`	Adds runtime checks: C++ error message now explicitly lists allowed non-MLA configs when SageAttention is selected. TRT-LLM-Gen path is blocked when any SageAttention parameter is non-zero.
Documentation `examples/visual_gen/README.md`	Documents new `--enable_sage_attention` flag and its TRTLLM backend dependency in WAN text-to-video example commands.
Test Updates `tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_sage_attention.py`, `tests/unittest/_torch/visual_gen/test_attention_integration.py`, `tests/unittest/_torch/visual_gen/test_attention_perf.py`	Updates existing test to use `SageAttentionConfig` object. Adds new SageAttention self-attention integration test with cosine similarity validation. Introduces SM100-only `TestSageAttentionPerformance` suite benchmarking five config tuples with speedup reporting versus vanilla and TRTLLM baselines.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly indicates the main feature being enabled (SageAttention) and for which models (Wan/FLUX), accurately reflecting the primary changes across the changeset.
Description check	✅ Passed	The PR description provides context (integration of SageAttention kernels), references the original PR (`#13425`), explains the limitation (LTX-2 not yet supported), and includes comprehensive performance data. However, the Test Coverage section is incomplete/empty.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year in this modified file.

The header still says 2025, but this file now has 2026 changes.

As per coding guidelines, "Include NVIDIA copyright header on all new files; update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py` at line 1, The
file trtllm.py still contains a 2025 NVIDIA copyright header; update the header
year to 2026 in the top-of-file comment (the SPDX/copyright line) so the file
reflects the current modification year.
tests/unittest/_torch/visual_gen/test_attention_perf.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year in this modified file.

The header still says 2025, but this file now has 2026 changes.

As per coding guidelines, "Include NVIDIA copyright header on all new files; update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py` at line 1, Update
the SPDX copyright header year from 2025 to 2026 in the file by editing the
top-of-file header line that currently reads "SPDX-FileCopyrightText: Copyright
(c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved." to reflect 2026
so the modified file's header matches the current changes.

🧹 Nitpick comments (1)

tests/unittest/_torch/visual_gen/test_attention_integration.py (1)
693-701: QA integration test-list updates are not needed for this change.

This PR touches tests/unittest/... coverage; no tests/integration/defs/... additions were made, so QA scheduled list changes are unnecessary here.

As per coding guidelines: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py` around lines
693 - 701, This change only updates unit tests under
tests/unittest/_torch/visual_gen/test_attention_integration.py (the loop
invoking test_sage_attention_self_attention with results[label]) and does not
add any integration defs, so explicitly state that QA integration list updates
are unnecessary by adding a clear one-line comment or PR note near the test
block (referencing test_sage_attention_self_attention and the results dict/label
loop) that says "QA integration test-list updates are unnecessary for this PR"
to satisfy the coding guideline.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/visual_gen/README.md`:
- Around line 91-100: The README currently shows a SageAttention usage example
for WAN (visual_gen_wan_t2v.py with --enable_sage_attention and
--attention_backend TRTLLM) but the PR also adds FLUX support; update the docs
to either (A) add a FLUX-specific SageAttention example/row mirroring the WAN
example (use the FLUX model name, appropriate script or --attention_backend
value for FLUX and include --enable_sage_attention), or (B) explicitly state
that SageAttention is WAN-only in the example and the support table; modify the
example block and the support table entries consistently so they reflect the
chosen scope.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 241: The API exposes seq_len_kv but the non-Sage path in
trtllm.py/_concat_qkv (and surrounding logic where seq_len_kv is accepted) only
supports self-attention; add a guard that if seq_len_kv is not None and
seq_len_kv != seq_len you raise a clear ValueError (or NotImplementedError)
explaining cross-attention is unsupported yet, referencing the parameter name
seq_len_kv and the helper _concat_qkv so callers (e.g., attention module) get a
deterministic failure instead of an obscure reshape error; apply the same check
in the other related block(s) around lines 268-295 where seq_len_kv is used.
- Around line 271-274: Replace the inline assert in the Sage attention input
check with an explicit input-validation exception: in the method where
self.sage_attention_config is checked (the block that currently does "assert k
is not None and v is not None, ('SageAttention requires separate Q, K, V
tensors')"), change it to explicitly test "if k is None or v is None:" and raise
a ValueError("SageAttention requires separate Q, K, V tensors") so the invalid
inputs fail deterministically at this point (before any reshape or downstream
ops).

In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`:
- Around line 114-119: The code currently forwards
attention_config.sage_attention_config into kwargs before checking backend,
which can pass TRTLLM-only args to other backends; modify the logic so that
sage_attention_config is only added to kwargs when backend.upper() == "TRTLLM"
and attention_config is not None and attention_config.sage_attention_config is
not None (i.e., move or wrap the existing kwargs["sage_attention_config"]
assignment behind the backend check), referencing
attention_config.sage_attention_config, kwargs, and backend to locate and fix
the spot.

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 121-124: The error text in the raise ValueError uses misleading
wording that claims a fallback ("Fallback to non-SageAttention TRTLLM
attention") even though the code actually raises an exception; update the
message in the raise ValueError in the block referencing
self.sage_attention_config to accurately state that the configuration is
unsupported and that no fallback will occur (e.g., "Unsupported
{self.sage_attention_config}; cannot fallback to non-SageAttention"), and ensure
the message includes the self.sage_attention_config value for clarity.
- Around line 54-80: The SageAttentionConfig class currently inherits from
BaseModel which permits extra fields; change it to inherit from StrictBaseModel
to enforce extra='forbid' and fail fast on unknown fields: update the class
definition for SageAttentionConfig to extend StrictBaseModel (and add/import
StrictBaseModel where necessary), keeping all existing PydanticField attributes
(num_elts_per_blk_q, num_elts_per_blk_k, num_elts_per_blk_v, qk_int8) unchanged.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py`:
- Around line 311-327: The code incorrectly computes seq_len and seq_len_kv
using q.shape[1], which is the number of heads for HND layout; update the
computation to derive sequence length from the correct dimension after reshape
(use q.shape[2] for HND layout) and likewise compute seq_len_kv from k.shape[2]
when k is not None; then pass those corrected seq_len and seq_len_kv values in
the kwargs to self.attn.forward (adjust logic around seq_len = q.shape[...] and
seq_len_kv = k.shape[...] to handle both HND and other layouts consistently).

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 291-387: The test test_sage_attention_self_attention parametrizes
seq_len with very large values (e.g., 16384, 32760) causing CI timeouts/OOM;
change the parametrization to a small "fast" set for normal unit runs (e.g.,
[256, 512, 1024]) and move the heavy shapes into a separate gated marker (e.g.,
pytest.mark.slow or a new param set gated by pytest.config option) so the heavy
runs are only executed when requested; update the decorator around
test_sage_attention_self_attention and reference the same
SageAttentionConfig/Attention/NaiveWanSelfAttention flow so the logic and
weight-copying (copy_weights_self_attention) remain unchanged.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 947-1038: The tests only assert timing results but not that the
SageAttention execution path ran; update the production benchmarking path to
emit an observable flag (e.g., add a field like "used_sage" or "exec_path" to
the dict returned by _bench and benchmark.benchmark_single when the
SageAttention code path is taken) and then assert that flag in
test_sage_runs_and_times, test_sage_vs_vanilla_quick, and
test_sage_vs_trtllm_wan_shapes (use the existing result variables sage, vanilla,
trtllm and the helper _bench/benchmark_single symbols to locate changes); ensure
the runtime sets that flag inside the SageAttention dispatch/implementation code
so the tests fail if code silently falls back to TRTLLM.
- Around line 897-910: Change the mutable lists WAN_SEQ_LENS and QUICK_SEQ_LENS
to immutable tuples (replace [...] with (...)) and update the three zip calls
that pair _SAGE_CONFIGS with _SAGE_CONFIG_IDS to use zip(..., strict=True) so
mismatched lengths raise errors; specifically modify the zip invocations that
iterate over _SAGE_CONFIGS and _SAGE_CONFIG_IDS (the three places where those
two names are zipped) to pass strict=True.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 1: The file trtllm.py still contains a 2025 NVIDIA copyright header;
update the header year to 2026 in the top-of-file comment (the SPDX/copyright
line) so the file reflects the current modification year.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Line 1: Update the SPDX copyright header year from 2025 to 2026 in the file by
editing the top-of-file header line that currently reads
"SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All
rights reserved." to reflect 2026 so the modified file's header matches the
current changes.

---

Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 693-701: This change only updates unit tests under
tests/unittest/_torch/visual_gen/test_attention_integration.py (the loop
invoking test_sage_attention_self_attention with results[label]) and does not
add any integration defs, so explicitly state that QA integration list updates
are unnecessary by adding a clear one-line comment or PR note near the test
block (referencing test_sage_attention_self_attention and the results dict/label
loop) that says "QA integration test-list updates are unnecessary for this PR"
to satisfy the coding guideline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e0d8f56-e8da-4005-8f3a-f81b82bfd310

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6d431 and 83d8984.

📒 Files selected for processing (15)

cpp/tensorrt_llm/thop/attentionOp.cpp
examples/visual_gen/README.md
examples/visual_gen/visual_gen_flux.py
examples/visual_gen/visual_gen_ltx2.py
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
tensorrt_llm/_torch/attention_backend/trtllm.py
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
tensorrt_llm/_torch/visual_gen/config.py
tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py
tensorrt_llm/_torch/visual_gen/modules/attention.py
tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_sage_attention.py
tests/unittest/_torch/visual_gen/test_attention_integration.py
tests/unittest/_torch/visual_gen/test_attention_perf.py

tensorrt-cicd · 2026-04-28T22:20:55Z

PR_Github #45971 [ run ] completed with state FAILURE. Commit: 83d8984
/LLM/main/L0_MergeRequest_PR pipeline #36124 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

xrq-phys · 2026-04-29T03:53:44Z

/bot run

tensorrt-cicd · 2026-04-29T04:00:41Z

PR_Github #46052 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

tensorrt-cicd · 2026-04-29T06:55:30Z

PR_Github #46052 [ run ] completed with state FAILURE. Commit: b7be394
/LLM/main/L0_MergeRequest_PR pipeline #36196 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-04-29T07:36:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-29T07:42:45Z

PR_Github #46095 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

tensorrt-cicd · 2026-04-29T11:17:52Z

PR_Github #46095 [ run ] completed with state SUCCESS. Commit: b7be394
/LLM/main/L0_MergeRequest_PR pipeline #36238 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-04-29T16:16:35Z

/bot run

tensorrt-cicd · 2026-04-29T16:24:09Z

PR_Github #46180 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

tburt-nv · 2026-04-29T17:19:04Z

I rebased to pick up #13598, which is needed to fix the debug build failures

tburt-nv · 2026-04-29T17:19:16Z

/bot run

chang-l · 2026-04-29T17:12:15Z

+
+
+def _sage_attention_config_for_model(model_path: str) -> tuple[dict, str]:
+    """INT8 Q/K Sage preset: (1,4,1) for Wan2.1-I2V, else (1,16,1)."""


This is interesting – it seems I2V is more sensitive to attn accuracy. Is that right?

Tbh I haven't actually compared I2V against T2V.

Overall, smaller models tend to post higher precision requirements. From my personal perspective, special handling should detect Wan2.1 1.3B instead of Wan2.1 I2V.

Perhaps @o-stoner implemented this only for I2V from actual tests. I'll have to check it up with real runs later.

chang-l · 2026-04-29T17:15:45Z

+        "backend": args.attention_backend,
+    }
+    if args.enable_sage_attention:
+        sage_cfg, sage_preset = _sage_attention_config_for_model(args.model_path)


What sage_preset stands for? Is sage_cfg alone not enough?
It seems every model has a recommended sage config. Can we relocate this as part of model's default setting, instead of scattered within examples

These are preset names (Wan2.1-I2V or default).
Looks like AI code style again.

Sorry I wanted to first see whether all feature are functioning. Haven't carefully checked these lines. 🙇

tensorrt-cicd · 2026-04-29T17:27:05Z

PR_Github #46196 [ run ] triggered by Bot. Commit: 276a366 Link to invocation

tensorrt-cicd · 2026-04-29T17:27:09Z

PR_Github #46180 [ run ] completed with state ABORTED. Commit: b7be394

Link to invocation

tensorrt-cicd · 2026-04-29T22:44:43Z

PR_Github #46196 [ run ] completed with state SUCCESS. Commit: 276a366
/LLM/main/L0_MergeRequest_PR pipeline #36310 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

chang-l · 2026-04-29T23:00:40Z

/bot run

tensorrt-cicd · 2026-04-29T23:08:44Z

PR_Github #46221 [ run ] triggered by Bot. Commit: ed2a1b6 Link to invocation

tensorrt-cicd · 2026-04-30T00:12:27Z

PR_Github #46221 [ run ] completed with state FAILURE. Commit: ed2a1b6
/LLM/main/L0_MergeRequest_PR pipeline #36333 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-04-30T01:50:26Z

/bot run

xrq-phys · 2026-04-30T17:55:33Z

/bot run

tensorrt-cicd · 2026-04-30T18:01:28Z

PR_Github #46422 [ run ] triggered by Bot. Commit: 90509d3 Link to invocation

tensorrt-cicd · 2026-04-30T18:20:46Z

PR_Github #46422 [ run ] completed with state FAILURE. Commit: 90509d3
/LLM/main/L0_MergeRequest_PR pipeline #36494 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xrq-phys · 2026-05-01T04:24:11Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-05-01T04:31:15Z

PR_Github #46466 [ run ] triggered by Bot. Commit: 1f031bc Link to invocation

tensorrt-cicd · 2026-05-01T04:33:13Z

PR_Github #46466 [ run ] completed with state FAILURE. Commit: 1f031bc

Link to invocation

xrq-phys · 2026-05-01T05:44:54Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-05-01T05:52:15Z

PR_Github #46474 [ run ] triggered by Bot. Commit: 1f031bc Link to invocation

tensorrt-cicd · 2026-05-01T08:38:20Z

PR_Github #46474 [ run ] completed with state SUCCESS. Commit: 1f031bc
/LLM/main/L0_MergeRequest_PR pipeline #36538 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys · 2026-05-01T10:02:59Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-05-01T10:09:32Z

PR_Github #46488 [ run ] triggered by Bot. Commit: ca40211 Link to invocation

xrq-phys requested review from a team as code owners April 28, 2026 17:50

xrq-phys requested review from PerkzZheng, kaiyux and laikhtewari April 28, 2026 17:50

github-actions Bot assigned xrq-phys Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

zhenhuaw-me requested a review from chang-l April 29, 2026 01:05

xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 83d8984 to b7be394 Compare April 29, 2026 03:50

tburt-nv force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from b7be394 to 276a366 Compare April 29, 2026 17:18

chang-l reviewed Apr 29, 2026

View reviewed changes

NVShreyas reviewed Apr 30, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py

Comment thread tensorrt_llm/_torch/visual_gen/modules/attention.py

xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from ed2a1b6 to 90509d3 Compare April 30, 2026 17:55

xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 90509d3 to 1f031bc Compare May 1, 2026 03:49

o-stoner and others added 12 commits May 1, 2026 19:02

metadata + attention config updates

fcf40eb

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

enable sage attention

2fbacdb

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

Sage integration - FLUX + WAN

3d4e4ac

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

small fixes

f7be152

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

fix config format

50f7718

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

Revert k_ptr/v_ptr setting in attentionOp.cpp

2465af1

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Fix attn kwargs update failure

9aed8b4

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Minor fixes

d9c29b1

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Fix docs & comments

a9ac609

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Fix Ulysses handling of attn backend kwargs

3478ccd

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Add unsupported Sage config fallback & related tests

474118d

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

Simplify Wan2.1 fine-grained sage config detection

ca40211

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 1f031bc to ca40211 Compare May 1, 2026 10:02



		def _sage_attention_config_for_model(model_path: str) -> tuple[dict, str]:
		"""INT8 Q/K Sage preset: (1,4,1) for Wan2.1-I2V, else (1,16,1)."""

Conversation

xrq-phys commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

xrq-phys commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

xrq-phys commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xrq-phys commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xrq-phys commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tburt-nv commented Apr 29, 2026

Uh oh!

tburt-nv commented Apr 29, 2026

Uh oh!

Uh oh!

chang-l Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

xrq-phys Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

chang-l Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

xrq-phys Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

chang-l commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

xrq-phys commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading