Skip to content

[TRTLLM-12128][feat] enable SageAttention for Wan/FLUX (new commits)#13570

Open
xrq-phys wants to merge 12 commits intoNVIDIA:mainfrom
xrq-phys:user/o-stoner/visual-gen-enable-sageattn
Open

[TRTLLM-12128][feat] enable SageAttention for Wan/FLUX (new commits)#13570
xrq-phys wants to merge 12 commits intoNVIDIA:mainfrom
xrq-phys:user/o-stoner/visual-gen-enable-sageattn

Conversation

@xrq-phys
Copy link
Copy Markdown
Collaborator

@xrq-phys xrq-phys commented Apr 28, 2026

Summary by CodeRabbit

  • New Features

    • Added SageAttention optimization support for visual generation models. Users can enable it via the --enable_sage_attention flag (requires TRTLLM attention backend).
  • Documentation

    • Updated visual generation documentation with SageAttention configuration examples and parameter descriptions.
  • Tests

    • Added SageAttention integration and performance test suites for validation.

Description

Copied from #13425 with new commits to fix CI failures.

Integrate SageAttention kernels from #12937 into VisualGen for Wan and FLUX.2 models.
LTX-2 SageAttention is not yet supported.

Preliminary perf / quality compared to #12548 baseline from example scripts:

Model Configs Pipeline Time (Baseline) Output (Baseline) Pipeline Time (SageAttention) Output (SageAttention) Speedup
Wan2.1-T2V-1.3B "A cute cat playing piano", 480x832, 33 frames, 50 steps, trtllm-fp8-blockwise linear 7.94s video 7.54s video 1.05x
Wan2.1-I2V-14B-480P 480x832, 81 frames, 50 steps, trtllm-nvfp4 linear, parallel VAE disabled 128.79s video 117.38s video 1.10x
Wan2.1-I2V-14B-720P 720x1280, 81 frames, 50 steps, trtllm-nvfp4 linear, parallel VAE disabled 501.23s video 458.33s video 1.09x
Wan2.1-T2V-14B-Diffusers "A cute cat playing piano", 720x1280, 81 frames, 50 steps, trtllm-nvfp4 linear 484.63s video 413.65s video 1.17x
Wan2.2-I2V-A14B 720x1280, 81 frames, 40 steps, trtllm-nvfp4 linear, parallel VAE disabled 387.83s video 340.48s video 1.14x
Wan2.2-T2V-A14B "A cute cat playing piano", 720x1280, 81 frames, 40 steps, trtllm-nvfp4 linear, parallel VAE disabled 389.10s video 331.24s video 1.17x
FLUX.2-dev "A cat sitting on a windowsill", 1024x1024, 50 steps, trtllm-nvfp4 linear 5.60s PNG 5.61s PNG 1x

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45971 [ run ] triggered by Bot. Commit: 83d8984 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

This pull request introduces SageAttention support to the visual generation pipeline. Changes include adding a new SageAttentionConfig schema, integrating it into the attention backend selection, updating multiple visual generation scripts with CLI flags to enable SageAttention, refactoring attention module APIs to thread the configuration through layers, and adding comprehensive test coverage with performance benchmarking.

Changes

Cohort / File(s) Summary
Configuration Schema
tensorrt_llm/_torch/visual_gen/config.py
Adds new SageAttentionConfig pydantic model with per-block quantization parameters and optional sage_attention_config field to AttentionConfig. Includes validation logic requiring TRTLLM backend when SageAttention config is provided.
Attention Backend API Refactoring
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py, tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
Refactors TrtllmAttention to accept sage_attention_config object instead of individual SageAttention parameters. Updates forward signature to require explicit batch_size/seq_len inputs and optional seq_len_kv. Dispatch logic conditionally uses SageAttention path when config is provided. create_attention helper forwards config to backend constructor.
Attention Module Threading
tensorrt_llm/_torch/visual_gen/modules/attention.py, tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py
_attn_impl now derives and injects batch_size, seq_len, and seq_len_kv into attention forward call. LTX2 pipeline raises NotImplementedError when SageAttention config is detected.
Visual Generation Scripts
examples/visual_gen/visual_gen_flux.py, examples/visual_gen/visual_gen_wan_t2v.py, examples/visual_gen/visual_gen_wan_i2v.py, examples/visual_gen/visual_gen_ltx2.py
Adds --enable_sage_attention CLI flag to Flux and WAN scripts. Each constructs attention_cfg dict with backend and conditionally adds sage_attention_config with model-specific block quantization parameters. Includes checkpoint-detection helpers for Wan2.1 model variants.
Backend Validation
cpp/tensorrt_llm/thop/attentionOp.cpp, tensorrt_llm/_torch/attention_backend/trtllm.py
Adds runtime checks: C++ error message now explicitly lists allowed non-MLA configs when SageAttention is selected. TRT-LLM-Gen path is blocked when any SageAttention parameter is non-zero.
Documentation
examples/visual_gen/README.md
Documents new --enable_sage_attention flag and its TRTLLM backend dependency in WAN text-to-video example commands.
Test Updates
tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_sage_attention.py, tests/unittest/_torch/visual_gen/test_attention_integration.py, tests/unittest/_torch/visual_gen/test_attention_perf.py
Updates existing test to use SageAttentionConfig object. Adds new SageAttention self-attention integration test with cosine similarity validation. Introduces SM100-only TestSageAttentionPerformance suite benchmarking five config tuples with speedup reporting versus vanilla and TRTLLM baselines.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly indicates the main feature being enabled (SageAttention) and for which models (Wan/FLUX), accurately reflecting the primary changes across the changeset.
Description check ✅ Passed The PR description provides context (integration of SageAttention kernels), references the original PR (#13425), explains the limitation (LTX-2 not yet supported), and includes comprehensive performance data. However, the Test Coverage section is incomplete/empty.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year in this modified file.

The header still says 2025, but this file now has 2026 changes.

As per coding guidelines, "Include NVIDIA copyright header on all new files; update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py` at line 1, The
file trtllm.py still contains a 2025 NVIDIA copyright header; update the header
year to 2026 in the top-of-file comment (the SPDX/copyright line) so the file
reflects the current modification year.
tests/unittest/_torch/visual_gen/test_attention_perf.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year in this modified file.

The header still says 2025, but this file now has 2026 changes.

As per coding guidelines, "Include NVIDIA copyright header on all new files; update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py` at line 1, Update
the SPDX copyright header year from 2025 to 2026 in the file by editing the
top-of-file header line that currently reads "SPDX-FileCopyrightText: Copyright
(c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved." to reflect 2026
so the modified file's header matches the current changes.
🧹 Nitpick comments (1)
tests/unittest/_torch/visual_gen/test_attention_integration.py (1)

693-701: QA integration test-list updates are not needed for this change.

This PR touches tests/unittest/... coverage; no tests/integration/defs/... additions were made, so QA scheduled list changes are unnecessary here.

As per coding guidelines: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py` around lines
693 - 701, This change only updates unit tests under
tests/unittest/_torch/visual_gen/test_attention_integration.py (the loop
invoking test_sage_attention_self_attention with results[label]) and does not
add any integration defs, so explicitly state that QA integration list updates
are unnecessary by adding a clear one-line comment or PR note near the test
block (referencing test_sage_attention_self_attention and the results dict/label
loop) that says "QA integration test-list updates are unnecessary for this PR"
to satisfy the coding guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/visual_gen/README.md`:
- Around line 91-100: The README currently shows a SageAttention usage example
for WAN (visual_gen_wan_t2v.py with --enable_sage_attention and
--attention_backend TRTLLM) but the PR also adds FLUX support; update the docs
to either (A) add a FLUX-specific SageAttention example/row mirroring the WAN
example (use the FLUX model name, appropriate script or --attention_backend
value for FLUX and include --enable_sage_attention), or (B) explicitly state
that SageAttention is WAN-only in the example and the support table; modify the
example block and the support table entries consistently so they reflect the
chosen scope.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 241: The API exposes seq_len_kv but the non-Sage path in
trtllm.py/_concat_qkv (and surrounding logic where seq_len_kv is accepted) only
supports self-attention; add a guard that if seq_len_kv is not None and
seq_len_kv != seq_len you raise a clear ValueError (or NotImplementedError)
explaining cross-attention is unsupported yet, referencing the parameter name
seq_len_kv and the helper _concat_qkv so callers (e.g., attention module) get a
deterministic failure instead of an obscure reshape error; apply the same check
in the other related block(s) around lines 268-295 where seq_len_kv is used.
- Around line 271-274: Replace the inline assert in the Sage attention input
check with an explicit input-validation exception: in the method where
self.sage_attention_config is checked (the block that currently does "assert k
is not None and v is not None, ('SageAttention requires separate Q, K, V
tensors')"), change it to explicitly test "if k is None or v is None:" and raise
a ValueError("SageAttention requires separate Q, K, V tensors") so the invalid
inputs fail deterministically at this point (before any reshape or downstream
ops).

In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`:
- Around line 114-119: The code currently forwards
attention_config.sage_attention_config into kwargs before checking backend,
which can pass TRTLLM-only args to other backends; modify the logic so that
sage_attention_config is only added to kwargs when backend.upper() == "TRTLLM"
and attention_config is not None and attention_config.sage_attention_config is
not None (i.e., move or wrap the existing kwargs["sage_attention_config"]
assignment behind the backend check), referencing
attention_config.sage_attention_config, kwargs, and backend to locate and fix
the spot.

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 121-124: The error text in the raise ValueError uses misleading
wording that claims a fallback ("Fallback to non-SageAttention TRTLLM
attention") even though the code actually raises an exception; update the
message in the raise ValueError in the block referencing
self.sage_attention_config to accurately state that the configuration is
unsupported and that no fallback will occur (e.g., "Unsupported
{self.sage_attention_config}; cannot fallback to non-SageAttention"), and ensure
the message includes the self.sage_attention_config value for clarity.
- Around line 54-80: The SageAttentionConfig class currently inherits from
BaseModel which permits extra fields; change it to inherit from StrictBaseModel
to enforce extra='forbid' and fail fast on unknown fields: update the class
definition for SageAttentionConfig to extend StrictBaseModel (and add/import
StrictBaseModel where necessary), keeping all existing PydanticField attributes
(num_elts_per_blk_q, num_elts_per_blk_k, num_elts_per_blk_v, qk_int8) unchanged.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py`:
- Around line 311-327: The code incorrectly computes seq_len and seq_len_kv
using q.shape[1], which is the number of heads for HND layout; update the
computation to derive sequence length from the correct dimension after reshape
(use q.shape[2] for HND layout) and likewise compute seq_len_kv from k.shape[2]
when k is not None; then pass those corrected seq_len and seq_len_kv values in
the kwargs to self.attn.forward (adjust logic around seq_len = q.shape[...] and
seq_len_kv = k.shape[...] to handle both HND and other layouts consistently).

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 291-387: The test test_sage_attention_self_attention parametrizes
seq_len with very large values (e.g., 16384, 32760) causing CI timeouts/OOM;
change the parametrization to a small "fast" set for normal unit runs (e.g.,
[256, 512, 1024]) and move the heavy shapes into a separate gated marker (e.g.,
pytest.mark.slow or a new param set gated by pytest.config option) so the heavy
runs are only executed when requested; update the decorator around
test_sage_attention_self_attention and reference the same
SageAttentionConfig/Attention/NaiveWanSelfAttention flow so the logic and
weight-copying (copy_weights_self_attention) remain unchanged.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 947-1038: The tests only assert timing results but not that the
SageAttention execution path ran; update the production benchmarking path to
emit an observable flag (e.g., add a field like "used_sage" or "exec_path" to
the dict returned by _bench and benchmark.benchmark_single when the
SageAttention code path is taken) and then assert that flag in
test_sage_runs_and_times, test_sage_vs_vanilla_quick, and
test_sage_vs_trtllm_wan_shapes (use the existing result variables sage, vanilla,
trtllm and the helper _bench/benchmark_single symbols to locate changes); ensure
the runtime sets that flag inside the SageAttention dispatch/implementation code
so the tests fail if code silently falls back to TRTLLM.
- Around line 897-910: Change the mutable lists WAN_SEQ_LENS and QUICK_SEQ_LENS
to immutable tuples (replace [...] with (...)) and update the three zip calls
that pair _SAGE_CONFIGS with _SAGE_CONFIG_IDS to use zip(..., strict=True) so
mismatched lengths raise errors; specifically modify the zip invocations that
iterate over _SAGE_CONFIGS and _SAGE_CONFIG_IDS (the three places where those
two names are zipped) to pass strict=True.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 1: The file trtllm.py still contains a 2025 NVIDIA copyright header;
update the header year to 2026 in the top-of-file comment (the SPDX/copyright
line) so the file reflects the current modification year.

In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Line 1: Update the SPDX copyright header year from 2025 to 2026 in the file by
editing the top-of-file header line that currently reads
"SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All
rights reserved." to reflect 2026 so the modified file's header matches the
current changes.

---

Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 693-701: This change only updates unit tests under
tests/unittest/_torch/visual_gen/test_attention_integration.py (the loop
invoking test_sage_attention_self_attention with results[label]) and does not
add any integration defs, so explicitly state that QA integration list updates
are unnecessary by adding a clear one-line comment or PR note near the test
block (referencing test_sage_attention_self_attention and the results dict/label
loop) that says "QA integration test-list updates are unnecessary for this PR"
to satisfy the coding guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e0d8f56-e8da-4005-8f3a-f81b82bfd310

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6d431 and 83d8984.

📒 Files selected for processing (15)
  • cpp/tensorrt_llm/thop/attentionOp.cpp
  • examples/visual_gen/README.md
  • examples/visual_gen/visual_gen_flux.py
  • examples/visual_gen/visual_gen_ltx2.py
  • examples/visual_gen/visual_gen_wan_i2v.py
  • examples/visual_gen/visual_gen_wan_t2v.py
  • tensorrt_llm/_torch/attention_backend/trtllm.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tensorrt_llm/_torch/visual_gen/config.py
  • tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py
  • tensorrt_llm/_torch/visual_gen/modules/attention.py
  • tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_sage_attention.py
  • tests/unittest/_torch/visual_gen/test_attention_integration.py
  • tests/unittest/_torch/visual_gen/test_attention_perf.py

Comment thread examples/visual_gen/README.md
Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
Comment thread tensorrt_llm/_torch/visual_gen/config.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/config.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/modules/attention.py Outdated
Comment thread tests/unittest/_torch/visual_gen/test_attention_integration.py
Comment thread tests/unittest/_torch/visual_gen/test_attention_perf.py
Comment thread tests/unittest/_torch/visual_gen/test_attention_perf.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45971 [ run ] completed with state FAILURE. Commit: 83d8984
/LLM/main/L0_MergeRequest_PR pipeline #36124 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhenhuaw-me zhenhuaw-me requested a review from chang-l April 29, 2026 01:05
@xrq-phys xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 83d8984 to b7be394 Compare April 29, 2026 03:50
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46052 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46052 [ run ] completed with state FAILURE. Commit: b7be394
/LLM/main/L0_MergeRequest_PR pipeline #36196 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46095 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46095 [ run ] completed with state SUCCESS. Commit: b7be394
/LLM/main/L0_MergeRequest_PR pipeline #36238 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46180 [ run ] triggered by Bot. Commit: b7be394 Link to invocation

@tburt-nv tburt-nv force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from b7be394 to 276a366 Compare April 29, 2026 17:18
@tburt-nv
Copy link
Copy Markdown
Collaborator

I rebased to pick up #13598, which is needed to fix the debug build failures

@tburt-nv
Copy link
Copy Markdown
Collaborator

/bot run

Comment thread examples/visual_gen/README.md Outdated


def _sage_attention_config_for_model(model_path: str) -> tuple[dict, str]:
"""INT8 Q/K Sage preset: (1,4,1) for Wan2.1-I2V, else (1,16,1)."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting – it seems I2V is more sensitive to attn accuracy. Is that right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I haven't actually compared I2V against T2V.

Overall, smaller models tend to post higher precision requirements. From my personal perspective, special handling should detect Wan2.1 1.3B instead of Wan2.1 I2V.

Perhaps @o-stoner implemented this only for I2V from actual tests. I'll have to check it up with real runs later.

"backend": args.attention_backend,
}
if args.enable_sage_attention:
sage_cfg, sage_preset = _sage_attention_config_for_model(args.model_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sage_preset stands for? Is sage_cfg alone not enough?
It seems every model has a recommended sage config. Can we relocate this as part of model's default setting, instead of scattered within examples

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are preset names (Wan2.1-I2V or default).
Looks like AI code style again.

Sorry I wanted to first see whether all feature are functioning. Haven't carefully checked these lines. 🙇

Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/config.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46196 [ run ] triggered by Bot. Commit: 276a366 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46180 [ run ] completed with state ABORTED. Commit: b7be394

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46196 [ run ] completed with state SUCCESS. Commit: 276a366
/LLM/main/L0_MergeRequest_PR pipeline #36310 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator

chang-l commented Apr 29, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46221 [ run ] triggered by Bot. Commit: ed2a1b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46221 [ run ] completed with state FAILURE. Commit: ed2a1b6
/LLM/main/L0_MergeRequest_PR pipeline #36333 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run

Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py
Comment thread tensorrt_llm/_torch/visual_gen/modules/attention.py
@xrq-phys xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from ed2a1b6 to 90509d3 Compare April 30, 2026 17:55
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46422 [ run ] triggered by Bot. Commit: 90509d3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46422 [ run ] completed with state FAILURE. Commit: 90509d3
/LLM/main/L0_MergeRequest_PR pipeline #36494 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xrq-phys xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 90509d3 to 1f031bc Compare May 1, 2026 03:49
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented May 1, 2026

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46466 [ run ] triggered by Bot. Commit: 1f031bc Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46466 [ run ] completed with state FAILURE. Commit: 1f031bc

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented May 1, 2026

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46474 [ run ] triggered by Bot. Commit: 1f031bc Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46474 [ run ] completed with state SUCCESS. Commit: 1f031bc
/LLM/main/L0_MergeRequest_PR pipeline #36538 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

o-stoner and others added 12 commits May 1, 2026 19:02
Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>
@xrq-phys xrq-phys force-pushed the user/o-stoner/visual-gen-enable-sageattn branch from 1f031bc to ca40211 Compare May 1, 2026 10:02
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented May 1, 2026

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46488 [ run ] triggered by Bot. Commit: ca40211 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants