Skip to content

[TRTLLM-11119][feat] Blackwell SageAttention, Integrate into AttentionOp API#11718

Merged
zhenhuaw-me merged 1 commit intoNVIDIA:mainfrom
xrq-phys:ruqingx/visual_gen/sage_attn
Apr 2, 2026
Merged

[TRTLLM-11119][feat] Blackwell SageAttention, Integrate into AttentionOp API#11718
zhenhuaw-me merged 1 commit intoNVIDIA:mainfrom
xrq-phys:ruqingx/visual_gen/sage_attn

Conversation

@xrq-phys
Copy link
Copy Markdown
Collaborator

@xrq-phys xrq-phys commented Feb 25, 2026

Summary by CodeRabbit

  • New Features
    • Added new FMHA kernel variants with extended configuration and tuning options
    • Introduced Sage Attention support with configurable scaling factors for enhanced performance
    • Extended kernel selection mechanism to support additional parameters for improved optimization

Description

  • Add SageAttention Kernels from TRTLLM-Gen & create appropriate runners for it.
  • Integrate them into AttentionOp API.
  • Minor variant: QkvFp8 variant
  • Major variant: QkInt8PvInt8 variant (small effort)

Test Coverage

TBA

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/developer-guide/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@xrq-phys xrq-phys marked this pull request as draft February 25, 2026 13:55
@xrq-phys xrq-phys self-assigned this Feb 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

This PR introduces support for extended FMHA kernel variants by adding Git LFS pointers for numerous prebuilt CUDA kernel binaries and extending the kernel infrastructure to support new metadata structures with SageAttention parameters and tuning configurations.

Changes

Cohort / File(s) Summary
Git LFS Kernel Binaries
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_*.cpp, cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_*.cpp, cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm101aKernel_*.cpp
Added ~150 Git LFS pointer files for prebuilt CUDA kernel binaries across various SM architectures, data types (Bfloat16, Int8, E4M3), and configurations (PersistentContext, StaticContext, SageQ variants). Each pointer contains version, OID, and size metadata.
Kernel Metadata Extension
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/kernelMetaInfoVx.h
Introduces new public struct TllmGenFmhaKernelMetaInfoVx extending base kernel metadata with vx-specific fields (mMaxNumHeadsQPerKvInCta, mNumEltsPerSageAttnBlk*, mDataTypeQkReinterpret). Includes multiple constructors and static array sTllmGenFmhaKernelMetaInfosVx binding cubins to kernel variants. Adds extern declarations for ~100+ kernel binaries and length symbols, and TLLM_GEN_VERSION macro.
Kernel Factory & Infrastructure
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h
Extended TllmGenFmhaKernel and TllmFmhaKernelFactory to support extended kernel metadata (KernelMetaVx). Updated constructor signatures, added getKernelMetaVx() method, extended kernel selection logic, and modified hashing scheme to encode new parameters (maxNumHeadsQPerKvInCta, numEltsPerSageAttnBlk*, dataTypeQkReinterpret).
Runner Configuration
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h, cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp
Extended TllmGenFmhaRunner constructor to accept new parameters (maxNumHeadsQPerKvInCta, numEltsPerSageAttnBlk*, dataTypeQkReinterpret) with defaults. Updated kernel creation calls to pass extended metadata to getTllmFmhaKernels.
Parameter Structs
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunnerParams.h
Added optional SageAttention scaling factor pointers (Q, K, P, V) and log-based block element counts to both TllmGenFmhaRunnerParams and TllmGenSelectKernelParams structs.
Type Trait Support
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h
Added HasSageAttnParams type trait to detect SageAttention support in options types. Conditionally populates SageAttention pointers and parameters when available; sets mInflateMax to 0.4f for Sage paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#6379: Directly related kernel infrastructure updates for FMHA cubins and kernel metadata structures.

Suggested reviewers

  • PerkzZheng
  • Wanli-Jiang
  • byshiue
  • Tracin
  • lowsfer
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description is incomplete. Test coverage is marked as 'TBA' and missing specific test cases. Multiple checklist items are unchecked, including CODEOWNERS updates, documentation, and architecture diagram updates. Complete the PR description by: (1) providing specific test cases or a timeline for 'TBA' test coverage, (2) explicitly checking off or explaining unchecked items, (3) confirming all required documentation/architecture updates are complete, and (4) verifying CODEOWNERS has been updated if needed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[TRTLLM-11119][feat] Blackwell SageAttention, Integrate into AttentionOp API' directly summarizes the main change: adding SageAttention kernel support for Blackwell with integration into AttentionOp.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year in this modified file.

Header still says 2020-2023 while this file has meaningful 2026 changes.

As per coding guidelines: "Include NVIDIA copyright header on ALL new files; update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp` at line 2,
Update the copyright header string at the top of the file from "2020-2023" to
"2020-2026" so the modified file reflects 2026 changes; locate the existing
header comment containing "Copyright (c) 2020-2023, NVIDIA CORPORATION." and
replace the year range with "2020-2026".
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year in this modified file.

Header still says 2020-2025 while this file has meaningful 2026 edits.

As per coding guidelines: "Include NVIDIA copyright header on ALL new files; update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h` at line 2,
Update the copyright header in fmhaKernels.h to include 2026 (e.g., change
"2020-2025" to include 2026 such as "2020-2026" or the appropriate range),
ensuring the top-of-file NVIDIA copyright comment reflects the new modification
year.
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year in this modified file.

Header still says 2020-2023 while this file was meaningfully changed in this PR.

As per coding guidelines: "Include NVIDIA copyright header on ALL new files; update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h` at line 2,
Update the copyright header in the top comment of fmhaRunner.h to reflect the
current modification year (change "2020-2023" to include 2026, e.g.,
"2020-2026"); locate the header comment at the top of the file and replace the
year range accordingly so the file shows the updated copyright span.
🧹 Nitpick comments (2)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/kernelMetaInfoVx.h (1)

27-27: Prefer a constexpr version constant instead of a macro.

Use a typed constant for scope and compiler visibility.

Proposed refactor
-#define TLLM_GEN_VERSION "ebe6f5a2-dirty"
+inline constexpr char kTllmGenVersion[] = "ebe6f5a2-dirty";

As per coding guidelines: "Prefer const or constexpr variables over #define directives, as the latter are not visible to the compiler".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/kernelMetaInfoVx.h` at
line 27, Replace the preprocessor macro TLLM_GEN_VERSION with a typed
compile-time constant to give it proper scope and compiler visibility; locate
the definition of TLLM_GEN_VERSION in kernelMetaInfoVx.h and change it from a
`#define` to a constexpr (e.g., constexpr const char* or constexpr
std::string_view) named TLLM_GEN_VERSION, keeping the same string value and
ensuring linkage/visibility is appropriate for header usage.
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h (1)

45-53: Make the trait guard match all accessed SageAttn members.

The trait currently checks only sageAttnSfsQPtr, but the guarded block uses additional SageAttn pointers and log fields.

Proposed refactor
 template <typename T>
-struct HasSageAttnParams<T, std::void_t<decltype(std::declval<T const&>().sageAttnSfsQPtr)>> : std::true_type
+struct HasSageAttnParams<T, std::void_t<
+    decltype(std::declval<T const&>().sageAttnSfsQPtr),
+    decltype(std::declval<T const&>().sageAttnSfsKPtr),
+    decltype(std::declval<T const&>().sageAttnSfsPPtr),
+    decltype(std::declval<T const&>().sageAttnSfsVPtr),
+    decltype(std::declval<T const&>().mLogNumEltsPerSageAttnBlkQ),
+    decltype(std::declval<T const&>().mLogNumEltsPerSageAttnBlkK),
+    decltype(std::declval<T const&>().mLogNumEltsPerSageAttnBlkP),
+    decltype(std::declval<T const&>().mLogNumEltsPerSageAttnBlkV)>> : std::true_type
 {
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h` around lines
45 - 53, The HasSageAttnParams trait only checks for sageAttnSfsQPtr but the
guarded code reads other SageAttn members; update HasSageAttnParams to require
all SageAttn fields used in the guarded block (e.g. include checks for
sageAttnSfsKPtr, sageAttnSfsVPtr and any log/index fields like sageAttnLogPtr or
sageAttnLogIdx that the code accesses) by replacing the single decltype check
with a std::void_t of decltype(...) for each accessed member so the trait is
true only when every referenced SageAttn member is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp`:
- Around line 1-3: CMake is globbing Git LFS pointer files in the cubin/
directory (via the file(GLOB_RECURSE SRC_CPP ...) call) so the SRC_CPP list
contains LFS pointer “_cubin.cpp” files that break compilation; open
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt and after the
file(GLOB_RECURSE SRC_CPP ...) invocation filter the SRC_CPP list to exclude the
cubin/ directory (i.e., remove any entries matching the cubin/ path) so those
pointer files are not treated as source; alternatively, change the glob to only
include the real source layout or move/load cubin binaries outside the C++
source tree.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp`:
- Around line 1-3: CI is parsing Git LFS pointer files (e.g., *_cubin.cpp)
because LFS objects aren't hydrated; update the workflow that runs static
analysis/compile to either enable LFS on checkout (set actions/checkout@v6 with
lfs: true) or add an explicit git lfs pull step before the lint/compile jobs,
and ensure this change covers jobs that process files matched by .gitattributes
(see the '*cubin.cpp filter=lfs' entry) so static analyzers read real binary
CUBIN content rather than pointer metadata.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
is a Git LFS pointer (starts with "version https://git-lfs.github.com/spec/v1")
so adding the required NVIDIA Apache-2.0 header will corrupt the LFS pointer;
fix by either renaming the file to a non-compiled extension (e.g., change the
extension from .cpp to .cubin or .bin so the pointer remains intact and build
systems won’t treat it as C++), or add an explicit lint exclusion rule for
**/cubin/*.cpp in the repo copyright-header check with a brief justification
comment referencing the LFS pointer constraint and this filename.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The build is failing because CMakeLists.txt uses
file(GLOB_RECURSE SRC_CPP *.cpp) which picks up Git LFS pointer-only `.cpp`
cubin files (e.g., the cubin file in the cubin/ subdirectory) that are not valid
C++ sources; update the CMake logic in the CMakeLists.txt under the fmha
directory to avoid adding those files — either (A) stop globbing into the cubin/
directory (exclude the cubin/ path from the glob), (B) only include .cpp files
from explicit source dirs and/or add an exclude list for files matching cubin/*
or the specific pattern used for these cubin assets, or (C) rename those files
to a non-.cpp extension (e.g., .cubin/.bin) or remove them from source control
and fetch them as binary artifacts at build time; adjust the file(GLOB_RECURSE
SRC_CPP *.cpp) use or the target source list accordingly so functions/classes in
normal .cpp files are still compiled but cubin pointer files (like
FmhaSm100aKernel_..._cubin.cpp) are not added to the target.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The glob in this CMakeLists (SRC_CPP via file(GLOB_RECURSE
*.cpp)) can pick up LFS pointer files for *cubin.cpp; before the
add_library(...) that uses SRC_CPP, add a pre-build validation that finds at
least one *.cubin.cpp (or pattern "cubin.cpp") from the glob and uses file(SIZE
<found_file> size) to ensure size > 1024 bytes, and if not call
message(FATAL_ERROR ...) with a clear instruction to run "git lfs install && git
lfs pull" (mirroring the check used for internal_cutlass_kernels in
cpp/tensorrt_llm/CMakeLists.txt), so the build fails with a helpful LFS-missing
error instead of passing a 3-line LFS pointer to the compiler.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The recursive glob in CMakeLists.txt (file(GLOB_RECURSE
SRC_CPP *.cpp)) is picking up Git LFS pointer files like the cubin.cpp in the
cubin/ subdir; update the CMake glob to exclude these by adding an explicit
exclusion for files matching "*cubin.cpp" (or use a FILTER step to remove
entries that match that pattern) so kernels/trtllmGenKernels/fmha/CMakeLists.txt
does not attempt to compile cubin LFS pointer files; alternatively, document and
enforce LFS materialization in CI if you prefer not to change the glob.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The build is picking up Git LFS pointer cubin .cpp files
because CMakeLists.txt uses file(GLOB_RECURSE SRC_CPP *.cpp); update the build
to either ensure LFS files are hydrated in CI or exclude cubin sources from the
C++ target by filtering SRC_CPP to remove files matching *cubin.cpp (or use
set_source_files_properties / target_sources to omit them) so the compiler never
receives non-hydrated LFS pointer files; reference CMakeLists.txt and the
file(GLOB_RECURSE SRC_CPP *.cpp) glob when making this change.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkBfloat16VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: This file is an LFS pointer (three-line Git LFS format) and
so must not be modified in-place to add the required Apache-2.0 NVIDIA copyright
header; instead, either (A) replace this pointer-file approach with a small
companion .cpp wrapper that contains the required NVIDIA Apache-2.0 copyright
block and then `#include` or otherwise load the LFS-tracked binary/compiled
resource, or (B) keep the LFS-tracked binary as-is and add a thin sidecar source
(e.g.,
FmhaSm103aKernel_QkBfloat16VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin_wrapper.cpp)
that contains the copyright header and any necessary glue to reference the
binary; do not alter the three-line pointer contents in the current .cpp file
itself. Ensure the new sidecar contains the exact required NVIDIA Apache-2.0
block and adjust build rules to compile or include the wrapper so the copyright
is present for new .cpp additions.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The build is picking up Git LFS pointer files like the cubin
.cpp (file:
FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp)
because your CMake uses file(GLOB_RECURSE SRC_CPP *.cpp) and then
add_library(...) with SRC_CPP; fix by excluding the cubin artifacts: either move
these files out of the source tree (e.g., data/ or artifacts/), rename them to a
non-source extension (.cubin), or update CMake to filter out the cubin/
directory before add_library (example: remove paths matching */cubin/* from
SRC_CPP) so that these 3-line Git LFS pointer files are not compiled.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkvE4m3OE4m3H256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The CI is analyzing pointer stubs from LFS-tracked cubin
files (e.g.,
FmhaSm103aKernel_QkvE4m3OE4m3H256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp);
update the C++ analysis workflows (pr-check.yml and precommit-check.yml) to
hydrate LFS objects by ensuring actions/checkout is called with fetch-depth: 0
and lfs: true or add explicit steps to run git lfs install --local && git lfs
pull (or git lfs fetch && git lfs checkout) before any static
analysis/compilation steps, or alternatively exclude *_cubin.cpp from the
linters until hydrated—make the change in the workflows that run C++
linting/compilation.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h`:
- Around line 850-852: The context-kernel detection predicate in the if
condition that checks isContextKernel(params.mKernelType) and
isSeparateQkv(params.mQkvLayout) currently only considers
params.mLogNumEltsPerSageAttnBlkQ, K, and V; add
params.mLogNumEltsPerSageAttnBlkP to that OR list so SageAttention configs that
only set P are recognized as context kernels (i.e., update the condition in the
same block to include || params.mLogNumEltsPerSageAttnBlkP).
- Around line 1025-1029: The block-size hash currently encodes
numEltsPerSageAttnBlkP twice (for P and V), causing collisions; update the bit
construction that calls computeLog2BlockSize for the V range so it uses
numEltsPerSageAttnBlkV instead of numEltsPerSageAttnBlkP (the expression
combining computeLog2BlockSize(numEltsPerSageAttnBlkQ/K/P/P) should become
computeLog2BlockSize(numEltsPerSageAttnBlkQ/K/P/V)), leaving the shifts and the
final dataTypeQkReinterpret shift unchanged.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp`:
- Around line 58-59: After calling getTllmFmhaKernels(...) to populate mKernel
in fmhaRunner (the assignment shown), validate that mKernel is not null before
returning from the constructor/init path: if mKernel is null, log an error
(include context) and set the runner into a safe failure state (e.g., set a bool
like mSupported=false or propagate an error/throw) so that run() and
support-check methods which dereference mKernel do not crash; update any callers
or support-check logic to respect this failure state. Ensure you reference
mKernel and getTllmFmhaKernels when adding the null check and error handling.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h`:
- Around line 877-891: The SageAttn-related members of params
(ptrSageAttnSfsQ/K/P/V, mLogNumEltsPerSageAttnBlkQ/K/P/V, mInflateMax) are left
uninitialized when detail::HasSageAttnParams<FmhaOptions_>::value is false; add
an else branch after that constexpr check to explicitly set those
ptrSageAttnSfs* members to nullptr (or 0), the mLogNumEltsPerSageAttnBlk*
members to 0, and mInflateMax to 0.0f so kernel launch parameters are
deterministic (modify the block around the existing if constexpr in
kernelParams.h where params and options are used).

---

Outside diff comments:
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h`:
- Line 2: Update the copyright header in fmhaKernels.h to include 2026 (e.g.,
change "2020-2025" to include 2026 such as "2020-2026" or the appropriate
range), ensuring the top-of-file NVIDIA copyright comment reflects the new
modification year.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp`:
- Line 2: Update the copyright header string at the top of the file from
"2020-2023" to "2020-2026" so the modified file reflects 2026 changes; locate
the existing header comment containing "Copyright (c) 2020-2023, NVIDIA
CORPORATION." and replace the year range with "2020-2026".

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h`:
- Line 2: Update the copyright header in the top comment of fmhaRunner.h to
reflect the current modification year (change "2020-2023" to include 2026, e.g.,
"2020-2026"); locate the header comment at the top of the file and replace the
year range accordingly so the file shows the updated copyright span.

---

Duplicate comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is currently stored as a Git LFS pointer instead of the actual binary content;
to fix, run git lfs install, ensure the pattern for this cubin (or its
directory) is tracked (git lfs track "<pattern>"), then run git lfs migrate
import
--include="cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp"
to replace the pointer with a proper LFS-managed object, commit the updated file
and push so CI and other tooling receive the actual LFS-managed binary rather
than a pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file contains a Git LFS pointer (lines 1–3) instead of
the real C++ source, which prevents compilation; fix by replacing the pointer
with the actual generated C++ contents or ensure CI/PR includes the real file
(or configure Git LFS to fetch the file during build). Locate the offending file
name shown in the diff
(FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp)
and either (a) commit the expanded .cpp source into the repo in this PR, or (b)
update build/CI to run git lfs pull (or add LFS pointers to the build step) so
the actual binary/source is available to the compiler; after replacing/fetching,
verify the file begins with valid C++ content rather than the LFS pointer
header.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
is a Git LFS pointer (lines show version/oid/size) not actual C++/CUDA code
which breaks compilation; replace the pointer with the real compiled cubin/C++
source or stop treating it as a .cpp source in the build. Fix by either (a)
committing the actual binary/source artifact for that symbol (the real cubin or
.cpp) so the build sees valid code, or (b) update the build/Makefile/CMakeLists
that references this filename (the cubin .cpp target) to consume a prebuilt
binary or exclude this file from compilation; ensure the repository stores the
real file (not just LFS pointer) or the build rules reference the correct
prebuilt artifact.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
currently contains an LFS pointer instead of the actual binary content (same
LFS-pointer-in-`.cpp` validation issue); remove the pointer content from the
commit and either add the real compiled binary via Git LFS (ensure
.gitattributes tracks *.cpp binaries) or replace the file with a small
stub/source file that references the real binary artifact stored in
LFS/artifacts, then amend the commit to remove the pointer-only entry so
CI/validation sees the proper artifact.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
is currently a Git LFS pointer (pointer-only content) which breaks analyzers/CI;
replace or hydrate it with the real binary blob so tools can read it: run git
lfs pull (or ensure CI runs git lfs fetch && git lfs checkout), re-add the real
file and commit, or update repository LFS configuration so this cubin is stored
and retrieved as a real artifact; verify by opening the file and confirming it
is not the text pointer (the file name above identifies the problematic asset).

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file begins with Git LFS pointer text ("version
https://git-lfs.github.com/spec/v1" and the oid/size lines) instead of the
actual C++ source and is missing the required file header; replace the LFS
pointer block (the "version ...", "oid sha256:...", "size ...") with the actual
.cpp contents (or re-upload the real source), and prepend the project-required
file header/license comment to the top of
FmhaSm100aKernel_QkInt8...SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
so the file contains real code and the mandated header rather than pointer
metadata.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
appears as a Git LFS pointer (hydration/tooling problem); fix by removing the
pointer commit and properly tracking and uploading the binary with Git LFS: add
a .gitattributes entry to track the .cpp (or .cubin) pattern, run git lfs track
for that pattern, remove the pointer from the index (git rm --cached <that
file>), commit the removal, add the actual binary back, commit, then push LFS
objects (git lfs push --all origin <branch>) so the real file is stored in LFS
rather than leaving a pointer in the repo history.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is a Git LFS pointer (pointer-only) causing analyzer/CI failures; replace the
pointer with the actual binary or ensure CI fetches LFS objects: commit the real
built .cubin (or add a proper placeholder artifact) instead of the pointer, or
update CI to run git lfs pull (and confirm .gitattributes tracks *.cubin) so the
analyzer sees the hydrated binary; locate the file name above to apply the fix.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
is a duplicate LFS-pointer stub; remove the duplicate pointer file or replace it
with the actual binary/source content so there is only one valid LFS pointer for
this artifact. Locate the duplicate by the filename above in the commit and
either delete the second stub entry or re-add the real file blob (or update the
git-lfs pointer to the correct oid) so the repository contains a single
authoritative LFS object for that kernel binary.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is committed as an LFS pointer (the version/oid header) instead of its actual
binary content; replace the pointer entry with properly tracked LFS content by
adding the actual binary to Git LFS (or re-adding the real file), ensure
.gitattributes includes the .cpp/.cubin pattern for LFS, run git lfs track and
git lfs add for the file, perform git lfs push/pull so the repository stores the
actual blob, and recommit the file so the repository contains the hydrated
binary rather than the pointer (reference file symbol:
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp).

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: Ensure this cubin file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
contains the same copyright/header verification logic used by the
PersistentContext variant; locate the header check or verification routine used
for the PersistentContext variant and replicate it here (add the same
copyright/license block and any runtime header validation), and ensure the file
exposes the same symbol(s)/macro(s) used for verification so automated checks
pass.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The committed file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
currently contains a Git LFS pointer (the three-line oid/version header) instead
of the real C++/CUDA source or compiled cubin, which blocks compilation; replace
the pointer with the actual file contents expected by the build (either the full
.cpp/.cu source or the actual binary cubin content) or remove the incorrect file
and add the correct artifact under the same filename so the build system can
find the real kernel implementation.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
contains Git LFS pointer metadata instead of actual binary/source content and is
missing the required file header; replace the pointer text (the version/oid/size
block at the top) with the actual compiled/source content or remove the pointer
file from the repo and add the correct binary artifact, and add the required
file header comment (license/ownership/build metadata) to the top of
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
so the build and compliance checks succeed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: This file contains an LFS pointer instead of the actual
binary C++ content for
FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp;
fix by ensuring the real binary is committed via Git LFS for this filename:
install/enable Git LFS in your environment, add the filename (or appropriate
pattern) to .gitattributes so .cpp binaries are tracked, run git lfs track for
the pattern, re-add the actual file content (git add), commit and push so the
repository stores the actual LFS-managed asset instead of the pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The committed file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
contains a Git LFS pointer instead of real content; remove this pointer commit
and replace it with the correct artifact: either (a) add the real C++ source
content if this is intended to be source code, or (b) if this is a binary kernel
blob, convert it to a non-.cpp binary and ensure it is properly tracked with Git
LFS (use git lfs track and update .gitattributes) and re-add the actual binary
file so the repository stores the real file rather than a pointer. Ensure the
final commit contains the actual file content for the named file rather than an
LFS pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: This file currently contains Git LFS pointer text instead of
the actual C++ source and is missing the required source file header; replace
the pointer lines at the top of
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
with the proper C++ file header (copyright, SPDX license identifier, brief
description) and ensure the file contains the actual compiled code or a valid
C++ wrapper that loads an external .cubin asset; if the intent was to store a
binary blob, move the binary into a separate resource (e.g., .cubin or assets
directory) and update functions that reference it to load that file instead
(adjust any loader/initializer functions that reference this cubin file name
accordingly).

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
contains non-source pointer text at the top (lines 1–3) and lacks the required
project file header; remove the git-lfs pointer block at the top and prepend the
standard required file header used across the repo (copyright/license/author and
brief description), ensuring the file begins with the proper header comment
instead of the oid/version lines so builds/tooling and compliance checks will
pass.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The uploaded cubin file is a Git LFS pointer instead of the
actual binary, causing CI/hydration failures; replace the pointer with the real
binary (or re-add the file via Git LFS) for the artifact named
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cubin:
remove the pointer commit and re-commit the real .cubin using git lfs track if
not already tracked (or run git lfs migrate import --include="*.cubin"), push
LFS objects to the remote, and verify CI can download the file (git lfs pull) so
analyzers see the real binary rather than the pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp`:
- Around line 1-3: This PR includes a large binary blob (the cubin file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp)
that appears as an LFS pointer concern; remove the raw binary from the commit
history and re-add it via Git LFS tracking: remove the file from the repo (git
rm --cached), add a git-lfs tracking rule for this filename or pattern, commit
the pointer file, and push; alternatively move the artifact to your external
large-artifact store and reference it from the repo and update any
loading/tooling code that expects this cubin file to use the external location.
Ensure .gitattributes contains the LFS pattern for this file name so future
binaries are stored correctly.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is currently an LFS pointer (not actual C++), which breaks compilation; replace
the pointer with the real generated C++/CUBIN content or ensure Git LFS is used
to fetch the real file before build. Locate the entry for this file in your
repo/build (the file named
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp)
and either commit the actual compiled/binary content expected by the toolchain
or update CI/build scripts to run `git lfs pull` (or equivalent) so the true
file is present at build time; alternatively remove the file from the build if
it was added mistakenly.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: This file is a Git LFS pointer (not real C++ source) and must
be excluded from compilation to avoid false-positive static analysis; update the
build system rule that compiles fmha kernels (the same rule changed for the
StaticContext variant) to ignore the file named
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
(or any *.cpp files that are LFS pointers) by adding an exclusion or guard in
the kernel sources list or CMake/BUILD file so the pointer file is not passed to
Clang/Cppcheck, mirroring the StaticContext fix.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: This file contains a Git LFS pointer (lines starting with
"version https://git-lfs.github.com/spec/v1" and oid sha256:9535f9bce640...) and
needs the same hydration/exclusion treatment applied to the other cubin pointer
file: remove the pointer commit and either replace it with the actual hydrated
cubin stored in LFS (track the filename with git-lfs and recommit the real
binary) or exclude the binary from the repo (add the filename to
.gitattributes/.gitignore or adjust export rules) so the repo no longer contains
an LFS pointer; ensure the cubin pointer file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is updated consistently with the prior fix.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
currently contains a Git LFS pointer (the version/oid/size header) instead of
actual C++ source which blocks compilation; replace the LFS pointer with the
real compiled CUDA/C++ binary or source contents (or remove and add the correct
.cpp/.cubin artifact) so the build receives a valid artifact, ensuring the real
file content is committed in place of the pointer referenced by the oid.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: This file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
is a git-LFS pointer (not the actual binary) and will break CI/build; either
ensure the real cubin is hydrated during CI or remove it from LFS/tracking and
commit the real artifact. Fix by updating the build/CI pipeline to run git lfs
fetch && git lfs checkout (or your repo's equivalent) before build steps that
need this cubin, or remove the file from LFS and re-add the actual binary
content (adjust .gitattributes accordingly) so the build sees the real cubin
rather than the pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
contains pointer text (LFS pointer) instead of the real binary/source and is
also missing the required file header; replace the pointer content with the
actual .cpp/cubin content or remove the binary from the repo (store via Git LFS
correctly) and add the mandated file header (license, SPDX identifier, generated
file notice) at the top of the file; ensure the file committed is the real
artifact (or a small generated-source stub) and that the header is present in
the FmhaSm100aKernel_QkInt8...StaticContext_cubin.cpp to satisfy
build/tooling/compliance.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: This file is an LFS pointer (contains "oid
sha256:a1358aaca1660c5d558ea4739bbc8481eb7b10a4c8ae122dd908528bee71184a") and
should receive the same treatment as the other cubin pointer: either hydrate the
actual binary into the repo (replace the pointer with the real cubin) or mark it
for exclusion in .gitattributes (or CI rules) so pointers are not treated as
source — update the cubin pointer entry matching the oid above to the real
binary or add the exclusion/hydration rule so the pointer file is no longer
committed as a plain LFS pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
contains a Git LFS pointer rather than the actual binary content; replace the
pointer commit by adding the actual binary to Git LFS and recommitting so the
repository stores the real LFS object. Fix this by tracking the .cpp/cubin
binary type with Git LFS (ensure .gitattributes includes the pattern), re-add
the real binary file and run the appropriate git-lfs migrate/import workflow to
rewrite the history of this commit so the pointer is replaced with a proper LFS
pointer referencing the uploaded object, then force-push the corrected branch;
target the file named
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
when making these changes.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
is a git-lfs pointer instead of the actual compiled/source content; replace the
pointer-only blob with the real file (or remove the duplicate pointer commit) by
re-adding the true binary/source using git-lfs tracking (e.g., ensure .cpp is
tracked with git lfs track and re-commit the real file) or by removing the
duplicate LFS-pointer entry so the repository contains the actual artifact;
verify the fix by confirming the file content is no longer the small pointer
text and that the duplicate pointer issue is resolved.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: This file is a duplicate LFS-pointer entry of the same .cpp
observed earlier; remove the duplicate and ensure only a single LFS-tracked
pointer for
FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
remains in the branch: delete the redundant file/commit entry, re-add the
correct LFS pointer if needed, verify git-lfs tracking (.gitattributes) includes
the pattern for this artifact, and recommit so the repository contains one
consistent LFS pointer for that filename.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The commit contains a raw Git LFS pointer/blob (oid
sha256:4d1bda8605a03b1cf5455111bfc9a26736b250780b4590ae93db368b747f43e3) which
triggers the LFS-hydration/tooling duplicate issue; fix by removing the large
binary from the commit history and re-adding it under Git LFS tracking (update
.gitattributes, run git lfs track for the file pattern, use git rm --cached
<file> then git add and commit, or run git lfs migrate import --include to
rewrite history), or replace the committed binary with a lightweight
stub/resource and ensure CI/checkout steps run git lfs pull so the real artifact
is hydrated only when needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
is an LFS pointer (contains only the pointer metadata) instead of the real
binary/source; remove the incorrect pointer commit and re-add the real artifact:
run git lfs install (if not done), ensure the file is stored in LFS via git lfs
track for the appropriate pattern or place the actual .cpp/.cubin content in the
repo, then replace the pointer in the commit by deleting the pointer file,
adding the real file (git add), and amending or recommitting so the repository
contains the real file content rather than the LFS pointer. Ensure CI/tooling
that validates .cpp files sees the actual content after pushing the corrected
commit.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: This file currently contains a Git LFS pointer (starts with
"version https://git-lfs.github.com/spec/v1" and the oid
"ac07d546aa72a08408f2ecbe152b408813138abe86ca022076ea4e1acc097b6b") and must
receive the same hydration/exclusion treatment as the other cubin pointer:
either replace the pointer with the actual binary blob when committing artifacts
that require the binary, or exclude it from the repo and add a matching
.gitattributes entry so the cubin is tracked by LFS and CI is responsible for
hydrating it at build time; update the relevant build/CI steps to run git lfs
pull (or otherwise fetch the real cubin) when needed and ensure the pointer file
is not left as a committed placeholder in source branches (locate the pointer by
searching for the "version https://git-lfs.github.com/spec/v1" header and the
oid above).

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: This file is a Git LFS pointer (risk of being committed as
pointer rather than hydrated), so apply the same LFS-pointer handling as the
previous cubin: detect the pointer content in
FmhaSm100aKernel_QkvE4m3OBfloat16H256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
and either hydrate the actual binary into the repo or add it to
.gitattributes/.gitignore to exclude pointers from CI processing; update the
repo tooling or pre-commit hook that handled the earlier cubin pointer to also
process this symbol (the cubin pointer file name) so it is never left as an LFS
pointer in the tree.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: This file contains only a Git LFS pointer (the "version / oid
sha256 / size" lines) instead of the actual compiled/binary content; replace the
pointer with the real blob by ensuring the artifact is tracked and uploaded via
Git LFS: run git lfs track for the appropriate pattern (if missing), fetch/pull
the real LFS object (git lfs fetch && git lfs pull) or re-add the real file (git
add the actual binary/cubin), then commit and push so the repository stores the
proper LFS pointer referencing an uploaded object; verify with git lfs ls-files
that the file is tracked and not an accidental empty pointer before requesting
re-review.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
contains an LFS pointer blob on lines 1–3 instead of actual C++ source and is
missing the required file header; replace the pointer text with the real
generated C++/CUDA source (or re-checkout the actual file from LFS) and prepend
the standard project file header comment, ensuring the top of the file (in this
filename) no longer contains the oid/size pointer and includes the required
license/header metadata so tooling and compliance pass.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
currently contains a Git LFS pointer instead of the real binary source; replace
the pointer with the actual compiled binary (or the proper source) and ensure
Git LFS is used correctly by adding the pattern to .gitattributes, removing the
pointer-committed file from the index (git rm --cached), re-adding the real file
so it’s tracked by LFS, and recommitting; if history needs cleanup run git lfs
migrate import for that filename to rewrite prior commits so the real file is
stored in LFS rather than as a pointer in the repository.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: This file contains a Git LFS pointer (lines starting with
"version https://git-lfs.github.com/spec/v1", the "oid
sha256:6ca3c61176dbc8f405fe84a858700900a3e9d72c6ecc15f9b87de34530a7a32b" entry
and "size 812853") and should be treated the same as the other cubin LFS
pointers: either replace the pointer with the actual hydrated binary or exclude
the binary from the repo and keep a proper pointer handling rule; update the
commit to remove the raw LFS pointer content and add the real cubin (or add it
to .gitattributes/.gitignore per project policy) so the repository does not
contain unresolved LFS pointers for
FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file is committed as a Git LFS pointer (starts with
"version https://git-lfs.github.com/spec/v1") which breaks analyzers/CI; either
replace the pointer with the real binary blob for
FmhaSm100aKernel_QkvE4m3OE4m3H256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
or ensure CI performs a git lfs install && git lfs pull (or equivalent) before
build/tests so the actual .cubin content is present; verify by checking the file
no longer begins with the LFS header and re-run the failing analyzer/CI job.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The .cpp currently contains an LFS pointer instead of the
actual cubin binary and also is missing the required header include; replace the
pointer text at the top of
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
with the real cubin data (or generate/commit the compiled binary blob) and
ensure the matching header that declares the extern symbol(s) for this cubin is
included (add the same header include used in the other fixed kernel files and
reference the cubin symbol name used by the kernel loader).

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The uploaded cubin file
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
is a Git LFS pointer rather than the real binary, causing analyzer/CI hydration
failures; replace the pointer-only commit with the actual binary content (or
re-add it via git lfs track and commit the real file), ensure the repo's LFS
tracking includes this filename/pattern and that CI checks out LFS objects
(e.g., use actions/checkout with lfs:true or run git lfs pull in CI), and verify
with git lfs ls-files that the file is stored in LFS and the CI step
successfully retrieves the real file before analysis runs.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp`:
- Around line 1-3: The submitted cubin source file
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
currently contains a Git LFS pointer instead of the actual binary content and
lacks the required new-file header; fix by replacing the pointer content with
the real cubin data (or re-add the file properly through Git LFS so the real
file is present in the commit) and add the project's required new-file header at
the top of the file so the file is recognized as a proper new artifact in the
repo.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
is an LFS pointer (not real C++), which blocks compilation; replace the pointer
with the actual compiled cubin/binary content or remove the .cpp LFS pointer
from the source tree and add the real asset to the release artifacts, update any
references that expect a compiled kernel (search for the exact filename
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp)
so the build/tooling consumes the correct binary rather than the LFS pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is an LFS pointer rather than the real cubin; either hydrate it to the real
binary or remove it from LFS tracking exactly as done for the earlier pointer:
fetch and checkout the real object (git lfs fetch
--include=".../FmhaSm100aKernel_..._cubin.cpp" && git lfs checkout
"path/to/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp")
and commit the hydrated file, or run git lfs untrack for that cubin pattern,
update .gitattributes (remove the matching pattern), run git add --renormalize
and commit so the repository contains the actual cubin file and not an LFS
pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The submitted .cpp is a Git LFS pointer (it contains the LFS
header "version https://git-lfs.github.com/spec/v1" and an oid sha256) instead
of the real compiled/binary blob; replace the pointer with the actual file
content by ensuring the file is properly stored via Git LFS (run git lfs track
for this filename pattern, git add the real binary, and commit/push so the LFS
object is uploaded) or, if the source/C++ should be stored instead, regenerate
and commit the real C++ source for
FmhaSm103aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext
(remove the pointer header and the oid lines), then re-add and push so the repo
contains the actual file content rather than an LFS pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: This file is committed as a Git LFS pointer only (causing
analyzer/CI to fail to hydrate the blob); replace the pointer with a properly
hydrated binary or update CI to fetch LFS objects: ensure the actual cubin
binary for
FmhaSm103aKernel_QkBfloat16VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin
is present in the repo (or re-add the real file to Git LFS and push), and update
CI job(s) to run "git lfs pull --include=<path>" or enable LFS fetch so the
analyzer sees the real file rather than the pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: A large LFS binary object (the committed blob starting with
"version https://git-lfs.github.com/spec/v1" and oid
sha256:ab369016b2c839ef25d7fa0bd2dd2593351a28968f65e9a34102728edb2f982b) was
accidentally committed; remove the binary from the PR and replace it with a
proper Git LFS pointer or external artifact reference: remove the blob from the
branch (use git rm --cached or git filter-repo/BFG to purge if in history), add
an appropriate .gitattributes entry to track the matching file pattern with LFS,
re-add the file so Git stores only the LFS pointer, and push the cleaned branch
or upload the binary to the release/storage bucket and reference it in the repo
instead.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
was committed as a Git LFS pointer stub instead of the real binary; fix by
replacing the pointer with the actual binary blob and ensuring Git LFS tracking
is correct: run git lfs install and git lfs pull (or re-add the real .cpp with
git lfs track configured for this pattern), then commit the real file so the
repository contains the actual content rather than the stub pointer; verify the
committed object is the full file by checking git lfs ls-files or cat the file
contents.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
currently contains only a Git LFS pointer instead of the real .cpp source;
replace the pointer with the actual hydrated/compiled source (or remove the
large binary and add the real source code) and re-commit so reviewers can
inspect the code. To fix: retrieve the real .cpp content from the original
source/build artifact, ensure .gitattributes/git-lfs settings are correct,
remove the pointer commit (or run git lfs migrate/import to convert the pointer
to the real file), and push the updated commit so the file contains the actual
C++ content rather than the LFS pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The large binary blob file
FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
was committed as raw content instead of a Git LFS pointer; remove the raw blob,
add the file pattern to .gitattributes and re-commit the kernel binary via Git
LFS (git rm --cached the large file, add and commit .gitattributes entry for
this cubin pattern, then git lfs track and re-add/commit the file) so the repo
stores only LFS pointers rather than the full binary. Ensure CI/tooling that
performs LFS hydration is configured to fetch LFS objects when checking out this
file.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
contains Git LFS pointer lines at the top and lacks the required source file
header; remove the LFS pointer block (the repeated "version/oid/size" lines)
from the start of the .cpp and replace it with your project's standard file
header (license, author, brief description, and any required annotations), and
ensure the file actually contains the compiled/binary data or—preferably—store
the binary in LFS and keep a thin C++ wrapper or generated header in repo;
update any build scripts or CMake targets referencing this symbol to point to
the proper artifact location.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The checked-in file is an LFS pointer (lines containing
"version", "oid", "size") rather than the actual C++/cubin content, causing the
compile/tooling blocker; replace the pointer with the real file contents (or
ensure Git LFS is used to populate the real blob) for
FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
by either committing the actual compiled/source file or updating CI to run git
lfs pull so the build sees the real file instead of the pointer.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
is a git-lfs pointer without the required NVIDIA copyright header; fix by
providing the actual cubin blob that contains the mandated copyright header (or,
per the project-level resolution, add the approved header via the agreed
workflow such as replacing the LFS pointer with the real file or adding an
accompanying header file and updating the lfs commit), and update the commit for
FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
to include that header.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp`:
- Around line 1-3: This PR contains a large binary
(FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp)
that appears to be incorrectly committed instead of tracked/packaged via Git LFS
or release artifacts; remove the binary from the commit history and ensure it is
properly tracked or delivered via artifacts. Fix by removing the file from the
commit (or replacing it with the small LFS pointer), add a .gitattributes entry
for the cubin pattern (e.g., *.cubin) so Git LFS tracks future binaries, re-add
the file through LFS (or move it to CI/release artifacts) and rewrite the branch
history so the large blob is not in the repo history; reference the filename
FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
when making these changes.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm103aKernel_QkvE4m3OBfloat16H256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The file
FmhaSm103aKernel_QkvE4m3OBfloat16H256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
is committed as a Git LFS pointer instead of the actual binary, causing
analyzer/CI hydration failures; replace the pointer with the real binary by
ensuring Git LFS is set up and the file is hydrated (run git lfs install && git
lfs pull or re-add the real .cubin via git lfs track and git add/commit), or
alternatively vendor the actual binary into the repo/artifacts so CI can access
it; after hydrating, re-run the pipeline to confirm the analyzer no longer
fails.

---

Nitpick comments:
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/kernelMetaInfoVx.h`:
- Line 27: Replace the preprocessor macro TLLM_GEN_VERSION with a typed
compile-time constant to give it proper scope and compiler visibility; locate
the definition of TLLM_GEN_VERSION in kernelMetaInfoVx.h and change it from a
`#define` to a constexpr (e.g., constexpr const char* or constexpr
std::string_view) named TLLM_GEN_VERSION, keeping the same string value and
ensuring linkage/visibility is appropriate for header usage.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h`:
- Around line 45-53: The HasSageAttnParams trait only checks for sageAttnSfsQPtr
but the guarded code reads other SageAttn members; update HasSageAttnParams to
require all SageAttn fields used in the guarded block (e.g. include checks for
sageAttnSfsKPtr, sageAttnSfsVPtr and any log/index fields like sageAttnLogPtr or
sageAttnLogIdx that the code accesses) by replacing the single decltype check
with a std::void_t of decltype(...) for each accessed member so the trait is
true only when every referenced SageAttn member is present.

Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h Outdated
Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h Outdated
Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp
Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParams.h Outdated
@zhenhuaw-me zhenhuaw-me requested a review from chang-l February 26, 2026 01:43
@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/sage_attn branch from 7b6dbd4 to 98bc80e Compare February 26, 2026 06:58
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
@chang-l chang-l requested a review from PerkzZheng February 27, 2026 17:23
@chang-l
Copy link
Copy Markdown
Collaborator

chang-l commented Feb 27, 2026

@PerkzZheng @yuxianq can you please take a look at this PR?

@chang-l chang-l requested a review from yuxianq February 27, 2026 17:39
Comment thread cpp/tensorrt_llm/thop/attentionOp.cpp
@PerkzZheng PerkzZheng requested a review from xueweilnvidia March 2, 2026 02:18
@PerkzZheng
Copy link
Copy Markdown
Collaborator

@PerkzZheng @yuxianq can you please take a look at this PR?

thanks. also add @xueweilnvidia for vis.

Copy link
Copy Markdown
Collaborator

@PerkzZheng PerkzZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xrq-phys do you know if this is required by any customers that are using TRTLLM ? I am wondering if we could just reuse FI efforts.
@wenmingw for opinions who are driving the efforts of pushing FI integration. Sage attention kernels can be used as a standalone operation without Rope/Kv cache updates.

Comment thread cpp/tensorrt_llm/common/attentionOp.cpp
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
Comment thread tensorrt_llm/_torch/attention_backend/trtllm.py Outdated
@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/sage_attn branch from 3831dcc to 88efcb9 Compare March 10, 2026 14:11
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

Change rebased.

@chang-l please check the latest commit contains all your intended changes. I'm not 100% sure fidelity of my rework of your commits.

Thanks so much!

@xrq-phys xrq-phys marked this pull request as ready for review March 10, 2026 14:13
@xrq-phys xrq-phys requested review from a team as code owners March 10, 2026 14:13
@zhenhuaw-me
Copy link
Copy Markdown
Member

/bot run

@xrq-phys xrq-phys force-pushed the ruqingx/visual_gen/sage_attn branch from 88bd342 to 128e392 Compare March 30, 2026 03:58
@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40652 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40652 [ run ] completed with state SUCCESS. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #31687 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhenhuaw-me
Copy link
Copy Markdown
Member

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40868 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40868 [ run ] completed with state SUCCESS. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #31875 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40945 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40945 [ run ] completed with state SUCCESS. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #31933 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhenhuaw-me
Copy link
Copy Markdown
Member

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41071 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41071 [ run ] completed with state FAILURE. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #32047 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Apr 1, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41107 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41107 [ run ] completed with state FAILURE. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #32081 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Apr 1, 2026

The failed test runs normally on internal environments:

tests/integration/defs/accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype_with_helix

@xrq-phys
Copy link
Copy Markdown
Collaborator Author

xrq-phys commented Apr 1, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41179 [ run ] triggered by Bot. Commit: 128e392 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41179 [ run ] completed with state SUCCESS. Commit: 128e392
/LLM/main/L0_MergeRequest_PR pipeline #32142 completed with status: 'SUCCESS'

CI Report

Link to invocation

@zhenhuaw-me zhenhuaw-me merged commit 1b66e96 into NVIDIA:main Apr 2, 2026
5 checks passed
yunruis added a commit to yunruis/TensorRT-LLM that referenced this pull request Apr 2, 2026
…AttentionOp API (NVIDIA#11718)"

This reverts commit 1b66e96.

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
yuxianq pushed a commit that referenced this pull request Apr 2, 2026
…Integrate into AttentionOp API (#11718)" (#12679)

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…nOp API (NVIDIA#11718)

Signed-off-by: Ruqing Xu <ruqingx@nvidia.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…Integrate into AttentionOp API (NVIDIA#11718)" (NVIDIA#12679)

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants