[CUTLASS] Refactor cutlass kernel generation and selection#9800
Merged
comaniac merged 5 commits intoapache:mainfrom Dec 30, 2021
Merged
[CUTLASS] Refactor cutlass kernel generation and selection#9800comaniac merged 5 commits intoapache:mainfrom
comaniac merged 5 commits intoapache:mainfrom
Conversation
096d8d8 to
87b36db
Compare
Member
Author
|
@comaniac Can you take a look (no functional change, should be easy)? The cutlass side change to enable residual block fusion was merged yesterday in NVIDIA/cutlass#391, so I'm ready to send residual fusion support (with good speed up!) |
Contributor
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, when we enumerate cutlass kernels for profiling, for each parameter config we generate all variants of the kernel with different epilogues. See for example
tvm/python/tvm/contrib/cutlass/gen_gemm.py
Lines 67 to 106 in 1afcf36
After profiling, we select which variant of epilogue to use based on the pattern name:
tvm/python/tvm/contrib/cutlass/build.py
Lines 219 to 230 in 1afcf36
This approach simply doesn't work when we introduce support for residual connection fusion, because there are so many different kinds of epilogues.
The idea of this change is to split kernel generation into two steps:
(1) First, we generate all kernels without any epilogue. This is used for profiling
(2) After profiling decides the best parameter configuration, use that information to generate a single kernel with the required epilogue.
Overall I believe this refactoring of kernel generation and selection have made things much cleaner, and makes us well-prepared for residual block fusion.
cc @comaniac @Laurawly