[TIR] Add CUDA int4 tensor core intrinsics#14598
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
Hzfengsy
left a comment
There was a problem hiding this comment.
LGTM. But I want to remind you that the int4 Tensor Core support is removed from the 4th Tensor Core (Rtx 40 serious and Hopper)
|
@Hzfengsy , int4 Tensor Cores is still supported in RTX 40 series, per Ada whitepaper. |
yzh119
left a comment
There was a problem hiding this comment.
A slight issue, otherwise LGTM.
| *get_wmma_sync_intrin(16, 16, 16, "int8", "int32", True), | ||
| ) | ||
|
|
||
| WMMA_SYNC_8x8x32_s4s4s32_TRANS_INTRIN = "wmma_sync_8x8x32_s4s4s32_trans" |
There was a problem hiding this comment.
"wmma_sync_8x8x32_s4s4s32" is missing.
There was a problem hiding this comment.
sub-byte tensor core only allows A in row major and B in col major
There was a problem hiding this comment.
Oh that's interesting! Maybe we can leave a note somewhere.
This PR added int4 tensor intrinsic for CUDA tensor core.
cc @junrushao @tqchen @masahi