[MetaSchedule] Introduce MMA Tensor Core Multilevel Tiling by cblmemo · Pull Request #14673 · apache/tvm

cblmemo · 2023-04-19T18:55:04Z

This PR introduces mma in the multilevel tiling tensor core.

For the benchmark result, please refer to https://docs.google.com/spreadsheets/d/1thf1jsbX87WokRfESXO14fx40H3vYHDk6EWkb_wnv5Y

For all tuning logs, best performance scripts and python tuning & benchmarking scripts, please refer to https://github.com/cblmemo/TVMGemmAsync/tree/main/mma

tvm-bot · 2023-04-19T18:55:08Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ibsidorenko _{See #10317 for details}

_{Generated by tvm-bot}

masahi · 2023-04-19T19:16:42Z

+TensorIntrin.register("m16n8k8_sync", m16n8k8_sync_desc, m16n8k8_sync_impl)
+TensorIntrin.register(
+    "m16n8k8_store_C_row_major", m16n8k8_store_C_row_major_desc, m16n8k8_store_C_row_major_impl
+)


Why can't the existing intrinsic definitions for m16n16k16 be used?

FrozenGene · 2023-06-02T06:36:22Z

How to reproduce it? I would like to test it on A100 @cblmemo

cblmemo · 2023-06-02T08:41:44Z

How to reproduce it? I would like to test it on A100 @cblmemo

You could find all scripts you need at https://github.com/cblmemo/TVMGemmAsync/tree/main/mma 🫡 @FrozenGene

FrozenGene · 2023-06-02T09:55:21Z

How to reproduce it? I would like to test it on A100 @cblmemo

You could find all scripts you need at https://github.com/cblmemo/TVMGemmAsync/tree/main/mma 🫡 @FrozenGene

@cblmemo Small bug: https://github.com/cblmemo/TVMGemmAsync/blob/main/mma/GemmRuleGenerate.py#L135 should be tensorcore_outputs not outputs

FrozenGene · 2023-06-03T04:23:50Z

@cblmemo Great job! I have tested it on A100. 1024x1024x1024, it could achieve the same level of cutlass. If we could add more mma variants, maybe we could achieve better result.

cblmemo · 2023-06-03T05:10:41Z

How to reproduce it? I would like to test it on A100 @cblmemo

You could find all scripts you need at https://github.com/cblmemo/TVMGemmAsync/tree/main/mma 🫡 @FrozenGene

@cblmemo Small bug: https://github.com/cblmemo/TVMGemmAsync/blob/main/mma/GemmRuleGenerate.py#L135 should be tensorcore_outputs not outputs

@FrozenGene Thanks for point that out! I use a lot of dir names and it appears than I made a mistake when uploading my script 🙂

cblmemo · 2023-06-03T05:11:30Z

@cblmemo Great job! I have tested it on A100. 1024x1024x1024, it could achieve the same level of cutlass. If we could add more mma variants, maybe we could achieve better result.

@FrozenGene Sure. The m16n8k16 and fp32 accumulator are WIP now 🧐

FrozenGene · 2023-06-03T12:01:02Z

@cblmemo Great job! I have tested it on A100. 1024x1024x1024, it could achieve the same level of cutlass. If we could add more mma variants, maybe we could achieve better result.

@FrozenGene Sure. The m16n8k16 and fp32 accumulator are WIP now 🧐

@cblmemo Sounds great! Also consider supporting signed (and unsigned ) i8 * i8 -> i32 (accumulator)(m16n8k32/m8n8k16), which is commonly used in the quantized model. If we have this, we could do more interesting benchmark!

Hzfengsy

LGTM

Hzfengsy · 2023-06-26T05:30:25Z

Please take another look if you are interested @spectrometerHBH @masahi @vinx13 @FrozenGene @junrushao

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

Hzfengsy · 2023-06-28T01:14:17Z

Thanks @cblmemo for the excellent work, together with the reviews from @spectrometerHBH and @FrozenGene

cblmemo force-pushed the mma-auto branch from d5c3e79 to b1ea034 Compare April 19, 2023 19:05

masahi reviewed Apr 19, 2023

View reviewed changes

cblmemo marked this pull request as ready for review June 1, 2023 07:32

cblmemo changed the title ~~[WIP][MetaSchedule] Introduce MMA Tensor Core Multilevel Tiling~~ [MetaSchedule] Introduce MMA Tensor Core Multilevel Tiling Jun 1, 2023

cblmemo force-pushed the mma-auto branch from 86c14be to a445c53 Compare June 2, 2023 09:22

cblmemo force-pushed the mma-auto branch from b2bf721 to ed60c84 Compare June 23, 2023 12:18

Hzfengsy reviewed Jun 25, 2023

View reviewed changes

Comment thread include/tvm/tir/transform.h Outdated

Comment thread python/tvm/tir/schedule/schedule.py

Comment thread python/tvm/tir/tensor_intrin/cuda.py Outdated

Comment thread src/driver/driver_api.cc

cblmemo force-pushed the mma-auto branch 2 times, most recently from 996063c to 8ad16d9 Compare June 25, 2023 04:05

mma auto tensorize with permuted layout

f9f99f4

cblmemo force-pushed the mma-auto branch from 8ad16d9 to f9f99f4 Compare June 25, 2023 08:10

flow-matic-brewster added 3 commits June 25, 2023 10:29

add unittest

8db7eed

update unittest

79dd5dd

add comment

872332c

Hzfengsy approved these changes Jun 26, 2023

View reviewed changes

Comment thread src/tir/transforms/inject_permuted_layout.cc Outdated

Comment thread src/tir/transforms/inject_permuted_layout.cc Outdated

cblmemo and others added 5 commits June 26, 2023 13:55

Update src/tir/transforms/inject_permuted_layout.cc

c74a085

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

Update src/tir/transforms/inject_permuted_layout.cc

527eeca

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

modify unittest & format

1afe8fc

fix lint

4e60280

fix lint

0e96449

spectrometerHBH approved these changes Jun 26, 2023

View reviewed changes

flow-matic-brewster added 3 commits June 26, 2023 17:25

fix lint

d654e6b

fix lint

7f15891

add tune test & reformat test

a1f8543

FrozenGene approved these changes Jun 27, 2023

View reviewed changes

Hzfengsy merged commit c8f5595 into apache:main Jun 28, 2023

ysh329 mentioned this pull request Jul 12, 2023

[Release] v0.13.0 Release Candidate Notes #15295

Closed

twmht mentioned this pull request Jul 28, 2023

[Bug] InternalError: Check failed: (loops.size() == 3 || !state->is_mma) is false: The MMA tensor core only supports SSR loops now #15426

Closed

cblmemo deleted the mma-auto branch November 19, 2023 09:23

Conversation

cblmemo commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tvm-bot commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masahi Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

FrozenGene commented Jun 2, 2023

Uh oh!

cblmemo commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FrozenGene commented Jun 2, 2023

Uh oh!

FrozenGene commented Jun 3, 2023

Uh oh!

cblmemo commented Jun 3, 2023

Uh oh!

cblmemo commented Jun 3, 2023

Uh oh!

FrozenGene commented Jun 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hzfengsy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Hzfengsy commented Jun 26, 2023

Uh oh!

Hzfengsy commented Jun 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cblmemo commented Apr 19, 2023 •

edited

Loading

tvm-bot commented Apr 19, 2023 •

edited

Loading

cblmemo commented Jun 2, 2023 •

edited

Loading