Skip to content

[Hopper TMA] Add CUDA codegen support for bulk asynchronous copy#15656

Merged
masahi merged 2 commits intoapache:mainfrom
adstraw:straw-cp-async-bulk
Sep 5, 2023
Merged

[Hopper TMA] Add CUDA codegen support for bulk asynchronous copy#15656
masahi merged 2 commits intoapache:mainfrom
adstraw:straw-cp-async-bulk

Conversation

@adstraw
Copy link
Copy Markdown
Contributor

@adstraw adstraw commented Sep 1, 2023

Adds CUDA codegen support for bulk asynchronous copy which are new instructions for Hopper. Also includes some cleanup of PR #15616 in the form of comments and tests. Notably this PR does not include any TIR transform work for lowering to new bulk asynchronous copy instructions; this will come in a future PR. Also note the "workaround" and TODO regarding lack of CUDA codegen support for allocation alignment.

Comment thread python/tvm/tir/op.py Outdated
Comment thread include/tvm/tir/builtin.h
@masahi masahi merged commit d26fdcf into apache:main Sep 5, 2023
@adstraw adstraw deleted the straw-cp-async-bulk branch September 6, 2023 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants