[KVCache] Fix attention kernel for ROCm by MasterJH5574 · Pull Request #16551 · apache/tvm

MasterJH5574 · 2024-02-09T21:53:22Z

When compiling a TensorIR function to ROCm backends, we need to be careful to store and load local registers in a consistent approach (either using vector access or scalar access), or otherwise there will be correctness issue.

This PR fixes the attention kernel for ROCm.
Meanwhile, it adds tests for float32 dtype and head dim other than 128.

When compiling a TensorIR function to ROCm backends, we need to be careful to store and load local registers in a consistent approach (either using vector access or scalar access), or otherwise there will be correctness issue. This PR fixes the attention kernel for ROCm. Meanwhile, it adds tests for float32 dtype and head dim other than 128.

tqchen approved these changes Feb 11, 2024

View reviewed changes

tqchen merged commit 0449a16 into apache:main Feb 11, 2024

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Fix attention kernel for ROCm#16551

[KVCache] Fix attention kernel for ROCm#16551
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2024-02-09-rocm-kv-cache

MasterJH5574 commented Feb 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MasterJH5574 commented Feb 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants