Skip to content

Cherry-pick RDNA1 fix into 6.1 release#1916

Closed
GZGavinZhao wants to merge 1 commit intoROCm:release/rocm-rel-6.1from
GZGavinZhao:rocm-rel-6.1-RDNA1-fix
Closed

Cherry-pick RDNA1 fix into 6.1 release#1916
GZGavinZhao wants to merge 1 commit intoROCm:release/rocm-rel-6.1from
GZGavinZhao:rocm-rel-6.1-RDNA1-fix

Conversation

@GZGavinZhao
Copy link
Contributor

This PR requests to cherry-pick #1897 into ROCm 6.1 release.

#1897 enables RDNA1 (gfx101*) users to run rocBLAS with the rocblas package from AMD's official repository. Since rocBLAS is almost ubiquitous in ML workflows, this means that all RDNA1 users are essentially unable to run any programs related to ML right now (ROCm/ROCm#2527).

Admittedly, RDNA1 GPUs are not officially supported, but this is such a small change that would benefit all users of RDNA1 GPUs, so I believe is very worth it.

This is a NFC for all architectures besides gfx1010.

#1897 requires #1888 to function properly, but it seems like #1888 is already cherry-picked into the release/rocm-rel-6.1 branch in #1905, so cherry-picking #1897 would not break anything.

Fixes ROCm#1757.

Enables architectures that don't have optimized logic files to also produce
libraries when `--separate-architectures` or `--lazy-library-loading` is
turned on. Previously, one must disable both of these two flags in order for
rocBLAS to run on architectures like `gfx1010`.

Test plan:
```
cmake -GNinja -B build -S . \
    -DCMAKE_C_COMPILER=hipcc \
    -DCMAKE_CXX_COMPILER=hipcc \
    -DBUILD_CLIENTS_TESTS=OFF \
    -DBUILD_CLIENTS_BENCHMARKS=OFF \
    -DBUILD_CLIENTS_SAMPLES=OFF \
    -DBUILD_TESTING=OFF \
    -DBUILD_WITH_TENSILE=ON \
    -DTensile_PRINT_DEBUG=ON \
    -DTensile_LIBRARY_FORMAT=msgpack \
    -DTensile_CPU_THREADS=14 \
    -DTensile_LAZY_LIBRARY_LOADING=ON \
    -DAMDGPU_TARGETS="..."
```
With `AMDGPU_TARGETS` being one of the following
- `AMDGPU_TARGETS=gfx1010`
- `AMDGPU_TARGETS=gfx1030;gfx1010`
- `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102`

In all three cases,
`$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced
and all other `*.dat` files remain unchanged.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
@nakajee
Copy link
Contributor

nakajee commented Apr 25, 2024

Rocm6.1 release branch is already finalized and no further release plan for 6.1.
This will be included in rocm 6.2.

@GZGavinZhao
Copy link
Contributor Author

@nakajee Thank you for your quick response. Would you mind if I create a PR to release-staging/rocm-rel-6.2, or will someone from AMD cherry-pick this internally?

@GZGavinZhao
Copy link
Contributor Author

Nevermind, I see this is already included in release-staging/rocm-rel-6.2. Thank you!

@GZGavinZhao GZGavinZhao deleted the rocm-rel-6.1-RDNA1-fix branch April 25, 2024 20:32
@cgmb
Copy link
Contributor

cgmb commented Apr 25, 2024

Rocm6.1 release branch is already finalized and no further release plan for 6.1.

To be clear, there is a ROCm 6.1.1 release planned. Though, perhaps there are no changes planned for Tensile specifically.

@GZGavinZhao, it is not a simple process to cherry-pick a change into a patch release. There is an internal change management process for cherry-picks. Unfortunately, it is mostly handled manually and it is very time-consuming. And even a small change in Tensile requires retesting a large suite of libraries and applications for performance and correctness.

This is a great PR and it highlights some of the difficulties with the existing cherry-pick process. I'm hopeful we can improve to the point that a PR like this could be accepted in the future.

@GZGavinZhao
Copy link
Contributor Author

@cgmb I see. I understand that the cherry-picking process may require extensive testing, hence this PR is more of a request to cherry-pick. I thought that maybe some Tensile changes are planned for the next patch release, so if this change can be considered together that'd be great, but releasing in ROCm 6.2 is totally fine as well. Anyway, Thank you so much for your explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants