Skip to content

[cuda.cooperative] Support multidimensional thread blocks in block load/store and improve load/store docs#3161

Merged
bernhardmgruber merged 3 commits intoNVIDIA:mainfrom
brycelelbach:pr/cuda.cooperative/block_load_store_multidimensional_blocks
Mar 3, 2025
Merged

[cuda.cooperative] Support multidimensional thread blocks in block load/store and improve load/store docs#3161
bernhardmgruber merged 3 commits intoNVIDIA:mainfrom
brycelelbach:pr/cuda.cooperative/block_load_store_multidimensional_blocks

Conversation

@brycelelbach
Copy link
Contributor

@brycelelbach brycelelbach commented Dec 13, 2024

Description

This PR changes cuda.cooperative.block.load and cuda.cooperative.block.store to accept a multi-dimensional block shape.

Checklist

  • Implementation.
  • Add tests.
  • Add docs.

@brycelelbach brycelelbach requested a review from a team as a code owner December 13, 2024 19:48
@brycelelbach brycelelbach requested a review from shwina December 13, 2024 19:48
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 13, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@brycelelbach brycelelbach force-pushed the pr/cuda.cooperative/block_load_store_multidimensional_blocks branch 2 times, most recently from f5773e3 to ae7b656 Compare February 22, 2025 17:36
@brycelelbach
Copy link
Contributor Author

@shwina @tpn @gevtushenko Please review and merge.

@brycelelbach brycelelbach force-pushed the pr/cuda.cooperative/block_load_store_multidimensional_blocks branch from a885984 to 232b226 Compare March 2, 2025 13:09
@brycelelbach brycelelbach changed the title [cuda.cooperative] Support multidimensional thread blocks in block load/store [cuda.cooperative] Support multidimensional thread blocks in block load/store and improve load/store docs Mar 2, 2025
load/store example and fix the return types of block load/store in the docs.
@bernhardmgruber
Copy link
Contributor

/ok to test

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2025

🟩 CI finished in 58m 40s: Pass: 100%/1 | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
  • 🟩 python: Pass: 100%/1 | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 58m 40s | Avg: 58m 40s | Max: 58m 40s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber bernhardmgruber merged commit 1852d12 into NVIDIA:main Mar 3, 2025
15 of 18 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Mar 3, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
…ad/store and improve load/store docs (NVIDIA#3161)

* [cuda.cooperative] Support multidimensional thread blocks in block load/store
* [cuda.cooperative] Add tests for multidimensional block loads and stores and add
documentation for block loads and stores.
* [cuda.cooperative] Remove an unnecessary synchronization from the block
load/store example and fix the return types of block load/store in the docs.
@brycelelbach brycelelbach deleted the pr/cuda.cooperative/block_load_store_multidimensional_blocks branch November 3, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants