Skip to content

Add test_reduce_api.py::test_reduce_struct_type_minmax#3938

Merged
oleksandr-pavlyk merged 5 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:add-reduce-minmax-example
Feb 25, 2025
Merged

Add test_reduce_api.py::test_reduce_struct_type_minmax#3938
oleksandr-pavlyk merged 5 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:add-reduce-minmax-example

Conversation

@oleksandr-pavlyk
Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk commented Feb 25, 2025

This adds an example of using struct that holds minimum and maximum values to compute smallest and largest values of an array using a single call cub::DeviceReduce

It also demonstrates a use of TransformedIterator to map input array values as v -> MinMax(v,v) to make reduction binary operator commutative, to render data-parallel algorithm applicable. Since this is done in registers, each element of input array is only accessed once.

Docstring of gpu_struct is modified to reference newly added example.

Description

closes gh-3937

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

This adds an example of using struct that holds minimum and maximum values
to compute smallest and largest values of an array using a single call
cub::DeviceReduce

It also demonstrates a use of TransformedIterator to map input array
values as v -> MinMax(v,v) to make reduction binary operator commutative,
to render data-parallel algorithm applicable. Since this is done in
registers, each element of input array is only accessed once.
@oleksandr-pavlyk oleksandr-pavlyk requested a review from a team as a code owner February 25, 2025 18:16
@github-actions
Copy link
Contributor

🟥 CI finished in 7m 40s: Pass: 0%/1 | Total: 7m 40s | Avg: 7m 40s | Max: 7m 40s
  • 🟥 python: Pass: 0%/1 | Total: 7m 40s | Avg: 7m 40s | Max: 7m 40s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@github-actions
Copy link
Contributor

🟩 CI finished in 50m 06s: Pass: 100%/1 | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
  • 🟩 python: Pass: 100%/1 | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 50m 06s | Avg: 50m 06s | Max: 50m 06s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@github-actions
Copy link
Contributor

🟩 CI finished in 50m 53s: Pass: 100%/1 | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
  • 🟩 python: Pass: 100%/1 | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@oleksandr-pavlyk oleksandr-pavlyk merged commit 29d14d4 into NVIDIA:main Feb 25, 2025
15 of 18 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the add-reduce-minmax-example branch February 25, 2025 22:46
davebayer pushed a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
* Add test_reduce_api.py::test_reduce_struct_type_minmax

This adds an example of using struct that holds minimum and maximum values
to compute smallest and largest values of an array using a single call
cub::DeviceReduce

It also demonstrates a use of TransformedIterator to map input array
values as v -> MinMax(v,v) to make reduction binary operator commutative,
to render data-parallel algorithm applicable. Since this is done in
registers, each element of input array is only accessed once.

* Refined docstring text

* Change reduce-minmax to compute mininum/maximum absolute values of an array

* Replace embedding_op to transform_op, add comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[DOC]: Add cuda_parallel example of using reduce_into to compute minimum and maximum values of an array at once

2 participants