Merged
Conversation
Also avoid recomputing cccl_value of init in both segmented_reduce and in reduce
1. Include np.complex64 2. Device output size in a variable and reuse it to avoid repeated occurrances of literal values 3. Generate real/imag values for complex arrays in a single call to sampling function for efficiency 4. Change range of generated integral arrays based on the signness of the integral data type. For unsigned types we continue to sample in interval [0, 10), for signed we sample from [-5, 5].
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…e base Additionally, changed the __hash__ of IteratorKind to mix the hash of its value with hash of self.__class__.
This is used to advance a given iterator `it` the `offset` steps without running into multiple definitions of the advance/derefence methods.
This calls IteratorBase.__add__ to produce an iterator whose state is advanced by 1, but which shares the same advance/dereference methods.
Contributor
🟩 CI finished in 40m 15s: Pass: 100%/1 | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
73e2154 to
ed864d7
Compare
Contributor
🟩 CI finished in 40m 27s: Pass: 100%/1 | Total: 40m 27s | Avg: 40m 27s | Max: 40m 27s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
rwgk
approved these changes
Feb 25, 2025
python/cuda_parallel/cuda/parallel/experimental/algorithms/segmented_reduce.py
Outdated
Show resolved
Hide resolved
python/cuda_parallel/cuda/parallel/experimental/iterators/_iterators.py
Outdated
Show resolved
Hide resolved
python/cuda_parallel/cuda/parallel/experimental/iterators/_iterators.py
Outdated
Show resolved
Hide resolved
python/cuda_parallel/cuda/parallel/experimental/iterators/_iterators.py
Outdated
Show resolved
Hide resolved
Also make generation of complex array in test_reduce.py more efficient by genering real and imaginary components in a single call to np.random.random instead of using two calls.
These were only defined for TransformIterator and AdvancedIterator classes, but not for other classes. Implemented review suggestion to type type(self) instead of self.__class__
…cumulation For short range data types we take a small slice of the input array to avoid running into the overflow problem. This works because input_array fixture samples from uniform discrete distribution with small upper range (8), hence using 31 uint8 elements can run up to 31 * 7 = 217 ( < 255) and fits in the type.
Contributor
🟩 CI finished in 40m 55s: Pass: 100%/1 | Total: 40m 55s | Avg: 40m 55s | Max: 40m 55s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
rwgk
approved these changes
Feb 26, 2025
This finds compute capability and include paths and appends them to the algorithm-specific arguments. Used the utility in segmented_reduce.
shwina
reviewed
Feb 26, 2025
Contributor
🟩 CI finished in 52m 20s: Pass: 100%/1 | Total: 52m 20s | Avg: 52m 20s | Max: 52m 20s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
Contributor
🟩 CI finished in 51m 12s: Pass: 100%/1 | Total: 51m 12s | Avg: 51m 12s | Max: 51m 12s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
Contributor
🟩 CI finished in 52m 35s: Pass: 100%/1 | Total: 52m 35s | Avg: 52m 35s | Max: 52m 35s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 1)
| # | Runner |
|---|---|
| 1 | linux-amd64-gpu-rtx2080-latest-1 |
shwina
approved these changes
Feb 28, 2025
Contributor
shwina
left a comment
There was a problem hiding this comment.
Looks great - thank you, Sasha!
oleksandr-pavlyk
added a commit
to oleksandr-pavlyk/cccl
that referenced
this pull request
Feb 28, 2025
oleksandr-pavlyk
added a commit
that referenced
this pull request
Mar 11, 2025
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Mar 12, 2025
bernhardmgruber
pushed a commit
to bernhardmgruber/cccl
that referenced
this pull request
Mar 13, 2025
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Apr 7, 2025
* Add algorithms.segmented_reduce Python API Also avoid recomputing cccl_value of init in both segmented_reduce and in reduce * Change to input_array fixture 1. Include np.complex64 2. Device output size in a variable and reuse it to avoid repeated occurrances of literal values 3. Generate real/imag values for complex arrays in a single call to sampling function for efficiency 4. Change range of generated integral arrays based on the signness of the integral data type. For unsigned types we continue to sample in interval [0, 10), for signed we sample from [-5, 5]. * Corrected docstring of segmented_reduce function * Add initial tests for segmented_reduce * Improve readability of test_segmented_reduce_api example * TransformIteratorKind need not override __eq__/__hash__ methods of the base Additionally, changed the __hash__ of IteratorKind to mix the hash of its value with hash of self.__class__. * Add AdvancedIterator(it, offset=1) function This is used to advance a given iterator `it` the `offset` steps without running into multiple definitions of the advance/derefence methods. * Add example for summing rows of a matrix using segmented_reduce * Implement IteratorBase.__add__(self, offset : int) using make_advanced_iterator * Use end_offsets = start_offsets + 1 This calls IteratorBase.__add__ to produce an iterator whose state is advanced by 1, but which shares the same advance/dereference methods. * Add a test for segmented_reduce on gpu_struct * Change hash of transform iterator to mix its kind * Rename variable n to sample_size Also make generation of complex array in test_reduce.py more efficient by genering real and imaginary components in a single call to np.random.random instead of using two calls. * Remove __hash__ and __eq__ special methods from some iterator classes These were only defined for TransformIterator and AdvancedIterator classes, but not for other classes. Implemented review suggestion to type type(self) instead of self.__class__ * Tweak test_scan_array_input to avoid integer overflows during host accumulation For short range data types we take a small slice of the input array to avoid running into the overflow problem. This works because input_array fixture samples from uniform discrete distribution with small upper range (8), hence using 31 uint8 elements can run up to 31 * 7 = 217 ( < 255) and fits in the type. * Add cccl.set_cccl_iterator_state utility function and use in segmented_reduce.py * Introduce _bindings.call_build utility This finds compute capability and include paths and appends them to the algorithm-specific arguments. Used the utility in segmented_reduce. * Make call_build take *args, **kwargs
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Apr 7, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
closes gh-3715
This PR adds Python API for
segmented_reducealgorithm.Checklist