-
Notifications
You must be signed in to change notification settings - Fork 359
Add segmented_reduce python api #3906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
shwina
merged 21 commits into
NVIDIA:main
from
oleksandr-pavlyk:add-segmented-reduce-python-api
Feb 28, 2025
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
9672a22
Add algorithms.segmented_reduce Python API
oleksandr-pavlyk dfe317a
Change to input_array fixture
oleksandr-pavlyk ae9ee6f
Corrected docstring of segmented_reduce function
oleksandr-pavlyk ad3b103
Add initial tests for segmented_reduce
oleksandr-pavlyk 6937a17
Improve readability of test_segmented_reduce_api example
oleksandr-pavlyk 2753cf5
TransformIteratorKind need not override __eq__/__hash__ methods of th…
oleksandr-pavlyk 5c0ce63
Add AdvancedIterator(it, offset=1) function
oleksandr-pavlyk bb10d46
Add example for summing rows of a matrix using segmented_reduce
oleksandr-pavlyk 799267a
Implement IteratorBase.__add__(self, offset : int) using make_advance…
oleksandr-pavlyk 57fed46
Use end_offsets = start_offsets + 1
oleksandr-pavlyk b96e9e2
Add a test for segmented_reduce on gpu_struct
oleksandr-pavlyk c651a67
Merge branch 'main' into add-segmented-reduce-python-api
oleksandr-pavlyk ed864d7
Change hash of transform iterator to mix its kind
oleksandr-pavlyk 2a83978
Rename variable n to sample_size
oleksandr-pavlyk 15a3012
Remove __hash__ and __eq__ special methods from some iterator classes
oleksandr-pavlyk 08cbd94
Tweak test_scan_array_input to avoid integer overflows during host ac…
oleksandr-pavlyk d6d39fa
Add cccl.set_cccl_iterator_state utility function and use in segmente…
oleksandr-pavlyk 13d8d19
Introduce _bindings.call_build utility
oleksandr-pavlyk 9f65dee
Merge branch 'main' into add-segmented-reduce-python-api
oleksandr-pavlyk ecfca41
Make call_build take *args, **kwargs
oleksandr-pavlyk af23cca
Merge branch 'main' into add-segmented-reduce-python-api
oleksandr-pavlyk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
169 changes: 169 additions & 0 deletions
169
python/cuda_parallel/cuda/parallel/experimental/algorithms/segmented_reduce.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| import ctypes | ||
| from typing import Callable | ||
|
|
||
| import numba | ||
| import numpy as np | ||
| from numba.cuda.cudadrv import enums | ||
|
|
||
| from .. import _cccl as cccl | ||
| from .._bindings import call_build, get_bindings | ||
| from .._caching import CachableFunction, cache_with_key | ||
| from .._utils import protocols | ||
| from ..iterators._iterators import IteratorBase | ||
| from ..typing import DeviceArrayLike, GpuStruct | ||
|
|
||
|
|
||
| class _SegmentedReduce: | ||
| def __del__(self): | ||
| if self.build_result is None: | ||
| return | ||
| bindings = get_bindings() | ||
| bindings.cccl_device_segmented_reduce_cleanup(ctypes.byref(self.build_result)) | ||
|
|
||
| def __init__( | ||
| self, | ||
| d_in: DeviceArrayLike | IteratorBase, | ||
| d_out: DeviceArrayLike, | ||
| start_offsets_in: DeviceArrayLike | IteratorBase, | ||
| end_offsets_in: DeviceArrayLike | IteratorBase, | ||
| op: Callable, | ||
| h_init: np.ndarray | GpuStruct, | ||
| ): | ||
| self.build_result = None | ||
| self.d_in_cccl = cccl.to_cccl_iter(d_in) | ||
| self.d_out_cccl = cccl.to_cccl_iter(d_out) | ||
| self.start_offsets_in_cccl = cccl.to_cccl_iter(start_offsets_in) | ||
| self.end_offsets_in_cccl = cccl.to_cccl_iter(end_offsets_in) | ||
| self.h_init_cccl = cccl.to_cccl_value(h_init) | ||
| if isinstance(h_init, np.ndarray): | ||
| value_type = numba.from_dtype(h_init.dtype) | ||
| else: | ||
| value_type = numba.typeof(h_init) | ||
| sig = (value_type, value_type) | ||
| self.op_wrapper = cccl.to_cccl_op(op, sig) | ||
| self.build_result = cccl.DeviceSegmentedReduceBuildResult() | ||
| self.bindings = get_bindings() | ||
| error = call_build( | ||
| self.bindings.cccl_device_segmented_reduce_build, | ||
| ctypes.byref(self.build_result), | ||
| self.d_in_cccl, | ||
| self.d_out_cccl, | ||
| self.start_offsets_in_cccl, | ||
| self.end_offsets_in_cccl, | ||
| self.op_wrapper, | ||
| self.h_init_cccl, | ||
| ) | ||
| if error != enums.CUDA_SUCCESS: | ||
| raise ValueError("Error building reduce") | ||
|
|
||
| def __call__( | ||
| self, | ||
| temp_storage, | ||
| d_in, | ||
| d_out, | ||
| num_segments: int, | ||
| start_offsets_in, | ||
| end_offsets_in, | ||
| h_init, | ||
| stream=None, | ||
| ): | ||
| set_state_fn = cccl.set_cccl_iterator_state | ||
| set_state_fn(self.d_in_cccl, d_in) | ||
| set_state_fn(self.d_out_cccl, d_out) | ||
| set_state_fn(self.start_offsets_in_cccl, start_offsets_in) | ||
| set_state_fn(self.end_offsets_in_cccl, end_offsets_in) | ||
| self.h_init_cccl.state = h_init.__array_interface__["data"][0] | ||
|
|
||
| stream_handle = protocols.validate_and_get_stream(stream) | ||
|
|
||
| if temp_storage is None: | ||
| temp_storage_bytes = ctypes.c_size_t() | ||
| d_temp_storage = None | ||
| else: | ||
| temp_storage_bytes = ctypes.c_size_t(temp_storage.nbytes) | ||
| d_temp_storage = protocols.get_data_pointer(temp_storage) | ||
|
|
||
| error = self.bindings.cccl_device_segmented_reduce( | ||
| self.build_result, | ||
| ctypes.c_void_p(d_temp_storage), | ||
| ctypes.byref(temp_storage_bytes), | ||
| self.d_in_cccl, | ||
| self.d_out_cccl, | ||
| ctypes.c_ulonglong(num_segments), | ||
| self.start_offsets_in_cccl, | ||
| self.end_offsets_in_cccl, | ||
| self.op_wrapper, | ||
| self.h_init_cccl, | ||
| ctypes.c_void_p(stream_handle), | ||
| ) | ||
|
|
||
| if error != enums.CUDA_SUCCESS: | ||
| raise ValueError("Error reducing") | ||
|
|
||
| return temp_storage_bytes.value | ||
|
|
||
|
|
||
| def _to_key(d_in: DeviceArrayLike | IteratorBase): | ||
| "Return key for an input array-like argument or an iterator" | ||
| d_in_key = ( | ||
| d_in.kind if isinstance(d_in, IteratorBase) else protocols.get_dtype(d_in) | ||
| ) | ||
| return d_in_key | ||
|
|
||
|
|
||
| def make_cache_key( | ||
| d_in: DeviceArrayLike | IteratorBase, | ||
| d_out: DeviceArrayLike, | ||
| start_offsets_in: DeviceArrayLike | IteratorBase, | ||
| end_offsets_in: DeviceArrayLike | IteratorBase, | ||
| op: Callable, | ||
| h_init: np.ndarray, | ||
| ): | ||
| d_in_key = _to_key(d_in) | ||
| d_out_key = protocols.get_dtype(d_out) | ||
| start_offsets_in_key = _to_key(start_offsets_in) | ||
| end_offsets_in_key = _to_key(end_offsets_in) | ||
| op_key = CachableFunction(op) | ||
| h_init_key = h_init.dtype | ||
| return ( | ||
| d_in_key, | ||
| d_out_key, | ||
| start_offsets_in_key, | ||
| end_offsets_in_key, | ||
| op_key, | ||
| h_init_key, | ||
| ) | ||
|
|
||
|
|
||
| @cache_with_key(make_cache_key) | ||
| def segmented_reduce( | ||
| d_in: DeviceArrayLike | IteratorBase, | ||
| d_out: DeviceArrayLike, | ||
| start_offsets_in: DeviceArrayLike | IteratorBase, | ||
| end_offsets_in: DeviceArrayLike | IteratorBase, | ||
| op: Callable, | ||
| h_init: np.ndarray, | ||
| ): | ||
| """Computes a device-wide segmented reduction using the specified binary ``op`` and initial value ``init``. | ||
|
|
||
| Example: | ||
| Below, ``segmented_reduce`` is used to compute the minimum value of a sequence of integers. | ||
|
|
||
| .. literalinclude:: ../../python/cuda_parallel/tests/test_segmented_reduce_api.py | ||
| :language: python | ||
| :dedent: | ||
| :start-after: example-begin segmented-reduce-min | ||
| :end-before: example-end segmented-reduce-min | ||
|
|
||
| Args: | ||
| d_in: Device array or iterator containing the input sequence of data items | ||
| d_out: Device array that will store the result of the reduction | ||
| start_offsets_in: Device array or iterator containing offsets to start of segments | ||
| end_offsets_in: Device array or iterator containing offsets to end of segments | ||
| op: Callable representing the binary operator to apply | ||
| init: Numpy array storing initial value of the reduction | ||
|
|
||
| Returns: | ||
| A callable object that can be used to perform the reduction | ||
| """ | ||
| return _SegmentedReduce(d_in, d_out, start_offsets_in, end_offsets_in, op, h_init) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.