Move transform kernels to NVRTC compilable header#3875
Conversation
aa66915 to
ff8311f
Compare
🟩 CI finished in 1h 36m: Pass: 100%/93 | Total: 2d 15h | Avg: 41m 11s | Max: 1h 13m | Hits: 66%/134095
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
fa9003d to
f2fa182
Compare
| #include <cuda/std/__type_traits/conjunction.h> | ||
|
|
||
| #include <type_traits> | ||
| #include <cuda/std/type_traits> |
There was a problem hiding this comment.
Q: I assume we included only one internal type trait header (the line above) for a reason? Do we want everything here?
There was a problem hiding this comment.
I simply blindly replaced #include <type_traits> with #include <cuda/std/type_traits> to ensure this is compilable by NVRTC. Hadn't dug into the actual usage of it.
My guess is we do need this for things like ::cuda::std::is_trivially_copyable
🟩 CI finished in 1h 37m: Pass: 100%/93 | Total: 2d 16h | Avg: 41m 46s | Max: 1h 18m | Hits: 62%/133925
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
bernhardmgruber
left a comment
There was a problem hiding this comment.
I think you also want to move the following functions to the kernel header:
make_iterator_kernel_arg
make_aligned_base_ptr_kernel_arg
needs_aligned_ptr_v
select_kernel_arg
transform_kernel
And I think this does not need to be in the kernel header, since it's only used for the setup code:
cuda_expected
Thanks! My initial understanding was that we would have to replicate things like I'll go ahead and place these in |
f2fa182 to
6bff5fe
Compare
🟩 CI finished in 1h 37m: Pass: 100%/93 | Total: 2d 12h | Avg: 39m 16s | Max: 1h 18m | Hits: 74%/133925
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
🟩 CI finished in 1h 51m: Pass: 100%/93 | Total: 2d 11h | Avg: 38m 10s | Max: 1h 17m | Hits: 79%/133925
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
* Move transform kernels to NVRTC compilable header * Update cub/cub/device/dispatch/kernels/transform.cuh Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
Description
Closes #3874
Checklist