Skip to content

Move transform kernels to NVRTC compilable header#3875

Merged
shwina merged 2 commits intoNVIDIA:mainfrom
shwina:nvrtc-compilable-transform-headers
Feb 21, 2025
Merged

Move transform kernels to NVRTC compilable header#3875
shwina merged 2 commits intoNVIDIA:mainfrom
shwina:nvrtc-compilable-transform-headers

Conversation

@shwina
Copy link
Contributor

@shwina shwina commented Feb 20, 2025

Description

Closes #3874

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@shwina shwina requested review from a team as code owners February 20, 2025 13:02
@shwina shwina force-pushed the nvrtc-compilable-transform-headers branch from aa66915 to ff8311f Compare February 20, 2025 13:15
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 36m: Pass: 100%/93 | Total: 2d 15h | Avg: 41m 11s | Max: 1h 13m | Hits: 66%/134095
  • 🟩 cub: Pass: 100%/45 | Total: 1d 15h | Avg: 52m 36s | Max: 1h 13m | Hits: 70%/53665

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 13h | Avg: 52m 17s | Max:  1h 13m | Hits:  70%/51227 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 22s | Max: 59m 52s | Hits:  68%/2438  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 49m | Avg: 57m 48s | Max:  1h 03m | Hits:  59%/5928  
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  68%/2256  
      🟩 12.8               Pass: 100%/38  | Total:  1d 08h | Avg: 51m 05s | Max:  1h 13m | Hits:  72%/45481 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  75%/2108  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 49m | Avg: 57m 48s | Max:  1h 03m | Hits:  59%/5928  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  68%/2256  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 06h | Avg: 50m 30s | Max:  1h 13m | Hits:  71%/43373 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  75%/2108  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 13h | Avg: 52m 11s | Max:  1h 13m | Hits:  70%/51557 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 49m | Avg: 57m 18s | Max: 59m 53s | Hits:  69%/4884  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 44s | Max: 58m 14s | Hits:  68%/2438  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 57s | Max: 59m 20s | Hits:  68%/2438  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 10s | Max: 58m 34s | Hits:  68%/2438  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 44m | Avg: 49m 12s | Max:  1h 02m | Hits:  79%/8203  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 49m | Avg: 54m 50s | Max: 55m 02s | Hits:  68%/2442  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  68%/1221  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 48s | Max: 59m 35s | Hits:  68%/2442  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 29s | Max:  1h 02m | Hits:  68%/2442  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 05s | Max: 58m 38s | Hits:  68%/2438  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 50m | Avg: 55m 04s | Max: 55m 43s | Hits:  68%/2438  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 46m | Avg: 36m 58s | Max:  1h 10m | Hits:  85%/13409 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 13m | Hits:  14%/2088  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 13m | Hits:  14%/2088  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  68%/2256  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 17m | Avg: 53m 57s | Max:  1h 02m | Hits:  73%/20401 
      🟩 GCC                Pass: 100%/22  | Total: 17h 13m | Avg: 46m 58s | Max:  1h 10m | Hits:  77%/26832 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 39m | Avg:  1h 09m | Max:  1h 13m | Hits:  14%/4176  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  68%/2256  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 12m | Avg: 24m 17s | Max: 26m 12s | Hits:  89%/3657  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 13m | Hits:  63%/40256 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 08m | Avg: 31m 02s | Max:  1h 02m | Hits:  92%/9752  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 17s | Max:  1h 13m | Hits:  63%/43913 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 12s | Avg: 21m 12s | Max: 21m 12s | Hits:  99%/1219  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 42s | Avg: 16m 42s | Max: 16m 42s | Hits:  99%/1219  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 08m | Avg: 22m 59s | Max: 23m 45s | Hits:  99%/3657  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 17s | Max: 23m 26s | Hits:  99%/3657  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 12m | Avg: 24m 17s | Max: 26m 12s | Hits:  89%/3657  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  68%/1219  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 49m | Avg: 59m 28s | Max:  1h 13m | Hits:  61%/23615 
      🟩 20                 Pass: 100%/25  | Total: 19h 37m | Avg: 47m 06s | Max:  1h 13m | Hits:  77%/30050 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 23h 34m | Avg: 31m 26s | Max: 1h 02m | Hits: 63%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 35m 59s | Avg: 17m 59s | Max: 24m 56s | Hits:  78%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 22h 36m | Avg: 31m 32s | Max:  1h 02m | Hits:  64%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 58m 31s | Avg: 29m 15s | Max: 30m 58s | Hits:  57%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 58m | Avg: 35m 37s | Max: 50m 28s | Hits:  60%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  42%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 18h 32m | Avg: 29m 17s | Max:  1h 01m | Hits:  65%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 52m 01s | Avg: 26m 00s | Max: 26m 04s | Hits:  57%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 58m | Avg: 35m 37s | Max: 50m 28s | Hits:  60%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  42%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 17h 40m | Avg: 29m 28s | Max:  1h 01m | Hits:  65%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 01s | Avg: 26m 00s | Max: 26m 04s | Hits:  57%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 22h 42m | Avg: 31m 41s | Max:  1h 02m | Hits:  64%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 24s | Max: 32m 27s | Hits:  61%/7124  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 36s | Max: 33m 00s | Hits:  57%/3562  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 33s | Max: 30m 51s | Hits:  57%/3562  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 00m | Avg: 30m 08s | Max: 30m 15s | Hits:  57%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 38m | Avg: 22m 36s | Max: 31m 09s | Hits:  69%/12467 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 02s | Max: 31m 07s | Hits:  64%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 31m 35s | Avg: 31m 35s | Max: 31m 35s | Hits:  57%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 03s | Max: 32m 19s | Hits:  65%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 03s | Max: 36m 41s | Hits:  57%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 17s | Max: 33m 10s | Hits:  57%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 03s | Max: 36m 21s | Hits:  56%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 31m | Avg: 21m 08s | Max: 35m 29s | Hits:  79%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 50s | Max: 50m 28s | Hits:  51%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 24m | Avg: 48m 14s | Max:  1h 01m | Hits:  49%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  42%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 54m | Avg: 27m 54s | Max: 33m 00s | Hits:  63%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  9h 32m | Avg: 27m 14s | Max: 36m 41s | Hits:  69%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 04m | Avg: 48m 52s | Max:  1h 01m | Hits:  50%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  42%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 19m 28s | Hits:  78%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 19m | Avg: 35m 08s | Max:  1h 02m | Hits:  57%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 43m | Avg: 22m 23s | Max:  1h 01m | Hits:  80%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 22h 04m | Avg: 34m 51s | Max:  1h 02m | Hits:  57%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 38s | Avg: 15m 12s | Max: 29m 39s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 08s | Avg: 11m 02s | Max: 11m 30s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 19m 28s | Hits:  78%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 29m 13s | Avg: 29m 13s | Max: 29m 13s | Hits:  80%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 03m | Avg: 36m 09s | Max:  1h 00m | Hits:  57%/35611 
      🟩 20                 Pass: 100%/23  | Total: 10h 55m | Avg: 28m 29s | Max:  1h 02m | Hits:  68%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 57s | Avg: 6m 28s | Max: 10m 46s | Hits: 98%/294

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 57s | Avg:  6m 28s | Max: 10m 46s | Hits:  98%/294   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s | Hits:  98%/147   
      🟩 Test               Pass: 100%/1   | Total: 10m 46s | Avg: 10m 46s | Max: 10m 46s | Hits:  98%/147   
    
  • 🟩 python: Pass: 100%/1 | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 35m 36s | Avg: 35m 36s | Max: 35m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@shwina shwina force-pushed the nvrtc-compilable-transform-headers branch 2 times, most recently from fa9003d to f2fa182 Compare February 20, 2025 20:16
#include <cuda/std/__type_traits/conjunction.h>

#include <type_traits>
#include <cuda/std/type_traits>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: I assume we included only one internal type trait header (the line above) for a reason? Do we want everything here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply blindly replaced #include <type_traits> with #include <cuda/std/type_traits> to ensure this is compilable by NVRTC. Hadn't dug into the actual usage of it.

My guess is we do need this for things like ::cuda::std::is_trivially_copyable

@github-actions
Copy link
Contributor

🟩 CI finished in 1h 37m: Pass: 100%/93 | Total: 2d 16h | Avg: 41m 46s | Max: 1h 18m | Hits: 62%/133925
  • 🟩 cub: Pass: 100%/45 | Total: 1d 15h | Avg: 52m 49s | Max: 1h 18m | Hits: 69%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 13h | Avg: 52m 22s | Max:  1h 18m | Hits:  69%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  67%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 49m | Avg: 57m 52s | Max:  1h 03m | Hits:  58%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  67%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 08h | Avg: 51m 28s | Max:  1h 18m | Hits:  71%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 57m | Avg: 58m 35s | Max: 59m 13s | Hits:  74%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 49m | Avg: 57m 52s | Max:  1h 03m | Hits:  58%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  67%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 06h | Avg: 51m 05s | Max:  1h 18m | Hits:  71%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 35s | Max: 59m 13s | Hits:  74%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 13h | Avg: 52m 33s | Max:  1h 18m | Hits:  69%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 44m | Avg: 56m 02s | Max: 57m 35s | Hits:  68%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 20s | Max: 59m 27s | Hits:  68%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 18s | Max: 59m 24s | Hits:  68%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 03m | Hits:  68%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 39m | Avg: 48m 26s | Max:  1h 01m | Hits:  79%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 27s | Max: 58m 02s | Hits:  67%/2434  
      🟩 GCC8               Pass: 100%/1   | Total: 54m 50s | Avg: 54m 50s | Max: 54m 50s | Hits:  67%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 34s | Max: 59m 38s | Hits:  67%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 00m | Hits:  67%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 56m | Avg: 58m 16s | Max:  1h 01m | Hits:  67%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 13s | Max:  1h 01m | Hits:  67%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 38m | Avg: 36m 12s | Max:  1h 09m | Hits:  85%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 14m | Hits:  14%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 18m | Hits:  14%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  67%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 17m | Avg: 53m 58s | Max:  1h 03m | Hits:  72%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 17h 19m | Avg: 47m 13s | Max:  1h 09m | Hits:  76%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 18m | Hits:  14%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  67%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 08m | Avg: 22m 52s | Max: 24m 33s | Hits:  89%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 18m | Hits:  62%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 04m | Avg: 30m 34s | Max:  1h 01m | Hits:  91%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 39s | Max:  1h 18m | Hits:  63%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 43s | Avg: 20m 43s | Max: 20m 43s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 40s | Avg: 16m 40s | Max: 16m 40s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 00s | Max: 23m 10s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 59s | Max: 22m 32s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 08m | Avg: 22m 52s | Max: 24m 33s | Hits:  89%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:  67%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 56m | Avg: 59m 49s | Max:  1h 14m | Hits:  61%/23535 
      🟩 20                 Pass: 100%/25  | Total: 19h 40m | Avg: 47m 13s | Max:  1h 18m | Hits:  76%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 00h | Avg: 32m 23s | Max: 1h 04m | Hits: 57%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 19s | Avg: 18m 39s | Max: 26m 13s | Hits:  74%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 23h 19m | Avg: 32m 32s | Max:  1h 04m | Hits:  57%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 58m 47s | Avg: 29m 23s | Max: 31m 06s | Hits:  49%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 07m | Avg: 37m 35s | Max: 59m 02s | Hits:  54%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  32%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 19h 04m | Avg: 30m 07s | Max:  1h 03m | Hits:  58%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 16s | Avg: 26m 38s | Max: 26m 40s | Hits:  49%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 07m | Avg: 37m 35s | Max: 59m 02s | Hits:  54%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  32%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 18h 11m | Avg: 30m 18s | Max:  1h 03m | Hits:  59%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 16s | Avg: 26m 38s | Max: 26m 40s | Hits:  49%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 23h 24m | Avg: 32m 39s | Max:  1h 04m | Hits:  57%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 15s | Max: 32m 31s | Hits:  58%/7124  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 07s | Max: 32m 28s | Hits:  49%/3562  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 23s | Max: 31m 48s | Hits:  49%/3562  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 46s | Max: 32m 31s | Hits:  49%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 44m | Avg: 23m 28s | Max: 34m 54s | Hits:  65%/12467 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 06m | Avg: 33m 00s | Max: 33m 04s | Hits:  56%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 31m 10s | Avg: 31m 10s | Max: 31m 10s | Hits:  49%/1782  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 55s | Max: 33m 17s | Hits:  59%/3564  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 22s | Max: 33m 51s | Hits:  49%/3564  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 15s | Max: 35m 19s | Hits:  49%/3564  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 37s | Max: 33m 40s | Hits:  49%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 33m | Avg: 21m 20s | Max: 34m 51s | Hits:  75%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 46s | Max: 59m 02s | Hits:  31%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 34m | Avg: 51m 21s | Max:  1h 03m | Hits:  38%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  32%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 59m | Avg: 28m 13s | Max: 34m 54s | Hits:  58%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  9h 40m | Avg: 27m 39s | Max: 35m 19s | Hits:  63%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 31m | Avg: 54m 19s | Max:  1h 03m | Hits:  35%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m | Hits:  32%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 46s | Avg: 15m 53s | Max: 21m 03s | Hits:  74%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 55m | Avg: 36m 13s | Max:  1h 04m | Hits:  49%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 50m | Avg: 23m 05s | Max:  1h 03m | Hits:  77%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 22h 50m | Avg: 36m 04s | Max:  1h 04m | Hits:  49%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 43m 45s | Avg: 14m 35s | Max: 28m 53s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 30s | Avg: 10m 52s | Max: 11m 22s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 46s | Avg: 15m 53s | Max: 21m 03s | Hits:  74%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 29m 39s | Avg: 29m 39s | Max: 29m 39s | Hits:  77%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 37m | Avg: 37m 52s | Max:  1h 01m | Hits:  48%/35611 
      🟩 20                 Pass: 100%/23  | Total: 11h 02m | Avg: 28m 49s | Max:  1h 04m | Hits:  62%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 01s | Avg: 7m 30s | Max: 12m 48s | Hits: 98%/304

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 12m 48s | Hits:  98%/304   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s | Hits:  98%/152   
      🟩 Test               Pass: 100%/1   | Total: 12m 48s | Avg: 12m 48s | Max: 12m 48s | Hits:  98%/152   
    
  • 🟩 python: Pass: 100%/1 | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 35m 17s | Avg: 35m 17s | Max: 35m 17s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also want to move the following functions to the kernel header:

make_iterator_kernel_arg
make_aligned_base_ptr_kernel_arg
needs_aligned_ptr_v
select_kernel_arg
transform_kernel

And I think this does not need to be in the kernel header, since it's only used for the setup code:

cuda_expected

@shwina
Copy link
Contributor Author

shwina commented Feb 21, 2025

I think you also want to move the following functions to the kernel header:

Thanks! My initial understanding was that we would have to replicate things like kernel_arg on the c side (as it's templated on the value type). I did sync with @gevtushenko to gain a better understanding, but it's still unclear to me how exactly we'll interface with some of these.

I'll go ahead and place these in transform.cuh and figure out the details later.

@shwina shwina force-pushed the nvrtc-compilable-transform-headers branch from f2fa182 to 6bff5fe Compare February 21, 2025 15:36
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 37m: Pass: 100%/93 | Total: 2d 12h | Avg: 39m 16s | Max: 1h 18m | Hits: 74%/133925
  • 🟩 cub: Pass: 100%/45 | Total: 1d 14h | Avg: 51m 55s | Max: 1h 18m | Hits: 70%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 12h | Avg: 51m 32s | Max:  1h 18m | Hits:  70%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  68%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 48m | Avg: 57m 36s | Max:  1h 03m | Hits:  58%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  68%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 07h | Avg: 50m 20s | Max:  1h 18m | Hits:  71%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 56m | Avg: 58m 06s | Max: 58m 23s | Hits:  74%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 48m | Avg: 57m 36s | Max:  1h 03m | Hits:  58%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  68%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 05h | Avg: 49m 54s | Max:  1h 18m | Hits:  71%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 06s | Max: 58m 23s | Hits:  74%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 13h | Avg: 51m 37s | Max:  1h 18m | Hits:  69%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 47m | Avg: 56m 53s | Max:  1h 00m | Hits:  68%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 50m | Avg: 55m 26s | Max: 55m 38s | Hits:  68%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 50m | Avg: 55m 20s | Max: 57m 04s | Hits:  68%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 32s | Max: 58m 36s | Hits:  68%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 33m | Avg: 47m 34s | Max:  1h 00m | Hits:  79%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 00s | Max: 54m 20s | Hits:  68%/2434  
      🟩 GCC8               Pass: 100%/1   | Total: 59m 05s | Avg: 59m 05s | Max: 59m 05s | Hits:  68%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 51m | Avg: 55m 42s | Max: 56m 58s | Hits:  68%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 32s | Max: 56m 23s | Hits:  68%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 16s | Max: 57m 32s | Hits:  68%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 48s | Max:  1h 00m | Hits:  68%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 35m | Avg: 35m 59s | Max:  1h 04m | Hits:  85%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 10m | Hits:  14%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 18m | Hits:  14%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  68%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 57m | Avg: 52m 46s | Max:  1h 00m | Hits:  72%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 16h 57m | Avg: 46m 15s | Max:  1h 04m | Hits:  76%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 46m | Avg:  1h 11m | Max:  1h 18m | Hits:  14%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  68%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 10m | Avg: 23m 30s | Max: 25m 11s | Hits:  89%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 09h | Avg: 59m 33s | Max:  1h 18m | Hits:  63%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 00m | Avg: 30m 04s | Max: 56m 06s | Hits:  91%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 58m 24s | Max:  1h 18m | Hits:  63%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 46s | Avg: 22m 46s | Max: 22m 46s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 21s | Avg: 18m 21s | Max: 18m 21s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 25s | Max: 23m 53s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 03m | Avg: 21m 17s | Max: 21m 53s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 10m | Avg: 23m 30s | Max: 25m 11s | Hits:  89%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 04m | Avg:  1h 04m | Max:  1h 04m | Hits:  68%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 45m | Avg: 59m 17s | Max:  1h 18m | Hits:  61%/23535 
      🟩 20                 Pass: 100%/25  | Total: 19h 10m | Avg: 46m 00s | Max:  1h 13m | Hits:  76%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 21h 00m | Avg: 28m 00s | Max: 59m 49s | Hits: 76%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 54s | Avg: 20m 57s | Max: 25m 18s | Hits:  74%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 20h 06m | Avg: 28m 04s | Max: 59m 49s | Hits:  76%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 53m 09s | Avg: 26m 34s | Max: 28m 02s | Hits:  77%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 34m | Avg: 30m 59s | Max: 48m 21s | Hits:  72%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  1h 29m | Avg: 44m 45s | Max: 45m 08s | Hits:  65%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 16h 55m | Avg: 26m 43s | Max: 59m 49s | Hits:  78%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 47m 09s | Avg: 23m 34s | Max: 24m 27s | Hits:  77%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 34m | Avg: 30m 59s | Max: 48m 21s | Hits:  72%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 29m | Avg: 44m 45s | Max: 45m 08s | Hits:  65%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 16h 08m | Avg: 26m 54s | Max: 59m 49s | Hits:  78%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 09s | Avg: 23m 34s | Max: 24m 27s | Hits:  77%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 20h 12m | Avg: 28m 12s | Max: 59m 49s | Hits:  76%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 07s | Max: 27m 09s | Hits:  77%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 54m 23s | Avg: 27m 11s | Max: 27m 56s | Hits:  77%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 56m 26s | Avg: 28m 13s | Max: 29m 18s | Hits:  77%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 55m 01s | Avg: 27m 30s | Max: 27m 40s | Hits:  77%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 25m | Avg: 20m 48s | Max: 27m 58s | Hits:  83%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 55m 20s | Avg: 27m 40s | Max: 28m 49s | Hits:  77%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 26m 45s | Avg: 26m 45s | Max: 26m 45s | Hits:  77%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 55m 46s | Avg: 27m 53s | Max: 28m 27s | Hits:  77%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 56m 13s | Avg: 28m 06s | Max: 28m 21s | Hits:  77%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 56m 46s | Avg: 28m 23s | Max: 29m 38s | Hits:  77%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 58m 23s | Avg: 29m 11s | Max: 30m 09s | Hits:  77%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 31m | Avg: 21m 11s | Max: 32m 46s | Hits:  83%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 09s | Max: 48m 21s | Hits:  54%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 17m | Avg: 45m 45s | Max: 59m 49s | Hits:  60%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 29m | Avg: 44m 45s | Max: 45m 08s | Hits:  65%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  6h 55m | Avg: 24m 28s | Max: 29m 18s | Hits:  79%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  8h 41m | Avg: 24m 48s | Max: 32m 46s | Hits:  80%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  3h 53m | Avg: 46m 43s | Max: 59m 49s | Hits:  58%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 29m | Avg: 44m 45s | Max: 45m 08s | Hits:  65%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 08s | Avg: 15m 04s | Max: 18m 28s | Hits:  88%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 16h 40m | Avg: 30m 18s | Max: 48m 21s | Hits:  74%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 49m | Avg: 22m 58s | Max: 59m 49s | Hits:  82%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 19h 23m | Avg: 30m 36s | Max: 59m 49s | Hits:  74%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 46m 21s | Avg: 15m 27s | Max: 30m 03s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 50m 25s | Avg: 12m 36s | Max: 16m 36s | Hits:  92%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 08s | Avg: 15m 04s | Max: 18m 28s | Hits:  88%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 30m 37s | Avg: 30m 37s | Max: 30m 37s | Hits:  77%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 10h 30m | Avg: 31m 30s | Max: 48m 21s | Hits:  73%/35611 
      🟩 20                 Pass: 100%/23  | Total:  9h 48m | Avg: 25m 34s | Max: 59m 49s | Hits:  80%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 19s | Avg: 7m 39s | Max: 12m 57s | Hits: 98%/304

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 19s | Avg:  7m 39s | Max: 12m 57s | Hits:  98%/304   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s | Hits:  98%/152   
      🟩 Test               Pass: 100%/1   | Total: 12m 57s | Avg: 12m 57s | Max: 12m 57s | Hits:  98%/152   
    
  • 🟩 python: Pass: 100%/1 | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 41m 24s | Avg: 41m 24s | Max: 41m 24s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
@shwina shwina enabled auto-merge (squash) February 21, 2025 18:29
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 51m: Pass: 100%/93 | Total: 2d 11h | Avg: 38m 10s | Max: 1h 17m | Hits: 79%/133925
  • 🟩 cub: Pass: 100%/45 | Total: 1d 14h | Avg: 51m 43s | Max: 1h 17m | Hits: 73%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 12h | Avg: 51m 18s | Max:  1h 17m | Hits:  73%/51055 
      🟩 arm64              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m | Hits:  73%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 49m | Avg: 57m 54s | Max:  1h 02m | Hits:  63%/5908  
      🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  1d 07h | Avg: 50m 10s | Max:  1h 17m | Hits:  75%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  76%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 49m | Avg: 57m 54s | Max:  1h 02m | Hits:  63%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 05h | Avg: 49m 29s | Max:  1h 17m | Hits:  75%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  76%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 12h | Avg: 51m 13s | Max:  1h 17m | Hits:  73%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 43m | Avg: 55m 57s | Max: 58m 18s | Hits:  73%/4868  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 48m | Avg: 54m 20s | Max: 54m 30s | Hits:  73%/2430  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 53m | Avg: 56m 56s | Max: 56m 59s | Hits:  73%/2430  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 49m | Avg: 54m 46s | Max: 55m 05s | Hits:  73%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 42m | Avg: 48m 59s | Max:  1h 03m | Hits:  82%/8175  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 07s | Max: 54m 22s | Hits:  72%/2434  
      🟩 GCC8               Pass: 100%/1   | Total: 53m 04s | Avg: 53m 04s | Max: 53m 04s | Hits:  73%/1217  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 30s | Max: 59m 18s | Hits:  73%/2434  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 46s | Max: 56m 51s | Hits:  73%/2434  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 19s | Max: 58m 18s | Hits:  73%/2430  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 51m | Avg: 55m 46s | Max: 57m 03s | Hits:  73%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 41m | Avg: 36m 28s | Max:  1h 05m | Hits:  87%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 17m | Hits:  15%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 13m | Hits:  15%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 58m | Avg: 52m 52s | Max:  1h 03m | Hits:  76%/20333 
      🟩 GCC                Pass: 100%/22  | Total: 16h 55m | Avg: 46m 09s | Max:  1h 05m | Hits:  80%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 41m | Avg:  1h 10m | Max:  1h 17m | Hits:  15%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  68%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 10m | Avg: 23m 35s | Max: 24m 10s | Hits:  90%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 09h | Avg: 58m 58s | Max:  1h 17m | Hits:  67%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 11m | Avg: 31m 26s | Max:  1h 03m | Hits:  93%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 11h | Avg: 58m 06s | Max:  1h 17m | Hits:  67%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 03s | Avg: 23m 03s | Max: 23m 03s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 28s | Avg: 16m 28s | Max: 16m 28s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 14s | Max: 24m 54s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 48s | Max: 23m 24s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 10m | Avg: 23m 35s | Max: 24m 10s | Hits:  90%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m | Hits:  73%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 19h 34m | Avg: 58m 44s | Max:  1h 17m | Hits:  65%/23535 
      🟩 20                 Pass: 100%/25  | Total: 19h 12m | Avg: 46m 06s | Max:  1h 13m | Hits:  79%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 19h 31m | Avg: 26m 01s | Max: 50m 09s | Hits: 83%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 32m 40s | Avg: 16m 20s | Max: 21m 32s | Hits:  92%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 18h 43m | Avg: 26m 08s | Max: 50m 09s | Hits:  83%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 47m 33s | Avg: 23m 46s | Max: 24m 53s | Hits:  84%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 27m | Avg: 29m 26s | Max: 40m 49s | Hits:  81%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  1h 34m | Avg: 47m 26s | Max: 50m 09s | Hits:  65%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 15h 29m | Avg: 24m 27s | Max: 49m 13s | Hits:  84%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 40m 28s | Avg: 20m 14s | Max: 21m 00s | Hits:  84%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 27m | Avg: 29m 26s | Max: 40m 49s | Hits:  81%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 34m | Avg: 47m 26s | Max: 50m 09s | Hits:  65%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 14h 48m | Avg: 24m 41s | Max: 49m 13s | Hits:  84%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 40m 28s | Avg: 20m 14s | Max: 21m 00s | Hits:  84%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 18h 50m | Avg: 26m 18s | Max: 50m 09s | Hits:  83%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 09s | Max: 27m 53s | Hits:  84%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 52s | Hits:  84%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 50m 45s | Avg: 25m 22s | Max: 25m 42s | Hits:  84%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 49m 32s | Avg: 24m 46s | Max: 25m 14s | Hits:  84%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 11m | Avg: 18m 46s | Max: 26m 25s | Hits:  88%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 53m 55s | Avg: 26m 57s | Max: 26m 58s | Hits:  84%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 30m 30s | Avg: 30m 30s | Max: 30m 30s | Hits:  48%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 55m 13s | Avg: 27m 36s | Max: 28m 40s | Hits:  84%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 53m 16s | Avg: 26m 38s | Max: 28m 16s | Hits:  84%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 54m 58s | Avg: 27m 29s | Max: 28m 17s | Hits:  84%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 52m 26s | Avg: 26m 13s | Max: 26m 17s | Hits:  84%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 03m | Avg: 18m 19s | Max: 27m 28s | Hits:  90%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 21m | Avg: 40m 34s | Max: 40m 49s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 08m | Avg: 42m 48s | Max: 49m 13s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 34m | Avg: 47m 26s | Max: 50m 09s | Hits:  65%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  6h 23m | Avg: 22m 33s | Max: 27m 53s | Hits:  86%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  8h 03m | Avg: 23m 01s | Max: 30m 30s | Hits:  85%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  3h 29m | Avg: 41m 54s | Max: 49m 13s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 34m | Avg: 47m 26s | Max: 50m 09s | Hits:  65%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 25m 39s | Avg: 12m 49s | Max: 14m 44s | Hits:  92%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 15h 38m | Avg: 28m 26s | Max: 50m 09s | Hits:  80%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 26m | Avg: 20m 41s | Max: 49m 13s | Hits:  89%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 17h 58m | Avg: 28m 22s | Max: 50m 09s | Hits:  80%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 38s | Avg: 16m 32s | Max: 33m 59s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 35s | Avg: 10m 53s | Max: 11m 13s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 25m 39s | Avg: 12m 49s | Max: 14m 44s | Hits:  92%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 27m 19s | Avg: 27m 19s | Max: 27m 19s | Hits:  84%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  9h 51m | Avg: 29m 34s | Max: 50m 09s | Hits:  79%/35611 
      🟩 20                 Pass: 100%/23  | Total:  9h 07m | Avg: 23m 47s | Max: 49m 13s | Hits:  85%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 44s | Avg: 7m 52s | Max: 13m 24s | Hits: 98%/304

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max: 13m 24s | Hits:  98%/304   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s | Hits:  98%/152   
      🟩 Test               Pass: 100%/1   | Total: 13m 24s | Avg: 13m 24s | Max: 13m 24s | Hits:  98%/152   
    
  • 🟩 python: Pass: 100%/1 | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 36m 11s | Avg: 36m 11s | Max: 36m 11s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@shwina shwina merged commit 4190e9c into NVIDIA:main Feb 21, 2025
104 of 107 checks passed
davebayer pushed a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
* Move transform kernels to NVRTC compilable header

* Update cub/cub/device/dispatch/kernels/transform.cuh

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

---------

Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Move transform kernels to NVRTC compilable header

3 participants