Skip to content

add _CCCL_HAS_NVFP8 macro#3429

Merged
fbusato merged 4 commits intoNVIDIA:mainfrom
fbusato:fp8-macro
Jan 21, 2025
Merged

add _CCCL_HAS_NVFP8 macro#3429
fbusato merged 4 commits intoNVIDIA:mainfrom
fbusato:fp8-macro

Conversation

@fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 16, 2025

Description

While all other non-standard data types have the corresponding _CCCL macro, FP8 does not.
The PR adds _CCCL_HAS_NVFP8() detection macro.

TODO:

  • add the macro description to the internal documentation
  • replace CUB/libcu++ specific macro

@fbusato fbusato added the 2.8.0 label Jan 16, 2025
@fbusato fbusato self-assigned this Jan 16, 2025
@fbusato fbusato requested a review from a team as a code owner January 16, 2025 21:19
@fbusato fbusato requested a review from wmaxey January 16, 2025 21:19
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 46m: Pass: 100%/144 | Total: 1d 10h | Avg: 14m 33s | Max: 1h 12m | Hits: 239%/25754
  • 🟩 libcudacxx: Pass: 100%/46 | Total: 10h 05m | Avg: 13m 09s | Max: 41m 25s | Hits: 372%/12472

    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  9h 56m | Avg: 13m 33s | Max: 41m 25s | Hits: 372%/12472 
      🟩 arm64              Pass: 100%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 29s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 38m | Avg: 12m 18s | Max: 35m 34s | Hits: 400%/4869  
      🟩 12.5               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 33s | Max: 33m 47s
      🟩 12.6               Pass: 100%/36  | Total:  7h 19m | Avg: 12m 12s | Max: 41m 25s | Hits: 354%/7603  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 06m | Avg: 16m 37s | Max: 20m 44s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 38m | Avg: 12m 18s | Max: 35m 34s | Hits: 400%/4869  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 07m | Avg: 33m 33s | Max: 33m 47s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  6h 13m | Avg: 11m 39s | Max: 41m 25s | Hits: 354%/7603  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 06m | Avg: 16m 37s | Max: 20m 44s
      🟩 nvcc               Pass: 100%/42  | Total:  8h 58m | Avg: 12m 49s | Max: 41m 25s | Hits: 372%/12472 
    🟩 cxx
      🟩 Clang14            Pass: 100%/6   | Total: 40m 41s | Avg:  6m 46s | Max: 21m 00s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
      🟩 Clang17            Pass: 100%/1   | Total:  8m 28s | Avg:  8m 28s | Max:  8m 28s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 11s | Max: 20m 44s
      🟩 GCC7               Pass: 100%/5   | Total: 29m 02s | Avg:  5m 48s | Max: 13m 44s
      🟩 GCC8               Pass: 100%/1   | Total:  4m 47s | Avg:  4m 47s | Max:  4m 47s
      🟩 GCC9               Pass: 100%/3   | Total: 10m 39s | Avg:  3m 33s | Max:  3m 52s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 17m | Avg: 13m 45s | Max: 28m 22s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 47m | Avg: 35m 42s | Max: 40m 02s | Hits: 377%/7354  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 19m | Avg: 39m 40s | Max: 41m 25s | Hits: 364%/5118  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 33s | Max: 33m 47s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 36m | Avg:  9m 13s | Max: 21m 00s
      🟩 GCC                Pass: 100%/22  | Total:  3h 14m | Avg:  8m 51s | Max: 28m 22s
      🟩 MSVC               Pass: 100%/5   | Total:  3h 06m | Avg: 37m 17s | Max: 41m 25s | Hits: 372%/12472 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 33s | Max: 33m 47s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 10h 05m | Avg: 13m 09s | Max: 41m 25s | Hits: 372%/12472 
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  7h 46m | Avg: 11m 57s | Max: 41m 25s | Hits: 372%/12472 
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 33m | Avg: 23m 17s | Max: 28m 22s
      🟩 Test               Pass: 100%/2   | Total: 43m 52s | Avg: 21m 56s | Max: 26m 14s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 16s | Avg: 12m 16s | Max: 12m 16s
      🟩 90a                Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max: 13m 57s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total: 46m 29s | Avg:  7m 44s | Max: 19m 37s
      🟩 14                 Pass: 100%/4   | Total:  1h 07m | Avg: 16m 55s | Max: 31m 31s | Hits: 399%/2394  
      🟩 17                 Pass: 100%/14  | Total:  3h 58m | Avg: 17m 03s | Max: 40m 02s | Hits: 355%/7445  
      🟩 20                 Pass: 100%/21  | Total:  4h 10m | Avg: 11m 54s | Max: 41m 25s | Hits: 395%/2633  
    
  • 🟩 cub: Pass: 100%/38 | Total: 11h 50m | Avg: 18m 41s | Max: 1h 12m | Hits: 38%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total: 11h 40m | Avg: 19m 27s | Max:  1h 12m | Hits:  38%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  9m 35s | Avg:  4m 47s | Max:  4m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 27m | Avg: 17m 25s | Max:  1h 05m | Hits:  38%/885   
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 12.6               Pass: 100%/31  | Total:  8h 08m | Avg: 15m 45s | Max:  1h 12m | Hits:  38%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 28s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 27m | Avg: 17m 25s | Max:  1h 05m | Hits:  38%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 nvcc12.6           Pass: 100%/29  | Total:  7h 59m | Avg: 16m 32s | Max:  1h 12m | Hits:  38%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 28s
      🟩 nvcc               Pass: 100%/36  | Total: 11h 41m | Avg: 19m 29s | Max:  1h 12m | Hits:  38%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 19s | Avg:  5m 19s | Max:  5m 27s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 34m | Avg: 13m 27s | Max: 42m 06s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 06s | Avg:  5m 33s | Max:  5m 36s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  5m 53s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 04s | Avg:  6m 04s | Max:  6m 04s
      🟩 GCC12              Pass: 100%/3   | Total: 29m 50s | Avg:  9m 56s | Max: 19m 43s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 37m | Avg: 12m 09s | Max: 21m 11s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 12m | Hits:  38%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 10m | Hits:  38%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  2h 12m | Avg:  9m 29s | Max: 42m 06s
      🟩 GCC                Pass: 100%/18  | Total:  2h 46m | Avg:  9m 15s | Max: 21m 11s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 36m | Avg:  1h 09m | Max:  1h 12m | Hits:  38%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 23m 56s | Avg: 11m 58s | Max: 19m 43s
      🟩 v100               Pass: 100%/36  | Total: 11h 26m | Avg: 19m 03s | Max:  1h 12m | Hits:  38%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  9h 04m | Avg: 17m 33s | Max:  1h 12m | Hits:  38%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 43s | Avg: 20m 43s | Max: 20m 43s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 30s | Avg: 15m 30s | Max: 15m 30s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 07s | Max: 27m 27s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 38s | Max: 42m 06s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 23m 56s | Avg: 11m 58s | Max: 19m 43s
      🟩 90a                Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  5h 26m | Avg: 23m 19s | Max:  1h 12m | Hits:  38%/2655  
      🟩 20                 Pass: 100%/24  | Total:  6h 23m | Avg: 15m 59s | Max:  1h 10m | Hits:  38%/885   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 10h 23m | Avg: 16m 50s | Max: 1h 12m | Hits: 145%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 19m 00s | Avg:  9m 30s | Max: 12m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 10h 13m | Avg: 17m 32s | Max:  1h 12m | Hits: 145%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  4m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 18m | Avg: 15m 44s | Max: 58m 53s | Hits:  80%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 12.6               Pass: 100%/30  | Total:  6h 39m | Avg: 13m 19s | Max:  1h 04m | Hits: 161%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 18m | Avg: 15m 44s | Max: 58m 53s | Hits:  80%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/28  | Total:  6h 28m | Avg: 13m 53s | Max:  1h 04m | Hits: 161%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 38s
      🟩 nvcc               Pass: 100%/35  | Total: 10h 12m | Avg: 17m 29s | Max:  1h 12m | Hits: 145%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 04s | Avg:  5m 16s | Max:  5m 45s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 57s | Avg:  5m 57s | Max:  5m 57s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 Clang18            Pass: 100%/7   | Total: 52m 08s | Avg:  7m 26s | Max: 17m 26s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 24s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 34s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 02m | Avg:  7m 48s | Max: 13m 59s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 06s | Max: 59m 20s | Hits: 100%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 44m | Avg: 54m 58s | Max:  1h 04m | Hits: 175%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 30m | Avg:  6m 27s | Max: 17m 26s
      🟩 GCC                Pass: 100%/16  | Total:  1h 45m | Avg:  6m 34s | Max: 13m 59s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 43m | Avg: 56m 37s | Max:  1h 04m | Hits: 145%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 10h 23m | Avg: 16m 50s | Max:  1h 12m | Hits: 145%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  8h 46m | Avg: 16m 59s | Max:  1h 12m | Hits:  90%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 24s | Avg: 17m 28s | Max: 36m 14s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 44m 14s | Avg: 14m 44s | Max: 17m 26s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  5h 08m | Avg: 22m 02s | Max:  1h 12m | Hits:  93%/5532  
      🟩 20                 Pass: 100%/21  | Total:  4h 55m | Avg: 14m 05s | Max:  1h 12m | Hits: 222%/3688  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 59m | Avg: 5m 59s | Max: 17m 08s | Hits: 81%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 49m | Avg:  6m 49s | Max: 17m 08s | Hits:  81%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 27s | Avg:  2m 36s | Max:  2m 39s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 52s | Avg: 12m 52s | Max: 12m 52s | Hits:  81%/261   
      🟩 12.5               Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 25s
      🟩 12.6               Pass: 100%/17  | Total:  1h 28m | Avg:  5m 11s | Max: 17m 08s | Hits:  81%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 52s | Avg: 12m 52s | Max: 12m 52s | Hits:  81%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 25s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 28m | Avg:  5m 11s | Max: 17m 08s | Hits:  81%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 59m | Avg:  5m 59s | Max: 17m 08s | Hits:  81%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
      🟩 Clang18            Pass: 100%/4   | Total: 25m 29s | Avg:  6m 22s | Max: 17m 08s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 GCC12              Pass: 100%/2   | Total: 19m 20s | Avg:  9m 40s | Max: 16m 05s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 49s | Avg:  2m 42s | Max:  2m 53s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 52s | Avg: 12m 52s | Max: 12m 52s | Hits:  81%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 54s | Avg: 12m 54s | Max: 12m 54s | Hits:  81%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 25s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 38m 37s | Avg:  4m 49s | Max: 17m 08s
      🟩 GCC                Pass: 100%/8   | Total: 36m 40s | Avg:  4m 35s | Max: 16m 05s
      🟩 MSVC               Pass: 100%/2   | Total: 25m 46s | Avg: 12m 53s | Max: 12m 54s | Hits:  81%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 59m | Avg:  5m 59s | Max: 17m 08s | Hits:  81%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 26m | Avg:  4m 48s | Max: 12m 54s | Hits:  81%/522   
      🟩 Test               Pass: 100%/2   | Total: 33m 13s | Avg: 16m 36s | Max: 17m 08s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 43s | Avg:  2m 43s | Max:  2m 43s
      🟩 90a                Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 23s | Avg:  4m 20s | Max:  9m 25s
      🟩 20                 Pass: 100%/16  | Total:  1h 42m | Avg:  6m 23s | Max: 17m 08s | Hits:  81%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 51s | Avg: 4m 55s | Max: 7m 56s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  7m 56s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
      🟩 Test               Pass: 100%/1   | Total:  7m 56s | Avg:  7m 56s | Max:  7m 56s
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 144)

# Runner
98 linux-amd64-cpu16
19 linux-amd64-gpu-v100-latest-1
16 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@github-actions
Copy link
Contributor

🟩 CI finished in 1h 27m: Pass: 100%/144 | Total: 1d 01h | Avg: 10m 43s | Max: 39m 39s | Hits: 542%/25759
  • 🟩 libcudacxx: Pass: 100%/46 | Total: 9h 21m | Avg: 12m 12s | Max: 39m 39s | Hits: 681%/12477

    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  9h 14m | Avg: 12m 36s | Max: 39m 39s | Hits: 681%/12477 
      🟩 arm64              Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  3m 37s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 04m | Avg:  8m 06s | Max: 22m 56s | Hits: 680%/4871  
      🟩 12.5               Pass: 100%/2   | Total:  1h 00m | Avg: 30m 28s | Max: 31m 42s
      🟩 12.6               Pass: 100%/36  | Total:  7h 15m | Avg: 12m 06s | Max: 39m 39s | Hits: 681%/7606  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 29s | Max: 20m 55s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 04m | Avg:  8m 06s | Max: 22m 56s | Hits: 680%/4871  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 00m | Avg: 30m 28s | Max: 31m 42s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  6h 09m | Avg: 11m 33s | Max: 39m 39s | Hits: 681%/7606  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 29s | Max: 20m 55s
      🟩 nvcc               Pass: 100%/42  | Total:  8h 15m | Avg: 11m 47s | Max: 39m 39s | Hits: 681%/12477 
    🟩 cxx
      🟩 Clang14            Pass: 100%/6   | Total: 37m 57s | Avg:  6m 19s | Max: 17m 31s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 30s | Avg:  4m 30s | Max:  4m 30s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 35m | Avg: 11m 57s | Max: 20m 55s
      🟩 GCC7               Pass: 100%/5   | Total: 30m 50s | Avg:  6m 10s | Max: 18m 03s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s
      🟩 GCC9               Pass: 100%/3   | Total: 11m 29s | Avg:  3m 49s | Max:  4m 51s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 35s | Avg:  3m 35s | Max:  3m 35s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 57s | Avg:  4m 57s | Max:  4m 57s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 47m | Avg: 16m 43s | Max: 39m 39s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 11m | Avg: 23m 41s | Max: 26m 19s | Hits: 681%/7357  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 21s | Avg: 28m 10s | Max: 30m 19s | Hits: 680%/5120  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 28s | Max: 31m 42s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 27m | Avg:  8m 40s | Max: 20m 55s
      🟩 GCC                Pass: 100%/22  | Total:  3h 45m | Avg: 10m 15s | Max: 39m 39s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 07m | Avg: 25m 29s | Max: 30m 19s | Hits: 681%/12477 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 28s | Max: 31m 42s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  9h 21m | Avg: 12m 12s | Max: 39m 39s | Hits: 681%/12477 
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  6h 32m | Avg: 10m 03s | Max: 31m 42s | Hits: 681%/12477 
      🟩 NVRTC              Pass: 100%/4   | Total:  2h 12m | Avg: 33m 03s | Max: 39m 39s
      🟩 Test               Pass: 100%/2   | Total: 35m 24s | Avg: 17m 42s | Max: 18m 01s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 49s | Avg:  1m 49s | Max:  1m 49s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 11s | Avg: 12m 11s | Max: 12m 11s
      🟩 90a                Pass: 100%/2   | Total: 16m 20s | Avg:  8m 10s | Max: 12m 38s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total: 37m 55s | Avg:  6m 19s | Max: 22m 14s
      🟩 14                 Pass: 100%/4   | Total:  1h 31m | Avg: 22m 52s | Max: 34m 07s | Hits: 682%/2395  
      🟩 17                 Pass: 100%/14  | Total:  3h 17m | Avg: 14m 05s | Max: 39m 39s | Hits: 680%/7448  
      🟩 20                 Pass: 100%/21  | Total:  3h 52m | Avg: 11m 05s | Max: 36m 14s | Hits: 682%/2634  
    
  • 🟩 cub: Pass: 100%/38 | Total: 7h 27m | Avg: 11m 46s | Max: 38m 17s | Hits: 539%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  7h 17m | Avg: 12m 09s | Max: 38m 17s | Hits: 539%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  9m 47s | Avg:  4m 53s | Max:  4m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 48m 07s | Avg:  9m 37s | Max: 27m 29s | Hits: 539%/885   
      🟩 12.5               Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 33s
      🟩 12.6               Pass: 100%/31  | Total:  6h 20m | Avg: 12m 16s | Max: 38m 17s | Hits: 539%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 03s | Avg:  4m 31s | Max:  4m 34s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 48m 07s | Avg:  9m 37s | Max: 27m 29s | Hits: 539%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 33s
      🟩 nvcc12.6           Pass: 100%/29  | Total:  6h 11m | Avg: 12m 49s | Max: 38m 17s | Hits: 539%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 03s | Avg:  4m 31s | Max:  4m 34s
      🟩 nvcc               Pass: 100%/36  | Total:  7h 18m | Avg: 12m 10s | Max: 38m 17s | Hits: 539%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 37s | Avg:  5m 24s | Max:  5m 41s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 54s | Avg:  5m 54s | Max:  5m 54s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 23m | Avg: 11m 54s | Max: 38m 17s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 25s | Avg:  5m 12s | Max:  5m 22s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 54s | Avg:  5m 27s | Max:  5m 40s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 55s | Avg:  5m 55s | Max:  5m 55s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
      🟩 GCC12              Pass: 100%/3   | Total: 29m 40s | Avg:  9m 53s | Max: 19m 28s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 01m | Avg: 15m 14s | Max: 29m 54s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 01s | Avg: 28m 00s | Max: 28m 32s | Hits: 539%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 07s | Max: 31m 36s | Hits: 539%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 33s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  2h 02m | Avg:  8m 44s | Max: 38m 17s
      🟩 GCC                Pass: 100%/18  | Total:  3h 10m | Avg: 10m 33s | Max: 29m 54s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 56m | Avg: 29m 04s | Max: 31m 36s | Hits: 539%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max:  9m 33s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 24m 03s | Avg: 12m 01s | Max: 19m 28s
      🟩 v100               Pass: 100%/36  | Total:  7h 03m | Avg: 11m 45s | Max: 38m 17s | Hits: 539%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  4h 28m | Avg:  8m 39s | Max: 31m 36s | Hits: 539%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 27m 03s | Avg: 27m 03s | Max: 27m 03s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 45s | Avg: 17m 45s | Max: 17m 45s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 03s | Max: 26m 36s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 08m | Avg: 34m 05s | Max: 38m 17s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 24m 03s | Avg: 12m 01s | Max: 19m 28s
      🟩 90a                Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  2h 27m | Avg: 10m 31s | Max: 28m 39s | Hits: 539%/2655  
      🟩 20                 Pass: 100%/24  | Total:  5h 00m | Avg: 12m 30s | Max: 38m 17s | Hits: 539%/885   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 6h 20m | Avg: 10m 17s | Max: 35m 16s | Hits: 365%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 22m 40s | Avg: 11m 20s | Max: 16m 30s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total:  6h 10m | Avg: 10m 35s | Max: 35m 16s | Hits: 365%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  5m 03s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 46m 22s | Avg:  9m 16s | Max: 25m 49s | Hits: 365%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 31m 24s | Avg: 15m 42s | Max: 15m 50s
      🟩 12.6               Pass: 100%/30  | Total:  5h 02m | Avg: 10m 05s | Max: 35m 16s | Hits: 365%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  6m 09s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 46m 22s | Avg:  9m 16s | Max: 25m 49s | Hits: 365%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 31m 24s | Avg: 15m 42s | Max: 15m 50s
      🟩 nvcc12.6           Pass: 100%/28  | Total:  4h 51m | Avg: 10m 24s | Max: 35m 16s | Hits: 365%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  6m 09s
      🟩 nvcc               Pass: 100%/35  | Total:  6h 09m | Avg: 10m 32s | Max: 35m 16s | Hits: 365%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 58s | Avg:  5m 14s | Max:  5m 47s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
      🟩 Clang18            Pass: 100%/7   | Total: 50m 46s | Avg:  7m 15s | Max: 16m 20s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 43s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  6m 13s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 15s | Avg:  6m 15s | Max:  6m 15s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 03s | Avg:  6m 03s | Max:  6m 03s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 03m | Avg:  7m 56s | Max: 16m 30s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 55m 26s | Avg: 27m 43s | Max: 29m 37s | Hits: 365%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 36m | Avg: 32m 05s | Max: 35m 16s | Hits: 365%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 31m 24s | Avg: 15m 42s | Max: 15m 50s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 27m | Avg:  6m 15s | Max: 16m 20s
      🟩 GCC                Pass: 100%/16  | Total:  1h 49m | Avg:  6m 52s | Max: 16m 30s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 31m | Avg: 30m 20s | Max: 35m 16s | Hits: 365%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total: 31m 24s | Avg: 15m 42s | Max: 15m 50s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total:  6h 20m | Avg: 10m 17s | Max: 35m 16s | Hits: 365%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  4h 46m | Avg:  9m 14s | Max: 30m 54s | Hits: 365%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 50s | Avg: 16m 56s | Max: 35m 16s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 43m 19s | Avg: 14m 26s | Max: 16m 30s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  2h 37m | Avg: 11m 13s | Max: 30m 06s | Hits: 365%/5532  
      🟩 20                 Pass: 100%/21  | Total:  3h 20m | Avg:  9m 33s | Max: 35m 16s | Hits: 365%/3688  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 2h 00m | Avg: 6m 00s | Max: 24m 27s | Hits: 388%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 49m | Avg:  6m 51s | Max: 24m 27s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 21s | Avg:  2m 35s | Max:  2m 38s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 51s | Avg: 11m 51s | Max: 11m 51s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 17s
      🟩 12.6               Pass: 100%/17  | Total:  1h 37m | Avg:  5m 45s | Max: 24m 27s | Hits: 388%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 51s | Avg: 11m 51s | Max: 11m 51s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 17s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 37m | Avg:  5m 45s | Max: 24m 27s | Hits: 388%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  2h 00m | Avg:  6m 00s | Max: 24m 27s | Hits: 388%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang18            Pass: 100%/4   | Total: 28m 46s | Avg:  7m 11s | Max: 20m 10s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 GCC12              Pass: 100%/2   | Total: 27m 34s | Avg: 13m 47s | Max: 24m 27s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 43s | Avg:  2m 40s | Max:  2m 56s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 51s | Avg: 11m 51s | Max: 11m 51s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 24s | Avg: 11m 24s | Max: 11m 24s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 17s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 41m 55s | Avg:  5m 14s | Max: 20m 10s
      🟩 GCC                Pass: 100%/8   | Total: 44m 36s | Avg:  5m 34s | Max: 24m 27s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 15s | Avg: 11m 37s | Max: 11m 51s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  2h 00m | Avg:  6m 00s | Max: 24m 27s | Hits: 388%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 15m | Avg:  4m 11s | Max: 11m 51s | Hits: 388%/522   
      🟩 Test               Pass: 100%/2   | Total: 44m 37s | Avg: 22m 18s | Max: 24m 27s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 39s | Avg:  2m 39s | Max:  2m 39s
      🟩 90a                Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 05s | Avg:  3m 16s | Max:  5m 17s
      🟩 20                 Pass: 100%/16  | Total:  1h 47m | Avg:  6m 41s | Max: 24m 27s | Hits: 388%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 19s | Avg: 4m 39s | Max: 7m 22s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  7m 22s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s
      🟩 Test               Pass: 100%/1   | Total:  7m 22s | Avg:  7m 22s | Max:  7m 22s
    
  • 🟩 python: Pass: 100%/1 | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 24m 29s | Avg: 24m 29s | Max: 24m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 144)

# Runner
98 linux-amd64-cpu16
19 linux-amd64-gpu-v100-latest-1
16 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the nvcc > 11.8 check, otherwise LGTM

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
@fbusato fbusato enabled auto-merge (squash) January 21, 2025 18:26
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 57m: Pass: 100%/135 | Total: 1d 09h | Avg: 14m 59s | Max: 1h 17m | Hits: 237%/23404
  • 🟩 cub: Pass: 100%/38 | Total: 11h 40m | Avg: 18m 26s | Max: 1h 15m | Hits: 38%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total: 11h 30m | Avg: 19m 11s | Max:  1h 15m | Hits:  38%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 01s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 21m | Avg: 16m 19s | Max:  1h 00m | Hits:  38%/885   
      🟩 12.5               Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 15m
      🟩 12.6               Pass: 100%/31  | Total:  7h 58m | Avg: 15m 26s | Max:  1h 09m | Hits:  38%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  4m 42s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 21m | Avg: 16m 19s | Max:  1h 00m | Hits:  38%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 15m
      🟩 nvcc12.6           Pass: 100%/29  | Total:  7h 49m | Avg: 16m 11s | Max:  1h 09m | Hits:  38%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  4m 42s
      🟩 nvcc               Pass: 100%/36  | Total: 11h 31m | Avg: 19m 12s | Max:  1h 15m | Hits:  38%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 28s | Avg:  5m 22s | Max:  5m 45s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 48s | Avg:  5m 48s | Max:  5m 48s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 18m | Avg: 11m 08s | Max: 29m 48s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  5m 20s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  6m 00s
      🟩 GCC10              Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s
      🟩 GCC12              Pass: 100%/3   | Total: 29m 20s | Avg:  9m 46s | Max: 19m 15s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 53m | Avg: 14m 10s | Max: 26m 40s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 06m | Hits:  38%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  39%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 15m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 56m | Avg:  8m 17s | Max: 29m 48s
      🟩 GCC                Pass: 100%/18  | Total:  3h 02m | Avg: 10m 07s | Max: 26m 40s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 21m | Avg:  1h 05m | Max:  1h 09m | Hits:  38%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 15m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 23m 48s | Avg: 11m 54s | Max: 19m 15s
      🟩 v100               Pass: 100%/36  | Total: 11h 16m | Avg: 18m 47s | Max:  1h 15m | Hits:  38%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  8h 55m | Avg: 17m 16s | Max:  1h 15m | Hits:  38%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 22s | Avg: 18m 22s | Max: 18m 22s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 05m | Avg: 21m 50s | Max: 23m 39s
      🟩 TestGPU            Pass: 100%/2   | Total: 56m 28s | Avg: 28m 14s | Max: 29m 48s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 23m 48s | Avg: 11m 54s | Max: 19m 15s
      🟩 90a                Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  5h 11m | Avg: 22m 13s | Max:  1h 06m | Hits:  38%/2655  
      🟩 20                 Pass: 100%/24  | Total:  6h 29m | Avg: 16m 13s | Max:  1h 15m | Hits:  39%/885   
    
  • 🟩 libcudacxx: Pass: 100%/37 | Total: 8h 09m | Avg: 13m 13s | Max: 43m 03s | Hits: 400%/10162

    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total:  7h 59m | Avg: 13m 41s | Max: 43m 03s | Hits: 400%/10162 
      🟩 arm64              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  6m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 45m 13s | Avg:  9m 02s | Max: 31m 05s | Hits: 400%/2495  
      🟩 12.5               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 35s | Max: 39m 14s
      🟩 12.6               Pass: 100%/30  | Total:  6h 12m | Avg: 12m 25s | Max: 43m 03s | Hits: 400%/7667  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 18s | Max: 19m 42s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 13s | Avg:  9m 02s | Max: 31m 05s | Hits: 400%/2495  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 11m | Avg: 35m 35s | Max: 39m 14s
      🟩 nvcc12.6           Pass: 100%/26  | Total:  5h 07m | Avg: 11m 49s | Max: 43m 03s | Hits: 400%/7667  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 18s | Max: 19m 42s
      🟩 nvcc               Pass: 100%/33  | Total:  7h 03m | Avg: 12m 50s | Max: 43m 03s | Hits: 400%/10162 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 19m 36s | Avg:  4m 54s | Max:  7m 53s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
      🟩 Clang16            Pass: 100%/1   | Total:  7m 29s | Avg:  7m 29s | Max:  7m 29s
      🟩 Clang17            Pass: 100%/1   | Total:  7m 55s | Avg:  7m 55s | Max:  7m 55s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 42m | Avg: 12m 51s | Max: 19m 42s
      🟩 GCC7               Pass: 100%/2   | Total:  6m 59s | Avg:  3m 29s | Max:  3m 34s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 33s | Avg:  3m 33s | Max:  3m 33s
      🟩 GCC9               Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  3m 46s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 38s | Avg:  3m 38s | Max:  3m 38s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 37m | Avg: 12m 12s | Max: 40m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 05m | Avg: 32m 32s | Max: 34m 00s | Hits: 401%/5000  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 23m | Avg: 41m 51s | Max: 43m 03s | Hits: 399%/5162  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 11m | Avg: 35m 35s | Max: 39m 14s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/15  | Total:  2h 22m | Avg:  9m 29s | Max: 19m 42s
      🟩 GCC                Pass: 100%/16  | Total:  2h 06m | Avg:  7m 55s | Max: 40m 34s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 28m | Avg: 37m 12s | Max: 43m 03s | Hits: 400%/10162 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 35s | Max: 39m 14s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total:  8h 09m | Avg: 13m 13s | Max: 43m 03s | Hits: 400%/10162 
    🟩 jobs
      🟩 Build              Pass: 100%/32  | Total:  6h 27m | Avg: 12m 07s | Max: 43m 03s | Hits: 400%/10162 
      🟩 NVRTC              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 11s | Max: 40m 34s
      🟩 Test               Pass: 100%/2   | Total: 34m 46s | Avg: 17m 23s | Max: 18m 22s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 13m 04s | Avg: 13m 04s | Max: 13m 04s
      🟩 90a                Pass: 100%/2   | Total: 16m 41s | Avg:  8m 20s | Max: 13m 05s
    🟩 std
      🟩 17                 Pass: 100%/15  | Total:  3h 38m | Avg: 14m 33s | Max: 40m 40s | Hits: 400%/7505  
      🟩 20                 Pass: 100%/21  | Total:  4h 28m | Avg: 12m 47s | Max: 43m 03s | Hits: 399%/2657  
    
  • 🟩 thrust: Pass: 100%/37 | Total: 10h 40m | Avg: 17m 18s | Max: 1h 17m | Hits: 143%/9180

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 19m 23s | Avg:  9m 41s | Max: 13m 47s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 10h 30m | Avg: 18m 00s | Max:  1h 17m | Hits: 143%/9180  
      🟩 arm64              Pass: 100%/2   | Total:  9m 48s | Avg:  4m 54s | Max:  5m 01s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 02m | Hits:  79%/1836  
      🟩 12.5               Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 17m
      🟩 12.6               Pass: 100%/30  | Total:  6h 45m | Avg: 13m 30s | Max:  1h 07m | Hits: 159%/7344  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 30s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 02m | Hits:  79%/1836  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 17m
      🟩 nvcc12.6           Pass: 100%/28  | Total:  6h 34m | Avg: 14m 06s | Max:  1h 07m | Hits: 159%/7344  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 30s
      🟩 nvcc               Pass: 100%/35  | Total: 10h 29m | Avg: 17m 59s | Max:  1h 17m | Hits: 143%/9180  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 25s | Avg:  5m 21s | Max:  5m 50s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
      🟩 Clang18            Pass: 100%/7   | Total: 48m 40s | Avg:  6m 57s | Max: 14m 09s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 20s | Avg:  5m 10s | Max:  5m 10s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 46s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 56s | Avg:  5m 56s | Max:  5m 56s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 14s | Avg:  6m 14s | Max:  6m 14s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 03m | Avg:  7m 55s | Max: 14m 40s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m | Hits:  96%/3672  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 49m | Avg: 56m 22s | Max:  1h 07m | Hits: 174%/5508  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 17m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 26m | Avg:  6m 11s | Max: 14m 09s
      🟩 GCC                Pass: 100%/16  | Total:  1h 48m | Avg:  6m 46s | Max: 14m 40s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 53m | Avg: 58m 37s | Max:  1h 07m | Hits: 143%/9180  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 17m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 10h 40m | Avg: 17m 18s | Max:  1h 17m | Hits: 143%/9180  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  9h 03m | Avg: 17m 31s | Max:  1h 17m | Hits:  87%/7344  
      🟩 TestCPU            Pass: 100%/3   | Total: 54m 35s | Avg: 18m 11s | Max: 38m 32s | Hits: 365%/1836  
      🟩 TestGPU            Pass: 100%/3   | Total: 42m 36s | Avg: 14m 12s | Max: 14m 40s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  5h 20m | Avg: 22m 52s | Max:  1h 17m | Hits:  90%/5508  
      🟩 20                 Pass: 100%/21  | Total:  5h 00m | Avg: 14m 19s | Max:  1h 14m | Hits: 222%/3672  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 2h 11m | Avg: 6m 35s | Max: 22m 40s | Hits: 81%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  2h 01m | Avg:  7m 34s | Max: 22m 40s | Hits:  81%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 35s | Avg:  2m 38s | Max:  2m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 37s | Avg: 12m 37s | Max: 12m 37s | Hits:  81%/261   
      🟩 12.5               Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max:  8m 55s
      🟩 12.6               Pass: 100%/17  | Total:  1h 41m | Avg:  5m 58s | Max: 22m 40s | Hits:  81%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 37s | Avg: 12m 37s | Max: 12m 37s | Hits:  81%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max:  8m 55s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 41m | Avg:  5m 58s | Max: 22m 40s | Hits:  81%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  2h 11m | Avg:  6m 35s | Max: 22m 40s | Hits:  81%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 36s | Avg:  3m 36s | Max:  3m 36s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 Clang18            Pass: 100%/4   | Total: 31m 13s | Avg:  7m 48s | Max: 22m 40s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 10s | Avg:  3m 10s | Max:  3m 10s
      🟩 GCC12              Pass: 100%/2   | Total: 26m 24s | Avg: 13m 12s | Max: 22m 29s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 45s | Avg:  2m 41s | Max:  2m 54s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 37s | Avg: 12m 37s | Max: 12m 37s | Hits:  81%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 12s | Avg: 13m 12s | Max: 13m 12s | Hits:  81%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max:  8m 55s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 44m 47s | Avg:  5m 35s | Max: 22m 40s
      🟩 GCC                Pass: 100%/8   | Total: 43m 32s | Avg:  5m 26s | Max: 22m 29s
      🟩 MSVC               Pass: 100%/2   | Total: 25m 49s | Avg: 12m 54s | Max: 13m 12s | Hits:  81%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max:  8m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  2h 11m | Avg:  6m 35s | Max: 22m 40s | Hits:  81%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 26m | Avg:  4m 48s | Max: 13m 12s | Hits:  81%/522   
      🟩 Test               Pass: 100%/2   | Total: 45m 09s | Avg: 22m 34s | Max: 22m 40s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 40s | Avg:  2m 40s | Max:  2m 40s
      🟩 90a                Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 35s | Avg:  4m 08s | Max:  8m 41s
      🟩 20                 Pass: 100%/16  | Total:  1h 55m | Avg:  7m 11s | Max: 22m 40s | Hits:  81%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 17m 51s | Avg: 8m 55s | Max: 15m 52s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 52s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
      🟩 Test               Pass: 100%/1   | Total: 15m 52s | Avg: 15m 52s | Max: 15m 52s
    
  • 🟩 python: Pass: 100%/1 | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 44m 46s | Avg: 44m 46s | Max: 44m 46s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 135)

# Runner
92 linux-amd64-cpu16
17 linux-amd64-gpu-v100-latest-1
15 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato fbusato merged commit da9f6e3 into NVIDIA:main Jan 21, 2025
145 of 148 checks passed
@davebayer
Copy link
Contributor

I've found out there is a small problem - the <cuda_fp8.h> header includes the <cuda_fp16.h> and <cuda_bf16.h> headers. Shouldn't we explicitly handle the case when CCCL_DISABLE_FP16_SUPPORT and CCCL_DISABLE_BF16_SUPPORT are defined?

We require NVFP16 to enable NVBF16, too.

@miscco
Copy link
Contributor

miscco commented Jan 22, 2025

We require NVFP16 to enable NVBF16, too.

I am addressing this in #3470

bernhardmgruber pushed a commit to bernhardmgruber/cccl that referenced this pull request Jan 22, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025
davebayer added a commit to davebayer/cccl that referenced this pull request Jan 22, 2025
update docs

update docs

add `memcmp`, `memmove` and `memchr` implementations

implement tests

Use cuda::std::min/max in Thrust (NVIDIA#3364)

Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (NVIDIA#3361)

* implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16`

Cleanup util_arch (NVIDIA#2773)

Deprecate thrust::null_type (NVIDIA#3367)

Deprecate cub::DeviceSpmv (NVIDIA#3320)

Fixes: NVIDIA#896

Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Compile basic infra test with C++17 (NVIDIA#3377)

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

Exit with error when RAPIDS CI fails. (NVIDIA#3385)

cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>

Deprecate thrust::async (NVIDIA#3324)

Fixes: NVIDIA#100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342)

Fix broken `_CCCL_BUILTIN_ASSUME` macro (NVIDIA#3314)

* add compiler-specific path
* fix device code path
* add _CCC_ASSUME

Deprecate thrust::numeric_limits (NVIDIA#3366)

Replace `typedef` with `using` in libcu++ (NVIDIA#3368)

Deprecate thrust::optional (NVIDIA#3307)

Fixes: NVIDIA#3306

Upgrade to Catch2 3.8  (NVIDIA#3310)

Fixes: NVIDIA#1724

refactor `<cuda/std/cstdint>` (NVIDIA#3325)

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

Update CODEOWNERS (NVIDIA#3331)

* Update CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix sign-compare warning (NVIDIA#3408)

Implement more cmath functions to be usable on host and device (NVIDIA#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (NVIDIA#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

Fix assert definition for NVHPC due to constexpr issues (NVIDIA#3418)

NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it.

Fix this by always using the host definition which should also work on device.

Fixes NVIDIA#3411

Extend CUB reduce benchmarks (NVIDIA#3401)

* Rename max.cu to custom.cu, since it uses a custom operator
* Extend types covered my min.cu to all fundamental types
* Add some notes on how to collect tuning parameters

Fixes: NVIDIA#3283

Update upload-pages-artifact to v3 (NVIDIA#3423)

* Update upload-pages-artifact to v3

* Empty commit

---------

Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>

Replace and deprecate thrust::cuda_cub::terminate (NVIDIA#3421)

`std::linalg` accessors and `transposed_layout` (NVIDIA#2962)

Add round up/down to multiple (NVIDIA#3234)

[FEA]: Introduce Python module with CCCL headers (NVIDIA#3201)

* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under NVIDIA#3201 (comment))

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* NVIDIA#3201 (comment)

* NVIDIA#3201 (comment)

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* NVIDIA#3201 (comment)

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d6.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (NVIDIA#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a21.

Error message: NVIDIA#3201 (comment)

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd.

* Implement suggestion by @shwina (NVIDIA#3201 (review))

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (NVIDIA#3434)

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes NVIDIA#3404

Fix CI issues (NVIDIA#3443)

Remove deprecated `cub::min` (NVIDIA#3450)

* Remove deprecated `cuda::{min,max}`

* Drop unused `thrust::remove_cvref` file

Fix typo in builtin (NVIDIA#3451)

Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435)

uses unsigned offset types in thrust's scan dispatch (NVIDIA#3436)

Default transform_iterator's copy ctor (NVIDIA#3395)

Fixes: NVIDIA#2393

Turn C++ dialect warning into error (NVIDIA#3453)

Uses unsigned offset types in thrust's sort algorithm calling into `DispatchMergeSort` (NVIDIA#3437)

* uses thrust's dynamic dispatch for merge_sort

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Refactor allocator handling of contiguous_storage (NVIDIA#3050)

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

Drop thrust::detail::integer_traits (NVIDIA#3391)

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

Improve docs of std headers (NVIDIA#3416)

Drop C++11 and C++14 support for all of cccl (NVIDIA#3417)

* Drop C++11 and C++14 support for all of cccl

---------

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

Deprecate a few CUB macros (NVIDIA#3456)

Deprecate thrust universal iterator categories (NVIDIA#3461)

Fix launch args order (NVIDIA#3465)

Add `--extended-lambda` to the list of removed clangd flags (NVIDIA#3432)

add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429)

Add `_CCCL_BUILTIN_PREFETCH` (NVIDIA#3433)

Drop universal iterator categories (NVIDIA#3474)

Ensure that headers in `<cuda/*>` can be build with a C++ only compiler (NVIDIA#3472)

Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470)

Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

Moves CUB kernel entry points to a detail namespace (NVIDIA#3468)

* moves emptykernel to detail ns

* second batch

* third batch

* fourth batch

* fixes cuda parallel

* concatenates nested namespaces

Deprecate block/warp algo specializations (NVIDIA#3455)

Fixes: NVIDIA#3409

Refactor CUB's util_debug (NVIDIA#3345)
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025
miscco added a commit that referenced this pull request Jan 22, 2025
* add `_CCCL_HAS_NVFP8` macro (#3429)

* Add cuda::is_floating_point supporting half and bfloat (#3379)

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

* Specialize __is_extended_floating_point for FP8 types (#3470)

Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

---------

Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 23, 2025
davebayer added a commit to davebayer/cccl that referenced this pull request Jan 23, 2025
Cleanup util_arch (NVIDIA#2773)

Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>

Deprecate thrust::async (NVIDIA#3324)

Fixes: NVIDIA#100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342)

Deprecate thrust::numeric_limits (NVIDIA#3366)

Upgrade to Catch2 3.8  (NVIDIA#3310)

Fixes: NVIDIA#1724

Fix sign-compare warning (NVIDIA#3408)

Implement more cmath functions to be usable on host and device (NVIDIA#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (NVIDIA#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes NVIDIA#3404

Remove deprecated `cub::min` (NVIDIA#3450)

* Remove deprecated `cuda::{min,max}`

* Drop unused `thrust::remove_cvref` file

Fix typo in builtin (NVIDIA#3451)

Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435)

Drop thrust::detail::integer_traits (NVIDIA#3391)

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429)

Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470)

Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>

Moves CUB kernel entry points to a detail namespace (NVIDIA#3468)

* moves emptykernel to detail ns

* second batch

* third batch

* fourth batch

* fixes cuda parallel

* concatenates nested namespaces

Deprecate block/warp algo specializations (NVIDIA#3455)

Fixes: NVIDIA#3409

fix documentation
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025
@fbusato fbusato deleted the fp8-macro branch February 11, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants