Skip to content

Specialize numeric_limits for CUDA 12.8 FP types#3832

Merged
miscco merged 6 commits intoNVIDIA:mainfrom
davebayer:new_fp_limits
Feb 17, 2025
Merged

Specialize numeric_limits for CUDA 12.8 FP types#3832
miscco merged 6 commits intoNVIDIA:mainfrom
davebayer:new_fp_limits

Conversation

@davebayer
Copy link
Contributor

@davebayer davebayer commented Feb 17, 2025

This PR specializes cuda::std::numeric_limits for __nv_fp8_e8m0, __nv_fp6_e2m3, __nv_fp6_e3m2 and __nv_fp4_e2m1 floating point types introduces in CUDA 12.8.

Partially implements #3558.

@davebayer davebayer requested a review from a team as a code owner February 17, 2025 10:03
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


static constexpr bool traps = false;
static constexpr bool tinyness_before = false;
static constexpr float_round_style round_style = round_toward_zero;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the right round style here. The conversion functions from e. g. float to __nv_fp8_e8m0 allow only rounding towards zero or positive infinity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe all of the fp8, fp6 and fp4 types should be round_indeterminate because they don't implement any arithmetic operations and the wmma instructions define the rounding to be unspecified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that, but then the question is what happens if implement parts of the machinery through conversions to floating point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructors from standard floating point types use cudaRoundZero

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks for figuring all the small little issues out

@miscco
Copy link
Contributor

miscco commented Feb 17, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Feb 17, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Feb 17, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Feb 17, 2025

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 1h 38m: Pass: 100%/158 | Total: 2d 22h | Avg: 26m 49s | Max: 1h 25m | Hits: 60%/248346
  • 🟩 cub: Pass: 100%/45 | Total: 1d 10h | Avg: 45m 46s | Max: 1h 25m | Hits: 70%/53761

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 08h | Avg: 45m 35s | Max:  1h 25m | Hits:  70%/51319 
      🟩 arm64              Pass: 100%/2   | Total:  1h 39m | Avg: 49m 46s | Max: 52m 07s | Hits:  75%/2442  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 53m | Avg: 22m 43s | Max:  1h 01m | Hits:  84%/5939  
      🟩 12.5               Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  23%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 06h | Avg: 47m 31s | Max:  1h 25m | Hits:  70%/45562 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 53m | Avg: 56m 52s | Max: 57m 11s | Hits:  82%/2114  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 53m | Avg: 22m 43s | Max:  1h 01m | Hits:  84%/5939  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  23%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 04h | Avg: 47m 00s | Max:  1h 25m | Hits:  70%/43448 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 52s | Max: 57m 11s | Hits:  82%/2114  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 08h | Avg: 45m 15s | Max:  1h 25m | Hits:  69%/51647 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 55m | Avg: 28m 54s | Max: 54m 28s | Hits:  87%/4892  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 37m | Avg: 48m 41s | Max: 49m 29s | Hits:  75%/2442  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 37m | Avg: 48m 46s | Max: 50m 50s | Hits:  75%/2442  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 46m | Avg: 53m 22s | Max: 53m 39s | Hits:  75%/2442  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 03m | Avg: 43m 22s | Max: 57m 11s | Hits:  84%/8219  
      🟩 GCC7               Pass: 100%/2   | Total: 58m 22s | Avg: 29m 11s | Max: 52m 56s | Hits:  81%/2446  
      🟩 GCC8               Pass: 100%/1   | Total: 53m 00s | Avg: 53m 00s | Max: 53m 00s | Hits:  58%/1223  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 20s | Max: 55m 15s | Hits:  81%/2446  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 47m | Avg: 53m 52s | Max: 53m 58s | Hits:  61%/2446  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 49m | Avg: 54m 39s | Max: 56m 03s | Hits:  56%/2442  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 46s | Max: 58m 18s | Hits:  54%/2442  
      🟩 GCC13              Pass: 100%/11  | Total:  6h 11m | Avg: 33m 45s | Max:  1h 01m | Hits:  82%/13431 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 09m | Hits:  14%/2094  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 25m | Hits:  12%/2094  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  23%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 12h 00m | Avg: 42m 24s | Max: 57m 11s | Hits:  81%/20437 
      🟩 GCC                Pass: 100%/22  | Total: 15h 05m | Avg: 41m 10s | Max:  1h 01m | Hits:  74%/26876 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 52m | Avg:  1h 13m | Max:  1h 25m | Hits:  13%/4188  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 10m | Hits:  23%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 07m | Avg: 22m 39s | Max: 26m 50s | Hits:  93%/3663  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 05h | Avg: 51m 45s | Max:  1h 25m | Hits:  63%/40330 
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 52m | Avg: 29m 01s | Max: 55m 40s | Hits:  90%/9768  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 07h | Avg: 50m 54s | Max:  1h 25m | Hits:  63%/43993 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 54s | Avg: 20m 54s | Max: 20m 54s | Hits:  99%/1221  
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 39s | Avg: 16m 39s | Max: 16m 39s | Hits:  99%/1221  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 52s | Max: 26m 50s | Hits:  99%/3663  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 27s | Max: 22m 50s | Hits:  99%/3663  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 07m | Avg: 22m 39s | Max: 26m 50s | Hits:  93%/3663  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  49%/1221  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 16h 47m | Avg: 50m 21s | Max:  1h 15m | Hits:  63%/23659 
      🟩 20                 Pass: 100%/25  | Total: 17h 32m | Avg: 42m 06s | Max:  1h 25m | Hits:  75%/30102 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 17h 01m | Avg: 22m 41s | Max: 1h 03m | Hits: 76%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 32m 25s | Avg: 16m 12s | Max: 21m 19s | Hits:  83%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 16h 25m | Avg: 22m 54s | Max:  1h 03m | Hits:  76%/76573 
      🟩 arm64              Pass: 100%/2   | Total: 36m 05s | Avg: 18m 02s | Max: 20m 34s | Hits:  77%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 07m | Avg: 13m 28s | Max: 47m 25s | Hits:  84%/8901  
      🟩 12.5               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  30%/3562  
      🟩 12.8               Pass: 100%/38  | Total: 13h 51m | Avg: 21m 53s | Max:  1h 03m | Hits:  78%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 27m 41s | Avg: 13m 50s | Max: 14m 30s | Hits:  90%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 07m | Avg: 13m 28s | Max: 47m 25s | Hits:  84%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  30%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 13h 24m | Avg: 22m 20s | Max:  1h 03m | Hits:  77%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 27m 41s | Avg: 13m 50s | Max: 14m 30s | Hits:  90%/3562  
      🟩 nvcc               Pass: 100%/43  | Total: 16h 33m | Avg: 23m 06s | Max:  1h 03m | Hits:  76%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 48m 27s | Avg: 12m 06s | Max: 19m 59s | Hits:  94%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 36m 30s | Avg: 18m 15s | Max: 19m 36s | Hits:  89%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 35m 51s | Avg: 17m 55s | Max: 18m 12s | Hits:  89%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 40m 23s | Avg: 20m 11s | Max: 21m 01s | Hits:  88%/3562  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 41m | Avg: 14m 28s | Max: 20m 24s | Hits:  92%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 23m 23s | Avg: 11m 41s | Max: 18m 24s | Hits:  94%/3564  
      🟩 GCC8               Pass: 100%/1   | Total: 22m 21s | Avg: 22m 21s | Max: 22m 21s | Hits:  65%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 23m 55s | Avg: 11m 57s | Max: 18m 29s | Hits:  94%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 46m 48s | Avg: 23m 24s | Max: 23m 41s | Hits:  66%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 48m 57s | Avg: 24m 28s | Max: 26m 05s | Hits:  65%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 54m 25s | Avg: 27m 12s | Max: 28m 35s | Hits:  65%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  2h 49m | Avg: 16m 55s | Max: 27m 00s | Hits:  84%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 22s | Max: 49m 20s | Hits:  36%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 30m | Avg: 50m 17s | Max:  1h 03m | Hits:  26%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  30%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  4h 22m | Avg: 15m 26s | Max: 21m 01s | Hits:  91%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  6h 29m | Avg: 18m 31s | Max: 28m 35s | Hits:  80%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 07m | Avg: 49m 31s | Max:  1h 03m | Hits:  30%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  30%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 24m 14s | Avg: 12m 07s | Max: 12m 11s | Hits:  86%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total: 13h 12m | Avg: 24m 00s | Max:  1h 03m | Hits:  74%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 24m | Avg: 20m 27s | Max:  1h 03m | Hits:  81%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 15h 31m | Avg: 24m 31s | Max:  1h 03m | Hits:  73%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 44m 49s | Avg: 14m 56s | Max: 29m 06s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 41s | Avg: 11m 10s | Max: 12m 03s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 24m 14s | Avg: 12m 07s | Max: 12m 11s | Hits:  86%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total: 23m 37s | Avg: 23m 37s | Max: 23m 37s | Hits:  87%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  8h 16m | Avg: 24m 49s | Max: 58m 46s | Hits:  72%/35611 
      🟩 20                 Pass: 100%/23  | Total:  8h 12m | Avg: 21m 24s | Max:  1h 03m | Hits:  80%/40961 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 16h 29m | Avg: 23m 00s | Max: 47m 01s | Hits: 39%/102909

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 15h 57m | Avg: 23m 21s | Max: 47m 01s | Hits:  38%/97264 
      🟩 arm64              Pass: 100%/2   | Total: 31m 20s | Avg: 15m 40s | Max: 21m 08s | Hits:  56%/5645  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 27m | Avg: 17m 24s | Max: 32m 17s | Hits:  59%/13652 
      🟩 12.5               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max: 35m 12s | Hits:  28%/5590  
      🟩 12.8               Pass: 100%/36  | Total: 13h 54m | Avg: 23m 10s | Max: 47m 01s | Hits:  36%/83667 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 42m 49s | Avg: 21m 24s | Max: 21m 30s | Hits:  26%/5610  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 27m | Avg: 17m 24s | Max: 32m 17s | Hits:  59%/13652 
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max: 35m 12s | Hits:  28%/5590  
      🟩 nvcc12.8           Pass: 100%/34  | Total: 13h 11m | Avg: 23m 16s | Max: 47m 01s | Hits:  37%/78057 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 49s | Avg: 21m 24s | Max: 21m 30s | Hits:  26%/5610  
      🟩 nvcc               Pass: 100%/41  | Total: 15h 46m | Avg: 23m 05s | Max: 47m 01s | Hits:  40%/97299 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 17m | Avg: 19m 20s | Max: 24m 02s | Hits:  42%/11184 
      🟩 Clang15            Pass: 100%/2   | Total: 26m 23s | Avg: 13m 11s | Max: 13m 59s | Hits:  63%/5602  
      🟩 Clang16            Pass: 100%/2   | Total: 48m 13s | Avg: 24m 06s | Max: 25m 09s | Hits:  32%/5602  
      🟩 Clang17            Pass: 100%/2   | Total: 47m 12s | Avg: 23m 36s | Max: 25m 40s | Hits:  32%/5602  
      🟩 Clang18            Pass: 100%/6   | Total:  2h 36m | Avg: 26m 07s | Max: 47m 01s | Hits:  30%/14034 
      🟩 GCC7               Pass: 100%/2   | Total: 26m 00s | Avg: 13m 00s | Max: 20m 31s | Hits:  65%/5540  
      🟩 GCC8               Pass: 100%/1   | Total: 20m 47s | Avg: 20m 47s | Max: 20m 47s | Hits:  32%/2780  
      🟩 GCC9               Pass: 100%/2   | Total: 43m 09s | Avg: 21m 34s | Max: 24m 37s | Hits:  39%/5552  
      🟩 GCC10              Pass: 100%/2   | Total: 46m 36s | Avg: 23m 18s | Max: 23m 31s | Hits:  32%/5608  
      🟩 GCC11              Pass: 100%/2   | Total: 47m 19s | Avg: 23m 39s | Max: 24m 46s | Hits:  32%/5604  
      🟩 GCC12              Pass: 100%/2   | Total: 48m 44s | Avg: 24m 22s | Max: 26m 41s | Hits:  32%/5604  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 09m | Avg: 18m 55s | Max: 45m 53s | Hits:  52%/14291 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 06m | Avg: 33m 20s | Max: 34m 24s | Hits:  35%/5078  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 17m | Avg: 38m 31s | Max: 42m 04s | Hits:  22%/5238  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max: 35m 12s | Hits:  28%/5590  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  5h 55m | Avg: 22m 14s | Max: 47m 01s | Hits:  38%/42024 
      🟩 GCC                Pass: 100%/21  | Total:  7h 01m | Avg: 20m 05s | Max: 45m 53s | Hits:  43%/44979 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 23m | Avg: 35m 55s | Max: 42m 04s | Hits:  28%/10316 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max: 35m 12s | Hits:  28%/5590  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 20m 21s | Avg: 10m 10s | Max: 13m 15s | Hits:  84%/2912  
      🟩 rtx2080            Pass: 100%/41  | Total: 16h 08m | Avg: 23m 37s | Max: 47m 01s | Hits:  38%/99997 
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 14h 10m | Avg: 22m 58s | Max: 42m 04s | Hits:  39%/102869
      🟩 NVRTC              Pass: 100%/2   | Total: 30m 34s | Avg: 15m 17s | Max: 15m 29s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total:  1h 46m | Avg: 35m 23s | Max: 47m 01s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 30m 34s | Avg: 15m 17s | Max: 15m 29s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 20m 21s | Avg: 10m 10s | Max: 13m 15s | Hits:  84%/2912  
      🟩 90;90a;100         Pass: 100%/1   | Total: 32m 58s | Avg: 32m 58s | Max: 32m 58s | Hits:  31%/2912  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  7h 45m | Avg: 22m 09s | Max: 34m 58s | Hits:  39%/54868 
      🟩 20                 Pass: 100%/21  | Total:  8h 41m | Avg: 24m 51s | Max: 47m 01s | Hits:  39%/48041 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 04m | Avg: 5m 38s | Max: 14m 14s | Hits: 95%/11244

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 52m | Avg:  6m 16s | Max: 14m 14s | Hits:  94%/9020  
      🟩 arm64              Pass: 100%/4   | Total: 11m 13s | Avg:  2m 48s | Max:  2m 54s | Hits:  99%/2224  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s | Hits:  61%/262   
      🟩 12.5               Pass: 100%/2   | Total: 11m 14s | Avg:  5m 37s | Max:  5m 50s | Hits:  96%/708   
      🟩 12.8               Pass: 100%/19  | Total:  1h 44m | Avg:  5m 28s | Max: 14m 14s | Hits:  95%/10274 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s | Hits:  61%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 14s | Avg:  5m 37s | Max:  5m 50s | Hits:  96%/708   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 44m | Avg:  5m 28s | Max: 14m 14s | Hits:  95%/10274 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 04m | Avg:  5m 38s | Max: 14m 14s | Hits:  95%/11244 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s | Hits: 100%/558   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 39s | Avg:  4m 39s | Max:  4m 39s | Hits:  88%/556   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s | Hits:  99%/556   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 43s | Avg:  4m 43s | Max:  4m 43s | Hits:  88%/556   
      🟩 Clang18            Pass: 100%/4   | Total: 21m 04s | Avg:  5m 16s | Max: 11m 23s | Hits:  99%/2224  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s | Hits:  99%/558   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 52s | Avg:  4m 52s | Max:  4m 52s | Hits:  88%/556   
      🟩 GCC12              Pass: 100%/2   | Total: 17m 58s | Avg:  8m 59s | Max: 13m 17s | Hits:  93%/1112  
      🟩 GCC13              Pass: 100%/6   | Total: 29m 08s | Avg:  4m 51s | Max: 14m 14s | Hits:  99%/3336  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s | Hits:  61%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 10m 43s | Avg: 10m 43s | Max: 10m 43s | Hits:  48%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 14s | Avg:  5m 37s | Max:  5m 50s | Hits:  96%/708   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 37m 39s | Avg:  4m 42s | Max: 11m 23s | Hits:  97%/4450  
      🟩 GCC                Pass: 100%/10  | Total: 55m 42s | Avg:  5m 34s | Max: 14m 14s | Hits:  97%/5562  
      🟩 MSVC               Pass: 100%/2   | Total: 19m 36s | Avg:  9m 48s | Max: 10m 43s | Hits:  54%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 14s | Avg:  5m 37s | Max:  5m 50s | Hits:  96%/708   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 11s | Avg:  8m 35s | Max: 14m 14s | Hits:  99%/1112  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 47m | Avg:  5m 21s | Max: 13m 17s | Hits:  94%/10132 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 25m | Avg:  4m 29s | Max: 10m 43s | Hits:  94%/9576  
      🟩 Test               Pass: 100%/3   | Total: 38m 54s | Avg: 12m 58s | Max: 14m 14s | Hits:  99%/1668  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 10s | Avg:  6m 43s | Max: 14m 14s | Hits:  99%/1668  
      🟩 90a                Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s | Hits:  99%/556   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 21s | Avg:  3m 35s | Max:  5m 50s | Hits:  99%/2022  
      🟩 20                 Pass: 100%/18  | Total:  1h 49m | Avg:  6m 06s | Max: 14m 14s | Hits:  94%/9222  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 59s | Avg: 6m 29s | Max: 10m 43s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 59s | Avg:  6m 29s | Max: 10m 43s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 43s | Avg: 10m 43s | Max: 10m 43s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 02s | Avg: 30m 02s | Max: 30m 02s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@miscco miscco merged commit 964efd9 into NVIDIA:main Feb 17, 2025
174 of 176 checks passed
@miscco
Copy link
Contributor

miscco commented Feb 17, 2025

Thanks a lot for improving support for those new floating point types 🎉

davebayer added a commit to davebayer/cccl that referenced this pull request Feb 20, 2025
* implement limits for new fp types

* modularize `numeric_limits`
davebayer added a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
* implement limits for new fp types

* modularize `numeric_limits`
@miscco miscco deleted the new_fp_limits branch July 14, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants