Specialize numeric_limits for CUDA 12.8 FP types#3832
Conversation
libcudacxx/include/cuda/std/limits
Outdated
|
|
||
| static constexpr bool traps = false; | ||
| static constexpr bool tinyness_before = false; | ||
| static constexpr float_round_style round_style = round_toward_zero; |
There was a problem hiding this comment.
I am not sure what is the right round style here. The conversion functions from e. g. float to __nv_fp8_e8m0 allow only rounding towards zero or positive infinity.
There was a problem hiding this comment.
Maybe all of the fp8, fp6 and fp4 types should be round_indeterminate because they don't implement any arithmetic operations and the wmma instructions define the rounding to be unspecified
There was a problem hiding this comment.
I am fine with that, but then the question is what happens if implement parts of the machinery through conversions to floating point?
There was a problem hiding this comment.
The constructors from standard floating point types use cudaRoundZero
...st/libcudacxx/std/language.support/support.limits/limits/numeric.limits.members/min.pass.cpp
Show resolved
Hide resolved
miscco
left a comment
There was a problem hiding this comment.
Looks good thanks for figuring all the small little issues out
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
🟩 CI finished in 1h 38m: Pass: 100%/158 | Total: 2d 22h | Avg: 26m 49s | Max: 1h 25m | Hits: 60%/248346
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
| # | Runner |
|---|---|
| 111 | linux-amd64-cpu16 |
| 15 | windows-amd64-cpu16 |
| 10 | linux-arm64-cpu16 |
| 8 | linux-amd64-gpu-rtx2080-latest-1 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 5 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
|
Thanks a lot for improving support for those new floating point types 🎉 |
* implement limits for new fp types * modularize `numeric_limits`
* implement limits for new fp types * modularize `numeric_limits`
This PR specializes
cuda::std::numeric_limitsfor__nv_fp8_e8m0,__nv_fp6_e2m3,__nv_fp6_e3m2and__nv_fp4_e2m1floating point types introduces in CUDA 12.8.Partially implements #3558.