Use libcu++ limits/trait in tests/benchmarks#3822
Use libcu++ limits/trait in tests/benchmarks#3822bernhardmgruber merged 5 commits intoNVIDIA:mainfrom
Conversation
b39a3ff to
5c8f54e
Compare
5c8f54e to
96ed90b
Compare
🟨 CI finished in 1h 27m: Pass: 97%/93 | Total: 1d 20h | Avg: 28m 27s | Max: 1h 11m | Hits: 80%/132111
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
96ed90b to
4d90b5d
Compare
c2h/include/c2h/custom_type.h
Outdated
| _LIBCUDACXX_BEGIN_NAMESPACE_STD | ||
| template <template <typename> class... Policies> | ||
| class numeric_limits<c2h::custom_type_t<Policies...>> | ||
| class __numeric_limits_impl<c2h::custom_type_t<Policies...>, __numeric_limits_type::__other> |
There was a problem hiding this comment.
Shouldnt this also work if we just specialized numeric_limits
There was a problem hiding this comment.
It doesn't and I need your guidance here. The problem is that CUB sometimes queries numeric_limits<const T> and that does not find a specialization if I only specialize:
class numeric_limits<c2h::custom_type_t<Policies...>> { ... };
Specializing __numeric_limits_impl works, because numeric_limits strips CV qualifiers and passes the type to __numeric_limits_impl. What's the best practice? Should be specialize numeric_limits multiple times for T, const T, and maybe also for referneces of those? Or should we try harder inside CUB to call like numeric_limits<decay_t<T>>?
There was a problem hiding this comment.
That is a bug in our implementation then: https://eel.is/c++draft/numeric.limits#general-5
The value of each member of a specialization of numeric_limits on a cv-qualified type cv T shall be equal to the value of the corresponding member of the specialization on the unqualified type T.
| // replace with | ||
| // std::numeric_limits<T>::max() when | ||
| // C++11 support is more prevalent | ||
| // TODO(bgruber): replace with ::cuda::std::numeric_limits<T>::max() (breaking change) |
There was a problem hiding this comment.
I thought we were breaking things?
There was a problem hiding this comment.
I wanted to separate non-breaking from breaking changes, so we can more easily roll if needed, and have a smaller diff to inspect with more attention.
miscco
left a comment
There was a problem hiding this comment.
I went through it, no change requested.
That said I found a few places where we are unconsistent about fully qualifying cuda::std
Feel free to ignore
ab531ba to
bab6e89
Compare
🟨 CI finished in 1h 25m: Pass: 97%/93 | Total: 1d 22h | Avg: 30m 11s | Max: 1h 13m | Hits: 86%/131751
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
|
I specialized |
ok, this is a fun one. The test is broken on |
|
I reported the bug here: #3835. And will disable the test now in this PR. |
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
bab6e89 to
fb4cf88
Compare
| // FIXME(bgruber): uchar3 fails the test, see #3835 | ||
| using vec_types = c2h::type_list<ulonglong4, /*uchar3,*/ short2>; |
There was a problem hiding this comment.
I am disabling the failing test here. We should update #3835 once this PR is merged.
🟩 CI finished in 1h 27m: Pass: 100%/93 | Total: 1d 23h | Avg: 30m 41s | Max: 1h 14m | Hits: 86%/134193
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| +/- | Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
This pulls out the non-breaking part of #3384, which:
This PR does NOT replace any use of CUB traits in the CUB/Thrust headers, because that would break any user opting in with a custom type by specializing CUB traits.