replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE#2439
replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE#2439ericniebler merged 5 commits intoNVIDIA:mainfrom
_CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE#2439Conversation
🟨 CI finished in 1h 19m: Pass: 99%/368 | Total: 2d 03h | Avg: 8m 22s | Max: 49m 36s | Hits: 74%/25647
|
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| CUB | |
| +/- | Thrust |
| +/- | CUDA Experimental |
| pycuda | |
| CUDA C Core Library |
Modifications in project or dependencies?
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | pycuda |
| +/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
| # | Runner |
|---|---|
| 297 | linux-amd64-cpu16 |
| 28 | linux-amd64-gpu-v100-latest-1 |
| 28 | linux-arm64-cpu16 |
| 15 | windows-amd64-cpu16 |
🟩 CI finished in 2h 32m: Pass: 100%/368 | Total: 2d 03h | Avg: 8m 24s | Max: 49m 36s | Hits: 74%/25647
|
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| CUB | |
| +/- | Thrust |
| +/- | CUDA Experimental |
| pycuda | |
| CUDA C Core Library |
Modifications in project or dependencies?
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | pycuda |
| +/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
| # | Runner |
|---|---|
| 297 | linux-amd64-cpu16 |
| 28 | linux-amd64-gpu-v100-latest-1 |
| 28 | linux-arm64-cpu16 |
| 15 | windows-amd64-cpu16 |
|
|
||
| template <class Vector> | ||
| void TestTransformInputOutputIterator() | ||
| THRUST_DISABLE_BROKEN_GCC_VECTORIZER void TestTransformInputOutputIterator() |
There was a problem hiding this comment.
This fixes our tests, but won't gcc still be miscompiling Thrust for users?
There was a problem hiding this comment.
That is nothing we can change. I want to note that this is exceptionally frickle and dependent on exact sizes and optimization settings, so I dont see anything we can do there
| _CCCL_EXEC_CHECK_DISABLE | ||
| template <typename... Ts> | ||
| _CCCL_FORCEINLINE _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const | ||
| inline _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const |
There was a problem hiding this comment.
@miscco at least locally, this change avoids the gcc optimizer issue.
There was a problem hiding this comment.
That is awesome. I will test whether we can avoid that hack in the other tests too. Will file a separate PR though
There was a problem hiding this comment.
looks like this is still failing with gcc 12 😿
https://github.com/NVIDIA/cccl/actions/runs/10967082905/job/30507674109?pr=2439
There was a problem hiding this comment.
it fixed almost all of the failures tho. just two remain. i can track those down when i get back from vacation.
|
/ok to test |
🟨 CI finished in 2h 00m: Pass: 99%/368 | Total: 7d 00h | Avg: 27m 29s | Max: 1h 25m | Hits: 54%/25647
|
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| CUB | |
| +/- | Thrust |
| +/- | CUDA Experimental |
| pycuda | |
| CUDA C Core Library |
Modifications in project or dependencies?
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | pycuda |
| +/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
| # | Runner |
|---|---|
| 297 | linux-amd64-cpu16 |
| 28 | linux-amd64-gpu-v100-latest-1 |
| 28 | linux-arm64-cpu16 |
| 15 | windows-amd64-cpu16 |
🟨 CI finished in 2d 08h: Pass: 99%/368 | Total: 7d 00h | Avg: 27m 25s | Max: 1h 25m | Hits: 54%/25647
|
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| CUB | |
| +/- | Thrust |
| +/- | CUDA Experimental |
| pycuda | |
| CUDA C Core Library |
Modifications in project or dependencies?
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | pycuda |
| +/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
| # | Runner |
|---|---|
| 297 | linux-amd64-cpu16 |
| 28 | linux-amd64-gpu-v100-latest-1 |
| 28 | linux-arm64-cpu16 |
| 15 | windows-amd64-cpu16 |
|
/ok to test |
🟩 CI finished in 1h 49m: Pass: 100%/370 | Total: 7d 15h | Avg: 29m 44s | Max: 1h 14m | Hits: 9%/25696
|
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| CUB | |
| +/- | Thrust |
| +/- | CUDA Experimental |
| pycuda | |
| CUDA C Core Library |
Modifications in project or dependencies?
| Project | |
|---|---|
| +/- | CCCL Infrastructure |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | pycuda |
| +/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 370)
| # | Runner |
|---|---|
| 297 | linux-amd64-cpu16 |
| 30 | linux-amd64-gpu-v100-latest-1 |
| 28 | linux-arm64-cpu16 |
| 15 | windows-amd64-cpu16 |
Description
cccl has
_CCCL_FORCEINLINEand_CCCL_ALWAYS_INLINE. there should be only one. also,_CCCL_FORCEINLINEcurrently expands toinlinewhen not using a CUDA compiler. that is unexpected. it should expand to either__attribute__((always_inline))or__forceinlinedepending on which is supported by the host compiler.closes #2438
This PR moves the definition of
_CCCL_FORCEINLINEfromexecution_space.htovisibility.h. it also changes the definition to expand directly to either__inline__ __attribute__((always_inline))or__forceinlinerather then indirectly through the__forceinline__macro defined inhost_defines.h.Checklist