Turn C++ dialect warning into error by bernhardmgruber · Pull Request #3453 · NVIDIA/cccl

bernhardmgruber · 2025-01-21T09:50:43Z

Accidentially compiling with < C++17 issues a warning (That C++ < 17 is no longer supported), but also causes errors later during compilation and the dialect warning is lost in the compiler error novel. We should error out sooner so users have a clear error message.

miscco

I am wondering whether we should just move that the the CCCL config and have one error

But that would not name the libraries soo meh

bernhardmgruber · 2025-01-21T10:05:25Z

I am wondering whether we should just move that the the CCCL config and have one error

But that would not name the libraries soo meh

I thought about the same, but it may happen that we deprecate C++17 in favor of C++20 in Thrust and CUB sooner than libcu++, so then the separate headers come in handy again. Although I think we should fuse the Thrust and CUB dialect headers at least at some point.

github-actions · 2025-01-21T11:42:39Z

🟩 CI finished in 1h 50m: Pass: 100%/78 | Total: 23h 07m | Avg: 17m 47s | Max: 1h 16m | Hits: 153%/12720

🟩 cub: Pass: 100%/38 | Total: 11h 32m | Avg: 18m 13s | Max: 1h 08m | Hits: 175%/3540

🟩 cpu
  🟩 amd64              Pass: 100%/36  | Total: 11h 21m | Avg: 18m 56s | Max:  1h 08m | Hits: 175%/3540  
  🟩 arm64              Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  6m 05s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total:  1h 31m | Avg: 18m 13s | Max:  1h 07m | Hits: 175%/885   
  🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
  🟩 12.6               Pass: 100%/31  | Total:  7h 50m | Avg: 15m 10s | Max:  1h 08m | Hits: 174%/2655  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 45s | Avg:  5m 22s | Max:  5m 29s
  🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 31m | Avg: 18m 13s | Max:  1h 07m | Hits: 175%/885   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
  🟩 nvcc12.6           Pass: 100%/29  | Total:  7h 39m | Avg: 15m 51s | Max:  1h 08m | Hits: 174%/2655  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 45s | Avg:  5m 22s | Max:  5m 29s
  🟩 nvcc               Pass: 100%/36  | Total: 11h 22m | Avg: 18m 56s | Max:  1h 08m | Hits: 175%/3540  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 26m 41s | Avg:  6m 40s | Max:  6m 45s
  🟩 Clang15            Pass: 100%/1   | Total:  7m 04s | Avg:  7m 04s | Max:  7m 04s
  🟩 Clang16            Pass: 100%/1   | Total:  7m 22s | Avg:  7m 22s | Max:  7m 22s
  🟩 Clang17            Pass: 100%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 12m | Avg: 10m 17s | Max: 22m 30s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  5m 30s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
  🟩 GCC9               Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  5m 34s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s
  🟩 GCC12              Pass: 100%/3   | Total: 29m 27s | Avg:  9m 49s | Max: 19m 17s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 42m | Avg: 12m 52s | Max: 22m 21s
  🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m | Hits: 175%/1770  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits: 174%/1770  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  1h 59m | Avg:  8m 34s | Max: 22m 30s
  🟩 GCC                Pass: 100%/18  | Total:  2h 51m | Avg:  9m 30s | Max: 22m 21s
  🟩 MSVC               Pass: 100%/4   | Total:  4h 30m | Avg:  1h 07m | Max:  1h 08m | Hits: 175%/3540  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 23m 38s | Avg: 11m 49s | Max: 19m 17s
  🟩 v100               Pass: 100%/36  | Total: 11h 09m | Avg: 18m 35s | Max:  1h 08m | Hits: 175%/3540  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total:  9h 09m | Avg: 17m 43s | Max:  1h 08m | Hits: 175%/3540  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 21s | Avg: 22m 21s | Max: 22m 21s
  🟩 GraphCapture       Pass: 100%/1   | Total: 18m 27s | Avg: 18m 27s | Max: 18m 27s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 03m | Avg: 21m 09s | Max: 22m 30s
  🟩 TestGPU            Pass: 100%/2   | Total: 39m 16s | Avg: 19m 38s | Max: 19m 46s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 23m 38s | Avg: 11m 49s | Max: 19m 17s
  🟩 90a                Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s
🟩 std
  🟩 17                 Pass: 100%/14  | Total:  5h 26m | Avg: 23m 17s | Max:  1h 08m | Hits: 175%/2655  
  🟩 20                 Pass: 100%/24  | Total:  6h 06m | Avg: 15m 16s | Max:  1h 08m | Hits: 172%/885

🟩 thrust: Pass: 100%/37 | Total: 10h 41m | Avg: 17m 21s | Max: 1h 16m | Hits: 145%/9180

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 35m 23s | Avg: 17m 41s | Max: 29m 09s
🟩 cpu
  🟩 amd64              Pass: 100%/35  | Total: 10h 32m | Avg: 18m 03s | Max:  1h 16m | Hits: 145%/9180  
  🟩 arm64              Pass: 100%/2   | Total:  9m 48s | Avg:  4m 54s | Max:  5m 06s
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total:  1h 17m | Avg: 15m 24s | Max: 56m 19s | Hits:  79%/1836  
  🟩 12.5               Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 16m
  🟩 12.6               Pass: 100%/30  | Total:  6h 56m | Avg: 13m 52s | Max:  1h 04m | Hits: 161%/7344  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 24s
  🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 17m | Avg: 15m 24s | Max: 56m 19s | Hits:  79%/1836  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 16m
  🟩 nvcc12.6           Pass: 100%/28  | Total:  6h 45m | Avg: 14m 29s | Max:  1h 04m | Hits: 161%/7344  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 24s
  🟩 nvcc               Pass: 100%/35  | Total: 10h 31m | Avg: 18m 02s | Max:  1h 16m | Hits: 145%/9180  
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 21m 13s | Avg:  5m 18s | Max:  5m 33s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 Clang18            Pass: 100%/7   | Total: 48m 29s | Avg:  6m 55s | Max: 14m 09s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 37s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
  🟩 GCC9               Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 58s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 52s | Avg:  5m 52s | Max:  5m 52s
  🟩 GCC11              Pass: 100%/1   | Total:  6m 06s | Avg:  6m 06s | Max:  6m 06s
  🟩 GCC12              Pass: 100%/1   | Total:  6m 08s | Avg:  6m 08s | Max:  6m 08s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 21m | Avg: 10m 13s | Max: 29m 09s
  🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 54s | Max: 57m 30s | Hits: 101%/3672  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 46m | Avg: 55m 29s | Max:  1h 04m | Hits: 174%/5508  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 16m
🟩 cxx_family
  🟩 Clang              Pass: 100%/14  | Total:  1h 26m | Avg:  6m 08s | Max: 14m 09s
  🟩 GCC                Pass: 100%/16  | Total:  2h 07m | Avg:  7m 56s | Max: 29m 09s
  🟩 MSVC               Pass: 100%/5   | Total:  4h 40m | Avg: 56m 03s | Max:  1h 04m | Hits: 145%/9180  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 16m
🟩 gpu
  🟩 v100               Pass: 100%/37  | Total: 10h 41m | Avg: 17m 21s | Max:  1h 16m | Hits: 145%/9180  
🟩 jobs
  🟩 Build              Pass: 100%/31  | Total:  8h 48m | Avg: 17m 03s | Max:  1h 16m | Hits:  90%/7344  
  🟩 TestCPU            Pass: 100%/3   | Total: 53m 24s | Avg: 17m 48s | Max: 38m 26s | Hits: 365%/1836  
  🟩 TestGPU            Pass: 100%/3   | Total: 59m 57s | Avg: 19m 59s | Max: 29m 09s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 55s | Avg:  4m 55s | Max:  4m 55s
🟩 std
  🟩 17                 Pass: 100%/14  | Total:  5h 09m | Avg: 22m 07s | Max:  1h 16m | Hits:  94%/5508  
  🟩 20                 Pass: 100%/21  | Total:  4h 56m | Avg: 14m 08s | Max:  1h 11m | Hits: 222%/3672

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 53s | Avg: 4m 26s | Max: 6m 55s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 55s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  1m 58s | Avg:  1m 58s | Max:  1m 58s
  🟩 Test               Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s

🟩 python: Pass: 100%/1 | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
+/-	CUB
+/-	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
+/-	CUB
+/-	Thrust
	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 78)

#	Runner
53	`linux-amd64-cpu16`
11	`linux-amd64-gpu-v100-latest-1`
9	`windows-amd64-cpu16`
4	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

dkolsen-pgi · 2025-01-21T14:18:57Z

Although I think we should fuse the Thrust and CUB dialect headers at least at some point.

My initial thought is don't merge the Thrust and CUB checks. Thrust can be used without CUB. (For example, in some configurations NVC++ uses the Thrust OpenMP back end.) And CUB is often used directly without going through Thrust.

@shwina

update docs update docs add `memcmp`, `memmove` and `memchr` implementations implement tests Use cuda::std::min/max in Thrust (NVIDIA#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (NVIDIA#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (NVIDIA#2773) Deprecate thrust::null_type (NVIDIA#3367) Deprecate cub::DeviceSpmv (NVIDIA#3320) Fixes: NVIDIA#896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (NVIDIA#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (NVIDIA#3385) cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Deprecate thrust::async (NVIDIA#3324) Fixes: NVIDIA#100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (NVIDIA#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (NVIDIA#3366) Replace `typedef` with `using` in libcu++ (NVIDIA#3368) Deprecate thrust::optional (NVIDIA#3307) Fixes: NVIDIA#3306 Upgrade to Catch2 3.8 (NVIDIA#3310) Fixes: NVIDIA#1724 refactor `<cuda/std/cstdint>` (NVIDIA#3325) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Update CODEOWNERS (NVIDIA#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (NVIDIA#3408) Implement more cmath functions to be usable on host and device (NVIDIA#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (NVIDIA#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Fix assert definition for NVHPC due to constexpr issues (NVIDIA#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes NVIDIA#3411 Extend CUB reduce benchmarks (NVIDIA#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: NVIDIA#3283 Update upload-pages-artifact to v3 (NVIDIA#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Replace and deprecate thrust::cuda_cub::terminate (NVIDIA#3421) `std::linalg` accessors and `transposed_layout` (NVIDIA#2962) Add round up/down to multiple (NVIDIA#3234) [FEA]: Introduce Python module with CCCL headers (NVIDIA#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under NVIDIA#3201 (comment)) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * NVIDIA#3201 (comment) * NVIDIA#3201 (comment) * Install CCCL headers under cuda.cccl.include Trigger for this change: * NVIDIA#3201 (comment) Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d6. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (NVIDIA#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a21. Error message: NVIDIA#3201 (comment) * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd. * Implement suggestion by @shwina (NVIDIA#3201 (review)) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (NVIDIA#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes NVIDIA#3404 Fix CI issues (NVIDIA#3443) Remove deprecated `cub::min` (NVIDIA#3450) * Remove deprecated `cuda::{min,max}` * Drop unused `thrust::remove_cvref` file Fix typo in builtin (NVIDIA#3451) Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435) uses unsigned offset types in thrust's scan dispatch (NVIDIA#3436) Default transform_iterator's copy ctor (NVIDIA#3395) Fixes: NVIDIA#2393 Turn C++ dialect warning into error (NVIDIA#3453) Uses unsigned offset types in thrust's sort algorithm calling into `DispatchMergeSort` (NVIDIA#3437) * uses thrust's dynamic dispatch for merge_sort * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Refactor allocator handling of contiguous_storage (NVIDIA#3050) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Drop thrust::detail::integer_traits (NVIDIA#3391) Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Improve docs of std headers (NVIDIA#3416) Drop C++11 and C++14 support for all of cccl (NVIDIA#3417) * Drop C++11 and C++14 support for all of cccl --------- Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Deprecate a few CUB macros (NVIDIA#3456) Deprecate thrust universal iterator categories (NVIDIA#3461) Fix launch args order (NVIDIA#3465) Add `--extended-lambda` to the list of removed clangd flags (NVIDIA#3432) add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429) Add `_CCCL_BUILTIN_PREFETCH` (NVIDIA#3433) Drop universal iterator categories (NVIDIA#3474) Ensure that headers in `<cuda/*>` can be build with a C++ only compiler (NVIDIA#3472) Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470) Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Moves CUB kernel entry points to a detail namespace (NVIDIA#3468) * moves emptykernel to detail ns * second batch * third batch * fourth batch * fixes cuda parallel * concatenates nested namespaces Deprecate block/warp algo specializations (NVIDIA#3455) Fixes: NVIDIA#3409 Refactor CUB's util_debug (NVIDIA#3345)

Turn C++ dialect warning into error

d274da0

bernhardmgruber requested review from a team as code owners January 21, 2025 09:50

bernhardmgruber requested a review from gevtushenko January 21, 2025 09:50

miscco approved these changes Jan 21, 2025

View reviewed changes

bernhardmgruber enabled auto-merge (squash) January 21, 2025 10:24

elstehle approved these changes Jan 21, 2025

View reviewed changes

bernhardmgruber merged commit 1f5d514 into NVIDIA:main Jan 21, 2025
90 of 93 checks passed

bernhardmgruber deleted the dialect_error branch January 21, 2025 13:24

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025

Turn C++ dialect warning into error (NVIDIA#3453)

7fa73b8

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025

Turn C++ dialect warning into error (NVIDIA#3453)

92b8038

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn C++ dialect warning into error#3453

Turn C++ dialect warning into error#3453
bernhardmgruber merged 1 commit intoNVIDIA:mainfrom
bernhardmgruber:dialect_error

bernhardmgruber commented Jan 21, 2025

Uh oh!

miscco left a comment

Uh oh!

bernhardmgruber commented Jan 21, 2025

Uh oh!

github-actions bot commented Jan 21, 2025

🟩 cub: Pass: 100%/38 | Total: 11h 32m | Avg: 18m 13s | Max: 1h 08m | Hits: 175%/3540

🟩 thrust: Pass: 100%/37 | Total: 10h 41m | Avg: 17m 21s | Max: 1h 16m | Hits: 145%/9180

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 53s | Avg: 4m 26s | Max: 6m 55s

🟩 python: Pass: 100%/1 | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 78)

Uh oh!

Uh oh!

dkolsen-pgi commented Jan 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bernhardmgruber commented Jan 21, 2025

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber commented Jan 21, 2025

Uh oh!

github-actions bot commented Jan 21, 2025

🟩 cub: Pass: 100%/38 | Total: 11h 32m | Avg: 18m 13s | Max: 1h 08m | Hits: 175%/3540

🟩 thrust: Pass: 100%/37 | Total: 10h 41m | Avg: 17m 21s | Max: 1h 16m | Hits: 145%/9180

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 53s | Avg: 4m 26s | Max: 6m 55s

🟩 python: Pass: 100%/1 | Total: 43m 31s | Avg: 43m 31s | Max: 43m 31s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 78)

Uh oh!

Uh oh!

dkolsen-pgi commented Jan 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants