Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add byteswap
  • Loading branch information
fbusato committed Feb 27, 2025
commit f00ed427acc34245f3b4bdc7b4a94ec075ff0ad7
9 changes: 5 additions & 4 deletions docs/libcudacxx/standard_api/numerics_library/bit.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
.. _libcudacxx-standard-api-numerics-bit:

``<cuda/std/bit>``
======================
==================

CUDA Performance Considerations
-------------------------------

- ``bit_width()`` translates into a single ``FLO`` SASS instruction. The result is assumed to be in the range ``[0, N-bit]``.
- ``bit_ceil()`` translates into ``FLO, SHL`` SASS instructions. The result is assumed to be greater than or equal to the input.
- ``bit_floor()`` translates into ``ADD, FLO, SHL, IMINMAX`` SASS instructions. The result is assumed to be less than or equal to the input.
- ``bit_ceil()`` translates into ``ADD, FLO, SHL, IMINMAX`` SASS instructions. The result is assumed to be greater than or equal to the input.
- ``bit_floor()`` translates into ``FLO, SHL`` SASS instructions. The result is assumed to be less than or equal to the input.
- ``byteswap()`` translates into a single ``PRMT`` SASS instruction.
- ``popcount()`` translates into a single ``POPC`` SASS instruction. The result is assumed to be in the range ``[0, N-bit]``.
- ``has_single_bit()`` translates into ``POPC + ISETP`` SASS instructions.
- ``rotl()/rotr()`` translate into a single ``SHF`` (funned shift) SASS instruction.
Expand All @@ -21,5 +22,5 @@ Additional Notes
----------------

- All functions are marked ``[[nodiscard]]`` and ``noexcept``
- All functions support ``__uint128_t``
- All functions support 128-bit integer types
- ``bit_ceil()`` checks for overflow in debug mode
Loading