GH-48177: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on s390x by Vishwanatha-HD · Pull Request #48180 · apache/arrow

Vishwanatha-HD · 2025-11-19T14:07:50Z

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes "arrow-acero-asof-join-node-test" testcase failure.

The "arrow-acero-asof-join-node-test" testcase was Aborted/core dumped on Big-endian platforms.

$ ./arrow-acero-asof-join-node-test
[ RUN      ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/3
[       OK ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/3 (201 ms)
[ RUN      ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/4
arrow/cpp/src/arrow/compute/util.cc:35:  Check failed: false 
Aborted (core dumped)

What changes are included in this PR?

The fix includes changes to "util.cc" file to address the Abort/Core dump issues.

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub Issue: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on Big-Endian (s390x) systems #48177
GitHub main Issue link: [C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

github-actions · 2025-11-19T21:49:58Z

⚠️ GitHub issue #48177 has been automatically assigned in GitHub to PR creator.

kou

Could you fix lint failure?

https://github.com/apache/arrow/actions/runs/19504115732/job/55873075753?pr=48180#step:6:84

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index 66c48631dc..3b671db021 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -325,10 +325,11 @@ void bytes_to_bits(int64_t hardware_flags, const int num_bits, const uint8_t* by
     bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
 #else
     if (tail == 8) {
-      bytes_next = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes + num_bits - tail));
+      bytes_next =
+          util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes + num_bits - tail));
     } else {
-      // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian order
-      // to ensure compatibility with subsequent bit operations
+      // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian
+      // order to ensure compatibility with subsequent bit operations
       bytes_next = 0;
       for (int i = 0; i < tail; ++i) {
         bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);

You can use nice pre-commit run --show-diff-on-failure --color=always --all-files cpp.

kou · 2025-11-20T00:29:51Z

cpp/src/arrow/compute/util.cc

-#endif
  ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
  if (num_bytes == 8) {
    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));


Does this work on big-endian system?

Thanks for pointing out this.. Now with the way we are handling the tail_bytes and loading the word data, we dont actually need to change "SafeLoadUpTo8Bytes()" function.. With the conditional compilation, this function will never be called on Big-endian architecture.
I have reverted this change.. Tested completely on s390x to see if all the test work. I have pushed a new commit. Please give your review comments. Thanks.

So we are not going to update this function for big-endian because it won't be called? If so, why don't we keep the above DCHECK(false)?

kou · 2025-11-20T00:33:23Z

cpp/src/arrow/compute/util.cc

+#if ARROW_LITTLE_ENDIAN
    uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
+#else
+    int tail_bytes = (tail + 7) / 8;
+    uint64_t word;
+    if (tail_bytes == 8) {
+      word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
+    } else {
+      // For bit manipulation, always load into least significant bits
+      // to ensure compatibility with CountTrailingZeros on Big-endian systems
+      word = 0;
+      for (int i = 0; i < tail_bytes; ++i) {
+        word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
+      }
+    }
+#endif


Why do we need this?

The SafeLoadUpTo8Bytes() change adds support for big-endian, right?

Now that I have removed the big-endian support to SafeLoadUpTo8Bytes() function, these changes are required as these handle the way we handle the tail_bytes on big-endian systems. If the tail_bytes are equal to 8, then we call directly the SafeLoad to load the data onto "word" variable. And for rest other cases, we need to take care of loading least significant bits to ensure compatibility with "CountTrailingZeros". This is the reason why we wont be able to make a direct call "SafeLoadUpTo8Bytes()" for every tail_bytes.

I have fixed the lint errors and pushed my changes. Thanks..

IIUC, here you want to load these bytes in little-endian to be further processed by CountTrailingZeros. What you are doing is not leveraging SafeLoadUpTo8Bytes(), which is supposed to load bytes in big-endian (and currently not implemented), but write your own little-endian loading.

This should work. But I think we'd better do it the other way:

Implement the big-endian loading in SafeLoadUpTo8Bytes() (you already did it in your previous commit), keep the call to it here, for both little- and big-endian.

For big-endian, issue an explicit byte swapping for big-endian: #if !ARROW_LITTLE_ENDIAN word = bit_util::ByteSwap(word); #endif

This way, the code can be more compact and semantic clear. The cost is an extra byte-swapping, which is trivial imho. cc @kou

Hi @zanmato1984..
I made the code changes as per your suggestion above.. but unfortunately, the testcase doesnt pass.. The thing is that its not just the "byteswap" that is required on the BE systems..

./debug/arrow-compute-row-test --gtest_filter=KeyCompare.CompareColumnsToRowsCuriousFSB
Note: Google Test filter = KeyCompare.CompareColumnsToRowsCuriousFSB
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from KeyCompare
[ RUN ] KeyCompare.CompareColumnsToRowsCuriousFSB
arrow/cpp/src/arrow/compute/row/compare_test.cc:103: Failure
Expected equality of these values:
num_rows_no_match
Which is: 7
1

[ FAILED ] KeyCompare.CompareColumnsToRowsCuriousFSB (2 ms)
[----------] 1 test from KeyCompare (2 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (18 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] KeyCompare.CompareColumnsToRowsCuriousFSB

1 FAILED TEST

Also, please update the code so I can help on any further failures. The current change isn't sufficient and I'm worrying about we may have false positives.

@zanmato1984.. Please get my latest code changes to util.cc file..

inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) { ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8); if (num_bytes == 8) { return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes)); } else { uint64_t word = 0; #if ARROW_LITTLE_ENDIAN for (int i = 0; i < num_bytes; ++i) { word |= static_cast<uint64_t>(bytes[i]) << (8 * i); } #else // Big-endian: most significant byte first for (int i = 0; i < num_bytes; ++i) { word |= static_cast<uint64_t>(bytes[i]) << (8 * (num_bytes - 1 - i)); } #endif return word; } }

In the bits_to_indexes_internal() function >>>>>>>>>>

// Optionally process the last partial word with masking out bits outside range if (tail) { const uint8_t* bits_tail = bits + (num_bits - tail) / 8; uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8); #if !ARROW_LITTLE_ENDIAN word = ::arrow::bit_util::ByteSwap(word); #endif if (bit_to_search == 0) { word = ~word; } word &= ~0ULL >> (64 - tail); if (filter_input_indexes) { bits_filter_indexes_helper(word, input_indexes + num_bits - tail, num_indexes, indexes); } else { bits_to_indexes_helper(word, num_bits - tail + base_index, num_indexes, indexes); } } }

In the bytes_to_bits() function >>>>>>>>>>>>

if (tail) { uint64_t bytes_next; bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail); #if !ARROW_LITTLE_ENDIAN bytes_next = ::arrow::bit_util::ByteSwap(bytes_next); #endif bytes_next &= 0x0101010101010101ULL; bytes_next |= (bytes_next >> 7); // Pairs of adjacent output bits in individual bytes bytes_next |= (bytes_next >> 14); // 4 adjacent output bits in individual bytes bytes_next |= (bytes_next >> 28); // All 8 output bits in the lowest byte bits[num_bits / 8] = static_cast<uint8_t>(bytes_next & 0xff); }

And, yes.. This looks to be a new testcase failure with the above mentioned changes..

Thanks for the update. I'll look into it.

Hi @Vishwanatha-HD , I guess that test KeyCompare.CompareColumnsToRowsCuriousFSB would still fail even w/o the change I proposed. The fact that it is failing means it is calling SafeLoadUpTo8Bytes, which is supposed to be a DCHECK failure. Your change of removing that DCHECK, which is against its by-design intention, makes it passing false-positively.

Meanwhile, do you see other tests failing with the change I proposed?

I've pushed a commit containing more fixes that I see necessary. I don't have a big-endian hardware so I'm not able to test it in local. Please pull the code and see if the tests pass and let me know the result. Thanks @Vishwanatha-HD .

Vishwanatha-HD

I have addressed all the review comments. Please re-review. Thanks.

Vishwanatha-HD · 2025-11-20T10:28:55Z

cpp/src/arrow/compute/util.cc

-#endif
  ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
  if (num_bytes == 8) {
    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));


Thanks for pointing out this.. Now with the way we are handling the tail_bytes and loading the word data, we dont actually need to change "SafeLoadUpTo8Bytes()" function.. With the conditional compilation, this function will never be called on Big-endian architecture.
I have reverted this change.. Tested completely on s390x to see if all the test work. I have pushed a new commit. Please give your review comments. Thanks.

Vishwanatha-HD · 2025-11-20T11:23:05Z

cpp/src/arrow/compute/util.cc

+#if ARROW_LITTLE_ENDIAN
    uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
+#else
+    int tail_bytes = (tail + 7) / 8;
+    uint64_t word;
+    if (tail_bytes == 8) {
+      word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
+    } else {
+      // For bit manipulation, always load into least significant bits
+      // to ensure compatibility with CountTrailingZeros on Big-endian systems
+      word = 0;
+      for (int i = 0; i < tail_bytes; ++i) {
+        word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
+      }
+    }
+#endif


Now that I have removed the big-endian support to SafeLoadUpTo8Bytes() function, these changes are required as these handle the way we handle the tail_bytes on big-endian systems. If the tail_bytes are equal to 8, then we call directly the SafeLoad to load the data onto "word" variable. And for rest other cases, we need to take care of loading least significant bits to ensure compatibility with "CountTrailingZeros". This is the reason why we wont be able to make a direct call "SafeLoadUpTo8Bytes()" for every tail_bytes.

Vishwanatha-HD · 2025-11-20T12:52:51Z

cpp/src/arrow/compute/util.cc

+#if ARROW_LITTLE_ENDIAN
    uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
+#else
+    int tail_bytes = (tail + 7) / 8;
+    uint64_t word;
+    if (tail_bytes == 8) {
+      word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
+    } else {
+      // For bit manipulation, always load into least significant bits
+      // to ensure compatibility with CountTrailingZeros on Big-endian systems
+      word = 0;
+      for (int i = 0; i < tail_bytes; ++i) {
+        word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
+      }
+    }
+#endif


I have fixed the lint errors and pushed my changes. Thanks..

cpp/src/arrow/compute/util.cc

kou

Does this work?

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index b90b3a6405..163a80d9d4 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -30,33 +30,41 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
-    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+    auto word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+#if !ARROW_LITTLE_ENDIAN
+    word = bit_util::ByteSwap(word);
+#endif
+    return word;
   } else {
     uint64_t word = 0;
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
+#else
+      word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);
+#endif
     }
     return word;
   }
 }
 
 inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
+#if ARROW_LITTLE_ENDIAN
     util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
+#else
+    util::SafeStore(reinterpret_cast<uint64_t*>(bytes), bit_util::ByteSwap(value));
+#endif
   } else {
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       bytes[i] = static_cast<uint8_t>(value >> (8 * i));
+#else
+      bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
+#endif
     }
   }
 }

k8ika0s · 2025-11-23T22:56:42Z

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question.
Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

Vishwanatha-HD · 2025-11-24T09:34:58Z

Does this work?

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index b90b3a6405..163a80d9d4 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -30,33 +30,41 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
-    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+    auto word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+#if !ARROW_LITTLE_ENDIAN
+    word = bit_util::ByteSwap(word);
+#endif
+    return word;
   } else {
     uint64_t word = 0;
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
+#else
+      word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);
+#endif
     }
     return word;
   }
 }
 
 inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
+#if ARROW_LITTLE_ENDIAN
     util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
+#else
+    util::SafeStore(reinterpret_cast<uint64_t*>(bytes), bit_util::ByteSwap(value));
+#endif
   } else {
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       bytes[i] = static_cast<uint8_t>(value >> (8 * i));
+#else
+      bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
+#endif
     }
   }
 }

Hi @kou ,
I have now reverted the changes done to "SafeLoadUpTo8Bytes()" function on s390x.. Its totally not required.. Thanks..

Vishwanatha-HD · 2025-11-24T09:48:50Z

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question. Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

Thanks @k8ika0s as well for your review comments.. I have checked the tail==8 code path, and its not required anymore. I have reverted the changes and pushed the code changes again..

Vishwanatha-HD

I have addressed all the review comments.. Please re-review the changes.. Thanks..

Vishwanatha-HD · 2025-11-24T18:37:36Z

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question. Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

@k8ika0s.. Thanks very much for your review on this.. Yeah sure.. You please go ahead and cherry-pick my PR patches and run the tests from your end.. Please let me know the final status.. Thanks.. !!

Vishwanatha-HD

Resolved all the code review comments

cpp/src/arrow/compute/util.cc

kou · 2025-11-25T02:22:04Z

cpp/src/arrow/compute/util.cc

+    uint64_t bytes_next;
+#if ARROW_LITTLE_ENDIAN
+    bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
+#else
+    // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian
+    // order to ensure compatibility with subsequent bit operations
+    bytes_next = 0;
+    for (int i = 0; i < tail; ++i) {
+      bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);
+    }
+#endif


Can we revert this change with the latest SafeLoadUpTo8Bytes() (that has big endian support)?

@kou.. I tried doing that but the testcase failed on s390x.. We need the "bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);" on big-endian, which we dont get it when we directly call the SafeLoadUpTo8Bytes()..

Could you share code you tried?

#48180 (review) includes word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);.

@kou.. The SafeLoadUpTo8Bytes() function remains unchanged..

inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) { ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8); if (num_bytes == 8) { return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes)); } else { uint64_t word = 0; for (int i = 0; i < num_bytes; ++i) { word |= static_cast<uint64_t>(bytes[i]) << (8 * i); } return word; } }

The bytes_to_bits() function is handling the endianness fix independently.. If I do the endianness conversion inside SafeLoadUpTo8Bytes() function, rather than here, then the testcase is not working..

if (tail) { uint64_t bytes_next; #if ARROW_LITTLE_ENDIAN bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail); #else // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian // order to ensure compatibility with subsequent bit operations bytes_next = 0; for (int i = 0; i < tail; ++i) { bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i); } #endif bytes_next &= 0x0101010101010101ULL; bytes_next |= (bytes_next >> 7); // Pairs of adjacent output bits in individual bytes bytes_next |= (bytes_next >> 14); // 4 adjacent output bits in individual bytes bytes_next |= (bytes_next >> 28); // All 8 output bits in the lowest byte bits[num_bits / 8] = static_cast<uint8_t>(bytes_next & 0xff); }

cpp/src/arrow/compute/util.cc

zanmato1984 · 2025-12-04T11:20:46Z

Does this work?

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index b90b3a6405..163a80d9d4 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -30,33 +30,41 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
-    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+    auto word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+#if !ARROW_LITTLE_ENDIAN
+    word = bit_util::ByteSwap(word);
+#endif
+    return word;
   } else {
     uint64_t word = 0;
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
+#else
+      word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);
+#endif
     }
     return word;
   }
 }
 
 inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
+#if ARROW_LITTLE_ENDIAN
     util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
+#else
+    util::SafeStore(reinterpret_cast<uint64_t*>(bytes), bit_util::ByteSwap(value));
+#endif
   } else {
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       bytes[i] = static_cast<uint8_t>(value >> (8 * i));
+#else
+      bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
+#endif
     }
   }
 }

Hi @kou, I have a question: why do we need to swap the bytes for big-endian when num_bytes == 8? The underlying util::SafeLoad/Store are just memcpy so the byte orders between value and the bytes should be the same right?

kou · 2025-12-04T11:37:29Z

I thought that we need to convert to little endian. But is my assumption wrong...? If so, my suggested code was wrong. Sorry.

zanmato1984 · 2025-12-04T11:46:32Z

I thought that we need to convert to little endian. But is my assumption wrong...? If so, my suggested code was wrong. Sorry.

Thanks for explaining. I'm not sure either. My assumption is that by SafeLoad/Store we need to preserve the machine endian. That is for example:

uint64_t value = SafeLoad(bytes);
uint8_t *p_value = reinterprete_cast<uint8_t *>(&value);
assert(p_value[0] == bytes[0]);
assert(p_value[1] == bytes[1]);
...
assert(p_value[7] == bytes[7]);

zanmato1984 · 2025-12-04T10:12:14Z

cpp/src/arrow/compute/util.cc

-#endif
  ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
  if (num_bytes == 8) {
    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));


So we are not going to update this function for big-endian because it won't be called? If so, why don't we keep the above DCHECK(false)?

zanmato1984 · 2025-12-05T10:09:21Z

cpp/src/arrow/compute/util.cc

+#if ARROW_LITTLE_ENDIAN
    uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
+#else
+    int tail_bytes = (tail + 7) / 8;
+    uint64_t word;
+    if (tail_bytes == 8) {
+      word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
+    } else {
+      // For bit manipulation, always load into least significant bits
+      // to ensure compatibility with CountTrailingZeros on Big-endian systems
+      word = 0;
+      for (int i = 0; i < tail_bytes; ++i) {
+        word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
+      }
+    }
+#endif


IIUC, here you want to load these bytes in little-endian to be further processed by CountTrailingZeros. What you are doing is not leveraging SafeLoadUpTo8Bytes(), which is supposed to load bytes in big-endian (and currently not implemented), but write your own little-endian loading.

This should work. But I think we'd better do it the other way:

Implement the big-endian loading in SafeLoadUpTo8Bytes() (you already did it in your previous commit), keep the call to it here, for both little- and big-endian.

For big-endian, issue an explicit byte swapping for big-endian: #if !ARROW_LITTLE_ENDIAN word = bit_util::ByteSwap(word); #endif

This way, the code can be more compact and semantic clear. The cost is an extra byte-swapping, which is trivial imho. cc @kou

…ilures on s390x

github-actions bot added Component: C++ awaiting review Awaiting review labels Nov 19, 2025

Vishwanatha-HD mentioned this pull request Nov 19, 2025

[C++][Parquet] Fix arrow-acero-asof-join-node-test failures on Big-Endian (s390x) systems #48177

Open

kou changed the title ~~GH-48151: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures…~~ GH-48177: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on s390x Nov 19, 2025

kou reviewed Nov 20, 2025

View reviewed changes

Vishwanatha-HD force-pushed the fixParqIssues2 branch from d80ee37 to ec51da8 Compare November 20, 2025 12:35

Vishwanatha-HD commented Nov 20, 2025

View reviewed changes

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

kou reviewed Nov 22, 2025

View reviewed changes

cpp/src/arrow/compute/util.cc Outdated Show resolved Hide resolved

kou reviewed Nov 22, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 24, 2025

Vishwanatha-HD force-pushed the fixParqIssues2 branch from ec51da8 to 6b16301 Compare November 24, 2025 09:54

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 24, 2025

Vishwanatha-HD force-pushed the fixParqIssues2 branch 4 times, most recently from 471c817 to 4d4691a Compare November 24, 2025 11:09

Vishwanatha-HD commented Nov 24, 2025

View reviewed changes

kou reviewed Nov 25, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 25, 2025

Vishwanatha-HD force-pushed the fixParqIssues2 branch from 4d4691a to b564d35 Compare November 29, 2025 13:30

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 29, 2025

zanmato1984 requested changes Dec 5, 2025

View reviewed changes

Vishwanatha-HD added 2 commits December 8, 2025 21:34

apacheGH-48151: [C++][Parquet] Fix arrow-acero-asof-join-node-test fa…

93ff1a5

…ilures on s390x

apacheGH-48151: [C++][Parquet] Fix arrow-acero-asof-join-node-test fa…

4daec9b

…ilures on s390x

Vishwanatha-HD force-pushed the fixParqIssues2 branch from b564d35 to 4daec9b Compare December 8, 2025 18:50

Add some fixes

965c9e8

zanmato1984 force-pushed the fixParqIssues2 branch from d4ad7fb to 965c9e8 Compare December 10, 2025 09:01

Conversation

Vishwanatha-HD commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vishwanatha-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Vishwanatha-HD commented Nov 24, 2025

Uh oh!

Vishwanatha-HD commented Nov 24, 2025

Uh oh!

Vishwanatha-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Vishwanatha-HD commented Nov 24, 2025

Uh oh!

Vishwanatha-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zanmato1984 commented Dec 4, 2025

Uh oh!

kou commented Dec 4, 2025

Vishwanatha-HD commented Nov 19, 2025 •

edited

Loading