Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 9, 2026

Rationale for this change

The TODO comment saying explicitly setting output->buffers[2] = nullptr in the else branch was redundant.

The output ArrayData is allocated by PrepareOutput() before the kernel executes:

while (span_iterator_.Next(&input)) {
ARROW_ASSIGN_OR_RAISE(output.value, PrepareOutput(input.length));

PrepareOutput() resizes the buffers vector to the required size for the output type (3 for String/Binary: validity, offsets, data):

Result<std::shared_ptr<ArrayData>> PrepareOutput(int64_t length) {
auto out = std::make_shared<ArrayData>(output_type_.GetSharedPtr(), length);
out->buffers.resize(output_num_buffers_);

For String/Binary output types, ComputeDataPreallocate() only preallocates the offsets buffer (not the data buffer):

case Type::BINARY:
case Type::STRING:
case Type::LIST:
case Type::MAP:
widths->emplace_back(32, /*added_length=*/1);

PrepareOutput() only allocates buffers that are in the data_preallocated_ vector. For String/Binary, this means only buffers[1] (offsets) is allocated:

for (size_t i = 0; i < data_preallocated_.size(); ++i) {
const auto& prealloc = data_preallocated_[i];
if (prealloc.bit_width >= 0) {
ARROW_ASSIGN_OR_RAISE(
out->buffers[i + 1],
AllocateDataBuffer(kernel_ctx_, length + prealloc.added_length,
prealloc.bit_width));
}
}

Since std::vector::resize()constructs all elements, buffers[2] is already nullptr from the resize call. Therefore, explicitly assigning nullptr in the else branch is redundant.

What changes are included in this PR?

Converted the redundant else clause to DCHECK_EQ.

Are these changes tested?

Yes, I locally ranCast.FixedSizeBinaryToBinaryOrString and Cast.FixedSizeBinaryToBinaryOrStringWithSlice.

Are there any user-facing changes?

No.

…g/Binary cast

The output->buffers vector is already resized with nullptr-initialized elements by PrepareOutput() in exec.cc before the kernel executes, so explicitly setting buffers[2] to nullptr when input_data is nullptr is redundant. Added a DCHECK to verify this invariant in debug builds.
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

⚠️ GitHub issue #48802 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant