fix: handle file reader cancellation and empty nested concat#327
Merged
CurtHagenlocher merged 3 commits intoApr 24, 2026
Merged
Conversation
Ensure async record batch reads use the caller token while loading file schemas before reading dictionaries or batches.
Keep zero-length child data when concatenating empty nested arrays and avoid negative list-view child slices for null...
762e2b4 to
2b61678
Compare
CurtHagenlocher
approved these changes
Apr 24, 2026
Contributor
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Thanks for these fixes!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes two correctness edge cases found while reviewing the IPC reader and array concatenation paths.
File reader cancellation
ArrowFileReaderImplementation.ReadRecordBatchAsync(...)andReadNextRecordBatchAsync(...)accepted a caller cancellation token, but theschema/footer loading step still called
ReadSchemaAsync()without passing thattoken.
That meant a canceled async file read could still continue through schema/footer
I/O before cancellation was observed later in the record batch path. This PR
passes the caller token into schema loading so cancellation is honored before
dictionary and record batch reads begin.
Empty nested array concatenation
ArrayDataConcatenatorcould drop the required child array structure whenconcatenating nested list arrays whose parent arrays were all empty. Nested Arrow
arrays still need a valid zero-length child
ArrayData; otherwise downstreamarray construction can fail or produce structurally invalid nested arrays.
This PR preserves a zero-length child for all-empty nested/list inputs.
Null-only ListView concatenation
For
ListViewandLargeListView, null rows can havesize == 0, and theiroffset values should not contribute to the child slice range. The previous logic
used every offset when computing the child bounds, even for zero-size entries.
This PR computes list-view child bounds only from entries with
size > 0, sonull-only inputs keep an empty child slice instead of deriving a range from null
rows.
Validation
dotnet test test/Apache.Arrow.Tests/Apache.Arrow.Tests.csproj -c Release --filter "FullyQualifiedName~ArrowFileReaderTests.ReadRecordBatchAsync_HonorsPreCanceledTokenDuringSchemaRead|FullyQualifiedName~ArrowFileReaderTests.ReadNextRecordBatchAsync_HonorsPreCanceledTokenDuringSchemaRead|FullyQualifiedName~ArrowArrayConcatenatorTests.TestConcatenateAllEmpty|FullyQualifiedName~ArrowArrayConcatenatorTests.TestConcatenateNullOnly"dotnet build Apache.Arrow.sln -c Release