Skip to content

fix: handle file reader cancellation and empty nested concat#327

Merged
CurtHagenlocher merged 3 commits into
apache:mainfrom
InCerryGit:fix/ipc-reader-correctness-guardrails
Apr 24, 2026
Merged

fix: handle file reader cancellation and empty nested concat#327
CurtHagenlocher merged 3 commits into
apache:mainfrom
InCerryGit:fix/ipc-reader-correctness-guardrails

Conversation

@InCerryGit
Copy link
Copy Markdown
Contributor

Summary

This PR fixes two correctness edge cases found while reviewing the IPC reader and array concatenation paths.

File reader cancellation

ArrowFileReaderImplementation.ReadRecordBatchAsync(...) and
ReadNextRecordBatchAsync(...) accepted a caller cancellation token, but the
schema/footer loading step still called ReadSchemaAsync() without passing that
token.

That meant a canceled async file read could still continue through schema/footer
I/O before cancellation was observed later in the record batch path. This PR
passes the caller token into schema loading so cancellation is honored before
dictionary and record batch reads begin.

Empty nested array concatenation

ArrayDataConcatenator could drop the required child array structure when
concatenating nested list arrays whose parent arrays were all empty. Nested Arrow
arrays still need a valid zero-length child ArrayData; otherwise downstream
array construction can fail or produce structurally invalid nested arrays.

This PR preserves a zero-length child for all-empty nested/list inputs.

Null-only ListView concatenation

For ListView and LargeListView, null rows can have size == 0, and their
offset values should not contribute to the child slice range. The previous logic
used every offset when computing the child bounds, even for zero-size entries.

This PR computes list-view child bounds only from entries with size > 0, so
null-only inputs keep an empty child slice instead of deriving a range from null
rows.

Validation

  • dotnet test test/Apache.Arrow.Tests/Apache.Arrow.Tests.csproj -c Release --filter "FullyQualifiedName~ArrowFileReaderTests.ReadRecordBatchAsync_HonorsPreCanceledTokenDuringSchemaRead|FullyQualifiedName~ArrowFileReaderTests.ReadNextRecordBatchAsync_HonorsPreCanceledTokenDuringSchemaRead|FullyQualifiedName~ArrowArrayConcatenatorTests.TestConcatenateAllEmpty|FullyQualifiedName~ArrowArrayConcatenatorTests.TestConcatenateNullOnly"
  • dotnet build Apache.Arrow.sln -c Release

Comment thread src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs
Ensure async record batch reads use the caller token while loading file schemas before reading dictionaries or batches.
Keep zero-length child data when concatenating empty nested arrays and avoid negative list-view child slices for null...
@InCerryGit InCerryGit force-pushed the fix/ipc-reader-correctness-guardrails branch from 762e2b4 to 2b61678 Compare April 24, 2026 14:42
Copy link
Copy Markdown
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these fixes!

@CurtHagenlocher CurtHagenlocher merged commit 7949fb6 into apache:main Apr 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants