Skip to content

feat: Add support for Run-End Encoded arrays#308

Merged
CurtHagenlocher merged 7 commits into
apache:mainfrom
CurtHagenlocher:run-end-encoding
Apr 8, 2026
Merged

feat: Add support for Run-End Encoded arrays#308
CurtHagenlocher merged 7 commits into
apache:mainfrom
CurtHagenlocher:run-end-encoding

Conversation

@CurtHagenlocher
Copy link
Copy Markdown
Contributor

What's Changed

This PR adds basic support for Run-End Encoded arrays by following established codebase patterns.

Notably:

  • New ArrowTypeId added.
  • New array type RunEndEncodedArray added.
  • New visitor method to handle the new array type.
  • New entry in the IPC serializer field type switch.
  • New RunEndEncodedType nested type.
  • Basic feature tests.
  • C API support
  • Concatenation support

Co-authored-by: Jorge Candeias jorge.candeias@outcompute.com

Supercedes #260

JorgeCandeias and others added 4 commits February 13, 2026 00:29
Introduced RunEndEncodedType and RunEndEncodedArray classes to represent run-end encoded arrays, including validation and logical length calculation. Integrated REE support into ArrowArrayFactory and IPC serialization/deserialization (ArrowStreamWriter, ArrowReaderImplementation, ArrowTypeFlatbufferBuilder, MessageSerializer). Added unit tests for REE array creation, validation, serialization, and indexing. This enables efficient handling of consecutive runs of the same value in Arrow .NET.
… API, the integration tests and the concatenator.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for Run-End Encoded (REE) arrays across Apache.Arrow .NET, integrating the new logical type into core type/array modeling, IPC read/write, C Data interface import/export, concatenation, and test coverage.

Changes:

  • Introduces ArrowTypeId.RunEndEncoded, RunEndEncodedType, and RunEndEncodedArray, and wires them into visitors/factories.
  • Extends IPC serialization/deserialization and JSON integration parsing to recognize/run REE schemas and arrays.
  • Adds concatenation support and new/updated tests covering REE behavior (including IPC roundtrip and concatenation scenarios).

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/Apache.Arrow.Tests/TestData.cs Adds REE fields and array creation support in test schema/data generation.
test/Apache.Arrow.Tests/TableTests.cs Updates expected column counts due to added REE test columns.
test/Apache.Arrow.Tests/RunEndEncodedArrayTests.cs New unit tests for REE type/array creation, validation, IPC roundtrip, and factory build.
test/Apache.Arrow.Tests/ArrowReaderVerifier.cs Extends array comparison visitor to support RunEndEncodedArray.
test/Apache.Arrow.Tests/ArrowArrayConcatenatorTests.cs Adds concatenation tests for REE arrays (incl. sliced inputs and mismatch errors).
test/Apache.Arrow.IntegrationTest/JsonFile.cs Adds JSON integration parsing and array creation support for REE.
src/Apache.Arrow/Types/RunEndEncodedType.cs New nested type representing REE (run_ends + values) with run_ends type validation.
src/Apache.Arrow/Types/IArrowType.cs Adds ArrowTypeId.RunEndEncoded.
src/Apache.Arrow/Ipc/MessageSerializer.cs Adds IPC schema/type deserialization for REE field types.
src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs Adds flatbuffer type emission for REE type.
src/Apache.Arrow/Ipc/ArrowStreamWriter.cs Adds IPC record batch buffer/node traversal for RunEndEncodedArray.
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs Updates reader buffer-count logic for REE arrays (no top-level buffers).
src/Apache.Arrow/C/CArrowSchemaImporter.cs Adds C Data interface schema import support for REE (+r).
src/Apache.Arrow/C/CArrowSchemaExporter.cs Adds C Data interface schema export format for REE (+r).
src/Apache.Arrow/C/CArrowArrayImporter.cs Adds C Data interface array import support for REE children handling.
src/Apache.Arrow/Arrays/RunEndEncodedArray.cs New array implementation for REE with logical length derivation and physical-index lookup.
src/Apache.Arrow/Arrays/ArrowArrayFactory.cs Enables building RunEndEncodedArray from ArrayData.
src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs Adds concatenation logic for REE arrays (run_ends adjustment + values concatenation).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs Outdated
Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs
Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs
Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs
Comment thread src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs Outdated
Comment thread src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs
Comment thread src/Apache.Arrow/Types/RunEndEncodedType.cs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Comment thread test/Apache.Arrow.Tests/RunEndEncodedArrayTests.cs
Comment thread src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs
Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs
Comment thread src/Apache.Arrow/C/CArrowArrayImporter.cs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Comment thread src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Comment thread src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Comment thread src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Comment thread src/Apache.Arrow/Arrays/RunEndEncodedArray.cs
Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me thanks Curt and @JorgeCandeias

@CurtHagenlocher CurtHagenlocher merged commit c0f957a into apache:main Apr 8, 2026
14 checks passed
@CurtHagenlocher CurtHagenlocher deleted the run-end-encoding branch April 23, 2026 00:04
adamreeve pushed a commit that referenced this pull request May 18, 2026
This feature is implemented as of v23.0.0 release (PR #308), 
so it was removed from the  "Not Implemented" section of the docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants