feat: Add support for Run-End Encoded arrays#308
Conversation
Introduced RunEndEncodedType and RunEndEncodedArray classes to represent run-end encoded arrays, including validation and logical length calculation. Integrated REE support into ArrowArrayFactory and IPC serialization/deserialization (ArrowStreamWriter, ArrowReaderImplementation, ArrowTypeFlatbufferBuilder, MessageSerializer). Added unit tests for REE array creation, validation, serialization, and indexing. This enables efficient handling of consecutive runs of the same value in Arrow .NET.
… API, the integration tests and the concatenator.
There was a problem hiding this comment.
Pull request overview
Adds first-class support for Run-End Encoded (REE) arrays across Apache.Arrow .NET, integrating the new logical type into core type/array modeling, IPC read/write, C Data interface import/export, concatenation, and test coverage.
Changes:
- Introduces
ArrowTypeId.RunEndEncoded,RunEndEncodedType, andRunEndEncodedArray, and wires them into visitors/factories. - Extends IPC serialization/deserialization and JSON integration parsing to recognize/run REE schemas and arrays.
- Adds concatenation support and new/updated tests covering REE behavior (including IPC roundtrip and concatenation scenarios).
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Apache.Arrow.Tests/TestData.cs | Adds REE fields and array creation support in test schema/data generation. |
| test/Apache.Arrow.Tests/TableTests.cs | Updates expected column counts due to added REE test columns. |
| test/Apache.Arrow.Tests/RunEndEncodedArrayTests.cs | New unit tests for REE type/array creation, validation, IPC roundtrip, and factory build. |
| test/Apache.Arrow.Tests/ArrowReaderVerifier.cs | Extends array comparison visitor to support RunEndEncodedArray. |
| test/Apache.Arrow.Tests/ArrowArrayConcatenatorTests.cs | Adds concatenation tests for REE arrays (incl. sliced inputs and mismatch errors). |
| test/Apache.Arrow.IntegrationTest/JsonFile.cs | Adds JSON integration parsing and array creation support for REE. |
| src/Apache.Arrow/Types/RunEndEncodedType.cs | New nested type representing REE (run_ends + values) with run_ends type validation. |
| src/Apache.Arrow/Types/IArrowType.cs | Adds ArrowTypeId.RunEndEncoded. |
| src/Apache.Arrow/Ipc/MessageSerializer.cs | Adds IPC schema/type deserialization for REE field types. |
| src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs | Adds flatbuffer type emission for REE type. |
| src/Apache.Arrow/Ipc/ArrowStreamWriter.cs | Adds IPC record batch buffer/node traversal for RunEndEncodedArray. |
| src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs | Updates reader buffer-count logic for REE arrays (no top-level buffers). |
| src/Apache.Arrow/C/CArrowSchemaImporter.cs | Adds C Data interface schema import support for REE (+r). |
| src/Apache.Arrow/C/CArrowSchemaExporter.cs | Adds C Data interface schema export format for REE (+r). |
| src/Apache.Arrow/C/CArrowArrayImporter.cs | Adds C Data interface array import support for REE children handling. |
| src/Apache.Arrow/Arrays/RunEndEncodedArray.cs | New array implementation for REE with logical length derivation and physical-index lookup. |
| src/Apache.Arrow/Arrays/ArrowArrayFactory.cs | Enables building RunEndEncodedArray from ArrayData. |
| src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs | Adds concatenation logic for REE arrays (run_ends adjustment + values concatenation). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
This all looks good to me thanks Curt and @JorgeCandeias
This feature is implemented as of v23.0.0 release (PR #308), so it was removed from the "Not Implemented" section of the docs.
What's Changed
This PR adds basic support for Run-End Encoded arrays by following established codebase patterns.
Notably:
ArrowTypeIdadded.RunEndEncodedArrayadded.RunEndEncodedTypenested type.Co-authored-by: Jorge Candeias jorge.candeias@outcompute.com
Supercedes #260