When ArrowStreamWriter is writing a RecordBatch with nulls in it, it is mixing up the column's NullCount.
You can see here:
|
for (var i = 0; i < fieldCount; i++) |
|
{ |
|
var fieldArray = recordBatch.Column(i); |
|
fieldNodeOffsets[i] = |
|
Flatbuf.FieldNode.CreateFieldNode(Builder, fieldArray.Length, fieldArray.NullCount); |
|
} |
It is writing the fields from 0 ~~> fieldCount order. But then lower, it is writing the fields from fieldCount ~~> 0.
Looking at the Java implementation it says
// struct vectors have to be created in reverse order
A simple test of roundtripping the following RecordBatch shows the issue:
var result = new RecordBatch(
new Schema.Builder()
.Field(f => f.Name("age").DataType(Int32Type.Default))
.Field(f => f.Name("CharCount").DataType(Int32Type.Default))
.Build(),
new IArrowArray[]
{
new Int32Array(
new ArrowBuffer.Builder<int>().Append(0).Build(),
new ArrowBuffer.Builder<byte>().Append(0).Build(),
length: 1,
nullCount: 1,
offset: 0),
new Int32Array(
new ArrowBuffer.Builder<int>().Append(7).Build(),
ArrowBuffer.Empty,
length: 1,
nullCount: 0,
offset: 0)
},
length: 1);
Here, the "age" column should have a null in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has NullCount == 1 and "age" column has NullCount == 0.
Reporter: Eric Erhardt / @eerhardt
Assignee: Eric Erhardt / @eerhardt
PRs and other links:
Note: This issue was originally created as ARROW-5887. Please see the migration documentation for further details.
When ArrowStreamWriter is writing a
RecordBatchwithnulls in it, it is mixing up the column'sNullCount.You can see here:
arrow/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
Lines 195 to 200 in 90affbd
It is writing the fields from
0~~>fieldCountorder. But then lower, it is writing the fields fromfieldCount~~>0.Looking at the Java implementation it says
A simple test of roundtripping the following RecordBatch shows the issue:
Here, the "age" column should have a
nullin it. However, when you write and read this RecordBatch back, you see that the "CharCount" column hasNullCount== 1 and "age" column hasNullCount== 0.Reporter: Eric Erhardt / @eerhardt
Assignee: Eric Erhardt / @eerhardt
PRs and other links:
Note: This issue was originally created as ARROW-5887. Please see the migration documentation for further details.