When writing an IPC file having multiple record batches, the schema provided to IpcFormatWriter is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.
This can be reproduced with the following test case (using pyarrow):
def test_chunked_record_batch_meta():
num_batches = 2
ipc_file = "/tmp/batches_with_metadata.arrow"
int_array = pa.array([i for i in range(chunk_size)])
schema = pa.schema(
[
("values", pa.int64()),
],
metadata={"foo": "bar"},
)
writer = pa.RecordBatchFileWriter(
ipc_file, schema
)
for i in range(num_batches):
# follow examples here:
# https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
batch = pa.record_batch(
[int_array],
names=["values"],
metadata={"batch_id": str(i)},
)
writer.write_batch(batch)
writer.close()
mmapped_file = pa.memory_map(ipc_file)
reader = pa.ipc.open_file(mmapped_file)
batch_0 = reader.get_record_batch(0)
assert batch_0.schema.metadata
Reporter: Yue Ni / @niyue
Assignee: Yue Ni / @niyue
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-16131. Please see the migration documentation for further details.
When writing an IPC file having multiple record batches, the schema provided to
IpcFormatWriteris correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.This can be reproduced with the following test case (using pyarrow):
Reporter: Yue Ni / @niyue
Assignee: Yue Ni / @niyue
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-16131. Please see the migration documentation for further details.