-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow cratearrow-flightChanges to the arrow-flight crateChanges to the arrow-flight cratebugnext-major-releasethe PR has API changes and it waiting on the next major versionthe PR has API changes and it waiting on the next major versionparquetChanges to the parquet crateChanges to the parquet crate
Description
Describe the bug
When setting with_preserve_dict_id(false) on IpcWriteOptions of a StreamWriter, and then write a record with multiple dicts whose Fields in the Schema have dict_id: 0, then the last dict's dictionary is actually used for all occurrences.
Best case this causes data to be incorrect, worst case, it causes a panic (which is what led me down this path because my first dictionary had more entries than the second and it caused an out of bounds panic).
To Reproduce
https://gist.github.com/brancz/067bfe6c9f9dfa7a7db82da1757e0edc results in
assertion `left == right` failed
left: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "b"
, data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [DictionaryArray {keys: PrimitiveArray<Int32>
[
0,
1,
] values: StringArray
[
"e",
"f",
]}
, DictionaryArray {keys: PrimitiveArray<Int32>
[
0,
1,
] values: StringArray
[
"e",
"f",
]}
], row_count: 2 }
right: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "b"
, data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [DictionaryArray {keys: PrimitiveArray<Int32>
[
0,
1,
] values: StringArray
[
"c",
"d",
]}
, DictionaryArray {keys: PrimitiveArray<Int32>
[
0,
1,
] values: StringArray
[
"e",
"f",
]}
], row_count: 2 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Expected behavior
Dicts are assigned correctly when the Schema's Field's dict_id is requested to not be preserved.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow cratearrow-flightChanges to the arrow-flight crateChanges to the arrow-flight cratebugnext-major-releasethe PR has API changes and it waiting on the next major versionthe PR has API changes and it waiting on the next major versionparquetChanges to the parquet crateChanges to the parquet crate