Skip to content

IPC not respecting not preserving dict ID #6443

@brancz

Description

@brancz

Describe the bug

When setting with_preserve_dict_id(false) on IpcWriteOptions of a StreamWriter, and then write a record with multiple dicts whose Fields in the Schema have dict_id: 0, then the last dict's dictionary is actually used for all occurrences.

Best case this causes data to be incorrect, worst case, it causes a panic (which is what led me down this path because my first dictionary had more entries than the second and it caused an out of bounds panic).

To Reproduce

https://gist.github.com/brancz/067bfe6c9f9dfa7a7db82da1757e0edc results in

assertion `left == right` failed
  left: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "b"
, data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [DictionaryArray {keys: PrimitiveArray<Int32>
[
  0,
  1,
] values: StringArray
[
  "e",
  "f",
]}
, DictionaryArray {keys: PrimitiveArray<Int32>
[
  0,
  1,
] values: StringArray
[
  "e",
  "f",
]}
], row_count: 2 }
 right: RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "b"
, data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [DictionaryArray {keys: PrimitiveArray<Int32>
[
  0,
  1,
] values: StringArray
[
  "c",
  "d",
]}
, DictionaryArray {keys: PrimitiveArray<Int32>
[
  0,
  1,
] values: StringArray
[
  "e",
  "f",
]}
], row_count: 2 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Expected behavior

Dicts are assigned correctly when the Schema's Field's dict_id is requested to not be preserved.

@alamb @tustvold

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratearrow-flightChanges to the arrow-flight cratebugnext-major-releasethe PR has API changes and it waiting on the next major versionparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions