WIP - Fix bug JSON input barfs on {"emptylist":[]} #1063
Conversation
| one_column_roundtrip("list_single_column", values, true, Some(SMALL_SIZE / 2)); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
💯 for test driven development 👍
| dbg!(&self); | ||
| let is_nullable = match self.level_type { | ||
| LevelType::Primitive(is_nullable) => is_nullable, | ||
| LevelType::List(is_nullable) => is_nullable, |
There was a problem hiding this comment.
This is the only change to the code path -- it addresses the error message we were seeing where the List leveltype was being picked up by the catchall (thus the panic).
| // one_column_roundtrip("null_list_single_column", values, true, Some(SMALL_SIZE / 2)); | ||
|
|
||
| let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap(); | ||
| roundtrip("test_null_list_single_column.parquet", batch, None); |
There was a problem hiding this comment.
The error I am now seeing comes from the reader being unable to read the parquet that is written out.
thread 'arrow::arrow_writer::tests::null_list_single_column' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to convert parquet INT32 logical type Some(UNKNOWN(NullType)) or converted type NONE")', parquet/src/arrow/array_reader.rs:1502:14
I have peppered the code with dbg! to see if I can make sense of what's going on but so far all I have is a hunch that we've also got a potential bug in the reader. Detailed output from this test appears below:
---- arrow::arrow_writer::tests::null_list_single_column stdout ----
[parquet/src/arrow/levels.rs:754] &self = LevelInfo {
definition: [
1,
],
repetition: Some(
[
0,
],
),
array_offsets: [
0,
0,
],
array_mask: [
true,
],
max_definition: 2,
level_type: List(
true,
),
offset: 0,
length: 0,
}
[parquet/src/arrow/levels.rs:788] &filtered = []
[parquet/src/arrow/array_reader.rs:1844] &context = ArrayReaderBuilderContext {
def_level: 0,
rep_level: 0,
path: ColumnPath {
parts: [
"arrow_schema",
],
},
}
[parquet/src/arrow/array_reader.rs:1845] &child = GroupType {
basic_info: BasicTypeInfo {
name: "emptylist",
repetition: Some(
OPTIONAL,
),
converted_type: LIST,
logical_type: Some(
LIST(
ListType,
),
),
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
converted_type: NONE,
logical_type: None,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}
[parquet/src/arrow/array_reader.rs:1495] &list_type = GroupType {
basic_info: BasicTypeInfo {
name: "emptylist",
repetition: Some(
OPTIONAL,
),
converted_type: LIST,
logical_type: Some(
LIST(
ListType,
),
),
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
converted_type: NONE,
logical_type: None,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}
[parquet/src/arrow/array_reader.rs:1496] &context = ArrayReaderBuilderContext {
def_level: 0,
rep_level: 0,
path: ColumnPath {
parts: [
"arrow_schema",
],
},
}
[parquet/src/arrow/array_reader.rs:1497] &item_type = PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
}
[parquet/src/arrow/array_reader.rs:1498] &new_context = ArrayReaderBuilderContext {
def_level: 2,
rep_level: 1,
path: ColumnPath {
parts: [
"arrow_schema",
"emptylist",
],
},
}
thread 'arrow::arrow_writer::tests::null_list_single_column' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to convert parquet INT32 logical type Some(UNKNOWN(NullType)) or converted type NONE")', parquet/src/arrow/array_reader.rs:1502:14
|
@alamb FYI I am afk until early Jan now so won't be making any progress on this until then. |
|
Still here and interested. Sorry for the hiatus - will try and spin back up shortly. Confirming that this is still relevant pls @alamb |
👋 @novemberkilo #1036 is still open and I assume it is still relevant, but we will have to confirm with @chadbrewbaker I think |
|
@alamb I have an update.
I am not sure whether I have uncovered a bug with I am going to close this PR for now because it is stale and a bit noisy. I will recreate a clean PR when I am clearer about the outcome of the code that appears here. Copying @tustvold because I wonder if this is now close to the work you've done recently in #1246 |
|
Thanks @novemberkilo |
Which issue does this PR close?
Closes #1036
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?