Feature Request / Improvement
Hi!
We are seeing a behavior where after we update a table's sort order, it's not being reflected in the new manifests.
This is reproducible by writing a unit test:
@TestTemplate
public void testSortOrder() {
sql(
"CREATE TABLE %s (col string, userid int, dt string) USING iceberg partitioned by (dt)",
tableName);
sql("ALTER TABLE %s WRITE ORDERED BY userid", tableName);
sql("INSERT OVERWRITE %s PARTITION (dt='dt1') VALUES ('str1', 1)", tableName);
Assert.assertEquals(
"Should have 1 row with sort order id is 1, and column stats is not null",
1L,
scalarSql(
"SELECT count(*) FROM %s.files where sort_order_id = 1 and column_sizes is not null",
tableName));
}
And we see the following failure:
Should have 1 row with sort order id is 1, and column stats is not null
Expected :1
Actual :0
We're able to confirm that there is only 1 row:
Assert.assertEquals(
"Should have 1 row",
1L,
scalarSql(
"SELECT count(*) FROM %s.files",
tableName));
Internally, we noticed that updating SparkWrite.createWriter() to create writerFactory with .dataSortOrder(table.sortOrder()) resolves this issue:
SparkFileWriterFactory writerFactory =
SparkFileWriterFactory.builderFor(table)
.dataFileFormat(format)
.dataSchema(writeSchema)
.dataSparkType(dsSchema)
.dataSortOrder(table.sortOrder())
.writeProperties(writeProperties)
.build();
Thanks!
Query engine
None
Willingness to contribute
Feature Request / Improvement
Hi!
We are seeing a behavior where after we update a table's sort order, it's not being reflected in the new manifests.
This is reproducible by writing a unit test:
And we see the following failure:
We're able to confirm that there is only 1 row:
Internally, we noticed that updating SparkWrite.createWriter() to create
writerFactorywith.dataSortOrder(table.sortOrder())resolves this issue:Thanks!
Query engine
None
Willingness to contribute