feat: support v3 column default values in UpdateSchema (3/4)#793
feat: support v3 column default values in UpdateSchema (3/4)#793huan233usc wants to merge 4 commits into
Conversation
| bool is_defaulted_add = false; | ||
| // A column added in this update with a default value can be made required: rows | ||
| // written before the change read the initial-default instead of null. | ||
| bool is_defaulted_add = added_name_to_id_.contains(CaseSensitivityAwareName(name)) && |
There was a problem hiding this comment.
For map/list nested adds, AddColumnInternal stores the new field in added_name_to_id_ using the canonical internal path:
added_name_to_id_[CaseSensitivityAwareName(full_name)] = new_id;
where full_name becomes something like:
locations.value.alt
points.element.z
But RequireColumn receives and checks the user-facing shorthand path:
added_name_to_id_.contains(CaseSensitivityAwareName(name))
where name is typically:
locations.alt
points.z
So the lookup misses, even though it refers to the same added column
There was a problem hiding this comment.
Thanks for catching this -- yes it stored only the canonical full_name but callers use the short name. Fixed by also indexing the short name (matching how the schema name index handles map value / list element paths), with a regression test for requiring a defaulted nested column by short name in the same transaction.
It seems that same bug exists in Java; working on a fix apache/iceberg#17034.
f51fb4a to
d87a2b8
Compare
AddColumn / AddRequiredColumn now accept an optional default value, used as both the initial-default and write-default of the new column; a non-null default also lets a required column be added (or an added column be made required) without AllowIncompatibleChanges(). UpdateColumnDefault sets or clears the write-default. Defaults are cast to the column type (rejecting uncastable or out-of-range values) and preserved across rename / doc / type updates and nested field-id reassignment. Part 4 of the v3 column-default-values work (POC apache#731), built on apache#746.
d87a2b8 to
1eb9681
Compare
| *new_default); | ||
| ICEBERG_BUILDER_ASSIGN_OR_RETURN_WITH_ERROR( | ||
| Literal typed_default, | ||
| new_default->CastTo(internal::checked_pointer_cast<PrimitiveType>(field.type())), |
There was a problem hiding this comment.
Could we share the decimal default promotion logic used above in UpdateColumn here too? Literal::CastTo only returns decimal values when source and target DecimalType are exactly equal; it does not handle same-scale precision widening. That means adding decimal(18,2) with Literal::Decimal(..., 9, 2) is rejected even though this is the same allowed promotion handled above. The same issue appears in AddColumnInternal.
There was a problem hiding this comment.
Done — shared it into a CastDefaultToType helper used in all three spots, with tests. Thanks!
Extract the decimal same-scale precision-widening logic into a shared CastDefaultToType helper and use it in UpdateColumn, UpdateColumnDefault, and AddColumnInternal. Previously only UpdateColumn handled it, so setting or adding a decimal default whose precision differs from the column (e.g. Decimal(_,9,2) into a decimal(18,2) column) was wrongly rejected.
| target_type->type_id() == TypeId::kDecimal) { | ||
| const auto& decimal_type = internal::checked_cast<const DecimalType&>(*target_type); | ||
| return Literal::Decimal(std::get<Decimal>(value.value()).value(), | ||
| decimal_type.precision(), decimal_type.scale()); |
There was a problem hiding this comment.
Could we guard this decimal fast path with a same-scale check before rebuilding the literal? Decimal stores only the unscaled value, so accepting decimal(9,3) as a default for decimal(18,2) would reinterpret 1234 from 1.234 to 12.34 instead of rejecting it. Java default parsing/serialization requires the decimal default scale to match the column type, so only same-scale precision widening should be accepted here.
There was a problem hiding this comment.
Good catch — added a same-scale guard so different-scale defaults fall through to CastTo and get rejected. Tested both paths. Thanks!
Guard the decimal default fast path in CastDefaultToType with a same-scale check. Decimal stores only the unscaled value, so rebuilding a decimal(9,3) default as decimal(18,2) would reinterpret 1234 from 1.234 to 12.34. Only same-scale precision widening is accepted; a differing scale falls through to CastTo and is rejected, matching Java's requirement that a decimal default's scale match the column type.
What
Part 3 of 4 of Iceberg v3 column default-value support (POC #731), built on the
schema-layer support merged in #746. Independent of the read-path PRs (#792 and
the Avro follow-up).
Adds default-value handling to
UpdateSchema(schema evolution).Changes
AddColumn/AddRequiredColumntake an optionaldefault_value. Whenprovided it is set as both the column's
initial-defaultandwrite-default.A non-null default also lets a required column be added without
AllowIncompatibleChanges()— rows written before the change read the defaultinstead of null.
UpdateColumnDefault(name, default)(new) sets, or clears withstd::nullopt, a column'swrite-default; theinitial-defaultis fixed whenthe column is added.
values) and preserved across rename / doc / type-promotion updates and
nested field-id reassignment.
RequireColumnmay now mark a column that was added with a default required.The
SchemaFieldconstructor stores defaults verbatim (it does not coercethem), so the cast/promotion is performed explicitly at each evolution site —
the same effect as Java, where
NestedField's constructor runscastDefault.Same-scale decimal precision widening is handled directly (the unscaled value is
unchanged), since
Literal::CastTodoes not cast between decimal types.Tests
13 cases in
update_schema_test.cc: add optional/required/nested column with adefault, mismatched/narrowing rejection,
UpdateColumnDefault(set / clear / cast-to-type / pre-existing column), require-after-default, and
preservation across doc updates and type promotion — including same-scale
decimal precision promotion.
Stack
addColumn/updateColumnDefault