Follow-up from apache/iceberg-go#1110.
iceberg-rust uses Bucket(u32) / Truncate(u32), accepting N up to MaxUint32 (4,294,967,295). Java's reference impl uses int, capping at Integer.MAX_VALUE (2,147,483,647). iceberg-go just landed apache/iceberg-go#1110 which validates n <= math.MaxInt32 at parse time, aligning with Java. iceberg-cpp uses int32_t, so the upper bound matches Java via the type. PyIceberg uses PositiveInt with no explicit ceiling.
So among the five clients:
| client |
type |
upper bound on N |
| Java |
int |
MaxInt32 (2,147,483,647) |
| iceberg-go (post-#1110) |
int + parse check |
MaxInt32 |
| iceberg-cpp |
int32_t |
INT32_MAX |
| iceberg-rust |
u32 |
MaxUint32 (4,294,967,295) |
| PyIceberg |
PositiveInt |
unbounded |
The spec doesn't bound N explicitly — Integer.MAX_VALUE only appears as a mask operand in the bucket formula (murmur3_x86_32_hash(x) & Integer.MAX_VALUE) % N, not as a ceiling. So N > Integer.MAX_VALUE is undefined per-spec rather than disallowed, but Java is the de facto reference and three of the four other clients match.
Worth noting: iceberg-cpp's TransformFromString parses via ParseNumber<int32_t> (rejects overflow at parse time via the type), but the Bucket / Truncate constructors don't validate > 0, so bucket[0] parses cleanly and only fails later at runtime — same shape as pre-#1110 iceberg-go. Probably a separate small issue worth filing on cpp's side.
Doesn't bite today — no real table is plausibly partitioned with 2–4 billion buckets. But rust is currently the only client that could author a spec in (MaxInt32, MaxUint32], and such a spec would throw NumberFormatException in Java and fail to parse in iceberg-go post-#1110.
If u32 is intentional, a doc comment near Bucket / Truncate capturing the divergence would help future readers not read it as a parity gap. If it's incidental, narrowing to i32 (or a NonZero newtype) would converge with the reference impl.
Follow-up from apache/iceberg-go#1110.
iceberg-rust uses
Bucket(u32)/Truncate(u32), acceptingNup toMaxUint32(4,294,967,295). Java's reference impl usesint, capping atInteger.MAX_VALUE(2,147,483,647). iceberg-go just landed apache/iceberg-go#1110 which validatesn <= math.MaxInt32at parse time, aligning with Java. iceberg-cpp usesint32_t, so the upper bound matches Java via the type. PyIceberg usesPositiveIntwith no explicit ceiling.So among the five clients:
NintMaxInt32(2,147,483,647)int+ parse checkMaxInt32int32_tINT32_MAXu32MaxUint32(4,294,967,295)PositiveIntThe spec doesn't bound
Nexplicitly —Integer.MAX_VALUEonly appears as a mask operand in the bucket formula(murmur3_x86_32_hash(x) & Integer.MAX_VALUE) % N, not as a ceiling. SoN > Integer.MAX_VALUEis undefined per-spec rather than disallowed, but Java is the de facto reference and three of the four other clients match.Worth noting: iceberg-cpp's
TransformFromStringparses viaParseNumber<int32_t>(rejects overflow at parse time via the type), but theBucket/Truncateconstructors don't validate> 0, sobucket[0]parses cleanly and only fails later at runtime — same shape as pre-#1110 iceberg-go. Probably a separate small issue worth filing on cpp's side.Doesn't bite today — no real table is plausibly partitioned with 2–4 billion buckets. But rust is currently the only client that could author a spec in
(MaxInt32, MaxUint32], and such a spec would throwNumberFormatExceptionin Java and fail to parse in iceberg-go post-#1110.If
u32is intentional, a doc comment nearBucket/Truncatecapturing the divergence would help future readers not read it as a parity gap. If it's incidental, narrowing toi32(or aNonZeronewtype) would converge with the reference impl.