Core: Make partition field in TrackedFile optional#17000
Conversation
d6cf350 to
ebe1087
Compare
| /** Adapts {@link TrackedFile} entries to the {@link DataFile} and {@link DeleteFile} APIs. */ | ||
| class TrackedFileAdapters { | ||
|
|
||
| static final Types.StructType EMPTY_STRUCT_TYPE = Types.StructType.of(); |
There was a problem hiding this comment.
we can probably use the package private constants from the BaseFile class.
There was a problem hiding this comment.
You're right. I removed introducing it here and reuse the on in BaseFile
| @Override | ||
| public StructLike partition() { | ||
| return file().partition(); | ||
| return file().partition() != null ? file().partition() : EMPTY_PARTITION_DATA; |
There was a problem hiding this comment.
nit: We could use the MoreObjects.firstNonNull method. Feel free to disregard this comment since this method isn't widely used in the codebase.
There was a problem hiding this comment.
I just did a grep on the Java codebase, apparently, there is a single usage of this. I'd prefer to keep the ternary as is to follow the pattern around these files.
ebe1087 to
21f8c80
Compare
gaborkaszab
left a comment
There was a problem hiding this comment.
Thanks for the reviews, @stevenzwu , @ebyhr !
| // partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its | ||
| // ordinal is looked up by field ID. | ||
| private static final int PARTITION_ORDINAL = ordinalOf(TrackedFile.PARTITION_ID); | ||
| private static final int CONTENT_STATS_ORDINAL = ordinalOf(TrackedFile.CONTENT_STATS_ID); |
There was a problem hiding this comment.
Note, this was unused, removed as part of this PR
There was a problem hiding this comment.
Please revert unnecessary changes.
There was a problem hiding this comment.
Found an overkill to open a new PR to remove an unused variable from the tests and rewrite/remove the relevant comment.
Reverted these now, opened a separate PR: #17031
| /** Adapts {@link TrackedFile} entries to the {@link DataFile} and {@link DeleteFile} APIs. */ | ||
| class TrackedFileAdapters { | ||
|
|
||
| static final Types.StructType EMPTY_STRUCT_TYPE = Types.StructType.of(); |
There was a problem hiding this comment.
You're right. I removed introducing it here and reuse the on in BaseFile
| // should return EMPTY_PARTITION_DATA | ||
| assertThat(file.partition()).isNotNull(); | ||
| assertThat(file.partition().size()).isEqualTo(0); | ||
| assertThat(file.partition()).isNull(); |
There was a problem hiding this comment.
curious and non-blocking, if the intention is projection without partition. Do we need a coverage for projection where we have a valid/nonnull partition but not included in the projection as well?
There was a problem hiding this comment.
This is okay, but callers should not access fields that have not been projected and we make no guarantees about it when they do. In many cases, we fail if possible like when accessing DataFile.recordCount() without projecting, which results in a NullPointerException because recordCount() returns a primitive long.
| @Override | ||
| public StructLike partition() { | ||
| return file().partition(); | ||
| return file().partition() != null ? file().partition() : BaseFile.EMPTY_PARTITION_DATA; |
There was a problem hiding this comment.
Instead of relying on BaseFile, I'd prefer to move this constant to PartitionData.empty() or similar.
There was a problem hiding this comment.
I introduced PartitionData.EMPTY for this
| // partition and content_stats are rebuilt with the supplied struct types inside | ||
| // schemaWithContentStats, so their ordinals are looked up by field ID. | ||
| // partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its | ||
| // ordinal is looked up by field ID. |
There was a problem hiding this comment.
I would rather not have comments than have these that get updated every time an LLM touches the file.
There was a problem hiding this comment.
seem my comment below
rdblue
left a comment
There was a problem hiding this comment.
Overall this looks okay. The main thing is that we should not use BaseFile.EMPTY_PARTITION_DATA without moving it to a better place. Tests look fine for this update.
21f8c80 to
4ed590c
Compare
gaborkaszab
left a comment
There was a problem hiding this comment.
Thanks for taking a look, @rdblue !
| @Override | ||
| public StructLike partition() { | ||
| return file().partition(); | ||
| return file().partition() != null ? file().partition() : BaseFile.EMPTY_PARTITION_DATA; |
There was a problem hiding this comment.
I introduced PartitionData.EMPTY for this
| // partition and content_stats are rebuilt with the supplied struct types inside | ||
| // schemaWithContentStats, so their ordinals are looked up by field ID. | ||
| // partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its | ||
| // ordinal is looked up by field ID. |
There was a problem hiding this comment.
seem my comment below
| // partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its | ||
| // ordinal is looked up by field ID. | ||
| private static final int PARTITION_ORDINAL = ordinalOf(TrackedFile.PARTITION_ID); | ||
| private static final int CONTENT_STATS_ORDINAL = ordinalOf(TrackedFile.CONTENT_STATS_ID); |
There was a problem hiding this comment.
Found an overkill to open a new PR to remove an unused variable from the tests and rewrite/remove the relevant comment.
Reverted these now, opened a separate PR: #17031
No description provided.