Skip to content

Core: Make partition field in TrackedFile optional#17000

Open
gaborkaszab wants to merge 1 commit into
apache:mainfrom
gaborkaszab:main_trackedfile_partition_optional
Open

Core: Make partition field in TrackedFile optional#17000
gaborkaszab wants to merge 1 commit into
apache:mainfrom
gaborkaszab:main_trackedfile_partition_optional

Conversation

@gaborkaszab

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread core/src/main/java/org/apache/iceberg/TrackedFile.java
@gaborkaszab gaborkaszab force-pushed the main_trackedfile_partition_optional branch from d6cf350 to ebe1087 Compare June 29, 2026 14:08
/** Adapts {@link TrackedFile} entries to the {@link DataFile} and {@link DeleteFile} APIs. */
class TrackedFileAdapters {

static final Types.StructType EMPTY_STRUCT_TYPE = Types.StructType.of();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably use the package private constants from the BaseFile class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I removed introducing it here and reuse the on in BaseFile

@Override
public StructLike partition() {
return file().partition();
return file().partition() != null ? file().partition() : EMPTY_PARTITION_DATA;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could use the MoreObjects.firstNonNull method. Feel free to disregard this comment since this method isn't widely used in the codebase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a grep on the Java codebase, apparently, there is a single usage of this. I'd prefer to keep the ternary as is to follow the pattern around these files.

@gaborkaszab gaborkaszab force-pushed the main_trackedfile_partition_optional branch from ebe1087 to 21f8c80 Compare June 30, 2026 07:13

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, @stevenzwu , @ebyhr !

// partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its
// ordinal is looked up by field ID.
private static final int PARTITION_ORDINAL = ordinalOf(TrackedFile.PARTITION_ID);
private static final int CONTENT_STATS_ORDINAL = ordinalOf(TrackedFile.CONTENT_STATS_ID);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this was unused, removed as part of this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert unnecessary changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an overkill to open a new PR to remove an unused variable from the tests and rewrite/remove the relevant comment.

Reverted these now, opened a separate PR: #17031

/** Adapts {@link TrackedFile} entries to the {@link DataFile} and {@link DeleteFile} APIs. */
class TrackedFileAdapters {

static final Types.StructType EMPTY_STRUCT_TYPE = Types.StructType.of();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I removed introducing it here and reuse the on in BaseFile

// should return EMPTY_PARTITION_DATA
assertThat(file.partition()).isNotNull();
assertThat(file.partition().size()).isEqualTo(0);
assertThat(file.partition()).isNull();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious and non-blocking, if the intention is projection without partition. Do we need a coverage for projection where we have a valid/nonnull partition but not included in the projection as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but callers should not access fields that have not been projected and we make no guarantees about it when they do. In many cases, we fail if possible like when accessing DataFile.recordCount() without projecting, which results in a NullPointerException because recordCount() returns a primitive long.

@Override
public StructLike partition() {
return file().partition();
return file().partition() != null ? file().partition() : BaseFile.EMPTY_PARTITION_DATA;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on BaseFile, I'd prefer to move this constant to PartitionData.empty() or similar.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced PartitionData.EMPTY for this

// partition and content_stats are rebuilt with the supplied struct types inside
// schemaWithContentStats, so their ordinals are looked up by field ID.
// partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its
// ordinal is looked up by field ID.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not have comments than have these that get updated every time an LLM touches the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem my comment below

@rdblue rdblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks okay. The main thing is that we should not use BaseFile.EMPTY_PARTITION_DATA without moving it to a better place. Tests look fine for this update.

@gaborkaszab gaborkaszab force-pushed the main_trackedfile_partition_optional branch from 21f8c80 to 4ed590c Compare July 1, 2026 13:15

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look, @rdblue !

@Override
public StructLike partition() {
return file().partition();
return file().partition() != null ? file().partition() : BaseFile.EMPTY_PARTITION_DATA;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced PartitionData.EMPTY for this

// partition and content_stats are rebuilt with the supplied struct types inside
// schemaWithContentStats, so their ordinals are looked up by field ID.
// partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its
// ordinal is looked up by field ID.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem my comment below

// partition is rebuilt with the supplied struct types inside schemaWithContentStats, so its
// ordinal is looked up by field ID.
private static final int PARTITION_ORDINAL = ordinalOf(TrackedFile.PARTITION_ID);
private static final int CONTENT_STATS_ORDINAL = ordinalOf(TrackedFile.CONTENT_STATS_ID);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an overkill to open a new PR to remove an unused variable from the tests and rewrite/remove the relevant comment.

Reverted these now, opened a separate PR: #17031

@gaborkaszab gaborkaszab requested a review from rdblue July 1, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

7 participants