-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Core: Introduce builder for TrackedFile #16769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,364 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.iceberg; | ||
|
|
||
| import java.nio.ByteBuffer; | ||
| import java.util.List; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
|
|
||
| class TrackedFileBuilder { | ||
| private final long snapshotId; | ||
| private final FileContent contentType; | ||
|
|
||
| // Required fields | ||
| private Integer writerFormatVersion = null; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this ever be Does the second case go through this builder? I would expect to use a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I have a follow-up PR (draft status now) to adapt v3 shapes (like ManifestFile and DataFile) to v4 TrackedFile. It would be good to use builder for those adaptions too, unless you are thinking about using the constructor directly. Regarding the null or not, current proposal/spec PR said it is not null (0 for pre-v4 and format versions for v4+)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking about this more, would we use adapters to wrap I could see us producing instances of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apparently, as Steven noted, we could use the TrackedFileBuilder for the wrappers around V3 metadata structs, and there we could use this
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we concluded on |
||
| private String location = null; | ||
| private FileFormat fileFormat = null; | ||
| private Long recordCount = null; | ||
| private Long fileSizeInBytes = null; | ||
| private PartitionData partitionData = null; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Also, this is in the "required fields" section, but I think that we want to allow this to be null for unpartitioned cases. The partition tuple will be null when |
||
|
|
||
| // optional fields | ||
| private Integer specId = null; | ||
| private ContentStats contentStats = null; | ||
| private Integer sortOrderId = null; | ||
| private DeletionVector deletionVector = null; | ||
| private ManifestInfo manifestInfo = null; | ||
| private ByteBuffer keyMetadata = null; | ||
| private List<Long> splitOffsets = null; | ||
| private List<Integer> equalityIds = null; | ||
|
|
||
| // tracking-related fields | ||
| private Tracking sourceTracking = null; | ||
| private boolean dvUpdated = false; | ||
| private ByteBuffer deletedPositions = null; | ||
| private ByteBuffer replacedPositions = null; | ||
|
|
||
| /** | ||
| * Creates a builder for a newly added data file entry. | ||
| * | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFileBuilder data(long newSnapshotId) { | ||
| return new TrackedFileBuilder(FileContent.DATA, newSnapshotId); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a builder for a newly added equality delete file entry. | ||
| * | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFileBuilder equalityDelete(long newSnapshotId) { | ||
| return new TrackedFileBuilder(FileContent.EQUALITY_DELETES, newSnapshotId); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a builder for a newly added data manifest entry. | ||
| * | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFileBuilder dataManifest(long newSnapshotId) { | ||
| return new TrackedFileBuilder(FileContent.DATA_MANIFEST, newSnapshotId); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a builder for a newly added delete manifest entry. | ||
| * | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFileBuilder deleteManifest(long newSnapshotId) { | ||
| return new TrackedFileBuilder(FileContent.DELETE_MANIFEST, newSnapshotId); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a builder for a tracked file derived from {@code source}. | ||
| * | ||
| * @param source source tracked file to copy fields from | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFileBuilder from(TrackedFile source, long newSnapshotId) { | ||
| Preconditions.checkArgument(source != null, "Invalid source: null"); | ||
| return new TrackedFileBuilder(source, newSnapshotId); | ||
| } | ||
|
|
||
| /** | ||
| * Returns a DELETED tracked file derived from {@code source}. | ||
| * | ||
| * @param source source tracked file | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFile deleted(TrackedFile source, long newSnapshotId) { | ||
| Preconditions.checkArgument(source != null, "Invalid source: null"); | ||
| return terminal(source, TrackingBuilder.deleted(source.tracking(), newSnapshotId)); | ||
| } | ||
|
|
||
| /** | ||
| * Returns a REPLACED tracked file derived from {@code source}. | ||
| * | ||
| * <p>Manifest entries cannot transition to REPLACED. | ||
| * | ||
| * @param source source tracked file | ||
| * @param newSnapshotId the snapshot ID in which the new tracked file will be committed | ||
| */ | ||
| static TrackedFile replaced(TrackedFile source, long newSnapshotId) { | ||
| Preconditions.checkArgument(source != null, "Invalid source: null"); | ||
| Preconditions.checkArgument( | ||
| !isLeafManifest(source.contentType()), | ||
| "Manifest entries cannot transition to REPLACED, but entry type is: %s", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is less direct than we typically prefer, but is not bad. This states a general rule and then says that it applies. We would normally say that at once:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx! I think we can even remove the content type parameter of the message: |
||
| source.contentType()); | ||
| return terminal(source, TrackingBuilder.replaced(source.tracking(), newSnapshotId)); | ||
| } | ||
|
|
||
| private static TrackedFile terminal(TrackedFile source, Tracking tracking) { | ||
| return new TrackedFileStruct( | ||
| tracking, | ||
| source.contentType(), | ||
| source.writerFormatVersion(), | ||
| source.location(), | ||
| source.fileFormat(), | ||
| (PartitionData) source.partition(), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not need unchecked casts like this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The difficulty here is the |
||
| source.recordCount(), | ||
| source.fileSizeInBytes(), | ||
| source.specId(), | ||
| source.contentStats(), | ||
| source.sortOrderId(), | ||
| source.deletionVector(), | ||
| source.manifestInfo(), | ||
| source.keyMetadata(), | ||
| source.splitOffsets(), | ||
| source.equalityIds()); | ||
| } | ||
|
|
||
| private TrackedFileBuilder(FileContent contentType, long snapshotId) { | ||
| this.contentType = contentType; | ||
| this.snapshotId = snapshotId; | ||
| } | ||
|
|
||
| private TrackedFileBuilder(TrackedFile source, long snapshotId) { | ||
| this.contentType = source.contentType(); | ||
| this.snapshotId = snapshotId; | ||
| this.writerFormatVersion = source.writerFormatVersion(); | ||
| this.location = source.location(); | ||
| this.fileFormat = source.fileFormat(); | ||
| this.recordCount = source.recordCount(); | ||
| this.fileSizeInBytes = source.fileSizeInBytes(); | ||
| this.partitionData = (PartitionData) source.partition(); | ||
| this.specId = source.specId(); | ||
| this.contentStats = source.contentStats(); | ||
| this.sortOrderId = source.sortOrderId(); | ||
| this.deletionVector = source.deletionVector(); | ||
| this.manifestInfo = source.manifestInfo(); | ||
| this.keyMetadata = source.keyMetadata(); | ||
| this.splitOffsets = source.splitOffsets(); | ||
| this.equalityIds = source.equalityIds(); | ||
| this.sourceTracking = source.tracking(); | ||
| } | ||
|
|
||
| TrackedFileBuilder writerFormatVersion(int newWriterFormatVersion) { | ||
| Preconditions.checkArgument( | ||
| newWriterFormatVersion >= 0, | ||
| "Invalid writer format version: %s (must be >= 0)", | ||
| newWriterFormatVersion); | ||
| this.writerFormatVersion = newWriterFormatVersion; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder location(String newLocation) { | ||
| Preconditions.checkArgument(newLocation != null, "Invalid location: null"); | ||
| this.location = newLocation; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder fileFormat(FileFormat newFileFormat) { | ||
| Preconditions.checkArgument(newFileFormat != null, "Invalid file format: null"); | ||
| this.fileFormat = newFileFormat; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder recordCount(long newRecordCount) { | ||
| Preconditions.checkArgument( | ||
| newRecordCount >= 0, "Invalid record count: %s (must be >= 0)", newRecordCount); | ||
| this.recordCount = newRecordCount; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder fileSizeInBytes(long newFileSizeInBytes) { | ||
| Preconditions.checkArgument( | ||
| newFileSizeInBytes >= 0, | ||
| "Invalid file size in bytes: %s (must be >= 0)", | ||
| newFileSizeInBytes); | ||
| this.fileSizeInBytes = newFileSizeInBytes; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder specId(int newSpecId) { | ||
| Preconditions.checkArgument(newSpecId >= 0, "Invalid spec ID: %s (must be >= 0)", newSpecId); | ||
| this.specId = newSpecId; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder partition(PartitionData newPartitionData) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this should be combined with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I probably miss something here, but I see that |
||
| Preconditions.checkArgument(newPartitionData != null, "Invalid partition: null"); | ||
| this.partitionData = newPartitionData; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder contentStats(ContentStats newContentStats) { | ||
| Preconditions.checkArgument(newContentStats != null, "Invalid content stats: null"); | ||
|
gaborkaszab marked this conversation as resolved.
|
||
| this.contentStats = newContentStats; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder sortOrderId(int newSortOrderId) { | ||
| Preconditions.checkArgument( | ||
| !isLeafManifest(contentType), | ||
| "Sort order ID cannot be added to manifest entries, but entry type is: %s", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Cannot set sort order for manifest files"?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed |
||
| contentType); | ||
| Preconditions.checkArgument( | ||
| newSortOrderId >= 0, "Invalid sort order ID: %s (must be >= 0)", newSortOrderId); | ||
| this.sortOrderId = newSortOrderId; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder deletionVector(DeletionVector newDeletionVector) { | ||
| Preconditions.checkArgument(newDeletionVector != null, "Invalid deletion vector: null"); | ||
| Preconditions.checkArgument( | ||
| contentType == FileContent.DATA, | ||
| "Deletion vector can only be added to DATA entries, but entry type is: %s", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, and found couple other similar messages, fixed them too. |
||
| contentType); | ||
| Preconditions.checkArgument( | ||
| this.deletionVector == null || !this.deletionVector.equals(newDeletionVector), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Style:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed |
||
| "The same deletion vector already added"); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check that I would expect is The cardinality check is closer to what we want although not foolproof. The underlying location and offset could be the same, although it is unlikely that would be the case. If you want to make sure that this is actually a different deletion vector, then I think the right thing to do is to do that check here: Preconditions.checkArgument(
deletionVector == null ||
deletionVector.cardinality() < newDeletionVector.cardinality(),
"Invalid DV update, cardinality must increase: existing=%s, new=%s", ...);
Preconditions.checkArgument(
deletionVector == null ||
!deletionVector.location().equals(newDeletionVector.location()) ||
!deletionVector.offset().equals(newDeletionVector.offset()),
"Invalid DV update: same offset and location has changed cardinality");
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, borrowed these checks! |
||
| this.deletionVector = newDeletionVector; | ||
| this.dvUpdated = true; | ||
| return this; | ||
|
stevenzwu marked this conversation as resolved.
|
||
| } | ||
|
|
||
| TrackedFileBuilder manifestInfo(ManifestInfo newManifestInfo) { | ||
| Preconditions.checkArgument(newManifestInfo != null, "Invalid manifest info: null"); | ||
| Preconditions.checkArgument( | ||
| isLeafManifest(contentType), | ||
| "Manifest info can only be added to manifests, but entry type is: %s", | ||
| contentType); | ||
| this.manifestInfo = newManifestInfo; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder keyMetadata(ByteBuffer newKeyMetadata) { | ||
| Preconditions.checkArgument(newKeyMetadata != null, "Invalid key metadata: null"); | ||
| this.keyMetadata = newKeyMetadata; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder splitOffsets(List<Long> newSplitOffsets) { | ||
| Preconditions.checkArgument(newSplitOffsets != null, "Invalid split offsets: null"); | ||
| Preconditions.checkArgument( | ||
| !isLeafManifest(contentType), | ||
| "Split offsets cannot be added to manifest entries, but entry type is: %s", | ||
| contentType); | ||
| this.splitOffsets = newSplitOffsets; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder equalityIds(List<Integer> newEqualityIds) { | ||
| Preconditions.checkArgument(newEqualityIds != null, "Invalid equality IDs: null"); | ||
| Preconditions.checkArgument( | ||
| contentType == FileContent.EQUALITY_DELETES, | ||
| "Equality IDs can only be added to EQUALITY_DELETES entries, but entry type is: %s", | ||
| contentType); | ||
| this.equalityIds = newEqualityIds; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder deletedPositions(ByteBuffer newDeletedPositions) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not against this since I haven't seen how it is used yet. But I find these methods that capture data that will be passed through to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Felt a bit odd for me when I implemented this but went this way for two reasons:
Introduced a |
||
| Preconditions.checkArgument(newDeletedPositions != null, "Invalid deleted positions: null"); | ||
| Preconditions.checkArgument( | ||
| isLeafManifest(contentType), | ||
| "Deleted positions can only be added to manifest entries, but entry type is: %s", | ||
| contentType); | ||
| this.deletedPositions = newDeletedPositions; | ||
| return this; | ||
| } | ||
|
|
||
| TrackedFileBuilder replacedPositions(ByteBuffer newReplacedPositions) { | ||
| Preconditions.checkArgument(newReplacedPositions != null, "Invalid replaced positions: null"); | ||
| Preconditions.checkArgument( | ||
| isLeafManifest(contentType), | ||
| "Replaced positions can only be added to manifest entries, but entry type is: %s", | ||
| contentType); | ||
| this.replacedPositions = newReplacedPositions; | ||
| return this; | ||
| } | ||
|
|
||
| private static boolean isLeafManifest(FileContent contentType) { | ||
| return contentType == FileContent.DATA_MANIFEST || contentType == FileContent.DELETE_MANIFEST; | ||
| } | ||
|
|
||
| TrackedFile build() { | ||
| Preconditions.checkArgument( | ||
| writerFormatVersion != null, "Missing required field: writer format version"); | ||
| Preconditions.checkArgument(location != null, "Missing required field: location"); | ||
| Preconditions.checkArgument(fileFormat != null, "Missing required field: file format"); | ||
| Preconditions.checkArgument(recordCount != null, "Missing required field: record count"); | ||
| Preconditions.checkArgument( | ||
| fileSizeInBytes != null, "Missing required field: file size in bytes"); | ||
| Preconditions.checkArgument(partitionData != null, "Missing required field: partition data"); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is correct.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx! |
||
| Preconditions.checkArgument( | ||
| !isLeafManifest(contentType) || manifestInfo != null, | ||
| "Missing required field: manifest info"); | ||
| Preconditions.checkArgument( | ||
| contentType != FileContent.EQUALITY_DELETES || equalityIds != null, | ||
| "Missing required field: equality IDs"); | ||
|
|
||
| TrackingBuilder trackingBuilder = | ||
| sourceTracking == null | ||
| ? TrackingBuilder.added(snapshotId) | ||
| : TrackingBuilder.from(sourceTracking, snapshotId); | ||
|
|
||
| if (dvUpdated) { | ||
| trackingBuilder.dvUpdated(); | ||
| } | ||
|
|
||
| if (deletedPositions != null) { | ||
| trackingBuilder.deletedPositions(deletedPositions); | ||
| } | ||
|
|
||
| if (replacedPositions != null) { | ||
| trackingBuilder.replacedPositions(replacedPositions); | ||
| } | ||
|
|
||
| return new TrackedFileStruct( | ||
| trackingBuilder.build(), | ||
| contentType, | ||
| writerFormatVersion, | ||
| location, | ||
| fileFormat, | ||
| partitionData, | ||
| recordCount, | ||
| fileSizeInBytes, | ||
| specId, | ||
| contentStats, | ||
| sortOrderId, | ||
| deletionVector, | ||
| manifestInfo, | ||
| keyMetadata, | ||
| splitOffsets, | ||
| equalityIds); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructLikeclasses in Iceberg often don't implement theequalsmethod. but in this case, I am ok with it for usage in theTrackedFileBuilderThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should remove this. There are multiple ways of considering something "equal" and we try to avoid
equalsmethods when it is unclear which one should be used. That's the case here, where it would be reasonable to assume that twoDeletionVectorinstances are the same if they represent the same DV (locationandoffsetare equal) but also reasonable to assume two instances are equal if they have the same values (mostly implemented here).In addition to the problem of what this means, there's another problem that this definition only works for other
DeletionVectorStructinstances. That means the definition of equality is more murky: twoDeletionVectorinstances can be equal by either definition above but still may not be equal here. The purpose of using interfaces for the public API is to make it so that implementations are interchangeable and this breaks that principle.I think this was introduced to be able to detect changes applied to the builder, but we should find a better way to solve that problem rather than introducing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue the comparison was introduced due to my comment here. Do you think it is sth we should guard against? if not, we can maybe just add Javadoc to explain the usage. if yes, should we just do a reference comparison like
this.deletionVector != newDeletionVectoror we want a util method like `DelectionVectorUtil.sameContents(DeletionVector, DeletionVector)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped these overrides here, and went for the verification suggested by Ryan within the builder class.