API, Core: Add data file reference to DeleteFile#11443
Conversation
| case 15: | ||
| return wrapped.sortOrderId(); | ||
| case 16: | ||
| if (wrapped instanceof DeleteFile) { |
There was a problem hiding this comment.
I am not a big fan of this instanceof given that it is called in a tight loop. We need it as IndexedDataFile is used for both data and deletes. At the same time, I am not sure it is reasonable to add IndexedDeleteFile for the sake of handling 3 different fields.
Any thoughts?
There was a problem hiding this comment.
I think I would define referencedDataFile on ContentFile and have DataFile always return null. That's how we handle content in v1/v2. DataFile returns a constant value.
There was a problem hiding this comment.
I wanted to avoid unnecessary methods in DataFile. The same goes for content offset and size, which will only be applicable to deletes. Let me think.
There was a problem hiding this comment.
I think it is better not to expose these methods on DataFile:
- Avoids reliance on validation outside of this block to ensure these fields are null for data files. We had bugs before that allowed persisting equality field IDs for data files.
- Avoids unnecessary methods that can be invoked but have no meaning, which we have to explain.
| case 17: | ||
| this.referencedDataFile = value != null ? value.toString() : null; | ||
| return; | ||
| case 18: |
There was a problem hiding this comment.
This changes the ordinal of fileOrdinal. This normally doesn't matter as we read with a projection. We also changed it a couple of times before. Anything I missed?
There was a problem hiding this comment.
As long as both reads and writes are updated, it should be fine.
There was a problem hiding this comment.
Thanks for confirming!
|
Thanks for reviewing, @rdblue @amogh-jahagirdar! |
This PR adds
referenced_data_fileto our delete metadata as per the design doc for improving position deletes.This PR cannot be merged until the vote is closed. This work is part of #11122.