Core: Don't bump DV snapshot ID for deleted_positions and replaced_positions#16823
Core: Don't bump DV snapshot ID for deleted_positions and replaced_positions#16823gaborkaszab wants to merge 1 commit into
Conversation
|
This came up during the V4 AMT sync today. Seems reasonable not to bump dv snapshot ID when setting deleted and replaced positions, because they are fields that relevant for the current snapshot only and we don't carry them forward anyway. |
|
Sorry, I missed the AMT sync this week.
Setting it is also consistent with how cc @stevenzwu who originally suggested changing |
|
yeah. I was at the Monday sync when Amogh brought up this question. Amogh's point was that |
|
@stevenzwu Does this mean that |
Practically yes. This is in line with the spec PR: |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Yeah the way I've reasoned about this is we have the following options:
1.) We could say that implementations must update dv_snapshot_id if replaced or deleted positions (the "diff" DVs are set). But if it's a diff representing the changes in that snapshot and nulled out when there are no changes, then what's the point of setting the dv_snapshot_id? The diff itself represents the changes in that snapshot. I guess one argument is consistency with how data DVs work but it feels like an extra spec rule that doesn't help anything on the read or change detection side, and makes write side more complex.
Let me know if I'm missing a case where the information of dv_snapshot id for leaf manifest DVs helps.
2.) We could not be strict in the spec about this and allow implementations to set it. Then it's one of those weird "optional" areas in the spec that I think we should avoid and it probably causes more confusion as there gets to be multiple implementations ("do we set it or not" is a debate that'd be nice to avoid a year from now).
3.) We could just strictly disallow setting dvSnapshotId for leaf manifests. I feel like this is the best option. It's less burden on implementations, less to get wrong on the change detection path. For leaf manifests, we currently track the total DV + the diff via replaced and deleted positions, so figuring out changes is just inherent in what's persisted in metadata at least in the current spec design.
Let me know if I'm missing something @anoop @stevenzwu @rdblue
| Tracking addedSource = manifestSourceTracking(); | ||
| Tracking modified = | ||
| TrackingBuilder.from(addedSource, 999L).deletedPositions(deletedBytes).build(); | ||
|
|
There was a problem hiding this comment.
Nit: Can we remove the newline, we can avoid conflicts
|
|
||
| assertThat(modified.status()).isEqualTo(EntryStatus.MODIFIED); | ||
| // the entry snapshot id is preserved; only the DV snapshot id advances to the commit snapshot | ||
| // the entry snapshot id is preserved; dv snapshot id is not relevant for manifest entries |
There was a problem hiding this comment.
DV snapshot ID must not be set?
There was a problem hiding this comment.
+1. DV snapshot ID must not be set for leaf manifest entries.
I would also split this comment line and write the 2nd half of the comment to be just above line 463.
There was a problem hiding this comment.
Thanks for the suggestions! done
|
|
||
| assertThat(modified.status()).isEqualTo(EntryStatus.MODIFIED); | ||
| // the entry snapshot id is preserved; only the DV snapshot id advances to the commit snapshot | ||
| // the entry snapshot id is preserved; dv snapshot id is not relevant for manifest entries |
There was a problem hiding this comment.
+1. DV snapshot ID must not be set for leaf manifest entries.
I would also split this comment line and write the 2nd half of the comment to be just above line 463.
…sitions Since deleted_positions and replaced_positions in Tracking are relevant only for the current snapshot, it is redundant to also bump the DV snapshot ID when setting these fields.
5574cf6 to
5fede4e
Compare
gaborkaszab
left a comment
There was a problem hiding this comment.
Thanks for taking a look! Addressed the comments
|
|
||
| assertThat(modified.status()).isEqualTo(EntryStatus.MODIFIED); | ||
| // the entry snapshot id is preserved; only the DV snapshot id advances to the commit snapshot | ||
| // the entry snapshot id is preserved; dv snapshot id is not relevant for manifest entries |
There was a problem hiding this comment.
Thanks for the suggestions! done
| Tracking addedSource = manifestSourceTracking(); | ||
| Tracking modified = | ||
| TrackingBuilder.from(addedSource, 999L).deletedPositions(deletedBytes).build(); | ||
|
|
|
I agree that setting There is still an argument for setting The argument against using After thinking through it, I'm not convinced that this is a good idea. I think I'd prefer keeping the update to |
|
closing this because we agreed on bumping dv_snapshot_id for replaced/deleted positions too |
There was a problem hiding this comment.
I went back and forth on this but I think I agree with @rdblue . It is indeed more clear to keep set dv_snapshot_id for leaf manifests just as we do for leaf data file DV changes. I also do think that it's not that much more difficult to have that set, especially if the alternative is a requirement to drop the diff DV when there are no changes, it's an equal requirement to just set the dv_snapshot_id (or in spec terms, the dvs represent the changes in the snapshot dv_snapshot_id)
I also thought about this statement
In this case, the spec would state that the bitmaps reflect the changes in the snapshot identified by dv_snapshot_id for manifest entries and not have a hard requirement on dropping the bitmaps immediately, which are likely to be small.
So in this case, we wouldn't be using the diff DV presence to determine changes, but rather we would check if dv_snapshot_id is set to the current snapshot , then we know one of replaced_positions or deleted_positions has changed. And to be fair this is similar in principle to the check we'd do for regular data DVs.
Since deleted_positions and replaced_positions in Tracking are relevant only for the current snapshot, it is redundant to also bump the DV snapshot ID when setting these fields.