Skip to content

Core: V4 write direction wrappers#16936

Open
stevenzwu wants to merge 1 commit into
apache:mainfrom
stevenzwu:v4_write_direction_wrappers
Open

Core: V4 write direction wrappers#16936
stevenzwu wants to merge 1 commit into
apache:mainfrom
stevenzwu:v4_write_direction_wrappers

Conversation

@stevenzwu

Copy link
Copy Markdown
Contributor

No description provided.

* <p>REPLACED files are the prior-state entries of v4 REPLACED/MODIFIED pairs and are not live.
* Returns null for manifest files written by pre-v4 writers.
*/
default Integer replacedFilesCount() {

@stevenzwu stevenzwu Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, public-interface changes should be minimized and delayed. Justification for keeping these two on the interface in this PR:

In-tree consumers (this PR + close follow-ups):

  • ContentEntryAdapters.manifestInfo() (this PR) populates the v4 root manifest's manifest_info struct from manifest.replacedFilesCount() / replacedRowsCount().
  • MergingSnapshotProducer.validateAddedDVs (follow-up phase) filters concurrent data manifests by replacedFilesCount > 0 to detect v4 colocated-DV write conflicts — without it, every concurrent data manifest must be deep-scanned.

Convention. The interface-default pattern matches v3's existing extensions for spec-defined optional fields on ManifestFile: firstRowId (row lineage), keyMetadata (encryption), containsNaN (partition NaN). v4 spec-defined manifest_info counts belong on the same interface in the same shape.

A TrackedFile-based parallel v4 API was considered. Would require rewriting ManifestGroup and every engine that consumes FileScanTask. The cost is disproportionate to the savings.

@stevenzwu stevenzwu Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projected new default methods across the full v4 stack, all spec-defined additions backing v4 manifest_info / content_entry fields:

In this PR:

  • default Integer replacedFilesCount()
  • default Long replacedRowsCount()
  • default int formatVersion()

when root-level manifest DV bitmaps are implemented:

  • default ByteBuffer deletedPositions()
  • default ByteBuffer replacedPositions()

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @stevenzwu ,
I went through mostly for my own understanding. Left some comments, mostly I'm a bit hesitant to expose setting status and sequence numbers directly through the builders. Let me know what you think!

Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/TrackedFileBuilder.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/TrackingBuilder.java Outdated
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 5 times, most recently from c78d6b1 to 828e0c7 Compare June 25, 2026 21:35
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch from 828e0c7 to fa86dc1 Compare June 25, 2026 22:18
@github-actions github-actions Bot added the API label Jun 25, 2026
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 5 times, most recently from fa46d3d to 55f5573 Compare June 26, 2026 05:57
* whose tracking values come from outside the writer (v4 manifest references, projections from
* legacy ManifestEntry).
*/
static TrackedFileBuilder explicitTracking(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaborkaszab following up on this comment from your PR, I am adding this new factory method for the new code path in this PR.

@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 8 times, most recently from b5c92a0 to d5c696b Compare June 28, 2026 06:09
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Jun 28, 2026
Per review (gaborkaszab on PR apache#16936): the first if-block existed only
to produce specific error messages distinguishing v3 DVs from v2
position delete files; the actual invariant (only EQUALITY_DELETES
allowed in v4 leaf delete manifests) is enforced by the Preconditions
check that immediately follows. Keep the explanatory comment on why
POSITION_DELETES is rejected (with both context bullets) above the
single invariant check. Update the two POSITION_DELETES rejection tests
to assert on the unified error message.

Generated-by: Claude Code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 2 times, most recently from 0f64967 to bff274c Compare June 28, 2026 22:53
@stevenzwu stevenzwu marked this pull request as ready for review June 29, 2026 01:33
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 2 times, most recently from 30e188a to 6bb3622 Compare June 29, 2026 05:15
Introduces the Phase 1 foundation for v4 metadata tree writes: TrackedFile
and supporting types are extended with v4-specific shape, and the new
ContentEntryAdapters utility projects legacy ManifestEntry rows and
ManifestFile references into TrackedFile instances for the v4 content_entry
schema.

ManifestFile gains V4_FORMAT_VERSION and LEGACY_FORMAT_VERSION constants and
default formatVersion() / recordCount() accessors. GenericManifestFile gets a
v4 constructor variant that populates these fields at construction time;
pre-v4 callers use the existing 17-arg constructor which delegates with
defaults.

ContentEntryAdapters.fromManifestFile dispatches the record_count source by
the manifest's intrinsic formatVersion: v4+ manifests must carry a non-null
recordCount populated by the writer or reader; pre-v4 manifests have it
summed from the per-status file counts.

TestContentEntryAdapters covers ADDED/EXISTING/DELETED tracking shape, the
unpartitioned and partitioned-projection paths, format_version dispatch,
record_count requirement for v4+, and the rejected status transitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch from 6bb3622 to 0535c88 Compare June 29, 2026 05:24
* manifest's snapshot_id
*/
static TrackedFileBuilder data(long newSnapshotId) {
static TrackedFileBuilder data(Long newSnapshotId) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when writing a data file entry to a staged leaf manifest file, the snapshot id is null and will be inherited from the manifest's snapshot id when written to the root manifest file.

@stevenzwu stevenzwu moved this to In review in V4: metadata tree Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants