Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures by amogh-jahagirdar · Pull Request #16025 · apache/iceberg

amogh-jahagirdar · 2026-04-18T02:21:00Z

This is one PR for V4 Adaptive Metadata Tree spec changes. The focus of this PR is to update the proposed entry structure in https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.wd1z5eeup025#heading=h.80fbnuij9rhg

There will be other PRs for updating the scan planning section, adding implementation notes for CDC etc.

amogh-jahagirdar · 2026-04-28T18:11:53Z

Two general working principals:

Trying to make it easy for someone who just wants to implement v4 to click the v4 tabs, and have it be self describing.
Try to do 1 without shifting too much of the existing spec contents, to avoid the risk of changing the spec for v1-v3.

amogh-jahagirdar · 2026-04-28T18:36:18Z

-| _optional_ | _optional_ | **`min-snapshots-to-keep`**  | `int`     | For `branch` type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots. Defaults to table property `history.expire.min-snapshots-to-keep`. |
-| _optional_ | _optional_ | **`max-snapshot-age-ms`**    | `long`    | For `branch` type only, a positive number for the max age of snapshots to keep when expiring, including the latest snapshot. Defaults to table property `history.expire.max-snapshot-age-ms`. |
-| _optional_ | _optional_ | **`max-ref-age-ms`**         | `long`    | For snapshot references except the `main` branch, a positive number for the max age of the snapshot reference to keep while expiring snapshots. Defaults to table property `history.expire.max-ref-age-ms`. The `main` branch never expires. |
+=== "v1 - v3"


If people like the two tabbed approach then we should get #14656 in first, that'll make this diff easier to review

stevenzwu · 2026-05-06T20:13:42Z

@@ -130,8 +130,10 @@ Tables do not require rename, except for tables that use atomic rename to implem
 * **Schema** -- Names and types of fields in a table.
 * **Partition spec** -- A definition of how partition values are derived from data fields.
 * **Snapshot** -- The state of a table at some point in time, including the set of all data files.
-* **Manifest list** -- A file that lists manifest files; one per snapshot.
-* **Manifest** -- A file that lists data or delete files; a subset of a snapshot.
+* **Manifest list** -- (V1-V3 only) A file that lists manifest files; one per snapshot.


nit: most of the time, lower case v1, v2 are used in the spec

stevenzwu · 2026-05-06T20:14:18Z

-* **Manifest** -- A file that lists data or delete files; a subset of a snapshot.
+* **Manifest list** -- (V1-V3 only) A file that lists manifest files; one per snapshot.
+* **Root Manifest** -- (V4+) A manifest that can reference data files, delete files, and other data and delete manifests; one per snapshot. Replaces manifest lists in V4.
+* **Data manifest** -- A file that lists data files; a subset of a snapshot.


Do we want to want to mention colocated DVs and column files for data manifests in v4+?

stevenzwu · 2026-05-06T20:16:33Z

@@ -484,7 +486,7 @@ Note that:

 ### Partitioning

-Data files are stored in manifests with a tuple of partition values that are used in scans to filter out files that cannot contain records that match the scan’s filter predicate. Partition values for a data file must be the same for all records stored in the data file. (Manifests store data files from any partition, as long as the partition spec is the same for the data files.)
+Data files are stored in manifests with partition values that are used in scans to filter out files that cannot contain records that match the scan’s filter predicate. Partition values for a data file must be the same for all records stored in the data file. In V1-V3, manifests store data files from any partition, as long as the partition spec is the same for the data files. In V4, manifests can store data files from different partition specs because partition values are stored as column statistics.


because partition values are stored as column statistics.

This sentence probably needs to be updated based on the last community sync.

stevenzwu · 2026-05-06T20:25:51Z

+
+    | Field id | Name | Type | Write | Read | Description |
+    |----------|------|------|-------|------|-------------|
+    | 134 | **`content_type`** | `int` (0: DATA, 2: EQUALITY DELETES, 3: DATA_MANIFEST, 4: DELETE_MANIFEST) | *required* | *required* | Type of content stored in the entry. Content types 3 and 4 are only valid in root manifests. |


POSITION_DELETES handling — the table omits value 1 entirely, but the comment two paragraphs below ("Value 1 (POSITION_DELETES) no longer applies in entries") is the only place that says so. Worth making the table itself unambiguous: either list 1: RESERVED (writers must not produce; readers must reject) or call out in the table cell that 1 is intentionally skipped. Otherwise a reader reaches for the int and wonders if the spec just forgot it.

stevenzwu · 2026-05-06T20:25:52Z

+    | Field id | Name | Type | Write | Read | Description |
+    |----------|------|------|-------|------|-------------|
+    | 134 | **`content_type`** | `int` (0: DATA, 2: EQUALITY DELETES, 3: DATA_MANIFEST, 4: DELETE_MANIFEST) | *required* | *required* | Type of content stored in the entry. Content types 3 and 4 are only valid in root manifests. |
+    | 157 | **`writer_format_version`** | `int` (0: PRE-V4, 1: V4) | *required* | *required* | Writer format version. V4 writers must produce `writer_format_version` 1. |


V4 writers must produce writer_format_version 1.

why not 4 for V4 writers? null for entries written before V4.

Also, the 0: PRE-V4 enum is odd: PRE-V4 manifests use the manifest_entry schema, not content_entry, so they can't have this field. When does value 0 ever appear?

this should always be 1, right? Shouldn't we remove this and allocate field ID 157 to column_files? cc @amogh-jahagirdar

stevenzwu · 2026-05-11T20:58:04Z

+    | 158 | **`column_files`** | `list<column_file>` | *optional* | *optional* | Column update files associated with this entry. |
+    | 101 | **`file_format`** | `string` | *required* | *required* | String file format name: `avro`, `orc`, `parquet`, or `puffin` |
+    | 147 | **`tracking`** | `tracking` struct | *required* | *required* | Groups status, snapshot, and sequence number. See tracking struct below. |
+    | 148 | **`deletion_vector`** | `deletion_vector` struct | *optional* | *optional* | Row-level deletion vector for a data file. |


the deletion_vector schema is only for external data DVs with location field. the inline manifest DV is stored as part of the manifest_info, which is a bit inconsistent to me. Ideally, the DV struct can capture both outline (data DV) and inline (manifest DV) at the top level. E.g. the cardinality field can be shared.

stevenzwu · 2026-05-11T21:06:04Z

+
+    Value 1 (POSITION_DELETES) no longer applies in entries because deletion vector metadata is colocated with data files (`content_type` 0).
+
+    Leaf data manifests may only contain entries with `content_type` 0 (DATA); leaf delete manifests may only contain entries with `content_type` 2 (EQUALITY DELETES).


leaf delete manifest files can also contain entries with position deletes written back when the format version is 2. that also means the line 683 above is also inaccurate

stevenzwu · 2026-05-11T21:07:09Z

+
+    Leaf data manifests may only contain entries with `content_type` 0 (DATA); leaf delete manifests may only contain entries with `content_type` 2 (EQUALITY DELETES).
+
+    The following constraints apply based on `content_type`:


it is better to capture these in the description column in the table (instead of a separate bullet list here).

stevenzwu · 2026-05-11T21:19:07Z

-|            |            | _required_ | **`first-row-id`**           | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage)        |
-|            |            | _required_ | **`added-rows`**             | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) |
-|            |            | _optional_ | **`key-id`**                 | ID of the encryption key that encrypts the manifest list key metadata |
+=== "v1 - v3"


also need a table for v4

RussellSpitzer · 2026-05-12T14:46:51Z

@@ -75,9 +75,9 @@ This table format tracks individual data files in a table instead of directories

 Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic swap. The table metadata file tracks the table schema, partitioning config, custom properties, and snapshots of the table contents. A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.

-Data files in snapshots are tracked by one or more manifest files that contain a row for each data file in the table, the file's partition data, and its metrics. The data in a snapshot is the union of all files in its manifests. Manifest files are reused across snapshots to avoid rewriting metadata that is slow-changing. Manifests can track data files with any subset of a table and are not associated with partitions.


Do we want to add anything up front about V4 unifying this structure? I just note that you extended this paragraph to say Data Manifests and Delete Manifests, but now we will only have one type of manifest

That sounds like a good idea to me.

RussellSpitzer · 2026-05-12T14:48:00Z


-The manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation.
+In V1-V3, the manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation. In V4, manifest lists are replaced by a single root manifest per snapshot, which can contain references to data files, delete files, and other data and delete manifests in a unified structure.


I thought technically we aren't allowing a reference to a delete file in the Root Manifest or in any V4 Manifest except for V4 Delete manifests for equality deletes. Shouldn't it always be a coupled entry of DV and DataFile or DV and Manifest?

RussellSpitzer · 2026-05-12T14:51:27Z


-A manifest is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.


Not sure why we are changing the pluralization here but the change is ok. Just wondering because we immediately switch back to singular in the next paragraph.

RussellSpitzer · 2026-05-12T14:53:17Z

-1. Technically, data files can be deleted when the last snapshot that contains the file as “live” data is garbage collected. But this is harder to detect and requires finding the diff of multiple snapshots. It is easier to track what files are deleted in a snapshot and delete them when that snapshot expires.  It is not recommended to add a deleted file back to a table. Adding a deleted file can lead to edge cases where incremental deletes can break table snapshots.
-2. Manifest list files are required in v2, so that the `sequence_number` and `snapshot_id` to inherit are always available.
+- V1-V3: A manifest stores files for a single partition spec. When a table’s partition spec changes, old files remain in the older manifest and newer files are written to a new manifest. This is required because a manifest file’s schema is based on its partition spec. The partition spec of each manifest is used to transform predicates on the table’s data rows into predicates on partition values during job planning.
+- V4: Manifests are not bound to a single partition spec. Files with different partition specs can coexist in the same manifest because partition values are stored in column statistics using source column IDs rather than in a partition-spec-specific struct. The `partition-spec-id` in manifest metadata is tracked for informational purposes but does not constrain the contents.


I'd leave the rational out of this paragraph. I think it's find to just say that they are not bound to a partition spec, I think partition-spec-id needs a better description here ... The spec id used by the writer when generating this data file?

RussellSpitzer · 2026-05-12T14:54:32Z

+
+#### Manifest File Format
+
+Manifests are Avro files in V1-V3. Starting in V4, writers must produce manifests in Parquet.


While I support this for simplicity, I know @rdblue still wants to have the Avro option. currently the code in my PR lets you write either in the SDK and i'm not sure it is much more expensive to allow both in the spec. Worth having a community discussion though.

RussellSpitzer · 2026-05-12T14:57:44Z

+    |            | _required_ | `content`           | Type of content files tracked by the manifest: "data" or "deletes"                                                                          |
+
+=== "v4"
+    | Write      | Read       | Key                 | Value                                                                                                                                       |


Why is there write and read? What would optional "read" be? I assume this was probably an LLM just trying to make a balanced table.

More importantly, do we want to relax the write requirements? In V2/3 these were all required, but now they are optional

RussellSpitzer · 2026-05-12T14:58:20Z

+    | Write      | Read       | Key                 | Value                                                                                                                                       |
+    |------------|------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+    | _optional_ | _optional_ | `schema-id`         | ID of the schema used to write the manifest as a string                                                                                     |
+    | _optional_ | _optional_ | `partition-spec-id` | ID of the partition spec used to write the manifest as a string                                                                             |


Not sure this one makes sense now? Entries should all have a spec, but i'm not sure it makes sense to have a global spec id for the manifest anymore?

Miss on my part, yes manifests are no longer bound to a partition spec!

RussellSpitzer · 2026-05-12T14:59:44Z

+    | _optional_ | _optional_ | `schema-id`         | ID of the schema used to write the manifest as a string                                                                                     |
+    | _optional_ | _optional_ | `partition-spec-id` | ID of the partition spec used to write the manifest as a string                                                                             |
+    | _optional_ | _optional_ | `format-version`    | Table format version number of the manifest as a string                                                                                     |
+    | _optional_ | _optional_ | `content`           | Type of content files tracked by the manifest: "data" or "deletes"                                                                          |


In V4 this is data or "equality deletes" but I think it's fine to just call it deletes

- Collapse v1/v2/v3 separate tabs into single v1-v3 tab across all manifest sections - Add v4 tab to Data File Fields with content_entry, tracking, and deletion_vector structs using Write/Read columns - Reconcile v4 architecture prose from v4-amt-changes: root manifest concept, Parquet format, partition spec binding, updated terms/glossary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…add matching reader V4Writer and V4DeleteWriter now emit content_entry Parquet rows via TrackedFileWrapper/ContentEntryAdapter rather than the legacy manifest_entry Avro shape. ContentEntryReader and ContentEntryManifestReaderAdapter project content_entry rows back to ManifestEntry<DataFile/DeleteFile> so all downstream consumers (ManifestGroup, MergingSnapshotProducer rewrite paths) work unchanged. Read-path dispatch in ManifestFiles is layered: 1. Avro manifests are always legacy (no file inspection). 2. Snapshot-tree callers thread an Integer writerFormatVersion hint through the new package-private read overloads: 1 routes to ContentEntryReader, 0 routes to legacy. 3. Callers without a hint (tests writing-then-reading, ad-hoc tooling) fall back to inspecting the Parquet footer schema for field id 134 (content_type) or 147 (tracking). The footer read is delegated to InternalParquet via DynMethods so core has no compile-time dependency on iceberg-parquet. Key design choices: - TrackedFile.schemaWithContentStats omits partition and content_stats when their struct types are empty (Parquet rejects empty groups). - TrackedFileWrapper uses hasPartition/hasContentStats flags to map positions dynamically when either optional group is absent. - V4Writer.add(DataFile) bypasses Delegates.suppressFirstRowId so per-entry firstRowId is stored in the tracking struct rather than at manifest level. - ContentEntryReader.setEntry uses wrapAppendPreservingFirstRowId for ADDED entries so firstRowId read from the tracking struct is not re-suppressed. - ContentEntryAdapter preserves firstRowId for EXISTING entries so uncommitted manifests can round-trip per-entry row IDs. - ContentEntryManifestReaderAdapter applies the same committed/uncommitted firstRowId nullification logic as ManifestReader.idAssigner. - ContentEntryManifestReaderAdapter.iterator tracks ordinal position and sets fileOrdinal and manifestLocation on each BaseFile to match Avro reader behavior. - Parquet.readSchema(InputFile) is a new public helper that returns just the Iceberg-converted file schema; InternalParquet.readSchema delegates to it for the DynMethods entry point. - v4 spec forbids content_type=POSITION_DELETES (PR apache#16025); three TestManifestReader tests that write standalone position-delete files / DV delete files are guarded with assumeThat isLessThan(4) and will be removed once PR apache#16677 (or its successor) gates v4 out of the broad parameterized test suite during incubation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gaborkaszab · 2026-06-12T05:57:14Z

+    |----------|------|------|-------|------|-------------|
+    | 0 | **`status`** | `int` (0: EXISTING, 1: ADDED, 2: DELETED, 3: REPLACED) | *required* | *required* | Used to track additions, deletions, and replacements. REPLACED indicates entries with data column updates or `deletion_vector` changes. Deleted entries are required when the snapshot has a non-null parent. Deletes are not used in scans. |
+    | 1 | **`snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the file was added or deleted. Inherited when null. Optional for leaf manifests, required for root. |
+    | 5 | **`dv_snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the deletion vector was added. Inherited when null. Must be null when `deletion_vector` is null. |


According to the latest implementation, the purpose of dv_snapshot_id has changed: It's no longer just for deletion_vector, we set it also when adding deleted_positions and/or replaced_positions.

Discussed offline and @gaborkaszab published a PR #16823, but I think the implementation previously wasn't quite right. I think we don't want to require setting dv_snapshot_id for the diff DV changes since the diff DV presence itself implies that the changes were in that snapshot.

I think we will want to also add a column file snapshot ID as well, will get that on my next update.

…add matching reader V4Writer and V4DeleteWriter now emit content_entry Parquet rows via TrackedFileWrapper/ContentEntryAdapter rather than the legacy manifest_entry Avro shape. ContentEntryReader and ContentEntryManifestReaderAdapter project content_entry rows back to ManifestEntry<DataFile/DeleteFile> so all downstream consumers (ManifestGroup, MergingSnapshotProducer rewrite paths) work unchanged. Read-path dispatch in ManifestFiles is layered: 1. Avro manifests are always legacy (no file inspection). 2. Snapshot-tree callers thread an Integer writerFormatVersion hint through the new package-private read overloads: 1 routes to ContentEntryReader, 0 routes to legacy. 3. Callers without a hint (tests writing-then-reading, ad-hoc tooling) fall back to inspecting the Parquet footer schema for field id 134 (content_type) or 147 (tracking). The footer read is delegated to InternalParquet via DynMethods so core has no compile-time dependency on iceberg-parquet. Key design choices: - TrackedFile.schemaWithContentStats omits partition and content_stats when their struct types are empty (Parquet rejects empty groups). - TrackedFileWrapper uses hasPartition/hasContentStats flags to map positions dynamically when either optional group is absent. - V4Writer.add(DataFile) bypasses Delegates.suppressFirstRowId so per-entry firstRowId is stored in the tracking struct rather than at manifest level. - ContentEntryReader.setEntry uses wrapAppendPreservingFirstRowId for ADDED entries so firstRowId read from the tracking struct is not re-suppressed. - ContentEntryAdapter preserves firstRowId for EXISTING entries so uncommitted manifests can round-trip per-entry row IDs. - ContentEntryManifestReaderAdapter applies the same committed/uncommitted firstRowId nullification logic as ManifestReader.idAssigner. - ContentEntryManifestReaderAdapter.iterator tracks ordinal position and sets fileOrdinal and manifestLocation on each BaseFile to match Avro reader behavior. - Parquet.readSchema(InputFile) is a new public helper that returns just the Iceberg-converted file schema; InternalParquet.readSchema delegates to it for the DynMethods entry point. - v4 spec forbids content_type=POSITION_DELETES (PR apache#16025); three TestManifestReader tests that write standalone position-delete files / DV delete files are guarded with assumeThat isLessThan(4) and will be removed once PR apache#16677 (or its successor) gates v4 out of the broad parameterized test suite during incubation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

amogh-jahagirdar · 2026-06-22T15:57:44Z

+
+    | Field id | Name | Type | Write | Read | Description |
+    |----------|------|------|-------|------|-------------|
+    | 0 | **`status`** | `int` (0: EXISTING, 1: ADDED, 2: DELETED, 3: REPLACED, 4: MODIFIED) | *required* | *required* | Used to track additions, deletions, replacements, and modifications. When a data file's `deletion_vector` or `column_files` change, REPLACED marks the prior version of the entry and MODIFIED marks the new, live version. For leaf manifest entries, MODIFIED marks a live manifest whose `dv` changed. Deletes are not used in scans. |


We should have a different place to keep all the rules for Replaced/modified, I don't think it should be here in the description

amogh-jahagirdar · 2026-06-22T15:59:23Z

+    |----------|------|------|-------|------|-------------|
+    | 0 | **`status`** | `int` (0: EXISTING, 1: ADDED, 2: DELETED, 3: REPLACED) | *required* | *required* | Used to track additions, deletions, and replacements. REPLACED indicates entries with data column updates or `deletion_vector` changes. Deleted entries are required when the snapshot has a non-null parent. Deletes are not used in scans. |
+    | 1 | **`snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the file was added or deleted. Inherited when null. Optional for leaf manifests, required for root. |
+    | 5 | **`dv_snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the deletion vector was added. Inherited when null. Must be null when `deletion_vector` is null. |


Discussed offline and @gaborkaszab published a PR #16823, but I think the implementation previously wasn't quite right. I think we don't want to require setting dv_snapshot_id for the diff DV changes since the diff DV presence itself implies that the changes were in that snapshot.

amogh-jahagirdar · 2026-06-22T15:59:42Z

+    |----------|------|------|-------|------|-------------|
+    | 0 | **`status`** | `int` (0: EXISTING, 1: ADDED, 2: DELETED, 3: REPLACED) | *required* | *required* | Used to track additions, deletions, and replacements. REPLACED indicates entries with data column updates or `deletion_vector` changes. Deleted entries are required when the snapshot has a non-null parent. Deletes are not used in scans. |
+    | 1 | **`snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the file was added or deleted. Inherited when null. Optional for leaf manifests, required for root. |
+    | 5 | **`dv_snapshot_id`** | `long` | *optional* | *optional* | Snapshot ID where the deletion vector was added. Inherited when null. Must be null when `deletion_vector` is null. |


I think we will want to also add a column file snapshot ID as well, will get that on my next update.

stevenzwu · 2026-06-24T21:57:15Z

+    | 147 | **`tracking`** | `tracking` struct | *required* | *required* | Groups status, snapshot, and sequence number. See tracking struct below. |
+    | 141 | **`spec_id`** | `int` | *optional* | *optional* | ID of the partition spec used to write this manifest or data file. |
+    | 140 | **`sort_order_id`** | `int` | *optional* | *optional* | ID representing sort order for this file. |
+    | 103 | **`record_count`** | `long` | *required* | *required* | Number of records in this file. |


for a leaf manifest file entry, this captures the number of data files in the manifest file, right? we are not talking about the sum of the number of rows in every data file. ManifestInfo has those added/deleted/replaced-rows-count.

Hey yes, for a manifest this should just be physical record count in the manifest just like it'd be physical record count in a data file. As you said, manifest info already captures all the aggregated information already. I think that's a correct separation, because manifest_info has all the other aggregated stats information. I can make this a bit more clear in the spec

…add matching reader V4Writer and V4DeleteWriter now emit content_entry Parquet rows via TrackedFileWrapper/ContentEntryAdapter rather than the legacy manifest_entry Avro shape. ContentEntryReader and ContentEntryManifestReaderAdapter project content_entry rows back to ManifestEntry<DataFile/DeleteFile> so all downstream consumers (ManifestGroup, MergingSnapshotProducer rewrite paths) work unchanged. Read-path dispatch in ManifestFiles is layered: 1. Avro manifests are always legacy (no file inspection). 2. Snapshot-tree callers thread an Integer writerFormatVersion hint through the new package-private read overloads: 1 routes to ContentEntryReader, 0 routes to legacy. 3. Callers without a hint (tests writing-then-reading, ad-hoc tooling) fall back to inspecting the Parquet footer schema for field id 134 (content_type) or 147 (tracking). The footer read is delegated to InternalParquet via DynMethods so core has no compile-time dependency on iceberg-parquet. Key design choices: - TrackedFile.schemaWithContentStats omits partition and content_stats when their struct types are empty (Parquet rejects empty groups). - TrackedFileWrapper uses hasPartition/hasContentStats flags to map positions dynamically when either optional group is absent. - V4Writer.add(DataFile) bypasses Delegates.suppressFirstRowId so per-entry firstRowId is stored in the tracking struct rather than at manifest level. - ContentEntryReader.setEntry uses wrapAppendPreservingFirstRowId for ADDED entries so firstRowId read from the tracking struct is not re-suppressed. - ContentEntryAdapter preserves firstRowId for EXISTING entries so uncommitted manifests can round-trip per-entry row IDs. - ContentEntryManifestReaderAdapter applies the same committed/uncommitted firstRowId nullification logic as ManifestReader.idAssigner. - ContentEntryManifestReaderAdapter.iterator tracks ordinal position and sets fileOrdinal and manifestLocation on each BaseFile to match Avro reader behavior. - Parquet.readSchema(InputFile) is a new public helper that returns just the Iceberg-converted file schema; InternalParquet.readSchema delegates to it for the DynMethods entry point. - v4 spec forbids content_type=POSITION_DELETES (PR apache#16025); three TestManifestReader tests that write standalone position-delete files / DV delete files are guarded with assumeThat isLessThan(4) and will be removed once PR apache#16677 (or its successor) gates v4 out of the broad parameterized test suite during incubation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the Specification Issues that may introduce spec changes. label Apr 18, 2026

amogh-jahagirdar force-pushed the v4-amt-changes branch from 544e59e to f3d76cd Compare April 21, 2026 17:15

nastra mentioned this pull request Apr 24, 2026

Spec: Add content stats to spec #14234

Merged

amogh-jahagirdar changed the title ~~V4 amt spec changes~~ Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures Apr 28, 2026

amogh-jahagirdar commented Apr 28, 2026

View reviewed changes

amogh-jahagirdar marked this pull request as ready for review April 28, 2026 18:49

amogh-jahagirdar requested review from RussellSpitzer, danielcweeks, nastra, rdblue and stevenzwu May 5, 2026 23:39

stevenzwu added this to V4: metadata tree May 6, 2026

stevenzwu reviewed May 11, 2026

View reviewed changes

RussellSpitzer reviewed May 12, 2026

View reviewed changes

anuragmantri mentioned this pull request May 19, 2026

Spec: Add V4 column updates to the spec #16425

Draft

amogh-jahagirdar and others added 6 commits May 31, 2026 20:46

Spec: Udpate formatting to use material content tabs

90223bb

Collapse v1-v3 into a single tab

d938b83

fixes

ae4998e

more cleanup/fixes

a088065

address some comments

9daff91

amogh-jahagirdar force-pushed the v4-amt-changes branch from 16a16d6 to 9daff91 Compare June 1, 2026 04:12

gaborkaszab reviewed Jun 12, 2026

View reviewed changes

add modified status wording, fix split_offsets, sort order ID

a4d0ca1

amogh-jahagirdar commented Jun 22, 2026

View reviewed changes

stevenzwu reviewed Jun 24, 2026

View reviewed changes


		Value 1 (POSITION_DELETES) no longer applies in entries because deletion vector metadata is colocated with data files (`content_type` 0).

		Leaf data manifests may only contain entries with `content_type` 0 (DATA); leaf delete manifests may only contain entries with `content_type` 2 (EQUALITY DELETES).


		Leaf data manifests may only contain entries with `content_type` 0 (DATA); leaf delete manifests may only contain entries with `content_type` 2 (EQUALITY DELETES).

		The following constraints apply based on `content_type`:

		@@ -75,9 +75,9 @@ This table format tracks individual data files in a table instead of directories

		Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic swap. The table metadata file tracks the table schema, partitioning config, custom properties, and snapshots of the table contents. A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.

		Data files in snapshots are tracked by one or more manifest files that contain a row for each data file in the table, the file's partition data, and its metrics. The data in a snapshot is the union of all files in its manifests. Manifest files are reused across snapshots to avoid rewriting metadata that is slow-changing. Manifests can track data files with any subset of a table and are not associated with partitions.


		The manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation.
		In V1-V3, the manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation. In V4, manifest lists are replaced by a single root manifest per snapshot, which can contain references to data files, delete files, and other data and delete manifests in a unified structure.


		A manifest is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.


		#### Manifest File Format

		Manifests are Avro files in V1-V3. Starting in V4, writers must produce manifests in Parquet.

Uh oh!

Conversation

amogh-jahagirdar commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amogh-jahagirdar commented Apr 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

amogh-jahagirdar commented Apr 18, 2026 •

edited

Loading

stevenzwu Jun 24, 2026 •

edited

Loading