Skip to content

Spec: Add V4 column updates to the spec#16425

Draft
anuragmantri wants to merge 2 commits into
apache:mainfrom
anuragmantri:spec/column-updates-semantics
Draft

Spec: Add V4 column updates to the spec#16425
anuragmantri wants to merge 2 commits into
apache:mainfrom
anuragmantri:spec/column-updates-semantics

Conversation

@anuragmantri

Copy link
Copy Markdown
Collaborator

These are the spec changes for V4 column updates from this proposal. This will be in conjunction with other V4 spec PRs #16025, #14234 and #15630

@anuragmantri anuragmantri marked this pull request as draft May 19, 2026 14:42
@github-actions github-actions Bot added the Specification Issues that may introduce spec changes. label May 19, 2026
@anuragmantri anuragmantri force-pushed the spec/column-updates-semantics branch from d997ce3 to 7f300b3 Compare June 1, 2026 16:58
Comment thread format/spec.md

* Rows in a column file with a null `_last_updated_sequence_number` inherit the base data file's (bumped) data sequence number. This applies to rows whose values were updated in the commit that produced the column file.
* Rows that carry over unchanged values from a prior column file retain their original `_last_updated_sequence_number`, which the writer materializes as a physical column in the column file.
* The `_row_id` for updated rows is unchanged; it is derived from the base data file's `first_row_id` and the row's position. Column files do not reassign row ids.

@Tishj Tishj Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing to me, it assumes that the _row_id value isn't materialized yet (and can therefore be derived from the "base" data_file), but I don't understand how we can make that assumption?

Wait.. I just realized, the _row_id of a data file should not be in a column-data-file.
I finally understand this section, but it's not worded very clearly.
I think it should instead say:
Column data files MUST NOT contain the _row_id field (field-id 2147483540). Regular "Row ID Assignment"(link to section) rules for reading apply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants