Skip to content

feat: sql catalog support update table#862

Closed
Li0k wants to merge 5 commits into
apache:mainfrom
Li0k:li0k/catalog_sql_update_table
Closed

feat: sql catalog support update table#862
Li0k wants to merge 5 commits into
apache:mainfrom
Li0k:li0k/catalog_sql_update_table

Conversation

@Li0k

@Li0k Li0k commented Dec 31, 2024

Copy link
Copy Markdown
Contributor

This PR support update_table interface for sql catalog

  • support update_table
  • add some UT

Other PRs for reference:

After these PRs have been merged, we can use sql database as the catalog backend

@Li0k

Li0k commented Dec 31, 2024

Copy link
Copy Markdown
Contributor Author

cc @Xuanwo @liurenjie1024 @ZENOTME

@liurenjie1024 liurenjie1024 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Li0k for this pr, but I have concerning introducing update table at this moment as there are many missing features such as conflict detection, commit retry.


/// Returns snapshot references.
#[inline]
pub fn snapshot_refs(&self) -> &HashMap<String, SnapshotReference> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this method? We already have lookup method for snapshot

/// TableCommit represents the commit of a table in the catalog.
#[derive(Debug, TypedBuilder)]
#[builder(build_method(vis = "pub(crate)"))]
#[builder(build_method(vis = "pub"))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we make TableCommit crate only is that we don't want to allow user to build it manually, all table commits construction should go through transaction api.

update_table_metadata_builder = table_update.apply(update_table_metadata_builder)?;
}

for table_requirement in requirements {

@DerGut DerGut May 21, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the requirements be checked in a transaction (that executes the update statement)? Otherwise a conflicting concurrent commit can update first and we end up in a broken table state.

The table metadata that's used to validate the requirements would also need to be loaded within the transaction.

@DerGut DerGut May 21, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would also make sense to explicitly set a transaction isolation level of repeatable read. Postgres for example, defaults to read committed which can similarly get us into a broken table state:

read committed allows us to see different versions of the same row between the SELECT statement (that we use to validate the commit requirements) and the UPDATE statement. Effectively, a concurrently running conflicting update operation that commits between SELECT and UPDATE will still allow our UPDATE to succeed. We were not able to re-check the new table requirements but only checked the old ones -> we end up in a broken state.

With repeatable read on the other hand, the UPDATE should safely fail with a serialization error.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Feb 26, 2026
@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants