Skip to content

HIVE-26882: Allow transactional check of Table parameter before altering the Table#3888

Merged
pvary merged 3 commits into
apache:masterfrom
pvary:expected
Jan 8, 2023
Merged

HIVE-26882: Allow transactional check of Table parameter before altering the Table#3888
pvary merged 3 commits into
apache:masterfrom
pvary:expected

Conversation

@pvary
Copy link
Copy Markdown
Contributor

@pvary pvary commented Dec 21, 2022

What changes were proposed in this pull request?

Adding the possibility to transactionally check if a Table parameter is changed before altering the table in the HMS.

Why are the changes needed?

This would provide an alternative, less error-prone and faster way to commit an Iceberg table, as the Iceberg table currently needs to:

  • Create an exclusive lock
  • Get the table metadata to check if the current snapshot is not changed
  • Update the table metadata
  • Release the lock

After the change these 4 HMS calls could be substituted with a single alter table call.
Also we could avoid cases where the locks are left hanging by failed processes

Does this PR introduce any user-facing change?

HMS API changes:

  • Added expectedParameterKey, and expectedParameterValue to the AlterTableRequest object
  • Alternative option to provide hive_metastoreConstants.EXPECTED_PARAMETER_KEY and hive_metastoreConstants.EXPECTED_PARAMETER_VALUE through the EnvironmentContext

How was this patch tested?

Added 2 new unit tests

@github-actions github-actions Bot requested a review from klcopp December 21, 2022 09:10
@pvary pvary changed the title HMS: Allow transactional check of Table parameter before altering the Table HIVE-26882: Allow transactional check of Table parameter before altering the Table Dec 21, 2022
private void alter_table_core(String catName, String dbname, String name, Table newTable,
EnvironmentContext envContext, String validWriteIdList, List<String> processorCapabilities, String processorId)
EnvironmentContext envContext, String validWriteIdList, List<String> processorCapabilities,
String processorId, String expectedPropertyKey, String expectedPropertyValue)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider allowing a list of table properties here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg does not explicitly need this, and it would somewhat complicate the code. So I voted down this. Still I do not feel too strongly either way, so if you think it could be useful, I am open to add multiple checks based on a map (do we have map in thrift?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought, I would opt for keeping the PR as it is - checking only for a single property.
If someone would need more properties to check, then they still could create an uber property - as a hash of the relevant properties -, and use the EXPECTED_PARAMETER_KEY and EXPECTED_PARAMETER_VALUE to check for the changes of this uber property. And if it is that hard for them, they could still add a new feature later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I still think its cleaner to support multi-properties transactional check, given Table has properties map, I'm leave it to you and @prasanthj

environmentContext.getProperties().get(hive_metastoreConstants.EXPECTED_PARAMETER_VALUE) : null;
if (expectedKey != null && expectedValue != null
&& !expectedValue.equals(oldt.getParameters().get(expectedKey))) {
throw new MetaException("The table has been modified. The parameter value for key '" + expectedKey + "' is '"
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, this only works if a user has table lock while calling alter right?

If user does not have lock, Hive has no internal lock to prevent two users from both getting their expected value and proceeding right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully this is not true.
HMS does these changes using the underlying DB’s transaction.
See:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out, that you have been right. We use READ_COMMITTED isolation level for our commits.
Because of this, we could commit concurrently, and lose data.

I had to set the isolation level for these transactions to REPEATABLE_READ explicitly.

@szehon-ho and @prasanthj could you please check it if you are back from holidays?

Thanks,
Peter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. Is this a setting user can configure in Hive? If so, it sounds like we should document it as a pre-requisite for user to enable this feature

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users does not have to do anything. We set the isolation level per transaction automatically from the code. Similarly as we do in the TxnHandler methods where the default isolation level is not enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the relevant code part:

msdb.openTransaction(Constants.TX_REPEATABLE_READ)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@pvary pvary merged commit 7a5bc54 into apache:master Jan 8, 2023
@pvary pvary deleted the expected branch January 8, 2023 22:02
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Jan 8, 2023

Pushed to master.
Thanks @prasanthj and @szehon-ho for the review!

rkirtir pushed a commit to rkirtir/hive that referenced this pull request Jan 9, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 13, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 14, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 14, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 14, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 14, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary pushed a commit to pvary/hive that referenced this pull request Jan 14, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
pvary added a commit that referenced this pull request Jan 15, 2023
…ing the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (#3943)
pvary added a commit that referenced this pull request Jan 15, 2023
…ing the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (#3944)
pvary added a commit that referenced this pull request Jan 15, 2023
…ing the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (#3946)

* HIVE-17981 Create a set of builders for Thrift classes.  This closes #274.  (Alan Gates, reviewed by Peter Vary)

* HIVE-18355: Add builder for metastore Thrift classes missed in the first pass - FunctionBuilder (Peter Vary, reviewed by Alan Gates)

* HIVE-18372: Create testing infra to test different HMS instances (Peter Vary, reviewed by Marta Kuczora, Vihang Karajgaonkar and Adam Szita)

* HIVE-26882: Allow transactional check of Table parameter before altering the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)

Co-authored-by: Alan Gates <gates@hortonworks.com>
Co-authored-by: Peter Vary <pvary@cloudera.com>
Co-authored-by: Peter Vary <peter_vary4@apple.com>
pvary added a commit that referenced this pull request Jan 15, 2023
…ing the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (#3947)

* HIVE-17981 Create a set of builders for Thrift classes.  This closes #274.  (Alan Gates, reviewed by Peter Vary)

* HIVE-18355: Add builder for metastore Thrift classes missed in the first pass - FunctionBuilder (Peter Vary, reviewed by Alan Gates)

* HIVE-18372: Create testing infra to test different HMS instances (Peter Vary, reviewed by Marta Kuczora, Vihang Karajgaonkar and Adam Szita)

* HIVE-26882: Allow transactional check of Table parameter before altering the Table (#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)

Co-authored-by: Alan Gates <gates@hortonworks.com>
Co-authored-by: Peter Vary <pvary@cloudera.com>
Co-authored-by: Peter Vary <peter_vary4@apple.com>
yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho)
karjensia added a commit to nexr/hive that referenced this pull request Jan 2, 2024
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (apache#3944) (#5)

Co-authored-by: pvary <peter.vary.apache@gmail.com>
minyk pushed a commit to nexr/hive that referenced this pull request Feb 29, 2024
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (apache#3944) (#5)

Co-authored-by: pvary <peter.vary.apache@gmail.com>
mudit1289 pushed a commit to mudit1289/hive that referenced this pull request Apr 22, 2024
…ing the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (apache#3944)
mudit1289 pushed a commit to mudit1289/hive that referenced this pull request Jun 26, 2024
HIVE-26882: Allow transactional check of Table parameter before altering the Table (apache#3888) (Peter Vary reviewed by Prasanth Jayachandran and Szehon Ho) (apache#3944)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants