Hive: Set the Table owner on table creation#5763
Conversation
|
This didn't get much attention I'm afraid. @pvary would you mind taking a quick look? |
|
Is there a way to do this without a change in the core? |
|
In PyIceberg we had a similar concern. We ended up by fetching username from the properties: https://github.com/apache/iceberg/blob/master/python/pyiceberg/catalog/hive.py#L317 Similar to what we do for location ( |
c72e5df to
8a0b5ca
Compare
Thanks for taking a look, @Fokko ! For some reason I figured that using a table property to provide the owner is not the proper way to go. However, I changed my patch to use a table property as you suggested. This frankly seems a way simpler implementation. |
Thanks for taking a look, @pvary ! After changing my patch as Fokko suggested there is still a line of change in core: Adding the constant for the new tableproperty. Do you think this can remain or should I find another place in hive-metastore? |
8a0b5ca to
a906297
Compare
| TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl"); | ||
| String location = temp.newFolder("tbl").toString(); | ||
| String owner = "some_owner"; | ||
| ImmutableMap<String, String> properties = |
There was a problem hiding this comment.
Does this also write the username to the HMS table property list?
Is this what we expect?
There was a problem hiding this comment.
Does this also write the username to the HMS table property list? Is this what we expect?
Good point! This was also added ad an HMS table property but I think it's not needed. I changed the code to only set the owner of the table but not to add the owner field as a table property to HMS.
a906297 to
abb7818
Compare
|
|
||
| // 'hms_table_owner' is only used for setting the owner of the table during table creation. No | ||
| // need to add it as a table property to HMS. | ||
| parameters.remove(TableProperties.HMS_TABLE_OWNER); |
There was a problem hiding this comment.
Any reason why you did not put this into the previous foreach?
There was a problem hiding this comment.
Nothing specific. Moved this to a filter() before the foreach()
abb7818 to
5c90479
Compare
|
I think the broken build here is due to a flaky test in TestSnapshotUtil. 62eb74c this is meant to fix it but wasn't part of this build. |
|
@gaborkaszab: Could you please check the failed test? Otherwise LGTM +1 |
I would just change the comment of the last commit and force push to the branch |
5c90479 to
caee19d
Compare
When HiveCatalog is used for creating a table the table owner property is taken from a system property of the process running Iceberg. However, in some clients (e.g. Apache Impala) it's not the owner of the process who runs the create table operation and as a result not the desired owner gets written to HMS. For instance, in Impala's case the owner of Catalogd (the process that makes the Iceberg API calls) was set as the table owner and not the user who ran the CREATE TABLE statement. I changed Hive Catalog to accept an owner via a table property from the clients. Note, other Catalog implementations aren't affected because the issue is related to HMS only.
caee19d to
b51c6e6
Compare
thx. a rebase and a re-run helped. |
|
Thanks for the PR @gaborkaszab! |
IMPALA-11429 made the creation of an Iceberg table happen in 2 steps. The first step creates the table, however, with wrong owner and the second step is an ALTER TABLE to set the correct table owner. Since Iceberg 1.1.0 there is now a way to provide a table owner via a table property so we can make the create table operation to take one step again. apache/iceberg#5763 apache/iceberg#6154 This patch implements this behavior, when creating an Iceberg table through HiveCatalog, specifying HMS_TABLE_OWNER as hive.metastore.table.owner in properties can do it in one step. Testing: - Use the existing test test_iceberg.py test_table_owner. Change-Id: I56ef7929449105af571d1fb9cb585d9b0733a39d Reviewed-on: http://gerrit.cloudera.org:8080/20646 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When HiveCatalog is used for creating a table the table owner property is taken from a system property of the process running Iceberg. However, in some clients (e.g. Apache Impala) it's not the owner of the process who runs the create table operation and as a result not the desired owner gets written to HMS.
For instance, in Impala's case the owner of Catalogd (the process that makes the Iceberg API calls) was set as the table owner and not the user who ran the CREATE TABLE statement.
I changed HiveCatalog to accept an owner parameter from the clients. Note, other Catalog implementations aren't affected because the issue is related to HMS only.