Skip to content

Hive: Set the Table owner on table creation#5763

Merged
pvary merged 1 commit into
apache:masterfrom
gaborkaszab:table_owner
Oct 21, 2022
Merged

Hive: Set the Table owner on table creation#5763
pvary merged 1 commit into
apache:masterfrom
gaborkaszab:table_owner

Conversation

@gaborkaszab
Copy link
Copy Markdown
Contributor

When HiveCatalog is used for creating a table the table owner property is taken from a system property of the process running Iceberg. However, in some clients (e.g. Apache Impala) it's not the owner of the process who runs the create table operation and as a result not the desired owner gets written to HMS.
For instance, in Impala's case the owner of Catalogd (the process that makes the Iceberg API calls) was set as the table owner and not the user who ran the CREATE TABLE statement.

I changed HiveCatalog to accept an owner parameter from the clients. Note, other Catalog implementations aren't affected because the issue is related to HMS only.

@gaborkaszab
Copy link
Copy Markdown
Contributor Author

This didn't get much attention I'm afraid. @pvary would you mind taking a quick look?

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Sep 30, 2022

Is there a way to do this without a change in the core?

@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented Oct 3, 2022

In PyIceberg we had a similar concern. We ended up by fetching username from the properties: https://github.com/apache/iceberg/blob/master/python/pyiceberg/catalog/hive.py#L317 Similar to what we do for location (METADATA_LOCATION_PROP).

@gaborkaszab
Copy link
Copy Markdown
Contributor Author

In PyIceberg we had a similar concern. We ended up by fetching username from the properties: https://github.com/apache/iceberg/blob/master/python/pyiceberg/catalog/hive.py#L317 Similar to what we do for location (METADATA_LOCATION_PROP).

Thanks for taking a look, @Fokko ! For some reason I figured that using a table property to provide the owner is not the proper way to go. However, I changed my patch to use a table property as you suggested. This frankly seems a way simpler implementation.

@gaborkaszab
Copy link
Copy Markdown
Contributor Author

Is there a way to do this without a change in the core?

Thanks for taking a look, @pvary ! After changing my patch as Fokko suggested there is still a line of change in core: Adding the constant for the new tableproperty. Do you think this can remain or should I find another place in hive-metastore?

TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
String location = temp.newFolder("tbl").toString();
String owner = "some_owner";
ImmutableMap<String, String> properties =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also write the username to the HMS table property list?
Is this what we expect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also write the username to the HMS table property list? Is this what we expect?

Good point! This was also added ad an HMS table property but I think it's not needed. I changed the code to only set the owner of the table but not to add the owner field as a table property to HMS.


// 'hms_table_owner' is only used for setting the owner of the table during table creation. No
// need to add it as a table property to HMS.
parameters.remove(TableProperties.HMS_TABLE_OWNER);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you did not put this into the previous foreach?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing specific. Moved this to a filter() before the foreach()

@gaborkaszab
Copy link
Copy Markdown
Contributor Author

I think the broken build here is due to a flaky test in TestSnapshotUtil. 62eb74c this is meant to fix it but wasn't part of this build.
Is there a way to re-execute the tests on this PR? (without pushing any new code to this review)

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Oct 20, 2022

@gaborkaszab: Could you please check the failed test? Otherwise LGTM +1

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Oct 20, 2022

I think the broken build here is due to a flaky test in TestSnapshotUtil. 62eb74c this is meant to fix it but wasn't part of this build. Is there a way to re-execute the tests on this PR? (without pushing any new code to this review)

I would just change the comment of the last commit and force push to the branch

When HiveCatalog is used for creating a table the table owner property
is taken from a system property of the process running Iceberg. However,
in some clients (e.g. Apache Impala) it's not the owner of the
process who runs the create table operation and as a result not the
desired owner gets written to HMS.
For instance, in Impala's case the owner of Catalogd (the process that
makes the Iceberg API calls) was set as the table owner and not the
user who ran the CREATE TABLE statement.

I changed Hive Catalog to accept an owner via a table property from the
clients.
Note, other Catalog implementations aren't affected because the issue
is related to HMS only.
@gaborkaszab
Copy link
Copy Markdown
Contributor Author

I think the broken build here is due to a flaky test in TestSnapshotUtil. 62eb74c this is meant to fix it but wasn't part of this build. Is there a way to re-execute the tests on this PR? (without pushing any new code to this review)

I would just change the comment of the last commit and force push to the branch

thx. a rebase and a re-run helped.

@pvary pvary merged commit b215c48 into apache:master Oct 21, 2022
@pvary
Copy link
Copy Markdown
Contributor

pvary commented Oct 21, 2022

Thanks for the PR @gaborkaszab!

@pvary pvary changed the title Core, Hive-metastore: Table owner can be incorrect in HMS Hive: Set the Table owner on table creation Oct 21, 2022
@gaborkaszab
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing @pvary and @Fokko!

@gaborkaszab gaborkaszab deleted the table_owner branch October 21, 2022 10:47
asfgit pushed a commit to apache/impala that referenced this pull request Nov 16, 2023
IMPALA-11429 made the creation of an Iceberg table happen in 2 steps.
The first step creates the table, however, with wrong owner and the
second step is an ALTER TABLE to set the correct table owner.

Since Iceberg 1.1.0 there is now a way to provide a table owner via a
table property so we can make the create table operation to take one
step again.
apache/iceberg#5763
apache/iceberg#6154

This patch implements this behavior, when creating an Iceberg table
through HiveCatalog, specifying HMS_TABLE_OWNER as
hive.metastore.table.owner in properties can do it in one step.

Testing:
- Use the existing test test_iceberg.py test_table_owner.

Change-Id: I56ef7929449105af571d1fb9cb585d9b0733a39d
Reviewed-on: http://gerrit.cloudera.org:8080/20646
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants