Hive: Arrange common part of the code for Iceberg View.#10001
Conversation
| protected void doCommit(TableMetadata base, TableMetadata metadata) { | ||
| boolean newTable = base == null; | ||
| String newMetadataLocation = writeNewMetadataIfRequired(newTable, metadata); | ||
| boolean hiveEngineEnabled = hiveEngineEnabled(metadata, conf); |
There was a problem hiding this comment.
All this common code has been moved to HiveOperationsBase .
There was a problem hiding this comment.
see my comment in https://github.com/apache/iceberg/pull/10001/files#r1533430710. Please revert these changes here. Once view support is in, we can revisit what we'll extract out
| && e.getMessage().contains(String.format("new table %s already exists", to))) { | ||
| throw new org.apache.iceberg.exceptions.AlreadyExistsException( | ||
| "Table already exists: %s", to); | ||
| throwErrorForExistedToContent(fromIdentifier, removeCatalogName(toIdentifier)); |
There was a problem hiding this comment.
I don't think we want to do this here, as this would do another call to retrieve the table. I would leave the code here as it was before
There was a problem hiding this comment.
Motivation behind this changes is
ViewCatalogTests expects error message like Cannot rename %s to %s. %s already exists. It should explicitly tell the upstream TO tableType.
Also we are making only one call to retrieve TO content.
Nevertheless, What do you propose to match the error messages for ViewCatalogTests ?
There was a problem hiding this comment.
what I mean is that we shouldn't be doing whather it's doing in throwErrorForExistedToContent() at this point and just throw the AlreadyExistsException as it was done previously. We don't want do add new functionality in this PR IMO
| Maps.newHashMap(), | ||
| null, | ||
| null, | ||
| properties.getOrDefault(HiveCatalog.VIEW_ORIGINAL_TEXT, null), |
There was a problem hiding this comment.
why is this adding view-specific things?
There was a problem hiding this comment.
I think this might not be specific to Hive-View. This is another field in Hive-Table . IMO, This field either we can ignore for both iceberg-table and view or make it optional with properties.
Ref: https://github.com/apache/hive/blame/df45194268ffb10f9f7aae0ab4e3aec35a7b31d8/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java#L23
There was a problem hiding this comment.
I think this PR should be a pure refactoring and not add new things/functionality. The current code on main doesn't look at this property, so why do we need to check this property as part of this PR?
|
@nastra , Please let me know if there are more comments ? |
| * @param newMetadataLocation newly written metadata location | ||
| * @return true if the new metadata location presents with current or previous metadata files. | ||
| */ | ||
| protected boolean checkCurrentMetadataLocation(String newMetadataLocation) { |
There was a problem hiding this comment.
why does this have to be protected?
There was a problem hiding this comment.
Not Private , bcoz It should be available outside of the core-package. Not public as it should be only scoped too child class. Do we need public privilege here ?
There was a problem hiding this comment.
I think this should be private. Why would it need to be available for subclasses? I don't think subclasses would call this code?
There was a problem hiding this comment.
It depends how the checkCommitStatus is being called. It it is using
BaseMetastoreTableOperations.checkCommitStatus it can be private . it it is using BaseMetastoreOperations.checkCommitStatus it should be protected.
| * Checks the new metadata location presents or not after refreshing the table. | ||
| * | ||
| * @param newMetadataLocation newly written metadata location | ||
| * @return true if the new metadata location presents with current or previous metadata files. |
There was a problem hiding this comment.
| * @return true if the new metadata location presents with current or previous metadata files. | |
| * @return true if the new metadata location is the current metadata location or present within previous metadata files. |
please also update the javadoc in L317 with a similar sentence as here
| TableIdentifier from, | ||
| TableIdentifier originalTo, | ||
| HiveOperationsBase.ContentType contentType) { | ||
| Preconditions.checkArgument(isValidIdentifier(from), "Invalid identifier: %s", from); |
There was a problem hiding this comment.
this is changing existing behavior. Previously it would throw NoSuchTableException
There was a problem hiding this comment.
Intent here was to have common exception for Table/View. But I think with View impl we will have to revisit this.
| } | ||
|
|
||
| @SuppressWarnings("checkstyle:CyclomaticComplexity") | ||
| default void commitWithHiveLock( |
There was a problem hiding this comment.
can we revert any changes in this class starting from L276 to L473? I feel like we're extracting too much functionality into a common place and some things are slightly different for tables/views later.
That's why I think it would be better to revert L276 to L473 at this point. In the first iteration of view support it's probably ok if some things are duplicated and once View support for Hive is in, we can do another round of refactoring to extract what's really common
| return null; | ||
| } | ||
| } | ||
| public void setHmsParameters( |
There was a problem hiding this comment.
It has been split into two parts , one is specific to Table like snapshot,stats, etc. Other pat is common parameters as schema, location, uuid.
There was a problem hiding this comment.
please revert this to how it was before
…able/View with other comments.
90e8aaa to
ac26860
Compare
| tableName, | ||
| e); | ||
| commitStatus = checkCommitStatus(newMetadataLocation, metadata); | ||
| commitStatus = |
There was a problem hiding this comment.
Here we can use the original method from BaseMetastoreTableOperations.checkCommitStatus too.
…us instead of BaseMetastoreOperations.checkCommitStatus for Hive-Table.
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class BaseMetastoreOperations { |
There was a problem hiding this comment.
| public class BaseMetastoreOperations { | |
| public abstract class BaseMetastoreOperations { |
| public void fullTableNameWithDifferentValues() { | ||
| String uriTypeCatalogName = "thrift://host:port/db.table"; | ||
| String nameSpaceWithOneLevel = "ns"; | ||
| String nameSpaceWithTwoLevel = "ns.l2"; |
There was a problem hiding this comment.
| String nameSpaceWithTwoLevel = "ns.l2"; | |
| String namespaceWithTwoLevels = "ns.l2"; |
| @Test | ||
| public void fullTableNameWithDifferentValues() { | ||
| String uriTypeCatalogName = "thrift://host:port/db.table"; | ||
| String nameSpaceWithOneLevel = "ns"; |
There was a problem hiding this comment.
| String nameSpaceWithOneLevel = "ns"; | |
| String namespace = "ns"; |
| try { | ||
| Table table = clients.run(client -> client.getTable(fromDatabase, fromName)); | ||
| HiveOperationsBase.validateTableIsIceberg(table, fullTableName(name, from)); | ||
| validateTableIsIcebergTableOrView(contentType, name, table, from); |
There was a problem hiding this comment.
rather than passing the catalog name / table / from separately, it's probably better to just pass the table and the fullTableName: fullTableName(name, from)
|
|
||
| private void validateTableIsIcebergTableOrView( | ||
| HiveOperationsBase.ContentType contentType, | ||
| String catalogName, |
There was a problem hiding this comment.
| String catalogName, |
| HiveOperationsBase.ContentType contentType, | ||
| String catalogName, | ||
| Table table, | ||
| TableIdentifier identifier) { |
There was a problem hiding this comment.
| TableIdentifier identifier) { | |
| String fullTableName) { |
|
|
||
| String table(); | ||
|
|
||
| String catalogName(); |
There was a problem hiding this comment.
I don't think we should be introducing catalogName()/ contentType() to this API just for printing it in a TRACE log inside loadHmsTable(). Please remove catalogName()/ contentType() from this API
| return metaClients().run(client -> client.getTable(database(), table())); | ||
| } catch (NoSuchObjectException nte) { | ||
| LOG.trace( | ||
| "{} not found {}", contentType(), catalogName() + "." + database() + "." + table(), nte); |
There was a problem hiding this comment.
| "{} not found {}", contentType(), catalogName() + "." + database() + "." + table(), nte); | |
| "{} not found {}", contentType(), database() + "." + table(), nte); |
this is just TRACE logging, so it should be ok to not have the catalog. I'm not even sure that logging stmt should even exist
| Optional.ofNullable(tbl.getParameters()).orElseGet(Maps::newHashMap); | ||
|
|
||
| if (!obsoleteProps.contains(TableProperties.UUID) && uuid != null) { | ||
| parameters.put(TableProperties.UUID, uuid); |
There was a problem hiding this comment.
In https://github.com/apache/iceberg/pull/9852/files#diff-db46657b84d66e084e15f31b8dab21577efb2ae7102863f94c6c9477782de676R206 you're passing the View UUID and I wonder whether this is actually correct to do so?
This is another example where it can be dangerous to have a common method to set something that seems could be common across tables/views.
I would suggest to remove setCommonHmsParameters() from this API
| return maxHiveTablePropertySize() > 0; | ||
| } | ||
|
|
||
| default void setSchema(TableMetadata metadata, Map<String, String> parameters) { |
There was a problem hiding this comment.
we need to keep the original method (for backwards compatibiliy), but the method can refer to the new setSchema() method:
default void setSchema(TableMetadata metadata, Map<String, String> parameters) {
setSchema(metadata.schema(), parameters);
}
| } | ||
| } | ||
|
|
||
| static StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hiveEngineEnabled) { |
There was a problem hiding this comment.
same as above, we need to keep StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hiveEngineEnabled) but it can internally call static StorageDescriptor storageDescriptor(Schema schema, String location, boolean hiveEngineEnabled)
| public void setHmsParameters( | ||
| TableMetadata metadata, Table tbl, String newMetadataLocation, Set<String> obsoleteProps) { | ||
|
|
||
| private void setHmsTableParameters( |
There was a problem hiding this comment.
please revert these changes in this method
| } | ||
|
|
||
| @Override | ||
| public String catalogName() { |
There was a problem hiding this comment.
please remove these two methods
| .doesNotContainKey(CURRENT_SNAPSHOT_TIMESTAMP); | ||
|
|
||
| ops.setSchema(metadata, parameters); | ||
| ops.setSchema(metadata.schema(), parameters); |
There was a problem hiding this comment.
| ops.setSchema(metadata.schema(), parameters); | |
| ops.setSchema(metadata, parameters); |
There was a problem hiding this comment.
no need to change this, as we need to keep the original method signature
…nd HiveOperationsBase.
Hive considers iceberg-view just another type of Hive-Table. So Iceberg-View on Hive should have same strategy for commit as Iceberg-Table on Hive. As part of this refactoring all the common code for table which can be used by view too has been moved to common places. Now
HiveOperationsBasehas all the common code for committing the iceberg-table and iceberg-view.This change also define a new abstract class
BaseMetastoreOperationswhich can contains common code fromBaseMetastoreTableOperationsandBaseViewOperations.Motivation behind this change to have less diff for implementing Iceberg-view on Hive.