Skip to content

Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882#6570

Merged
pvary merged 5 commits into
apache:masterfrom
pvary:env
Apr 25, 2023
Merged

Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882#6570
pvary merged 5 commits into
apache:masterfrom
pvary:env

Conversation

@pvary
Copy link
Copy Markdown
Contributor

@pvary pvary commented Jan 12, 2023

HIVE-26882 gives us the possibility to atomically change the metadata_location using a single alter_table HMS call. The change is a new feature, so exception is needed from the Hive community to include it to new releases. OTOH I deliberately kept simple, so if someone uses their own Hive release they could backport the change easily.

If we start using this possibility we can avoid using HMS locks to ensure the atomicity of the HiveTableOperations.commit. This has the following benefits:

The solution has 2 parts:

Some of my concerns and thoughts:

  • I am not sure when HIVE-26882 will be available in OS. Here I would ask help from the Hive folks: @InvisibleProgrammer, @TuroczyX, @deniskuzZ.
    • The simplicity of the Hive PR helps alleviating my fears here
  • We already have LockManager interface defined in the iceberg-api module. After some back-and-forth I decided against using it, because of the following reasons:
    • I do not think anyone would like to use HiveLockManager without HiveCatalog
    • The interface is not that useful for us:
      • We would need to keep track of the HMS lockId internally
      • We would need to update the LockManager if the setConf method is called on the HiveCatalog
      • We would need to add something like ensureActive to the interface which is needed for HiveTableOperations
    • The BaseLockManager does not provide too much of a functionality
    • The current configuration keys are different from the ones used by the LockManager implementations

WDYT?
Open for comments and suggestions.

Thanks,
Peter

Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the path Peter, that should definitely improve the Hive iceberg performance. Let me know what kind of help is needed from the Hive folks

Comment thread core/src/main/java/org/apache/iceberg/TableProperties.java Outdated
Comment thread docs/configuration.md Outdated
of the Hive Metastore (`hive.txn.timeout` or `metastore.txn.timeout` in the newer versions). Otherwise, the heartbeats on the lock (which happens during the lock checks) would end up expiring in the
Hive Metastore before the lock is retried from Iceberg.

Note: `iceberg.lock.hive.enabled` should only be set to `false` if [HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it cause some issues in the case of HA and not consistent HMS configs across the instances?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On HMS side HIVE-26882 is needed on both HMS nodes.
This config value is on the Iceberg side (client side from the HMS point of view).

But the question is good, because it highlights an important thing:

  • All of the Iceberg writers should have the same configuration

This is the exact reason why I have added the table level configuration, so we can change the behaviour of every clients "atomically" by changing the table property.
So the upgrade process would look like this:

  • Upgrade all of the clients to have the new code
  • Change the table property to turn off the locking

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there's plan to backport HIVE-26882 into previous Hive major releases, or is this only present on the newest major version (Hive 4)?

Copy link
Copy Markdown
Contributor Author

@VisibleForTesting
HiveLock lockObject(TableMetadata metadata) {
if (hiveLockEnabled(metadata, conf)) {
return new MetastoreLock(conf, metaClients, catalogName, database, tableName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev locking strategy migrated to MetastoreLock, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right

.add("lockState", lockState)
.toString();
@VisibleForTesting
HiveLock lockObject(TableMetadata metadata) {
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we delegate to LockManager to decide what locks need to be created (create an abstraction)? I think it would be more convenient to work in the future. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a thing I was debating a lot.

The situation is that we already have a LockManager interface which is used by other catalogs, but which is not appropriate for us. See, the PR description for more details.

IMHO if you are using HiveCatalog, you should stick to Hive for making sure that the Iceberg commit is atomic. So I do not see multiple implementation in the future for our OtherLockManager interface. Also creating another interface for LockManager would be confusing at best.

These were the reasons behind my decision, but I am not strongly convinced in any case.

@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Jan 17, 2023

LGTM, thanks for the path Peter, that should definitely improve the Hive iceberg performance. Let me know what kind of help is needed from the Hive folks

Hi @deniskuzZ, good to hear from you. I hope you are ok considering...

Happily, I got the approvals from @ayushtkn which were needed for backporting HIVE-26882 to Hive release branches. So the only thing remaining on Hive side is to have a release having the code available, so other projects could start using it.

Thanks for the review!
Peter

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java Outdated
*/
package org.apache.iceberg.hive;

public class NoLock implements HiveLock {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the configuration of the Catalog and the Table the lock could be either NoLock or MetastoreLock

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these efforts @pvary . I left some comments. I think it would be easier to review if we just did the refactor first, then the changes are more minimal for supporting the new NoLock mechanism, which I havent wrapped my mind around fully yet.

Comment thread core/src/main/java/org/apache/iceberg/TableProperties.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java Outdated
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Feb 3, 2023

@szehon-ho, @RussellSpitzer: Updated after the merge of the lock refactor.
This is a much smaller/straightforward change now.
Could you please review?

@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Feb 8, 2023

@szehon-ho: Gentle reminder, could you please take a look when you have some time?
Thanks, Peter

@pvary pvary force-pushed the env branch 3 times, most recently from 6672ecb to c54765a Compare March 17, 2023 09:46
public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;

public static final String HIVE_LOCK_ENABLED = "hive.lock.enabled";
public static final boolean HIVE_LOCK_ENABLED_DEFAULT = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep the engine.hive prefix? How about engine.hive.lock-enabled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I always read 'engine' as for hive-mr, ie running hive-on-iceberg, whereas this is for all engines (spark/flink) that use hive catalog.

private ConfigProperties() {}

public static final String ENGINE_HIVE_ENABLED = "iceberg.engine.hive.enabled";
public static final String LOCK_HIVE_ENABLED = "iceberg.lock.hive.enabled";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the convention from table properties, like engine.hive.enabled, adding the iceberg prefix to the table property.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Mar 25, 2023

Thanks @rdblue for the review. Fixed the config keys

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize its already documented, but still express concern, given how many Iceberg engines we have, how easy it is for user to un-knowingly have an old version of Iceberg that use the old mechanism to lock Hive, which will not know about this mechanism and corrupt the table. But as the concerns are already laid out, can proceed with the pr. Left some questions/comments

* @param conf The hive configuration to use
* @return if the hive engine related values should be enabled or not
*/
private static boolean hiveLockEnabled(TableMetadata metadata, Configuration conf) {
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to allow override the table property using conf? I thought all writes should use the hive lock enabled, or all writers should not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table property could override the catalog property.
This way there is a graceful way to enable the config for every writer at the same time:

  • Start by disabling the config on table level
  • Enable the writers one-by-one based on the writers release process
  • When you are sure that every writer uses the new code, then you can change the table config and all of the writers start using the new locking method at the same time

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds ok if its a double-check, but the way I read this code, if table config is set to NO_LOCK_ENABLED, it doesnt matter what the writer sets? So looks like table config is all that matters.

Should we improve this check to make sure both writer/table config is NO_LOCK?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was thinking more about it, even if we were to make this check return only if both are NO_LOCK, it still doesnt address the original concern that if one writer does not have NO_LOCK but the table has NO_LOCK, then that writer will not switch to NO_LOCK and has chance to be corrupted by others that do.

So I see two options:

  • all writers just follow table's NO_LOCK config
  • writer could do a check, if table is set to NO_LOCK but they are not set to NO_LOCK, throw an exception and prevent the write.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was somewhat different.
I expect that we should highlight in the documentation the possible issues, and then with the configuration provide flexibility to use it correctly.

I expect that if someone does the effort of backporting the required changes, then they have one of the following:

  1. They control everything and migrate with a big boom
  2. Few problematic tables which they need to write without locks, but they do not want to touch the other tables.
  3. They have a few tables which are written by old clients and most of the tables will be written by new clients

In the case of 2/3 there might be a situation where a particular client would like to write one table with locks and one table without locks, but there is a general rule which most of the clients would like to adhere.

Do I overcomplicate things? If we do not expect this kind of situation, then I would prefer a simpler/table level conf.

What do you think?
Thanks, Peter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm trying but still not understanding how having a client level (catalog level) config, that is fallback to table config, allows case 2/3?

To me as it is, it seems it causes more potential problem than solves (client says NO_LOCK but table does not have NO_LOCK). Unless we want to do catalog config as a safety check (throw exception if catalog config is NO_LOCK but table does not have NO_LOCK), as I was mentioning before, which makes more sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understand our differences here.

  • You think about Iceberg as a service, and the users as clients. In this view we have service level configurations (TableProperties), and client level configurations (CatalogProperties)
  • I think about Iceberg as a library which provides different levels of configuration. The configuration provided by the developer (CatalogProperties), and global overrides which prevent configuration discrepancies between different users (TableProperties)

Does this make sense?
Am I right in understanding the different point of views?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I am not thinking of Iceberg like a service, but in my thought, indeed, the TableProperties are a single configuration persisted on the table, and the CatalogProperties is set by individual clients that can override.

My concern in the previous comments was, say we have one table, written by clients A and B. Table property for lock is not set.

  • Client A sets catalog property LOCK_HIVE_ENABLED = "false"
  • Client B has catalog property LOCK_HIVE_ENABLED = "true"

This will make clients corrupt each other commit. So was suggesting, either

  • Remove catalog config altogether
  • Have Client A throw exception in this case (as I was mentioning, this may be better)

Does that make sense, or am I missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove catalog config altogether

Then it is not possible to handle 2/3 use-cases (We either have to set the flag for every table, or rely on the default provided by the Iceberg source code - I think it is more useful for the user to provide an override possibility on catalog level too). I am open to debate the usefulness of this cases, but if we want to support them, then we need 2 levels of configuration.

Have Client A throw exception in this case (as I was mentioning, this may be better)

We can throw an exception if the Table level and the Catalog level config is contradictory. In this case every table used by a Catalog instance should have the same locking configuration if it is set at least for one of the tables used.
If we decide on this then I agree with you that there is no need to have a Catalog level configuration, as it is just for throwing exceptions in some cases.

I am starting to feel that the main difference is:

  • What do we consider more important:
    • Configuration flexibility: If the user wants to set all of the tables to use the new Locking mechanism, they should not need to go to every one of the tables and alter them one-by-one. A global catalog configuration should be enough in these cases. For me the question was how to handle when we would like to use a table differently than the general one and the table level config is proposed to do that.
    • Preventing wrong configurations: With the "only Table level config" solution we can be sure that if every writer uses which uses a code version which contains these changes will not have a writing conflict. BTW this can be archived even with the previous proposal if all of the tables are altered one-by-one to set the Table level config.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think I get the difference as well, if I can summarize, we have two choices

  1. catalog config = NO_LOCK => start writing to all tables with no_lock, unless table is set explicitly to LOCK.
  2. catalog config = NO_LOCK => start writing to tables with no_lock only if table set explicitly to NO_LOCK.

In both cases, we can achieve case 2/3, the issue is whether the user has to set the tables explicitly to exclude or include them in new system.

You are right that maybe Option 2 is too much of a burden for the user to manually mark all tables as NO_LOCK. If all Iceberg clients are upgraded in the system, Option 2 may be a good check as then we are sure all of them will write NO_LOCK together, but we come back to the same problem if some clients are not upgraded. (which is what case 2/3 is trying to solve).

Let me think about it, but I think you are right that we cannot do better in the mixed-version scenarios.

public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;

public static final String HIVE_LOCK_ENABLED = "hive.lock.enabled";
public static final boolean HIVE_LOCK_ENABLED_DEFAULT = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I always read 'engine' as for hive-mr, ie running hive-on-iceberg, whereas this is for all engines (spark/flink) that use hive catalog.

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java Outdated
* @param conf The hive configuration to use
* @return if the hive engine related values should be enabled or not
*/
private static boolean hiveLockEnabled(TableMetadata metadata, Configuration conf) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there any way to validate Hive compaitibility, to prevent old Hive version from disabling Hive locks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no official release with the new code yet, sadly this is not an option 😢

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could at least protect the user from this potential corruption error (running without the patch HIVE-26882)

General question: is there any way we can check Hive server version? Would be nice in general.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is.
So once it get released, we can check the version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think this would be very important to get in for users of OSS Hive, its a bit hacky otherwise.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make an issue to track it on Iceberg side

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #7418

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java Outdated
Map<String, String> env = Maps.newHashMapWithExpectedSize(extraEnv.size() + 1);
env.putAll(extraEnv);
env.put(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE);

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking, if the version of alter_table does not support env context, it will throw an exception instead of silently fail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no 😢 . Historical reasons again

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was thinking, can we throw a ValidationException here then, if the API is to pass in env context, and the alter_table method that we load does not take it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not distinguish between the 2 cases. The API is backwards compatible (as expected by the Hive team), so the HMS will always take the env context input variables, but will not act on them.

I think this is not that interesting case, as the writer should have a clear info about the HMS they are connecting to.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh maybe I was not clear, I meant the alter_method that we load dynamically: https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java#L29 . Original question was just asking: if our method object is the one loaded without env, and we pass in env here, what will happen? (is the error message good enough for user to decipher, and it wont silently pass?) Its a pretty rare scenario, just wondering.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used in other cases as well and doing some extra checks would be edgy.

Added a check for the NoLock constructor to ensure that at least Hive 2 client is used when a NoLock object is created. This ensures that the correct client is used in this case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry , I thought I commented on this, but must have missed it. I was thinking something like:

if (env.size() > 0) {
Preconditions.checkArgument("Environment context is non-empty but alter_table method does not support it", ALTER_TABLE.args().size() == 5);
}

What do you think? Isn't alter_table only called with envContext is the new code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always add ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE) which is an optimisation for skipping stats generation / file listing when it is not needed on alter table commands. This always depended on DynMethod skipping last parameters (maybe by accident)

See:

EnvironmentContext envContext =
new EnvironmentContext(
ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE));
ALTER_TABLE.invoke(client, databaseName, tblName, table, envContext);

We can modify the proposed command to:

if (env.size() > 1) {
  Preconditions.checkArgument("Environment context is non-empty but alter_table method does not support it", 
  ALTER_TABLE.args().size() == 5);
}

Or add a new check to the the new alterTable method, but I find it better / more readable to add the check to the NoLock constructor.

WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was thinking at the very beginning of the method, before we put the ALTER_TABLE? Yea it may be better for understanding of the code if we have both checks, but you are right it's not a big deal in practice, just in the rare case there's a patched version of Hive 2 out there that has alter_table without 5 arguments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked, and every Hive 2 should have the required method on the HMS API:
https://github.com/apache/hive/blob/release-2.0.0/metastore/if/hive_metastore.thrift

So we are good here

Comment thread docs/configuration.md Outdated
Comment thread docs/configuration.md Outdated
Hive Metastore before the lock is retried from Iceberg.

Note: `iceberg.engine.hive.lock-enabled` should only be set to `false` if [HIVE-26882](https://issues.apache.org/jira/browse/HIVE-26882)
is available on the Hive Metastore server and every writer of a given table is using Iceberg 1.2 or later.
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to warn. We also miss warning about some catalogs setting true and others setting false.

How about this to add some details. I use HiveCatalog here, even though its not defined in the doc, otherwise its quite lengthy to continue the language- "catalog using Hive Metastore connector"

Warn: Setting iceberg.engine.hive.lock-enabled=false will cause HiveCatalog to commit to tables without using Hive locks. This should only be set to false if all following conditions are met:

  • HIVE-26882
    is available on the Hive Metastore server
  • All other HiveCatalogs committing to tables that this HiveCatalog commits to are also on Iceberg 1.3 or later
  • All other HiveCatalogs committing to tables that this HiveCatalog commits to have also disabled Hive locks on commit.

Failing to ensure these conditions risks corrupting the table.

Even with iceberg.engine.hive.lock-enabled set to false, a HiveCatalog can still use locks for individual tables by setting the table property 'engine.hive.lock-enabled'='true'. This is useful in the case where other HiveCatalogs cannot be upgraded and set to commit without using Hive locks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Added extra highlights for the warning Failing to ensure these conditions risks corrupting the table.

Map<String, String> env = Maps.newHashMapWithExpectedSize(extraEnv.size() + 1);
env.putAll(extraEnv);
env.put(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking through DynMethod, there is no check and will just pass whatever args it can fit. I think it will be nice to add a 'argLength' method DynMethod and then assert that argLength == 5 here, so ensure the env is not silently dropped.

@szehon-ho
Copy link
Copy Markdown
Member

Hey thanks, I had two additional comments for consideration, otherwise it looks good to me.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added just one minor reply, anyway approved as all major comments have been addressed, thanks @pvary

@pvary pvary merged commit c3232b6 into apache:master Apr 25, 2023
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Apr 25, 2023

Thanks for the review @deniskuzZ, @rdblue, @haizhou-zhao!
And special thanks for @szehon-ho!

@pvary pvary deleted the env branch April 25, 2023 07:14
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
@nastra nastra mentioned this pull request Feb 5, 2024
@chenwyi2
Copy link
Copy Markdown

https://issues.apache.org/jira/browse/HIVE-26882 has something wrong, should we has some documentation to info user?

@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Mar 28, 2024

@chenwyi2: Maybe we should wait a bit and see if we have a fix and document that.

@chenwyi2
Copy link
Copy Markdown

"Minimally Hive 2 HMS client is needed to use HIVE-26882 based locking" why we have to check Hive 2? Suppose if i am in hive 1 and i cherry pick HIVE-26882, that will not be right?

@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Oct 14, 2024

@chenwyi2: If you backport the changes to Hive 1, then you can use the feature. I suggest to create your own release for Iceberg as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants