Refactor commit lock mechanism from HiveTableOperations#5877
Conversation
Change-Id: I32ef3305c9df091ad1ccc994a542ab9b54d89cfa
|
|
||
| try { | ||
| if (state.get().equals(LockState.WAITING)) { | ||
| // Retry count is the typical "upper bound of retries" for Tasks.run() function. In fact, |
There was a problem hiding this comment.
Nit: could you please reformat the comment?
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class HiveCommitLock { |
There was a problem hiding this comment.
Would it be useful to implement the standard java.util.concurrent.locks.Lock interface?
There was a problem hiding this comment.
If we do, we may also make it clear in the doc that unlike standard Lock, this is not multi thread safe? (ie, it is not safe to call acquire lock from multiple threads) If my observation is right, I guess its worth documenting in any case
|
@szlta: any particular change to highlight for the review? Also, please check the failing tests |
szehon-ho
left a comment
There was a problem hiding this comment.
I think its good to refactor it, but definitely need to spend a bit of time to review this critical code. Gave a pass
| "Failed to heartbeat for hive lock. %s", | ||
| hiveLockHeartbeat.encounteredException.getMessage()); | ||
| if (!commitLock.isHeartbeatInProgress()) { | ||
| throw new CommitFailedException("Failed to heartbeat for hive lock."); |
There was a problem hiding this comment.
I feel here we may be losing some information in the stack? in the case that encounteredException is not null?
| && hiveLockHeartbeat.encounteredException == null; | ||
| } | ||
|
|
||
| private void acquireLockFromHms() throws UnknownHostException, TException, InterruptedException { |
There was a problem hiding this comment.
Nit: should method be named acquireHmsLock to be consistent with release?
| LOG.warn("Failed to unlock {}.{}", databaseName, tableName, e); | ||
| } | ||
| } | ||
| if (hiveLockHeartbeat != null) { |
There was a problem hiding this comment.
In original code, cancelling seemed to be run before unlock, should we keep it that way?
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class HiveCommitLock { |
There was a problem hiding this comment.
If we do, we may also make it clear in the doc that unlike standard Lock, this is not multi thread safe? (ie, it is not safe to call acquire lock from multiple threads) If my observation is right, I guess its worth documenting in any case
| private final ClientPool<IMetaStoreClient, TException> metaClients; | ||
| private final ScheduledExecutorService exitingScheduledExecutorService; | ||
|
|
||
| private Optional<Long> hmsLockId = Optional.empty(); |
There was a problem hiding this comment.
I am wondering, is there justification to make these Optional versus just null checks?
There are some inteillij warnings because we are using this. Looking into it, there are some who say that Java Optional is not meant for a general purpose Maybe type: https://stackoverflow.com/questions/26327957/should-java-8-getters-return-optional-type/26328555#26328555 hence the intellij warnings. It seems its meant more for return types of methods to indicate that its possible something is null. Of course its more of a debate, but just noticing we dont use a whole lot of Optional elsewhere and may be worth to keep it simple.
I do realize that originally lockId is optional, but mentioning because we are adding another optional (jvmLock). Wdyt?
| } | ||
|
|
||
| private void acquireJvmLock() { | ||
| if (jvmLock.isPresent()) { |
There was a problem hiding this comment.
Im just wondering, as we check both (jvmLock and lockId) for null, it seems it will protect only the case where single caller calls acquire twice? It wont protect from multi thread calls as these are only optional and not atomic, and method acquire is not synchronized. Would a simple boolean suffice then to avoid two null checks?
| "HMS lock ID=%s already acquired for table %s.%s", | ||
| hmsLockId.get(), databaseName, tableName)); | ||
| } | ||
| final LockComponent lockComponent = |
There was a problem hiding this comment.
I do realize its copy and paste, but I know from @aokolnychyi 's comments , we dont tend to use final in inline variable as there's little signfiicance. Could we remove them here?
| "Waiting for lock on table %s.%s", databaseName, tableName)); | ||
| } | ||
| } catch (InterruptedException e) { | ||
| Thread.interrupted(); // Clear the interrupt status flag |
There was a problem hiding this comment.
Again I see its copy and paste, but its weird, it seems this is checking a boolean and ignoring the value. Could you check this?
| } | ||
|
|
||
| private void releaseHmsLock() { | ||
| if (hmsLockId.isPresent()) { |
There was a problem hiding this comment.
If we have a simple boolean acquired (as mentioned in my other comment), we can just replace these theese two null checks with a single check on release(). As these are both private methods, checking both seems excessive?
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
HiveTableOperationshas grown to a substantial size and got pretty complex.A large portion of the logic is the implementation of the locking mechanism which I think could be separated out to reduce this complexity.
Let me know what you think @pvary @szehon-ho