Adjust the auto rebalancer state assignment logic to reduce top state transition.#986
Conversation
| stateCount--; | ||
| String proposedInstance = instance; | ||
| // Additional check and alternate the assignment for reducing top state handoff. | ||
| if (state.equals(stateModelDef.getTopState()) && !stateModelDef.getSecondTopStates() |
There was a problem hiding this comment.
I think you already sorted the preference list based on the state. Why not just replace the looping of preference list with the sorted list but add this logic?
This logic makes two order of preference list intersected each other that logic is not very clear.
There was a problem hiding this comment.
I thought about it. But even I move this logic to the sorting logic, the sorting itself will be comparing one list with 2 logics. This interesting trick is just moved to another place. I don't have a way to eliminate it for now.
Then why I want to put it here instead of the sorting logic? Because the sorting logic is away from this business logic block. That would be even harder for me to reason even with comments.
Put it here, I understand it is intersecting the same list, but I can at least give it a good reason.
There was a problem hiding this comment.
Please consider this as a skiplist type design/implementation.
There was a problem hiding this comment.
I see you point. Purely relies on sorting is fine because the previous author use it in wrong way. We suppose to sort it before looping.
Anyway, I am OK to keep the current behavior. But can you add a TODO here. To remind that we can refactor the logic based on sorting with comprehensive test in the future?
There was a problem hiding this comment.
Not exactly. Actually, I don't agree this is a TODO work to put everything in a sorting method. That is just an alternative design, and I don't think it is cleaner than the current one.
Considering the proposed universal sorting method, it is actually merging 2 orders (preference list order and the current state order). However, there is no clear way to define the priority score.
Note the logic here is that if the node is not on secondary state or above but it is attempted to be a master, we should find alternatives according to the current state order.
This logic cannot be done with only the 2 orders. We have to know the proposed states assignment. For example, how many top states and where to assign based on the preference state.
Give this, to merging the sort in one method (instead of the current design), we will need to:
- propose a state assignment based on the preference list
- input this proposed state mapping as one additional input to the sorting method, and then sort accordingly.
- calculate another state assignment based on the re-sorted list.
It seems more confusing and slower than the current design.
There was a problem hiding this comment.
Hmm. The sorting logic could be complicated. Alright, let's revisit the logic if it requires refactoring later.
junkaixue
left a comment
There was a problem hiding this comment.
Overall LGTM, please add comments to that.
… transition. The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example, 1. The current states are: [N1:M, N2:S, N3,S] 2. The desired states are: [N4:M, N2:S, N1:S] 3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica. 4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff. To be clear, we don't want step 3. But only the state transition in step 4. In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.
|
This PR is ready to be merged, approved by @dasahcc |
…cted. This is to fix a regression which has been introduced by PR apache#986. The PR tried to prioritize the preference list to avoid unnecessary top state transition. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the cauculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This fix will ensure the skipped participant being checked again until it gets the assignment.
…cted. This is to fix a regression which was introduced by PR apache#986. The PR tried to prioritize the preference list to avoid unnecessary top state transition. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the cauculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This fix will ensure the skipped participant being checked again until it gets the assignment.
…cted. (#1098) This is to fix a regression that was introduced by PR #986 The PR tried to prioritize the preference list to avoid unnecessary top state transitions. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the calculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This only happens if the state model has been customized so it is multiple top states and there is an intermediate state with expected count -1 between the top state and the other states. This fix will ensure the skipped participant being checked again until it gets the assignment.
… transition. (apache#986) The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example, 1. The current states are: [N1:M, N2:S, N3,S] 2. The desired states are: [N4:M, N2:S, N1:S] 3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica. 4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff. To be clear, we don't want step 3. But only the state transition in step 4. In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.
…cted. (apache#1098) This is to fix a regression that was introduced by PR apache#986 The PR tried to prioritize the preference list to avoid unnecessary top state transitions. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the calculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This only happens if the state model has been customized so it is multiple top states and there is an intermediate state with expected count -1 between the top state and the other states. This fix will ensure the skipped participant being checked again until it gets the assignment.
Issues
#985
Description
The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example,
To be clear, we don't want step 3. But only the state transition in step 4.
In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.
Tests
Add the test scenario to TestAbstractRebalancer.
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestJobQueueCleanUp.testJobQueueAutoCleanUp » ThreadTimeout Method org.testng....
[ERROR] TestWorkflowTimeout.testWorkflowTimeoutWhenWorkflowCompleted:116 expected: but was:
[INFO]
[ERROR] Tests run: 1145, Failures: 2, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:21 h
[INFO] Finished at: 2020-04-30T23:30:07-07:00
[INFO] ------------------------------------------------------------------------
Rerun:
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.931 s - in TestSuite
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.212 s
[INFO] Finished at: 2020-04-30T23:32:13-07:00
[INFO] ------------------------------------------------------------------------
Commits
Documentation (Optional)
(Link the GitHub wiki you added)
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)