Adjust the auto rebalancer state assignment logic to reduce top state transition. by jiajunwang · Pull Request #986 · apache/helix

jiajunwang · 2020-05-01T22:48:13Z

Issues

My PR addresses the following Helix issues and references them in the PR description:

Description

Here are some details about my PR, including screenshots of any UI changes:

The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example,

The current states are: [N1:M, N2:S, N3,S]
The desired states are: [N4:M, N2:S, N1:S]
Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica.
When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff.
To be clear, we don't want step 3. But only the state transition in step 4.

In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.

Tests

The following tests are written for this issue:

Add the test scenario to TestAbstractRebalancer.

The following is the result of the "mvn test" command on the appropriate module:

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestJobQueueCleanUp.testJobQueueAutoCleanUp » ThreadTimeout Method org.testng....
[ERROR] TestWorkflowTimeout.testWorkflowTimeoutWhenWorkflowCompleted:116 expected: but was:
[INFO]
[ERROR] Tests run: 1145, Failures: 2, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:21 h
[INFO] Finished at: 2020-04-30T23:30:07-07:00
[INFO] ------------------------------------------------------------------------

Rerun:
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.931 s - in TestSuite
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.212 s
[INFO] Finished at: 2020-04-30T23:32:13-07:00
[INFO] ------------------------------------------------------------------------

Commits

My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation (Optional)

In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Code Quality

My diff has been formatted using helix-style.xml
(helix-style-intellij.xml if IntelliJ IDE is used)

junkaixue · 2020-05-01T23:21:40Z

-            stateCount--;
+        String proposedInstance = instance;
+        // Additional check and alternate the assignment for reducing top state handoff.
+        if (state.equals(stateModelDef.getTopState()) && !stateModelDef.getSecondTopStates()


I think you already sorted the preference list based on the state. Why not just replace the looping of preference list with the sorted list but add this logic?

This logic makes two order of preference list intersected each other that logic is not very clear.

I thought about it. But even I move this logic to the sorting logic, the sorting itself will be comparing one list with 2 logics. This interesting trick is just moved to another place. I don't have a way to eliminate it for now.
Then why I want to put it here instead of the sorting logic? Because the sorting logic is away from this business logic block. That would be even harder for me to reason even with comments.
Put it here, I understand it is intersecting the same list, but I can at least give it a good reason.

Please consider this as a skiplist type design/implementation.

I see you point. Purely relies on sorting is fine because the previous author use it in wrong way. We suppose to sort it before looping.

Anyway, I am OK to keep the current behavior. But can you add a TODO here. To remind that we can refactor the logic based on sorting with comprehensive test in the future?

Not exactly. Actually, I don't agree this is a TODO work to put everything in a sorting method. That is just an alternative design, and I don't think it is cleaner than the current one.
Considering the proposed universal sorting method, it is actually merging 2 orders (preference list order and the current state order). However, there is no clear way to define the priority score.
Note the logic here is that if the node is not on secondary state or above but it is attempted to be a master, we should find alternatives according to the current state order.
This logic cannot be done with only the 2 orders. We have to know the proposed states assignment. For example, how many top states and where to assign based on the preference state.
Give this, to merging the sort in one method (instead of the current design), we will need to:

propose a state assignment based on the preference list

input this proposed state mapping as one additional input to the sorting method, and then sort accordingly.

calculate another state assignment based on the re-sorted list.

It seems more confusing and slower than the current design.

Hmm. The sorting logic could be complicated. Alright, let's revisit the logic if it requires refactoring later.

junkaixue

Overall LGTM, please add comments to that.

… transition. The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example, 1. The current states are: [N1:M, N2:S, N3,S] 2. The desired states are: [N4:M, N2:S, N1:S] 3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica. 4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff. To be clear, we don't want step 3. But only the state transition in step 4. In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.

jiajunwang · 2020-05-06T22:47:22Z

This PR is ready to be merged, approved by @dasahcc

…cted. This is to fix a regression which has been introduced by PR apache#986. The PR tried to prioritize the preference list to avoid unnecessary top state transition. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the cauculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This fix will ensure the skipped participant being checked again until it gets the assignment.

…cted. This is to fix a regression which was introduced by PR apache#986. The PR tried to prioritize the preference list to avoid unnecessary top state transition. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the cauculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This fix will ensure the skipped participant being checked again until it gets the assignment.

…cted. (#1098) This is to fix a regression that was introduced by PR #986 The PR tried to prioritize the preference list to avoid unnecessary top state transitions. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the calculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This only happens if the state model has been customized so it is multiple top states and there is an intermediate state with expected count -1 between the top state and the other states. This fix will ensure the skipped participant being checked again until it gets the assignment.

… transition. (apache#986) The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready. For example, 1. The current states are: [N1:M, N2:S, N3,S] 2. The desired states are: [N4:M, N2:S, N1:S] 3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica. 4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff. To be clear, we don't want step 3. But only the state transition in step 4. In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.

…cted. (apache#1098) This is to fix a regression that was introduced by PR apache#986 The PR tried to prioritize the preference list to avoid unnecessary top state transitions. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the calculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list. This only happens if the state model has been customized so it is multiple top states and there is an intermediate state with expected count -1 between the top state and the other states. This fix will ensure the skipped participant being checked again until it gets the assignment.

junkaixue reviewed May 1, 2020

View reviewed changes

junkaixue approved these changes May 6, 2020

View reviewed changes

jiajunwang force-pushed the statetest branch from 30816ee to 89ce5da Compare May 6, 2020 22:31

jiajunwang merged commit a1aba60 into apache:master May 6, 2020

jiajunwang deleted the statetest branch May 6, 2020 22:48

This was referenced Jun 16, 2020

The instance may not be assigned with the replica as expected with some customized state model #1097

Closed

Fix the issue that the instance may not be assigned a replica as expected. #1098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust the auto rebalancer state assignment logic to reduce top state transition.#986

Adjust the auto rebalancer state assignment logic to reduce top state transition.#986
jiajunwang merged 1 commit into
apache:masterfrom
jiajunwang:statetest

jiajunwang commented May 1, 2020

Uh oh!

junkaixue May 1, 2020

Uh oh!

jiajunwang May 1, 2020 •

edited

Loading

Uh oh!

jiajunwang May 1, 2020

Uh oh!

junkaixue May 6, 2020

Uh oh!

jiajunwang May 6, 2020

Uh oh!

junkaixue May 6, 2020

Uh oh!

junkaixue left a comment

Uh oh!

jiajunwang commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiajunwang commented May 1, 2020

Issues

Description

Tests

Commits

Documentation (Optional)

Code Quality

Uh oh!

junkaixue May 1, 2020

Choose a reason for hiding this comment

Uh oh!

jiajunwang May 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiajunwang May 1, 2020

Choose a reason for hiding this comment

Uh oh!

junkaixue May 6, 2020

Choose a reason for hiding this comment

Uh oh!

jiajunwang May 6, 2020

Choose a reason for hiding this comment

Uh oh!

junkaixue May 6, 2020

Choose a reason for hiding this comment

Uh oh!

junkaixue left a comment

Choose a reason for hiding this comment

Uh oh!

jiajunwang commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiajunwang May 1, 2020 •

edited

Loading