[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA#1697
Merged
Conversation
As the resource agent for rabbitmq with cluster HA restart the rabbitmq service several times, the current check can fail to validate rabbitmq status, as it could do the check just on one of those times that rabbit is up while creating/joining the cluster. Then if the check passed and continued the chef execution, the next steps could fail as they are dependant on having a running rabbitmq, while the rabbitmq server may still be restarting. Instead expand the checks to first look for a rabbit master for the resource and expand the check for a local runing rabbit to make sure we are checking for the local copy. Also add an extra check after the crm checks to make sure there are no pending operations for the resource so we can try to avoid continuing if there is a promotion going on. (cherry picked from commit 3060a3e)
houndci-bot
reviewed
Jul 3, 2018
|
|
||
| # wait for service to have a master, and to be active | ||
| ruby_block "wait for #{ms_name} to be started" do | ||
| block do |
There was a problem hiding this comment.
Metrics/BlockLength: Block has too many lines. [42/40]
| end | ||
|
|
||
| # wait for service to have a master, and to be active | ||
| ruby_block "wait for #{ms_name} to be started" do |
There was a problem hiding this comment.
Metrics/BlockLength: Block has too many lines. [45/40]
Member
Author
|
Do not merge until we know if #1637 needs backport or not (as its included here as part of the backport, if its not needed we should remove it from this PR) |
jsuchome
previously approved these changes
Jul 4, 2018
As the other checks are not enough, as pacemaker keeps restarting rabbitmq, we need a more robust way of checking that rabbit has entered an stable situation. So check that rabbit is up 5 times in a row with a delay of 2 seconds between checks to make sure pacemaker has left it alone. Also, only trigger that check for rabbit if the pacemaker_transaction is updated, otherwise there is no need to do so (cherry picked from commit 8b56894)
584bc25 to
99aa7c7
Compare
houndci-bot
reviewed
Jul 4, 2018
| end | ||
|
|
||
| # wait for service to have a master, and to be active | ||
| ruby_block "wait for #{ms_name} to be started" do |
There was a problem hiding this comment.
Metrics/BlockLength: Block has too many lines. [42/40]
ilausuch
approved these changes
Jul 5, 2018
AbelNavarro
approved these changes
Jul 5, 2018
Member
|
Ignoring hound and merging at the request of @ilausuch (there were two +1 already) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As the resource agent for rabbitmq with cluster HA restart the rabbitmq
service several times, the current check can fail to validate rabbitmq
status, as it could do the check just on one of those times that rabbit
is up while creating/joining the cluster. Then if the check passed and
continued the chef execution, the next steps could fail as they are
dependant on having a running rabbitmq, while the rabbitmq server may
still be restarting.
Instead expand the checks to first look for a rabbit master for the
resource and expand the check for a local runing rabbit to make sure
we are checking for the local copy.
Backport-of: #1396