Skip to content

[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA#1697

Merged
vuntz merged 2 commits into
crowbar:stable/4.0from
Itxaka:backport_rabbit_check
Jul 16, 2018
Merged

[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA#1697
vuntz merged 2 commits into
crowbar:stable/4.0from
Itxaka:backport_rabbit_check

Conversation

@Itxaka

@Itxaka Itxaka commented Jul 3, 2018

Copy link
Copy Markdown
Member

As the resource agent for rabbitmq with cluster HA restart the rabbitmq
service several times, the current check can fail to validate rabbitmq
status, as it could do the check just on one of those times that rabbit
is up while creating/joining the cluster. Then if the check passed and
continued the chef execution, the next steps could fail as they are
dependant on having a running rabbitmq, while the rabbitmq server may
still be restarting.

Instead expand the checks to first look for a rabbit master for the
resource and expand the check for a local runing rabbit to make sure
we are checking for the local copy.

Backport-of: #1396

As the resource agent for rabbitmq with cluster HA restart the rabbitmq
service several times, the current check can fail to validate rabbitmq
status, as it could do the check just on one of those times that rabbit
is up while creating/joining the cluster. Then if the check passed and
continued the chef execution, the next steps could fail as they are
dependant on having a running rabbitmq, while the rabbitmq server may
still be restarting.

Instead expand the checks to first look for a rabbit master for the
resource and expand the check for a local runing rabbit to make sure
we are checking for the local copy. Also add an extra check after
the crm checks to make sure there are no pending operations for the
resource so we can try to avoid continuing if there is a promotion
going on.

(cherry picked from commit 3060a3e)

# wait for service to have a master, and to be active
ruby_block "wait for #{ms_name} to be started" do
block do

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/BlockLength: Block has too many lines. [42/40]

end

# wait for service to have a master, and to be active
ruby_block "wait for #{ms_name} to be started" do

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/BlockLength: Block has too many lines. [45/40]

@Itxaka

Itxaka commented Jul 3, 2018

Copy link
Copy Markdown
Member Author

Do not merge until we know if #1637 needs backport or not (as its included here as part of the backport, if its not needed we should remove it from this PR)

jsuchome
jsuchome previously approved these changes Jul 4, 2018
As the other checks are not enough, as pacemaker keeps restarting
rabbitmq, we need a more robust way of checking that rabbit has entered
an stable situation.

So check that rabbit is up 5 times in a row with a delay of 2 seconds
between checks to make sure pacemaker has left it alone.

Also, only trigger that check for rabbit if the pacemaker_transaction is
updated, otherwise there is no need to do so

(cherry picked from commit 8b56894)
end

# wait for service to have a master, and to be active
ruby_block "wait for #{ms_name} to be started" do

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/BlockLength: Block has too many lines. [42/40]

@vuntz

vuntz commented Jul 16, 2018

Copy link
Copy Markdown
Member

Ignoring hound and merging at the request of @ilausuch (there were two +1 already)

@vuntz vuntz merged commit b1376ac into crowbar:stable/4.0 Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants