nova: Don't retry creating existing flavors#2142
Conversation
In some cases the flavor create call succeeds but client still returns non-zero status. Retries of the create call fail with "Flavor already exists" and the retry loop never succeeds. Added check is executed in every loop turn and will stop reytring if the flavor already exists. Example scenario where flavor might be correctly created but client doesn't return zero is when one of HA nodes executes flavor create commands while others perform delayed restart of nova API after config files are modified. If the "create" request hits the API just before restart it could be accepted but the client might not get the correct response back.
nicolasbock
left a comment
There was a problem hiding this comment.
I don't think this change breaks anything, but I am struggling to see when the new code would trigger.
The flavors are created on the cluster founder, i.e. this code path is serial. First it runs a list on existing flavors and only if the flavor to be created is not on that list does it continue. Then it issues the flavor create command.
I don't quiet see why the cluster founder would get an incorrect list of existing flavors.
| flavor_create.command command | ||
| flavor_create.retries 5 | ||
| # don't retry after "Flavor with ID ... already exists" | ||
| flavor_create.not_if "#{openstack} flavor show #{id}" |
There was a problem hiding this comment.
Do we still need ruby_block "Get current flavors" do if we do the check here?
There was a problem hiding this comment.
We might not need it but it checks the flavors in one API call and saves us some unneeded chef resources... not sure what is faster.
There was a problem hiding this comment.
We kind of needed for the second+ run of the nova cookbook. On second run, the list will be gathered with all the existing flavors and we will not create any flavor creation resources for those, thus skipping not only the resource creation+execution but also skipping that not_if call.
|
@nicolasbock I'm also not 100% what happens but here's a log snippet (mixed):
in failing logs there's no GET following the POST(s) so maybe the client call which creates this flavor is not failing because of the create but because it can't get the flavor info to be displayed after it's created... just a guess. |
Itxaka
left a comment
There was a problem hiding this comment.
Seems like a good addition.
cluster sync of resource management is hard :P
|
My tests confirmed that Example error from that point: |
In some cases the flavor create call succeeds but client still returns
non-zero status. Retries of the create call fail with "Flavor already
exists" and the retry loop never succeeds. Added check is executed in
every loop turn and will stop reytring if the flavor already exists.
Example scenario where flavor might be correctly created but client
doesn't return zero is when one of HA nodes executes flavor create
commands while others perform delayed restart of nova API after config
files are modified. If the "create" request hits the API just before
restart it could be accepted but the client might not get the correct
response back.