Skip to content

nova: Don't retry creating existing flavors#2142

Merged
nicolasbock merged 1 commit into
crowbar:masterfrom
skazi0:nova-flavor-existing
May 30, 2019
Merged

nova: Don't retry creating existing flavors#2142
nicolasbock merged 1 commit into
crowbar:masterfrom
skazi0:nova-flavor-existing

Conversation

@skazi0

@skazi0 skazi0 commented May 29, 2019

Copy link
Copy Markdown
Member

In some cases the flavor create call succeeds but client still returns
non-zero status. Retries of the create call fail with "Flavor already
exists" and the retry loop never succeeds. Added check is executed in
every loop turn and will stop reytring if the flavor already exists.

Example scenario where flavor might be correctly created but client
doesn't return zero is when one of HA nodes executes flavor create
commands while others perform delayed restart of nova API after config
files are modified. If the "create" request hits the API just before
restart it could be accepted but the client might not get the correct
response back.

In some cases the flavor create call succeeds but client still returns
non-zero status. Retries of the create call fail with "Flavor already
exists" and the retry loop never succeeds. Added check is executed in
every loop turn and will stop reytring if the flavor already exists.

Example scenario where flavor might be correctly created but client
doesn't return zero is when one of HA nodes executes flavor create
commands while others perform delayed restart of nova API after config
files are modified. If the "create" request hits the API just before
restart it could be accepted but the client might not get the correct
response back.

@nicolasbock nicolasbock left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change breaks anything, but I am struggling to see when the new code would trigger.

The flavors are created on the cluster founder, i.e. this code path is serial. First it runs a list on existing flavors and only if the flavor to be created is not on that list does it continue. Then it issues the flavor create command.

I don't quiet see why the cluster founder would get an incorrect list of existing flavors.

flavor_create.command command
flavor_create.retries 5
# don't retry after "Flavor with ID ... already exists"
flavor_create.not_if "#{openstack} flavor show #{id}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need ruby_block "Get current flavors" do if we do the check here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not need it but it checks the flavors in one API call and saves us some unneeded chef resources... not sure what is faster.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kind of needed for the second+ run of the nova cookbook. On second run, the list will be gathered with all the existing flavors and we will not create any flavor creation resources for those, thus skipping not only the resource creation+execution but also skipping that not_if call.

@skazi0

skazi0 commented May 29, 2019

Copy link
Copy Markdown
Member Author

@nicolasbock I'm also not 100% what happens but here's a log snippet (mixed):

c1:[2019-05-29T07:18:30+00:00] INFO: Processing execute[Create flavor m1.large] action run (dynamically defined)

c2:2019-05-29 07:18:30.785 5236 INFO nova.osapi_compute.wsgi.server [req-2b71c56e-4253-4d50-b7ec-60ae3d6eee1a] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 200 len: 884 time: 0.8050320

c2:[2019-05-29T07:18:31+00:00] INFO: template[/etc/nova/nova.conf.d/101-nova-placement.conf] sending restart action to service[nova-api] (delayed)
c2:[2019-05-29T07:18:31+00:00] INFO: Processing service[nova-api] action restart (nova::api line 27)

c2:2019-05-29 07:18:31.066 5241 INFO nova.wsgi [-] Stopping WSGI server.

c1:2019-05-29 07:18:33.724 13711 INFO nova.osapi_compute.wsgi.server [req-3be5575f-3912-4b19-9bea-386ab1cbbe4c] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 200 len: 883 time: 0.2285290

c1:[2019-05-29T07:19:31+00:00] INFO: Retrying execution of execute[Create flavor m1.large], 4 attempt(s) left

c3:2019-05-29 07:19:47.118   563 INFO nova.api.openstack.wsgi [req-1662b243-b99e-4960-8915-5b857cc311c3] HTTP exception thrown: Flavor with ID 4 already exists.
c3:2019-05-29 07:19:47.121   563 INFO nova.osapi_compute.wsgi.server [req-1662b243-b99e-4960-8915-5b857cc311c3] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 409 len: 498 time: 5.8394370

c1:[2019-05-29T07:19:47+00:00] INFO: Retrying execution of execute[Create flavor m1.large], 3 attempt(s) left

c1:2019-05-29 07:20:01.829 13710 INFO nova.api.openstack.wsgi [req-f49b3490-3e29-4bd8-8550-df014e57cca3] HTTP exception thrown: Flavor with ID 4 already exists.
c1:2019-05-29 07:20:01.832 13710 INFO nova.osapi_compute.wsgi.server [req-f49b3490-3e29-4bd8-8550-df014e57cca3] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 409 len: 498 time: 5.1512830

c1:[2019-05-29T07:20:01+00:00] INFO: Retrying execution of execute[Create flavor m1.large], 2 attempt(s) left

c3:2019-05-29 07:20:08.845   563 INFO nova.api.openstack.wsgi [req-1c28b909-0369-4cde-8887-606fe943af2d] HTTP exception thrown: Flavor with ID 4 already exists.
c3:2019-05-29 07:20:08.848   563 INFO nova.osapi_compute.wsgi.server [req-1c28b909-0369-4cde-8887-606fe943af2d] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 409 len: 498 time: 2.7859600

c1:[2019-05-29T07:20:08+00:00] INFO: Retrying execution of execute[Create flavor m1.large], 1 attempt(s) left

c1:2019-05-29 07:20:18.107 13711 INFO nova.api.openstack.wsgi [req-5cd8c1ee-ea19-4205-acd4-5875bf33bc8d] HTTP exception thrown: Flavor with ID 4 already exists.
c1:2019-05-29 07:20:18.109 13711 INFO nova.osapi_compute.wsgi.server [req-5cd8c1ee-ea19-4205-acd4-5875bf33bc8d] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 409 len: 498 time: 2.2201080

c1:[2019-05-29T07:20:18+00:00] INFO: Retrying execution of execute[Create flavor m1.large], 0 attempt(s) left

c2:2019-05-29 07:20:26.289 10844 INFO nova.api.openstack.wsgi [req-dfd93eb3-d108-49fe-a433-21894b5aa845] HTTP exception thrown: Flavor with ID 4 already exists.
c2:2019-05-29 07:20:26.292 10844 INFO nova.osapi_compute.wsgi.server [req-dfd93eb3-d108-49fe-a433-21894b5aa845] 192.168.124.86 "POST /v2.1/9086de40d41e417e942f5b2996606149/flavors HTTP/1.1" status: 409 len: 498 time: 3.9004841
  • founder gets empty list of existing flavors and decides to create all of them (outside of the log scope)
  • after creating first 3 it tries to create 4th
  • nova API call gets directed to c2 node and hits a restart caused by 101-nova-placement.conf update
  • note that the call to c2 returns 200 but there's also another log entry on c1 which also says 200 so some internal retry in novaclient might have been used
  • usual log pattern for flavor creation is this:
2019-05-29 11:52:21.522 18716 INFO nova.osapi_compute.wsgi.server [req-7dc4fc10-4413-450a-b174-225aa6c77f01] 10.164.233.2 "POST /v2.1/aaf67342dad24fe5893ed1663417df91/flavors HTTP/1.1" status: 200 len: 921 time: 0.5697560
2019-05-29 11:52:21.670 18715 INFO nova.osapi_compute.wsgi.server [req-6b282719-68c6-42d8-9367-9a89879c37ba] 10.164.233.2 "GET /v2.1/aaf67342dad24fe5893ed1663417df91/flavors/f316cec0-5079-4d69-b4d2-e3d9b3e961ff/os-extra_specs HTTP/1.1" status: 200 len: 412 time: 0.1365621

in failing logs there's no GET following the POST(s) so maybe the client call which creates this flavor is not failing because of the create but because it can't get the flavor info to be displayed after it's created... just a guess.

@Itxaka Itxaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good addition.

cluster sync of resource management is hard :P

@skazi0

skazi0 commented May 30, 2019

Copy link
Copy Markdown
Member Author

My tests confirmed that openstack flavor create can successfully create the flavor and return non-zero status if nova API is restarted between POST (create) and GET (fetch details to display).

Example error from that point:

SSL exception connecting to https://<%HOSTREMOVED%>:8774/v2.1/092a27eefd924e95906858b685751dae/flavors/688/os-extra_specs: HTTPSConnectionPool(host='<%HOSTREMOVED%>', port=8774): Max retries exceeded with url: /v2.1/092a27eefd924e95906858b685751dae/flavors/688/os-extra_specs (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants