Skip to content

Retry XML downloads#189

Merged
allenh1 merged 4 commits intoros-infrastructure:masterfrom
lgsvl:retry_xml_download
May 17, 2019
Merged

Retry XML downloads#189
allenh1 merged 4 commits intoros-infrastructure:masterfrom
lgsvl:retry_xml_download

Conversation

@andre-rosa
Copy link
Contributor

@andre-rosa
Copy link
Contributor Author

@allenh1 << ptal

@andre-rosa andre-rosa force-pushed the retry_xml_download branch from e557abe to 2583959 Compare May 10, 2019 19:01
  - Adds a retry mechanism to the XML downloads. It makes generation more
    robust upon transient failures of the download which occurs more often
    than not. Normally downloads work fine just by retrying them.
  - Fixes ros-infrastructure#188 .
@andre-rosa andre-rosa force-pushed the retry_xml_download branch from 2583959 to 3944ddd Compare May 10, 2019 19:04
  - Always try callback at least once
  - Handle zero or negative retries
@andre-rosa andre-rosa force-pushed the retry_xml_download branch from 3944ddd to 2db2a17 Compare May 10, 2019 19:19
Copy link
Member

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, I would likely have approached it as a recursive call with the call through decrementing the retry count to the lower one.

Also thinking about the other places where we've put in this sort of logic it's actually been much more effective to also bulid in a slightly escalating sleep to the logic. So that an intermittent network link doesn't just fail the retry 5 times in one second. Waiting half a second, 1 second, 2, 4, 8 is relatively reasonable and significantly increases the likely hood of recovery. For larger retry counts you don't really want to continue the exponential as the overall timeout gets pretty large where you actually want it to fail not just hang forever.

Sorry i didn't think of this earlier I was going to approve but I think the backoff logic is actually important to avoiding the transient errors in #188

  - Add an intermediate sleep that scales by doubling while num_retry <= 6
    and then switches back to 125ms. Sleep pattern with default parameters:
    0, 125 ms, 250 ms, .5 sec, 1 sec, 2 sec, 4 sec, 8 sec, 125 ms, 125 ms ...
  - Adopt recursive tail call
@andre-rosa andre-rosa force-pushed the retry_xml_download branch from 86a651d to a4df608 Compare May 16, 2019 00:32
@andre-rosa
Copy link
Contributor Author

@tfoote << ptal

@allenh1 allenh1 merged commit be22184 into ros-infrastructure:master May 17, 2019
@andre-rosa andre-rosa deleted the retry_xml_download branch May 17, 2019 04:54
zffgithub pushed a commit to zffgithub/superflore that referenced this pull request Apr 11, 2023
* Retry XML downloads

  - Adds a retry mechanism to the XML downloads. It makes generation more
    robust upon transient failures of the download which occurs more often
    than not. Normally downloads work fine just by retrying them.
  - Fixes ros-infrastructure#188 .

* Improve retry semantics

  - Always try callback at least once
  - Handle zero or negative retries

* Add intermediate sleeps between retries

  - Add an intermediate sleep that scales by doubling while num_retry <= 6
    and then switches back to 125ms. Sleep pattern with default parameters:
    0, 125 ms, 250 ms, .5 sec, 1 sec, 2 sec, 4 sec, 8 sec, 125 ms, 125 ms ...
  - Adopt recursive tail call

* Use format instead of f-strings in order to support python < 3.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sometimes a transient XML download failure aborts generation

3 participants