loop with DNS lookup when trying for SwitchLocation#7779
Conversation
|
This might be a useful step but yeah I don't think it will fix the case of a cold boot of the rack while one Scrimlet is down. In that case, as I understand things, the control plane will not come up because of this, and repeating the DNS lookup won't help because DNS will still be reporting the same thing. |
bnaecker
left a comment
There was a problem hiding this comment.
Looks OK to me, one suggestion about backoff but not blocking.
| } | ||
| Err(e) => { | ||
| warn!(log, "Failed to map switch zone addr: {e}, retrying"); | ||
| tokio::time::sleep(std::time::Duration::from_secs(2)).await; |
There was a problem hiding this comment.
We may want to use a backoff policy here, something like the "local service policy".
There was a problem hiding this comment.
The 2 seconds is what the loop had before, so I'm not making it worse then I found it :)
I agree a backoff policy here would be better, but I could not quite figure out how to detangle
the two actions in this loop to fit into the retry setup and I want this change landed before 14
if possible.
| log, | ||
| "Failed to map switch zone addr: {e}" | ||
| ); | ||
| tokio::time::sleep( |
There was a problem hiding this comment.
Same note here, may want to back off more carefully.
| log, | ||
| "Failed to map switch zone addr: {e}" | ||
| ); | ||
| tokio::time::sleep( |
Currently, callers of
map_switch_zone_addrs()first get the IP forServiceName::Dendritefrom DNS, then loop (forever) trying to translate that IP into aSwitchLocation. Under normal conditions, this is fine. However, if a sled has been expunged, or a new sled is being added, it's possible that what is returned in:Will change. If that changes happens after we start looping in
map_switch_zone_addrs(), then the loop will go on forever looking for something that is no longer correct.To fix this we put the
lookup_all_ipv6into the loop by using the functionswitch_zone_address_mappings()instead.switch_zone_address_mappings()'s loop includes the call to lookup addresses in DNS will callmap_switch_zone_addrs(). This allows us to include the DNS lookup inside the loop.Most places where we called
map_switch_zone_addrs()were also using the samelookup_all_ipv6()call, so transitioning them to callswitch_zone_address_mappings()will just drop right in.A fix for #7739