Page MenuHomePhabricator

switchdc cookbook should perform exponential backoff when checking DNS TTL
Open, LowPublic

Description

Noticed during the June 2021 switchover.

In step 00 when lowering DNS TTLs, the cookbook output stuff like:

[1/10, retrying in 3.00s] Waiting for DNS TTL update...: Expected TTL '10', got '300' for record 10.2.2.22
...

On IRC, @Joe said that it should perform an exponential backoff, not 3s.

Event Timeline

I took a look at the logs and in all runs it took 3 or at most 4 tries (2 or 3 retries) to find the updates, doesn't seem very noisy to me. So between 9 and 12 seconds of sleep after the change.
What exponential delay would you propose? FYI there are also the linear and power backoffs available, see the spicerack.decorators.retry documentation for more details.

For example using 3 seconds exponential would most likely risk to waste useless time because it might not get the update in time with the first 2 retries (3s, 9s sleeps) and then wait for an additional 27s.

We're currently not using check_ttl() during the RO period but this doesn't mean that that functionality would not be used in the future in some more time-critical path. So I'd wait that into the decision to avoid one or two lines of logs.
Surely I would not change the settings for check_record() that is used during the RO time.

If changing anything at all I'd rather wait few seconds before calling the check method in the cookbook, that is aware of how critical that particular step is, so that it will be less noisy.