Page MenuHomePhabricator

switchdc cookbook says confusing "failed to check $name" when switching services
Closed, ResolvedPublic

Description

I don't have the exact message anymore, but the switchdc.services cookbook says something like failed to check $name when switching services, which is confusing because it suggests something has failed even though it's expected.

Something like "$name hasn't updated yet" would be better.

Event Timeline

Legoktm created this task.

I think we need the exact message to decide what to do... you should fine them in the logs. Was it by any chance this one?

[ERROR] Expected IP '10.2.1.11', got '10.2.2.11' for record apertium
[WARNING] Failed to call 'spicerack.dnsdisc.check_record' [1/10, retrying in 3.00s]: Failed to check record apertium

It shouldn't be that one -- we fixed it in https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/638753 and got rid of that first Failed to call part of the message. From checking the logs on cumin1001, I do see that log entry but only dated from the previous switchover in 2020. (Which is a relief -- I took a while trying to figure out how we were still getting that message after we deleted it!)

In the logs from this switchover, I see:

2021-06-28 14:36:46,819 jayme 28351 [ERROR] Expected IP '10.2.1.13', got '10.2.2.13' for record kartotherian
2021-06-28 14:36:46,819 jayme 28351 [WARNING] [1/10, retrying in 3.00s] Waiting for DNS record update...: Failed to check record kartotherian

So, we improved the first part but not the second.

That Failed to check portion after the colon comes from the implementation of Discovery.check_record in Spicerack's dnsdisc.py, where we just say

raise DiscoveryError('Failed to check record {name}'.format(name=name))

if the record doesn't match the expected result. Straightforward to fix, I'll send a patch.

Change 703879 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message

https://gerrit.wikimedia.org/r/703879

Change 703879 merged by jenkins-bot:

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message

https://gerrit.wikimedia.org/r/703879