Page MenuHomePhabricator

switchdc cookbook says confusing "failed to check $name" when switching services
Closed, ResolvedPublic


I don't have the exact message anymore, but the cookbook says something like failed to check $name when switching services, which is confusing because it suggests something has failed even though it's expected.

Something like "$name hasn't updated yet" would be better.

Event Timeline

Legoktm created this task.

I think we need the exact message to decide what to do... you should fine them in the logs. Was it by any chance this one?

[ERROR] Expected IP '', got '' for record apertium
[WARNING] Failed to call 'spicerack.dnsdisc.check_record' [1/10, retrying in 3.00s]: Failed to check record apertium

It shouldn't be that one -- we fixed it in and got rid of that first Failed to call part of the message. From checking the logs on cumin1001, I do see that log entry but only dated from the previous switchover in 2020. (Which is a relief -- I took a while trying to figure out how we were still getting that message after we deleted it!)

In the logs from this switchover, I see:

2021-06-28 14:36:46,819 jayme 28351 [ERROR] Expected IP '', got '' for record kartotherian
2021-06-28 14:36:46,819 jayme 28351 [WARNING] [1/10, retrying in 3.00s] Waiting for DNS record update...: Failed to check record kartotherian

So, we improved the first part but not the second.

That Failed to check portion after the colon comes from the implementation of Discovery.check_record in Spicerack's, where we just say

raise DiscoveryError('Failed to check record {name}'.format(name=name))

if the record doesn't match the expected result. Straightforward to fix, I'll send a patch.

Change 703879 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message

Change 703879 merged by jenkins-bot:

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message