switchdc cookbook says confusing "failed to check $name" when switching services
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Legoktm
	Jun 28 2021, 5:28 PM

Description

I don't have the exact message anymore, but the switchdc.services cookbook says something like failed to check $name when switching services, which is confusing because it suggests something has failed even though it's expected.

Something like "$name hasn't updated yet" would be better.

Details

	Subject	Repo	Branch	Lines +/-
	dnsdisc: Improve "failed to check record" error message	operations/software/spicerack	master	+2 -2

Customize query in gerrit

Event Timeline

Legoktm triaged this task as Low priority.Jun 28 2021, 5:28 PM

Legoktm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 28 2021, 5:28 PM

I think we need the exact message to decide what to do... you should fine them in the logs. Was it by any chance this one?

[ERROR] Expected IP '10.2.1.11', got '10.2.2.11' for record apertium
[WARNING] Failed to call 'spicerack.dnsdisc.check_record' [1/10, retrying in 3.00s]: Failed to check record apertium

Yep, I think that was it.

It shouldn't be that one -- we fixed it in https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/638753 and got rid of that first Failed to call part of the message. From checking the logs on cumin1001, I do see that log entry but only dated from the previous switchover in 2020. (Which is a relief -- I took a while trying to figure out how we were still getting that message after we deleted it!)

In the logs from this switchover, I see:

2021-06-28 14:36:46,819 jayme 28351 [ERROR] Expected IP '10.2.1.13', got '10.2.2.13' for record kartotherian
2021-06-28 14:36:46,819 jayme 28351 [WARNING] [1/10, retrying in 3.00s] Waiting for DNS record update...: Failed to check record kartotherian

So, we improved the first part but not the second.

That Failed to check portion after the colon comes from the implementation of Discovery.check_record in Spicerack's dnsdisc.py, where we just say

raise DiscoveryError('Failed to check record {name}'.format(name=name))

if the record doesn't match the expected result. Straightforward to fix, I'll send a patch.

Change 703879 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message

https://gerrit.wikimedia.org/r/703879

gerritbot added a project: Patch-For-Review.Jul 9 2021, 11:46 PM

Change 703879 merged by jenkins-bot:

[operations/software/spicerack@master] dnsdisc: Improve "failed to check record" error message

https://gerrit.wikimedia.org/r/703879

RLazarus closed this task as Resolved.Jul 12 2021, 2:10 PM

Maintenance_bot removed a project: Patch-For-Review.Jul 12 2021, 2:10 PM

switchdc cookbook says confusing "failed to check $name" when switching servicesClosed, ResolvedPublicActions

Description

Details

Event Timeline

switchdc cookbook says confusing "failed to check $name" when switching services
Closed, ResolvedPublic
Actions