Page MenuHomePhabricator

Surprising new svc.eqiad.wmnet dns entry deployed: similar-users on host decommission
Closed, ResolvedPublic

Description

While decommissioning a host (helium), the script proposed to deploy, apart from expected removal of helium DNS and reverse DNS entries:

diff --git a/svc.eqiad.wmnet b/svc.eqiad.wmnet                                                                                        
index 21d096e..125179f 100644                                                                                                         
--- a/svc.eqiad.wmnet                                                                                                                 
+++ b/svc.eqiad.wmnet                                                                                                                 
@@ -39,6 +39,7 @@ restbase                                 1H IN A 10.2.2.17                                                          
 schema                                   1H IN A 10.2.2.43                                                                           
 search                                   1H IN A 10.2.2.30                                                                           
 sessionstore                             1H IN A 10.2.2.29                                                                           
+similar-users                            1H IN A 10.2.2.57                                                                           
 termbox                                  1H IN A 10.2.2.46                                                                           
 thanos-query                             1H IN A 10.2.2.53                                                                           
 thanos-swift                             1H IN A 10.2.2.54                                                                           
METADATA: {"path": "/tmp/dns-c25pcHBldHM-u03_64pz", "sha1": "3e8406cc734cac758145453af4e969771f867924", "insertions": 2, "deletions": 6, "lines": 8, "files": 7}

Which was unexpected. Maybe a merged change that wasn't previously deployed?

Event Timeline

That's probably the netbox equivalent of https://gerrit.wikimedia.org/r/c/operations/dns/+/658976

The back story is that svc IP address haven't yet been fully migrated in netbox, pending T270071. So for now they are duplicated (and the one you saw it's actually not yet used). It's quite probable that the object was created in netbox but sre.dns.netbox wasn't run. Adding @Volans cause cumin :-)

Oh, interesting! So the change to the dns repo is still required in this case, just to confirm?

jcrespo renamed this task from Surprising new svc.eqiad.wmnet ip deployed: similar-users to Surprising new svc.eqiad.wmnet dns entry deployed: similar-users on host decommission.Jan 29 2021, 10:31 AM

If everything looks good now, maybe this can be converted into a feature-request (lower priority) to "check uncommited netbox changes to dns before decom"? I am not too familiar with the workflows, so apologies if that doesn't make a lot of sense. Probably a lot of the work on netbox/automation is in progress so it doesn't make sense at the moment, in that case please ignore and resolve.

Oh, interesting! So the change to the dns repo is still required in this case, just to confirm?

yes

If everything looks good now, maybe this can be converted into a feature-request (lower priority) to "check uncommited netbox changes to dns before decom"?

There's an icinga alert already. https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes, so what you suggest has already been partly implemented.

I am not too familiar with the workflows, so apologies if that doesn't make a lot of sense. Probably a lot of the work on netbox/automation is in progress so it doesn't make sense at the moment, in that case please ignore and resolve.

I think that if we wrap up T270071 we 'll be in a better position already. So let's see what we can do about that...

jcrespo assigned this task to hnowlan.

I don't think there is further actionables here except for @Volans to read this conversation and see if he believes there is any improvement/idea to wrap up T270071.

@jcrespo I'm aware of this conversation, I just didn't had anything to add as @akosiaris had already gave all the related details and explanation.