Page MenuHomePhabricator

Inconsistent lists of labs-ns* nameservers
Closed, ResolvedPublic

Description

We advertise labs-ns0.wikimedia.org and labs-ns1.wikimedia.org as nameservers for the wmflabs.org., 128-25.155.80.208.in-addr.arpa., and now 56.15.185.in-addr.arpa. zones when delegating to them. But when you actually query NS records for those zones from labs-ns*, two extra ones are revealed by Designate: labs-ns2.wikimedia.org and labs-ns3.wikimedia.org. This was brought up in #wikimedia-traffic:

<bblack> Krenair: re the right/wrongness of listing labs-ns[23] in the local zone data NS records: I don't know designate well, but it could be tacking that information onto lots of responses, which will pollute caches with it quickly.
<bblack> Krenair: but even if it doesn't include unecessary NS records with normal responses, if anyone does an explicit query for NS records through a recursive cache, it may learn them that way and try to use them.  Depends on the cache implementation.
<bblack> Krenair: so if ns[23] aren't meant to be used yet (testing and/or unreliable), they probably shouldn't be there JIC
<paravoid> Krenair: the disparity between what labs-ns think it's their NSes, and what .org and now RIPE think is a bug either way, so I'd open a task :)
<Krenair> andrewbogott, what's the deal with labs-ns0[23] ?
<Krenair> It seems they're not advertised but designate has them as nameservers for stuff
<andrewbogott> 2 and 3 are cloudservices1003 and 1004.  They're a designate cluster that gets notifications for VMs created in eqiad1-r (by listening to the 1-r rabbit)
<andrewbogott> So when we turn off eqiad they'll become the main cloud nameservers.  I might rename them to labs-ns0 and labs-ns1 when that happens.
<Krenair> okay
<Krenair> so at some point it's planned for them to disappear?
<andrewbogott> probably
<andrewbogott> but their state should be consistent with 0 and 1

(there's also talk about them becoming cloud-ns instead of labs-ns later)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krenair closed this task as Resolved.EditedApr 23 2019, 9:01 PM
Krenair assigned this task to Andrew.
Krenair added a subscriber: Andrew.

With the shutting down of labs-ns* looming, to make T221531: Update RIPE about changes in WMCS auth servers possible @Andrew made a change which cleaned this up in the process:

<XioNoX> getting a RIPE error:
<XioNoX>  Parent has nameserver(s) not listed at the child (cloud-ns0.wikimedia.org; cloud-ns1.wikimedia.org).
<XioNoX> None of the nameservers listed at the parent are listed at the child. 
<XioNoX> looking at what it mean exactly
<Krenair> Maybe it wants us to update our end first XioNoX ?
<Krenair> Right now pri.authdns.ripe.net serves this:
<Krenair> 56.15.185.in-addr.arpa. 172800 IN NS labs-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 172800 IN NS labs-ns1.wikimedia.org.
<Krenair> Whereas we serve:
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns1.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns2.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns3.wikimedia.org.
<Krenair> Maybe to get RIPE to add cloud-ns0 we have to add the cloud-ns0 on our end etc.?
<XioNoX> Krenair: possibly, can you do it now?
<Krenair> I can't but andrewbogott could probably
<andrewbogott> I'm not 100% sure I know what you mean but let me look...
<andrewbogott> um… ok, now I officially don't know how to do that.  Is it something in our dns repo?
<Krenair> no, this would be a setting in designate somewhere I think
<Krenair> it might be referred to as a pool
<andrewbogott> hm...
<andrewbogott> I can't find that set anywhere in designate, but I do need to update this pool anyway
<andrewbogott> so will do that and see what shakes out
<Krenair> At some point you presumably did it to add ns2 and ns3 andrewbogott 
<Krenair> possibly with novaadmin credentials, 'designate server-list', 'designate server-update', or the openstackclient equivalents?
<Krenair> andrewbogott, possibly something under https://docs.openstack.org/designate/pike/admin/designate-manage.html
<andrewbogott> yep, I upgraded that just now
<andrewbogott> although I failed to check if ptr records were working properly before the change :(
<Krenair> mitaka docs: https://docs.openstack.org/designate/mitaka/pools.html#designate-manage-pools-command-reference
<andrewbogott> Krenair: do things look any different on your end?
<andrewbogott> I changed the cloud-ns0 servers but the labs-ns0/ns1 servers don't know about it so if you have those cached you'll still see the old results
<Krenair> yes:
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns1.wikimedia.org.
<Krenair> that looks righght
<Krenair> right* excuse my keyboard
<andrewbogott> oh, great
<andrewbogott> ok, so now… XioNoX did that warning go away?
<Krenair> It also looks fine when I query labs-ns*
<andrewbogott> oh, that's right, they share a db
<andrewbogott> so I updated it everywhere
<XioNoX> Your object has been successfully modified

so now everything is nice and consistent