Page MenuHomePhabricator

Offer AuthDNS service over IPv6
Open, LowPublic

Assigned To
None
Authored By
Aklapper
Oct 22 2012, 6:59 PM
Referenced Files
F71670832: image.png
Wed, Feb 4, 12:32 PM
Restricted File
Mon, Jan 19, 11:32 AM
F71541094: resolver_capture.pcap
Fri, Jan 16, 8:21 PM

Description

We've never yet offered IPv6-native authdns, for various historical reasons of variable validity.

I think at this point many of the blockers are behind us: IPv6 on the Internet is considerably more-mature now, an increasing percentage of client traffic is really IPv6, our GeoIP databases for IPv6 seem to be of reasonable quality (and we're also using them to route clients anyways, in cases where IPv4 recursors send us IPv6 edns-client-subnet), etc.

It's still not a quick and easy step and not without risk, but it's within reasonable reach.

We're also working on other AuthDNS improvements concurrently though, and I think it makes sense to get through some of those other transitions first. Chiefly, I think we should transition to our Anycasted IPv4 model first ( T98006 ), and then look at adding IPv6 addresses as anycast as well, after that. It just makes for less churn/noise in changes to our upstream NS sets with registrars (we have hundreds of domains to affect), and fewer concurrent experiments in this space.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Status changed from 'open' to 'stalled' by faidon

faidon lowered the priority of this task from Medium to Low.Dec 18 2014, 5:27 PM
faidon updated the task description. (Show Details)
faidon changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".
faidon changed the edit policy from "WMF-NDA (Project)" to "All Users".
faidon set Security to None.
faidon added subscribers: Unknown Object (MLST), Jasper, faidon and 3 others.

GeoIP2 has city resolution for IPv6 now (still of unknown quality). @BBlack has coded support for it in gdnsd that will land in the next release of gdnsd. I've done some work towards integrating the new libraries & geoip code into our infrastructure (packages & puppet code). This is finally, slowly, progressing!

We've now switched to both gdnsd 2.2.0 and GeoIP2, which comes with non-lite City IPv6 support. Next steps are evaluating somehow whether that support is sufficient and then, if so, going ahead and adding AAAAs to our zones and upstream glue records.

What's out evaluation plan here? Do we want to stall on proper IPv6 for in our VCL geoip lookup service first and do comparisons on that data? Or do some kind of direct survey of the two datasets? Or ask MaxMind how they think the relative quality fares?

Even if the V6 data is comparably-good for the V6 internet, we potentially face the additional issue that V6 DNS lookups may route differently than matching V4 user traffic. The scenario would be something like this:

  1. The real client is V4-only (perhaps because their DSL router/modem combo is V4 only because it's an older model). They use the default DNS servers from their ISP (over V4 for client->cache).
  2. Their ISP supports V6 to some degree, and will preferentially send lookups over IPv6 to us from their caches.
  3. Their ISP doesn't support edns-client-subnet (only about 1/3 of our requests have it, so it's not yet common).
  4. Their ISP has significantly different routing to us over IPv6 than IPv4: perhaps they tunnel all their global IPv6 traffic through an exit point in Los Angeles and all their V6 is marked there in MaxMind, but the user is in NYC and v4 would route locally there. This causes their DNS cache request over IPv6 to choose ulsfo for this east-coast user, whereas without authdns AAAA we would've picked the more-appropriate eqiad for them.

I'm honestly not worried all that much about tunnels anymore. In my experience, they're very rare nowadays and especially in this cross-country fashion (Google's 6to4 & Teredo statistics seem to concur).

I don't have any great ideas on how to compare the MaxMind data. Last time I looked up a bunch of RIPE Atlas nodes, since RIPE lists both the IPv4/IPv6 address for each, and found quite a few differences, most of which were of the limited accuracy type (e.g. correctly locating the country but not the city). That said, the Atlas dataset isn't especially great, as it contains a lot of probes located within datacenters and weird address spaces — not exactly an unbiased end-user sample. Perhaps a good approximation of a nameserver address sample, although it's hard to know for sure.

How do you envision testing this with the VCL GeoIP service? I think we have the same kind of concerns for that one too, unless you have thought of a good idea to test both? I suppose we could create a more controlled (and convoluted) experiment where we asynchronously load resources over separate IPv4-only and IPv6-only hostnames with a unique token for both, to check for parity… but still, we wouldn't know which of the two is the right one.

Perhaps we should just try it and look at our performance metrics from a 10.000ft view (page load time etc.)? Thoughts? Any other clever ideas?

For the VCL stuff, what I meant is that for IPv6 user traffic, we could compare the runtime lookup we do for Set-Cookie on the IPv6 address to the one done via IPv4-only geoiplookup-lb. Would require some JS support to tie the two together.

Moving forward and checking perf metrics after is an option, too. But unless the change is quite dramatic it will be hard to see it. Rolling forward and back on that wouldn't be quick with registrar involvement. On top of that there's a fairly long TTL smear before everyone switches to IPv6 queries that can. Then during all of this there will of course be continuing deployments on various levels that affect perf as well.

[edit: removed wrong stuff about glue]

BBlack renamed this task from No IPv6 addresses on Wikimedia nameservers ns(0-2).wikimedia.org to Offer AuthDNS service over IPv6.Dec 6 2019, 2:20 PM
BBlack removed faidon as the assignee of this task.
BBlack updated the task description. (Show Details)

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

What's the status on this fix? Currently, an IPv6-only client using an IPv6-only DNS resolver will fail to reach wikimedia services. If their DNS resolver is capable of using a NAT64 translator, that might be a reasonable workaround (as suggested by https://datatracker.ietf.org/doc/draft-momoka-v6ops-ipv6-only-resolver/), but a workaround shouldn't be necessary for foundational websites like wikipedia.

ssingh added subscribers: cmooney, ssingh.

In discussion with @cmooney, we will be revisiting this task again when Traffic does some other authdns-related work, so removing it from the Traffic-Icebox.

I note that a current draft in the IETF DNSOPS Working Group, aimed to replace RFC3901, draft-momoka-dnsop-3901bis-03 states:

Every authoritative DNS zone SHOULD be served by at least one IPv6-reachable authoritative name server to maintain name space continuity. The delegation configuration (Resolution of the parent, resolution of out-of-bailiwick names, GLUE) MUST not rely on IPv4 connectivity being available.

Hi, Is there any update on this? I see some *.wikimedia.org have ipv6 addresses which is useless if the host is ipv6 only. Since they can't query the dns as it's ipv4 only.

Yes, thanks for the ping @Paladox. We should most certainly pick this up again. @BBlack: any fresh 2026 thoughts? You listed some concerns above but some of them don't apply anymore -- should we do a quick sync here to understand if you have any lingering concerns?

@cmooney and I can take care of the deployment (there was a time when our v6 bird unicast setup wasn't tested, but that's also not true anymore so that's not a blocker).

I should mention that ns[01] v6 will be unicast, like v4, and ns2 will be anycast v6, just like the v4 one. But these are minor operational details, the real question is if we are ready to do this from an authdns point of view.

We have to take this plunge someday, and that someday probably should've been years ago, just too many other pressing things to focus on for anyone to remember to come back here and look! A few notes:

  • Obviously, it would make sense to take this on in phases: one nsX at a time, space it out by long enough (a week or two) to validate all the things and have time for any non-obvious failure reports to percolate up to us.
  • Currently, our upstream (in the DNS hierarchy) NS record TTLs are all 3600. This actually seems quite a bit shorter than it should be, to me. I don't recall the history on this, but I would've expected something more like 86400, which is what we set in our local zonefiles (the local records in our zonefiles matter far less to resolvers than the upstream ones in the .org servers, but it all still matters!).
  • My bikeshed on the TTL issues would be: I would drop our local TTLs on these to 3600 like upstream ahead of these changes, just because it makes rollback a bit less slow (assuming the registrar has fast processes), and keep A and AAAA having matching TTLs at all times, and bump it all (.org via the registrar, and our zonefiles) to 86400 after we're done with the IPv6 change.

[In fact, on that point, I'd note a quick survey of a handful of other major sites on the Internet shows a common pattern of 2 days for the NS records and 2-4 days on the matching address records. In any case, getting us to 1 day everywhere consistently would be a good start!]

We have to take this plunge someday, and that someday probably should've been years ago, just too many other pressing things to focus on for anyone to remember to come back here and look! A few notes:

  • Obviously, it would make sense to take this on in phases: one nsX at a time, space it out by long enough (a week or two) to validate all the things and have time for any non-obvious failure reports to percolate up to us.

Makes sense -- https://w.wiki/HSCH indicates per wmf_netflow that the anycast ns2 gets the most traffic, followed by ns0 and ns1. So we can pick ns1 in codfw for a safe, test rollout.

  • Currently, our upstream (in the DNS hierarchy) NS record TTLs are all 3600. This actually seems quite a bit shorter than it should be, to me. I don't recall the history on this, but I would've expected something more like 86400, which is what we set in our local zonefiles (the local records in our zonefiles matter far less to resolvers than the upstream ones in the .org servers, but it all still matters!).

Yeah that's certainly interesting; I wasn't aware of that so we can patch it up and fix it. Looking at Markmonitor, I am assuming this is somewhat of a default on their end because we are not setting one explicitly and so we have just carried this over?

  • My bikeshed on the TTL issues would be: I would drop our local TTLs on these to 3600 like upstream ahead of these changes, just because it makes rollback a bit less slow (assuming the registrar has fast processes), and keep A and AAAA having matching TTLs at all times, and bump it all (.org via the registrar, and our zonefiles) to 86400 after we're done with the IPv6 change.

Sounds good, thanks. While we do that, let us know if you think of something else before we roll this out :)

Our glue records also have a disparity.

dig wikimedia.org NS +trace +additional
ns2.wikimedia.org.	3600	IN	A	198.35.27.27
ns1.wikimedia.org.	3600	IN	A	208.80.153.231
ns0.wikimedia.org.	3600	IN	A	208.80.154.238

whereas, we set them to:

ns0         1D  IN A    208.80.154.238
ns1         1D  IN A    208.80.153.231
ns2         1D  IN A    198.35.27.27 ; anycasted authdns

Change #1226904 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] wikimedia/wikipedia.org: match TTLs for NS and glue records

https://gerrit.wikimedia.org/r/1226904

@cmooney: Any picks for your favourite v6 address for ns1? I was thinking of allocating 2620:0:860:ed1a::4/128 under LVS service IPs 2620:0:860:ed1a::/64, since unfortunately that is where have put the v4s, but I also don't want to carry that mistake forward, so deferring to you on that.

Change #1226928 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] dnsbox: codfw: advertise ns1 IPv6

https://gerrit.wikimedia.org/r/1226928

taavi changed the task status from Stalled to Open.Wed, Jan 14, 7:20 PM

@cmooney: Any picks for your favourite v6 address for ns1? I was thinking of allocating 2620:0:860:ed1a::4/128 under LVS service IPs 2620:0:860:ed1a::/64, since unfortunately that is where have put the v4s, but I also don't want to carry that mistake forward, so deferring to you on that.

Yeah it's not behind the LVS so I think somewhat confusing to allocate it an address from that range.

I'd be tempted to assign 2620:0:860:53::/64 from the codfw public subnets /56 for this. And then use 2620:0:860:53::/128 as the ns1 address?

We could possibly follow the same approach for ns0? Or is there an existing plan for eqiad?

Our glue records also have a disparity.

I was interested to know what effect this would have. One data-point for Bind (at least my local instance) is that it is caching the TTL that comes back from our servers, not the glue records in the ORG zone:

cathal@officepc:~$ dig A ns0.wikimedia.org @192.168.240.1

; <<>> DiG 9.20.15-1~deb13u1-Debian <<>> A ns0.wikimedia.org @192.168.240.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55864
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1400
; COOKIE: 6d3761bd9ecfb45301000000696a50bef494110bbf7eb76b (good)
;; QUESTION SECTION:
;ns0.wikimedia.org.		IN	A

;; ANSWER SECTION:
ns0.wikimedia.org.	30297	IN	A	208.80.154.238

@cmooney: Any picks for your favourite v6 address for ns1? I was thinking of allocating 2620:0:860:ed1a::4/128 under LVS service IPs 2620:0:860:ed1a::/64, since unfortunately that is where have put the v4s, but I also don't want to carry that mistake forward, so deferring to you on that.

Yeah it's not behind the LVS so I think somewhat confusing to allocate it an address from that range.

I'd be tempted to assign 2620:0:860:53::/64 from the codfw public subnets /56 for this. And then use 2620:0:860:53::/128 as the ns1 address?

We could possibly follow the same approach for ns0? Or is there an existing plan for eqiad?

Thanks! The plan is to the same for eqiad and then find a range for the anycast of ns2. Any thoughts on the last one (anycast range)? We will do that at the very end given ns2 sees the most traffic but checking.

Thanks! The plan is to the same for eqiad

Ok I've reserved those two ranges/IPs in Netbox now.

Any thoughts on the last one (anycast range)?

We'll want a new /48 separate prefix to announce into the DFZ for it I assume? We can possibly allocate 2a02:ec80:53::/40 for it in Netbox (better to only carve up the block at that size to keep things consistent), announce 2a02:ec80:53::/48 to the internet, and then use 2a02:ec80:53::/128 as the IP?

Thanks! The plan is to the same for eqiad

Ok I've reserved those two ranges/IPs in Netbox now.

Many thanks!

Any thoughts on the last one (anycast range)?

We'll want a new /48 separate prefix to announce into the DFZ for it I assume? We can possibly allocate 2a02:ec80:53::/40 for it in Netbox (better to only carve up the block at that size to keep things consistent), announce 2a02:ec80:53::/48 to the internet, and then use 2a02:ec80:53::/128 as the IP?

That seems to be what we are doing for Wikimedia DNS (and what we did), so I guess it makes sense -- but I am leaving that to your expertise :)

(2001:67c:930::/48 and then 2001:67c:930::1/128 for the Wikimedia DNS v6 anycast.)

overall lgtm

Using a full /64 unicast 2620:0:860:53::/64 for a single service looks a bit weird, but as it's something critical like AuthDNS it doesn't shock me too much. The alternative is to allocate a codfw /64 for any kind of "bird" services, one of them being authdns. Like we have a LVS range, which contains multiple services.

We'll want a new /48 separate prefix to announce into the DFZ for it I assume?

It's not strictly needed (especially if we internally route it in case of local failure), but doesn't hut to add one more entry in the DFZ to give us more control on the announcements.

with a slight preference to use 2a02:ec80:53::1/128 rather than 2a02:ec80:53::/128 as IPs to not freak people out.

overall lgtm

Using a full /64 unicast 2620:0:860:53::/64 for a single service looks a bit weird, but as it's something critical like AuthDNS it doesn't shock me too much. The alternative is to allocate a codfw /64 for any kind of "bird" services, one of them being authdns. Like we have a LVS range, which contains multiple services.

Not that it makes it any better but we are "wasting" IPv4s as well for the ns2 and the Wikimedia DNS anycast ranges, and the rationale we applied at that time as well was to keep them distinct. So I am guessing the IPv6 addresses don't have the same shortage/resource constraints and therefore we can continue using the full /64 unless there are other considerations from your end.

We'll want a new /48 separate prefix to announce into the DFZ for it I assume?

It's not strictly needed (especially if we internally route it in case of local failure), but doesn't hut to add one more entry in the DFZ to give us more control on the announcements.

with a slight preference to use 2a02:ec80:53::1/128 rather than 2a02:ec80:53::/128 as IPs to not freak people out.

I had that in mind as we do the same for text-lb, upload-lb, Wikimedia DNS but no strong preferences here, so leaving that you both. (FWIW, Cathal already assigned 2a02:ec80:53::/128).

Our glue records also have a disparity.

I was interested to know what effect this would have. One data-point for Bind (at least my local instance) is that it is caching the TTL that comes back from our servers, not the glue records in the ORG zone:

cathal@officepc:~$ dig A ns0.wikimedia.org @192.168.240.1

[...]

Except almost nobody but engineers are going to directly query that record. Most caches will learn and re-learn it as they traverse the delegation from the root of the DNS, and thus if the end-users are querying say en.wikipedia.org, the cache will learn it from the glue records at the Afilias org-level authservers. But sometimes they may learn it from us, if they happen to first traverse one of our non-.org domains that uses our .org NS records.

Except almost nobody but engineers are going to directly query that record. Most caches will learn and re-learn it as they traverse the delegation from the root of the DNS, and thus if the end-users are querying say en.wikipedia.org, the cache will learn it from the glue records at the Afilias org-level authservers. But sometimes they may learn it from us, if they happen to first traverse one of our non-.org domains that uses our .org NS records.

That doesn't seem to be happening for me. I did a pcap to see what was going on, and made a query for 'en.wikipedia.org' with dig.

{F71570195}

It seems after Bind gets the NS query response from the .org servers, including the glue 'A' entries for our authdns (packet 23), it immediately makes follow-up 'A' queries to our authdns for those same names, using the IPs from glue (packets 24-29). It's the responses to those, from our authdns with the TTL we set, that Bind caches.

NOTE: edited to remove spam in this task sry

As a further test I wiped my cache, started a packet capture and did a dig for 'en.wikimedia.org'.

did you intend to use a non-canonical domain here? pedia vs media.

But it seems Bind does not cache the glue records / additional that comes back from the .org authdns. At least for any length of time. It immediately makes separate A record queries for the nameservers .org returns, sent to the IPs from the glue records. It's the responses from those direct queries to us that it caches, with a TTL of 86400.

That all makes sense, and once they're loaded by that direct query, it would probably even use them in place of any missing "glue" (e.g. for wikipedia.com lookups that return NS delegations without addresses glued). But still, "nobody" is directly querying the nsX A-records in the normal mass case in public resolvers. I would expect it does cache the glue (attached to the NS) for an hour in the true delegation case.

did you intend to use a non-canonical domain here? pedia vs media.

Ah sorry that was a typo, corrected now. I looked up en.wikipedia.org (pcap doesn't lie at least).

That all makes sense, and once they're loaded by that direct query, it would probably even use them in place of any missing "glue"

Yeah those subsequent A record queries - when it already has the glue records from .org servers - were a surprise. I wonder if that's common to other recursors or only Bind? I guess doesn't really matter as mostly they should be identical.

Change #1228518 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/software/homer/deploy@master] plugins/wmf-netbox: remove ipv4 only for DNS hosts BGP

https://gerrit.wikimedia.org/r/1228518

@cmooney: Per the discussion above with Arzhel, we think that 2a02:ec80:53::1/128 is better for readability and consistency with other v6 records, than the current 2a02:ec80:53::/128. Any thoughts on that? If this makes sense to you as well, please feel free to re-assign in Netbox (or ask me to do it :) and we can move ahead with the other stuff. Thanks!

Change #1228576 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add ns2.wikimedia.org anycast block to anycast config

https://gerrit.wikimedia.org/r/1228576

@cmooney: Per the discussion above with Arzhel, we think that 2a02:ec80:53::1/128 is better for readability and consistency with other v6 records, than the current 2a02:ec80:53::/128. Any thoughts on that? If this makes sense to you as well, please feel free to re-assign in Netbox (or ask me to do it :) and we can move ahead with the other stuff. Thanks!

Yeah that's fine. The anycast IP hadn't been assigned in Netbox, so I've done that now. I've also create the RIPE IRR records for the block and the required RPKI ROA object.

So to confirm the assigned addresses are:

2620:0:861:53::1    ns0.wikimedia.org
2620:0:860:53::1    ns1.wikimedia.org
2a02:ec80:53::1     ns2.wikimedia.org

Many thanks @cmooney 🙏! I will go ahead with 2620:0:860:53::1/128 for ns1 and update that everywhere in the current CRs.

As discussed I think a good way to bring this live might be:

  1. Merge patch in puppet repo to make Bird announce the new IPs at all sites
  2. Merge the patch to enable IPv6 peering to the dns-boxes in the homer-reploy repo
  3. Release new homer version (plugin only release) to include this change
  4. Merge the patch to accept the new anycast addresses over BGP from the authdns boxes (done)
  5. Run Homer against all our core routers

At this point all the authdns boxes should be announcing the new IPs to our core routers. We should be receiving them and they should be pingable from the internet. We can check:

  1. Internet reachability
  2. BGP announcements of the new Anycast /48 is being done at all points and accepted by upstreams
  3. Manual queries direct to the IPs are answered (with dig)

Once we are happy all is good with that we can start adjusting our DNS zones, publishing the new AAAA records one by one starting with codfw/ns1. We should aim to add the record to our own boxes first, but quickly thereafter update the 'glue' with the registrar too.

As discussed I think a good way to bring this live might be:
[...]

Sounds like a plan and it makes sense -- we can test everything on our end before enabling the change selectively, starting with ns1. The updated design also helps us avoid the problem of having to silence any alerts, which may hide other problems and is perhaps not desirable in hindsight.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1226928 has been updated to effectively roll this out everywhere on the DNS hosts.

We also have an additional layer of control on the DNS hosts, though the current design makes it slightly more coupled than desired. Essentially,

sukhe@puppetserver1001:~$ sudo confctl select 'name=dns2004.wikimedia.org' get
{"dns2004.wikimedia.org": {"weight": 100, "pooled": "yes"}, "tags": "dc=codfw,cluster=dnsbox,service=ntp-a"}
{"dns2004.wikimedia.org": {"weight": 100, "pooled": "yes"}, "tags": "dc=codfw,cluster=dnsbox,service=recdns"}
{"dns2004.wikimedia.org": {"weight": 100, "pooled": "yes"}, "tags": "dc=codfw,cluster=dnsbox,service=authdns-ns1"}
{"dns2004.wikimedia.org": {"weight": 100, "pooled": "yes"}, "tags": "dc=codfw,cluster=dnsbox,service=authdns-ns2"}
{"dns2004.wikimedia.org": {"weight": 100, "pooled": "yes"}, "tags": "dc=codfw,cluster=dnsbox,service=authdns-update"}

This means that we can theoretically depool authdns-ns1 but that means a depool of both v4 and v6. In some ways, this is not ideal. But then, I can't think of a reason on why we would like to keep one pooled but not the other, and perhaps there is value in having one service for both the addresses. But this is something that is up for discussion and if required, I can update the various bits to have two services per authdns-nsX, authdns-ns1-v4 and authdns-ns1-v6.

That's some extra work so perhaps for later but I just wanted to mention that in case someone else has thoughts.

A maybe safer alternative is to first enable IPv6 BGP peering between the network and all the dnsbox with profile::bird::do_ipv6: true (and the Homer patches). BGP over v6 will be established, but the dnsbox won't be advertising any prefix.
Then if all good merge a patch to make bird/anycast-hc advertise the new IPs.

Change #1228576 merged by jenkins-bot:

[operations/homer/public@master] Add config for authdns IPv6 public IPs

https://gerrit.wikimedia.org/r/1228576

A maybe safer alternative is to first enable IPv6 BGP peering between the network and all the dnsbox with profile::bird::do_ipv6: true (and the Homer patches). BGP over v6 will be established, but the dnsbox won't be advertising any prefix.
Then if all good merge a patch to make bird/anycast-hc advertise the new IPs.

Cool. I didn't know if the Bird role would support that (i.e. enable an IPv6 BGP peering without announcing any IPs).

Ultimately I think we are safe to set up the routing step-by-step or in one big bang, it's not until we publish the AAAA records that there is potential to disrupt anything. So I'll leave to @ssingh and Traffic team to make the call on how to make the Bird changes.

Yeah, unless we update our own zone files but more specifically, Markmonitor, nothing really changes so we can just go ahead with the approach @cmooney suggested and enable it everywhere so that we can test things, and then turn it on when we are ready by publishing the updated glue record.

What @ayounsi is suggesting will of course also work -- and that's the same thing when a service is depooled, bird is set up but doesn't announce the IP -- but I really don't think it's required.

Change #1226904 merged by Ssingh:

[operations/dns@master] wikimedia/wikipedia.org: match TTLs for NS and glue records

https://gerrit.wikimedia.org/r/1226904

There is another layer of complexity we need to be aware of. Essentially, authdns_addrs in hieradata/common.yaml has the list of v4 authdns IP records, and now will have the v6 ones. In the above Puppet/bird patch, I thought we could simply skip adding the v6 IPs there till we have everything running and I assumed that only dictates the configuration for haproxy for DNS over TLS on port 853. I was wrong -- and I only realized this after looking at the PCC output -- since that particular key sets up not only the DoTLS bits but also the gdnsd bits, including the listen addresses and firewall rules; that's the canonical list of IPs for the auth servers.

See https://puppet-compiler.wmflabs.org/output/1226928/7913/dns2004.wikimedia.org/index.html (authdns_addr set for the v6 bits) and compare it with https://puppet-compiler.wmflabs.org/output/1226928/7918/dns1004.wikimedia.org/index.html (authdns_addrs not set).

This is not a big deal since we are not actually publishing these records, so it doesn't matter if haproxy is listening on 853/v6 for DoTLS but we need to be aware of this.

Change #1228518 merged by Cathal Mooney:

[operations/software/homer/deploy@master] plugins/wmf-netbox: remove ipv4 only for DNS hosts BGP

https://gerrit.wikimedia.org/r/1228518

Mentioned in SAL (#wikimedia-operations) [2026-01-21T17:37:52Z] <sukhe> sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1226928'": T81605

Change #1226928 merged by Ssingh:

[operations/puppet@production] dnsbox: advertise ns[0-2] IPv6

https://gerrit.wikimedia.org/r/1226928

Mentioned in SAL (#wikimedia-operations) [2026-01-21T19:30:02Z] <sukhe> re-enable puppet on A:dnsbox: T81605

Just want to add my two cents on the problem we hit trying to make the IPv6 IPs live.

  • I personally think it's cleaner if the dns servers only have configured, and only listen on, the IPs that are actually routing to them
    • I reckon in a disaster it's as easy (and as delicate) to modify the dnsbox's at a site to listen on/announce extra IPs as to route them manually on the network
  • Ultimately netops have no preference
    • We only care that the dns boxes announce the correct set of IPs at any location and answer queries sent to them
    • So whatever way traffic want to handle this is fine, we can either stop configuring core site IPs on the dns servers at POPs, or continue to do that but adjust the automation so it sets the IPv6 ones up with the correct netmask

Change #1230351 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] dnsbox: advertise ns[0-2] IPv6

https://gerrit.wikimedia.org/r/1230351

https://puppet-compiler.wmflabs.org/output/1230351/7941/

Interface::Ip[ns1-v6]
Exec[ip addr add 2620:0:861:53::1/128 label lo:anycast dev lo]
Augeas[lo_2620:0:860:53::1/128]
Exec[ip addr add 2a02:ec80:53::1/128 label lo:anycast dev lo]
Interface::Ip[lo-vip-ns2.wikimedia.org-ipv6]
Interface::Ip[lo-vip-ns0.wikimedia.org-ipv6]
Augeas[lo_2a02:ec80:53::1/128]
Exec[ip addr add 2620:0:860:53::1/128 dev lo]
Augeas[lo_2620:0:861:53::1/128]

With @taavi's change in https://gerrit.wikimedia.org/r/q/I5f05c2e4f90dfa8517e7f658216891f46a6a6964, I think we can move ahead with this. If we take eqiad and ns0, we set up ns0 and ns2 (anycast) with bird, therefore they get the loopbacks added with the right labels:

Exec[ip addr add 2620:0:861:53::1/128 label lo:anycast dev lo]

and

Exec[ip addr add 2a02:ec80:53::1/128 label lo:anycast dev lo]

But we also need to add ns1, and that is taken care by the puppetization for the authdns servers itself, 2620:0:860:53::1/128, and this block in modules/profile/manifests/dns/auth/config.pp:

# Create the loopback IPs used for public service (defined here since we
# also create the matching listener config here)
# Skip loopbacks if bird sets up the loopbacks in a given site.
$authdns_addrs.each |$alabel,$adata| {
    unless $adata['skip_loopback'] or $adata['skip_loopback_site'] == $::site {
        interface::ip { $alabel:
            address   => $adata['address'],
            interface => 'lo',
        }
    }
}

Mentioned in SAL (#wikimedia-operations) [2026-02-03T15:32:06Z] <sukhe> sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1230351'": T81605

Change #1230351 merged by Ssingh:

[operations/puppet@production] dnsbox: advertise ns[0-2] IPv6

https://gerrit.wikimedia.org/r/1230351

Mentioned in SAL (#wikimedia-operations) [2026-02-03T17:46:25Z] <sukhe> sudo cumin -b1 -s120 "A:dnsbox and not P{dns1004* or dns7001*}" "run-puppet-agent --enable 'merging CR 1230351'": T81605

Change #1236354 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] wikimedia.org: add IPv6 AAAA glue record for ns1

https://gerrit.wikimedia.org/r/1236354

RIPEstat looks good in terms of visibility of the new ns2 Anycast prefix:

image.png (671×624 px, 82 KB)

Change #1236798 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] wikimedia.org: add IPv6 AAAA record for ns0

https://gerrit.wikimedia.org/r/1236798

Change #1236803 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] wikimedia.org: add IPv6 AAAA record for ns2

https://gerrit.wikimedia.org/r/1236803