Anycast AuthDNS
Open, LowPublic
Actions

Assigned To

None

Authored By

	faidon
	May 4 2015, 2:23 PM

Description

As our network of PoPs expand, it makes sense to start thinking about distributing other critical services than just pure HTTP(S) to them. The obvious one that immediately comes to mind is authoritative DNS, as it is critical for serving user traffic efficiently. DNS can take a significant chunk of page load time with a cold or expired recursor cache (something that is unlikely for our sites, though).

Right now, we have three nameservers (ns0-1-2), one in each of eqiad/codfw/esams. ns0/1/2 are service IPs each local to the PoP and pointed to one server each. A downtime on one of them, e.g. for a server reboot, means that clients (recursors) will repeat a query, imposing additional latency. A downtime on all three of them would be catastrophic. The three nameservers are in different places of the world and some recursors are/can be smart with selecting the lowest-latency authoritative server for a domain (SRTT); not all of them implement this, though, and those that do aren't always well implemented.

Rather than add yet another nameserver to ulsfo and possibly our future PoPs, it makes more sense to start thinking about setting up anycast for our DNS service IPs.

I've thought about this a bit and here's what I have come up so far:

We designate a IPv4 /24 (& IPv6 /48?) from our (limited) unused IP space as an anycast IP space that we will advertise from all of our PoPs. Either 198.35.27.0/24 or a /24 from 185.15.56.0/22 can be used for that; the latter may have been a martian before and there may be some risk in using it exclusively. If we're feeling generous and very risk-averse, we can assign two /24s from disparate subnets for extra protection against routing failures.
Out of these 1-2 /24s, we assign 2-4 IPs for ns0-[13] as service IPs. The rest will remain unused unless we come up with some other useful service to be anycasted.
We set up two servers or VMs per PoP to serve as AuthDNS (among others?), each listening to *all* 2-4 IPs.
We set up those (now global across our network!) service IPs behind LVS in all of our sites. Pybal does BGP anyway and already supports DNS monitoring (for recdns).
We add a feature to Pybal that adds an option to not advertise the IP over BGP if all of the realservers are marked as down. This would ensure that if all servers in one PoP are down, traffic would be rerouted (internally) to a another PoP. This may be essentially an alternative action to depool-threshold (rather than stop depooling, stop announcing the IP) and could be generally useful for anycast'ing even TCP services internally.
We configure static routes of lower metric than BGP to point to one of the realservers, so that if all servers across all sites are marked as down (because of a misconfiguration or broken Pybal version), nameservers would still be reachable. We do this already for other Pybal endpoints as well, but this is even more important here because of the absence of depool-threshold.
Optional: on pybal, we assign e.g. ns0's 90/10 traffic to box1 and 10/90 to box2 and vice-versa for ns1; this ensures that a) traffic is load-balanced between the two servers b) each of them gets a significant portion for half of the IPs (easier troubleshooting, DDoS protection), c) each of them gets a small portion of traffic to the other IP as well, to make sure that everything is working when the time comes.

Details

Subject	Repo	Branch	Lines +/-
Remove unused systemd fragment	operations/puppet	production	+0 -3
[WIP] dnsbox: looser binding for anycast healtcheck	operations/puppet	production	+31 -8
bird anycast config: filter new anycast network	operations/puppet	production	+1 -1
Add testable anycast authdns address (attempt 2)	operations/puppet	production	+33 -11
Add testable anycast authdns address and bird stuff	operations/puppet	production	+22 -6
Add anycast authdns IP address	operations/dns	master	+4 -1
Add check_dns_query to dns profiles	operations/puppet	production	+8 -0
check_dns_query: add localonly option	operations/puppet	production	+13 -2
Split nagios_common::check_dns_query from commands	operations/puppet	production	+20 -3
Fix bug in check_dns_query error reporting	operations/puppet	production	+1 -1
bird: 10s delay after route withdraw	operations/puppet	production	+5 -2
anycast: bind to real service	operations/puppet	production	+6 -6
anycast healthchecker should be able to bind	operations/puppet	production	+12 -3
dns[12]001: include in authdns_servers	operations/puppet	production	+2 -0
dns[12]001: include authserver	operations/puppet	production	+2 -0
dns[35]001: include in authdns_servers	operations/puppet	production	+2 -0
dns[35]001: include authserver	operations/puppet	production	+2 -0
dns[15]002: include in authdns_servers	operations/puppet	production	+2 -0
dns[15]002: include authserver	operations/puppet	production	+2 -0
authdns: restrict 5353 to production	operations/puppet	production	+6 -4
authdns: add ferm rules for 5353	operations/puppet	production	+15 -1
dnsbox: add dns4001 to the set of authservers	operations/puppet	production	+2 -0
acme-chief: parallelize gdnsd-sync	operations/puppet	production	+19 -21
dnsbox: set up rec/auth machine-local dependency	operations/puppet	production	+25 -1
Update cumin aliases for dns profiles	operations/puppet	production	+2 -2
dns roles/profiles refactor, part 3	operations/puppet	production	+12 -23
dns::recursor: move role bits down to profile	operations/puppet	production	+7 -9
authdns: move per-server monitors to 5353	operations/puppet	production	+1 -2
authdns: configure explicit service/monitor addrs	operations/puppet	production	+36 -14
Move check_dns_query to separate cfg	operations/puppet	production	+15 -16
Add and use check_dns_query	operations/puppet	production	+166 -3
Fix common/monitoring dnsbox cluster defs	operations/puppet	production	+11 -11
Move DNS roles together under role::dns::	operations/puppet	production	+26 -49
Move DNS server profiles under profile::dns::	operations/puppet	production	+12 -12
Unify and simplify DNS server ferm rules	operations/puppet	production	+17 -37
authdns: refactor role/profile/hieradata bits	operations/puppet	production	+80 -82
Parallelize authdns-update with clush	operations/puppet	production	+16 -12
authdns-local-update: non-verbose by default	operations/puppet	production	+25 -4
authdns-local-update: var rename for clarity	operations/puppet	production	+4 -4
Bird: add monitoring to the VIP and bird process	operations/puppet	production	+26 -0
dnsrecursor: send hostname in version responses	operations/puppet	production	+7 -3
Have every rdns advertise a private anycast VIP	operations/puppet	production	+205 -1
note future anycast networks	operations/dns	master	+2 -0

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T81605 Offer AuthDNS service over IPv6
Open	None	T98006 Anycast AuthDNS
Declined	None	T101525 Set up LVS for current AuthDNS
Resolved	BBlack	T239667 Convert DNS servers to Buster
Resolved	BBlack	T240285 Clean up DNS server puppetization
Declined	None	T240863 Secure shared ticket key rotation for anycast authdns
Open	None	T240866 Create a system for distributed shared secret material to server tmps
Resolved	ayounsi	T253196 Advertise 198.35.27.0/24 as anycast prefix
Declined	ayounsi	T253666 Anycast: consistent routers->servers routing
Declined	None	T253732 Anycast: consistent ICMP packet too big routing

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 552814 merged by BBlack:
[operations/puppet@production] Move DNS server profiles under profile::dns::

https://gerrit.wikimedia.org/r/552814

Change 552815 merged by BBlack:
[operations/puppet@production] Move DNS roles together under role::dns::

https://gerrit.wikimedia.org/r/552815

Change 552861 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Fix common/monitoring dnsbox cluster defs

https://gerrit.wikimedia.org/r/552861

Change 552861 merged by BBlack:
[operations/puppet@production] Fix common/monitoring dnsbox cluster defs

https://gerrit.wikimedia.org/r/552861

Change 552935 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add and use check_dns_query

https://gerrit.wikimedia.org/r/552935

Change 553108 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] authdns: configure explicit service/monitor addrs

https://gerrit.wikimedia.org/r/553108

Change 553109 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] authdns: move per-server monitors to 5353

https://gerrit.wikimedia.org/r/553109

Change 552935 merged by BBlack:
[operations/puppet@production] Add and use check_dns_query

https://gerrit.wikimedia.org/r/552935

Change 553119 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Move check_dns_query to separate cfg

https://gerrit.wikimedia.org/r/553119

Change 553119 merged by BBlack:
[operations/puppet@production] Move check_dns_query to separate cfg

https://gerrit.wikimedia.org/r/553119

Change 553108 merged by BBlack:
[operations/puppet@production] authdns: configure explicit service/monitor addrs

https://gerrit.wikimedia.org/r/553108

Change 553135 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] authdns: add ferm rules for 5353

https://gerrit.wikimedia.org/r/553135

Change 553135 merged by BBlack:
[operations/puppet@production] authdns: add ferm rules for 5353

https://gerrit.wikimedia.org/r/553135

Change 553109 merged by BBlack:
[operations/puppet@production] authdns: move per-server monitors to 5353

https://gerrit.wikimedia.org/r/553109

Change 553176 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns::recursor: move role bits down to profile

https://gerrit.wikimedia.org/r/553176

Change 553177 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns roles/profiles refactor, part 3

https://gerrit.wikimedia.org/r/553177

Change 553176 merged by BBlack:
[operations/puppet@production] dns::recursor: move role bits down to profile

https://gerrit.wikimedia.org/r/553176

Change 553177 merged by BBlack:
[operations/puppet@production] dns roles/profiles refactor, part 3

https://gerrit.wikimedia.org/r/553177

Change 553186 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Update cumin aliases for dns profiles

https://gerrit.wikimedia.org/r/553186

Change 553186 merged by BBlack:
[operations/puppet@production] Update cumin aliases for dns profiles

https://gerrit.wikimedia.org/r/553186

Status update: the blended authdns+recdns(+ntp) role is now nearly-complete in role::dnsbox. There's a hieradata flag profile::dnsbox::include_auth which is only set for dns4002 in hieradata, which causes the authdns functionality to be included in the role. dns4002 is running this way now, and is currently the 4th member of our authdns set for authdns-update and similar purposes (including authdns healthchecks and prometheus stats), but only the local recdns daemon on that box is using it for lookups, there's no public service routed into it. authdns-update has been parallelized with clush for now, to ensure it doesn't get really slow as the count of servers expands.

What's missing is adding the systemd- (and perhaps puppet-) level dependency chain from pdns-recursor to gdnsd, to ensure that in the rare event gdnsd is stopped or restarted it will take down recdns with it (and thus the bgp advert for anycast recdns, etc), otherwise the local recdns daemon would be left with no auth service for our own names. Should have that part nailed down shortly. After that we can begin spreading this role around the rest of the dnsbox fleet and working on the various other unblocked parts (many of which don't necessarily depend on each other): Buster upgrades, DoTLS, Bird adverts for our current authdns addresses, and Anycast AuthDNS.

Change 553236 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dnsbox: set up rec/auth machine-local dependency

https://gerrit.wikimedia.org/r/553236

Change 553236 merged by BBlack:
[operations/puppet@production] dnsbox: set up rec/auth machine-local dependency

https://gerrit.wikimedia.org/r/553236

Change 552336 merged by BBlack:
[operations/puppet@production] acme-chief: parallelize gdnsd-sync

https://gerrit.wikimedia.org/r/552336

Change 553332 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dnsbox: add dns4001 to the set of authservers

https://gerrit.wikimedia.org/r/553332

Change 553332 merged by BBlack:
[operations/puppet@production] dnsbox: add dns4001 to the set of authservers

https://gerrit.wikimedia.org/r/553332

Change 553349 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] authdns: restrict 5353 to production

https://gerrit.wikimedia.org/r/553349

Change 553349 merged by BBlack:
[operations/puppet@production] authdns: restrict 5353 to production

https://gerrit.wikimedia.org/r/553349

Change 554064 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[15]002: include authserver

https://gerrit.wikimedia.org/r/554064

Change 554065 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[15]002: include in authdns_servers

https://gerrit.wikimedia.org/r/554065

Change 554064 merged by BBlack:
[operations/puppet@production] dns[15]002: include authserver

https://gerrit.wikimedia.org/r/554064

Change 554065 merged by BBlack:
[operations/puppet@production] dns[15]002: include in authdns_servers

https://gerrit.wikimedia.org/r/554065

Change 554067 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[35]001: include authserver

https://gerrit.wikimedia.org/r/554067

Change 554068 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[35]001: include in authdns_servers

https://gerrit.wikimedia.org/r/554068

Change 554067 merged by BBlack:
[operations/puppet@production] dns[35]001: include authserver

https://gerrit.wikimedia.org/r/554067

Change 554068 merged by BBlack:
[operations/puppet@production] dns[35]001: include in authdns_servers

https://gerrit.wikimedia.org/r/554068

Change 554073 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[12]001: include authserver

https://gerrit.wikimedia.org/r/554073

Change 554074 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] dns[12]001: include in authdns_servers

https://gerrit.wikimedia.org/r/554074

Change 554073 merged by BBlack:
[operations/puppet@production] dns[12]001: include authserver

https://gerrit.wikimedia.org/r/554073

Change 554074 merged by BBlack:
[operations/puppet@production] dns[12]001: include in authdns_servers

https://gerrit.wikimedia.org/r/554074

Where we're at now:

There are 13x authdns servers participating in authdns-update:
- The 3 traditional ones (authdns1001, authdns1002, ganeti3003) which are role::authdns
- The ten dnsbox hosts (role::dnsbox) that also do recdns and ntp (dns[12345]00[12])
cumin's A:dns-auth alias targets all 13, whereas A:dns-rec targets only the ten recursors (these aliases currently target underlying profiles, not roles)
Public authdns service routing is unchanged, with each of the ns[012] IPv4s routed into their usual 3 traditional role::authdns boxes
The recursors (the 10x role::dnsbox) now use their machine-local authdns instance over the loopback to look up our own names, rather than talking to the "real" ns[012] machines.
At least some of https://wikitech.wikimedia.org/wiki/DNS has been caught up with reality a bit.

Change 554866 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] anycast healthchecker should be able to bind

https://gerrit.wikimedia.org/r/554866

Change 554867 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] anycast: bind to real service

https://gerrit.wikimedia.org/r/554867

Change 554868 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] bird: 10s delay after route withdraw

https://gerrit.wikimedia.org/r/554868

Change 554866 merged by BBlack:
[operations/puppet@production] anycast healthchecker should be able to bind

https://gerrit.wikimedia.org/r/554866

Change 554867 merged by BBlack:
[operations/puppet@production] anycast: bind to real service

https://gerrit.wikimedia.org/r/554867

Change 554868 merged by BBlack:
[operations/puppet@production] bird: 10s delay after route withdraw

https://gerrit.wikimedia.org/r/554868

BBlack closed subtask T239667: Convert DNS servers to Buster as Resolved.Dec 5 2019, 9:13 PM

BBlack mentioned this in T81605: Offer AuthDNS service over IPv6.Dec 6 2019, 2:20 PM

BBlack added a parent task: T81605: Offer AuthDNS service over IPv6.Dec 6 2019, 2:24 PM

BBlack mentioned this in T240285: Clean up DNS server puppetization.Dec 9 2019, 10:26 PM

BBlack added a subtask: T240285: Clean up DNS server puppetization.

BBlack mentioned this in T240614: Fix acme-chief DNS validation correctly.Dec 12 2019, 8:43 PM

BBlack closed subtask T240863: Secure shared ticket key rotation for anycast authdns as Declined.May 19 2020, 4:43 PM

Change 597310 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Add anycast authdns IP address

https://gerrit.wikimedia.org/r/597310

Change 597311 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add testable anycast authdns address and bird stuff

https://gerrit.wikimedia.org/r/597311

Change 597315 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] check_dns_query: add localonly option

https://gerrit.wikimedia.org/r/597315

Change 597326 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Bugfix for check_dns_query error reporting

https://gerrit.wikimedia.org/r/597326

Change 597326 merged by BBlack:
[operations/puppet@production] Fix bug in check_dns_query error reporting

https://gerrit.wikimedia.org/r/597326

Change 597329 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Split nagios_common::check_dns_query from commands

https://gerrit.wikimedia.org/r/597329

Change 597330 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add check_dns_query to dns profiles

https://gerrit.wikimedia.org/r/597330

Change 597329 merged by BBlack:
[operations/puppet@production] Split nagios_common::check_dns_query from commands

https://gerrit.wikimedia.org/r/597329

Change 597330 merged by BBlack:
[operations/puppet@production] Add check_dns_query to dns profiles

https://gerrit.wikimedia.org/r/597330

Change 597315 merged by BBlack:
[operations/puppet@production] check_dns_query: add localonly option

https://gerrit.wikimedia.org/r/597315

Change 597310 merged by BBlack:
[operations/dns@master] Add anycast authdns IP address

https://gerrit.wikimedia.org/r/597310

Status Update: Worked through a bunch of the software-level complexities today with getting bird::anycast to advertise an authdns IP from all the auth and dnsbox hosts ( https://gerrit.wikimedia.org/r/#/q/topic:anycast-authdns+(status:open+OR+status:merged) ), and merged a DNS patch to set up a single anycast authdns IP in the unused /24 we had adjacent to ulsfo's space in: https://gerrit.wikimedia.org/r/#/c/operations/dns/+/597310/ . There's still one patch outstanding, which turns on the actual adverts and local healthchecks, at: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/597311/ .

The medium-term straw proposal TODO for now, to get to a testable (but not deployed for real use) state is:

Maybe - go ahead and add an "nsb" as a second IP in the same /24? Most likely registrars will require two IPs even if they're pointlessly-redundant. Some will probably even complain that the pair are in the same /24, but I hope not. If we set it up now, we can account for it in all our router config stuff below instead of redoing it all later:
Routers config stuff: disaggregate the 198.35.26.0/23 to split 27.0/24 cleanly, and set up various router config to allow bird to advertise this (using a less-specific fallback I guess, like we did with recdns, so that normal local fallback is over to neighboring DCs if they're reachable?), and advertise the space from everywhere? There might be some edge-cases to discuss with the router-only sites like eqdfw, eqord, and knams.
Merge the last patch and see if we can run test queries and whether the routing seems to work as expected
Merge an icinga-level monitoring patch for it as well (TBD)

If all of that goes well, then we can start planning around a cautiously-progressive rollout into live view (add to local then registrar level NS data, then remove the others slowly), eventually replacing the legacy NS IPs completely.

ayounsi added a subtask: T253196: Advertise 198.35.27.0/24 as anycast prefix.May 20 2020, 8:31 AM

Change 597311 merged by BBlack:
[operations/puppet@production] Add testable anycast authdns address and bird stuff

https://gerrit.wikimedia.org/r/597311

Maintenance_bot removed a project: Patch-For-Review.May 20 2020, 5:11 PM

Change 597590 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add testable anycast authdns address (attempt 2)

https://gerrit.wikimedia.org/r/597590

gerritbot added a project: Patch-For-Review.May 20 2020, 5:25 PM

Change 597590 merged by BBlack:
[operations/puppet@production] Add testable anycast authdns address (attempt 2)

https://gerrit.wikimedia.org/r/597590

Change 597594 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] bird anycast config: filter new anycast network

https://gerrit.wikimedia.org/r/597594

Change 597594 merged by BBlack:
[operations/puppet@production] bird anycast config: filter new anycast network

https://gerrit.wikimedia.org/r/597594

Maintenance_bot removed a project: Patch-For-Review.May 20 2020, 6:10 PM

Update: the nsa authdns IP at 198.35.27.27 is live internally everywhere and monitored and working. There's some stuff to finish up later this week for the public side in T253196 , and then we can start really digging into the detailed bits about loadbancing and hashing with a live testbed.

Current thinking on address spaces is that we probably will allocate a second distinct /24 (probably 185.15.58.0/24) so that we have a pair of separate address spaces to work with, before this is all actually-live for real use.

(correction - it's also internet-reachable via ulsfo only for now, in this interim state, just by chance because it's still advertising the whole original /23)

CDanis subscribed.May 20 2020, 6:25 PM

Change 597822 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Remove unused systemd fragment

https://gerrit.wikimedia.org/r/597822

Change 597823 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] [WIP] dnsbox: looser binding for anycast healtcheck

https://gerrit.wikimedia.org/r/597823

Change 597822 merged by BBlack:
[operations/puppet@production] Remove unused systemd fragment

https://gerrit.wikimedia.org/r/597822

ayounsi closed subtask T253196: Advertise 198.35.27.0/24 as anycast prefix as Resolved.May 21 2020, 5:26 PM

ayounsi added a subtask: T253666: Anycast: consistent routers->servers routing.May 27 2020, 9:17 AM

ayounsi added a subtask: T253732: Anycast: consistent ICMP packet too big routing.

ayounsi changed the status of subtask T253666: Anycast: consistent routers->servers routing from Open to Stalled.Aug 3 2020, 6:59 AM

BBlack closed subtask T240285: Clean up DNS server puppetization as Resolved.Sep 23 2020, 4:06 PM

Aklapper added a project: Infrastructure-Foundations.Jun 21 2021, 9:00 PM

BBlack moved this task from Some old column to Icebox-Temp on the Traffic board.Oct 8 2021, 4:37 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!