Page MenuHomePhabricator

Anycast NTP and update the list of timeservers for P:systemd::timesyncd
Closed, ResolvedPublic

Description

In continuation of the work carried under T347054, we want to reduce the toil and SPOFs around our DNS work, which also includes our NTP setup given that we are running ntpd on the DNS boxes and systemd-timesyncd as the NTP client.

Introduction

We have three important uses/current implementations around our NTP setup:

  1. For the Debian installer, we use the anycast address ntp.anycast.wmnet as the NTP server under modules/install_server/files/autoinstall/common.cfg:
d-i	clock-setup/ntp-server	string	ntp.anycast.wmnet

This address is announced by all the DNS boxes so the install server should connect to the one closest to it. There is nothing more to be done here.

  1. For the DNS hosts themselves where we are running ntpd, we generate the list (automatically) in modules/profile/manifests/dns/recursor.pp. This is generated statically using the authdns_servers Hiera key. We do want to get rid of this as well at some point but that's for another task. Note that this peer list is only for the DNS hosts and does not affect the main consumers directly, which are all the other hosts.
  1. For the clients themselves, we use P:systemd::timesyncd as the NTP client and this is what this task is about. The current list of these hosts is:
sukhe@cumin1002:~$ sudo cumin "P:systemd::timesyncd"
2184 hosts will be targeted:

Current problem with the P:systemd::timesyncd NTP servers list

The current NTP servers are generated in modules/profile/manifests/systemd/timesyncd.pp and look like:

# For historical context, this array was manually managed via
# hieradata/$::site/profile/systemd/timesyncd.yaml.
#
# To set ntp_servers in a site, use the ntp_peers under it and the peers of
# the closest core site, which we determine from $::datacenters_tree.
if $ntp_servers == undef {
    $_ntp_servers = [$ntp_peers[$::site], $ntp_peers[$site_nearest_core[$::site]]].flatten
} else {
    $_ntp_servers = $ntp_servers
}

class {'systemd::timesyncd':
    ensure      => $ensure,
    ntp_servers => $_ntp_servers,
}

The logic here is pretty simple: for a host in a given site, the list of these servers is the list of the DNS boxes in that site plus the list of the DNS boxes in the nearest core site. For say cp7001 in magru, this list looks like:

sukhe@cp7001:~$ cat /etc/systemd/timesyncd.conf 
## THIS FILE IS MANAGED BY PUPPET

[Time]
Servers=dns7001.wikimedia.org dns7002.wikimedia.org dns1004.wikimedia.org dns1005.wikimedia.org dns1006.wikimedia.org

Note that the above is generated statically by Puppet. But in T347054, we manage the state of the DNS hosts themselves dynamically via confd. This can result in a situation where a given DNS host has been depooled but unless it it is removed from Puppet as well (which we don't unless we decommission the host), the host will still continue to exist in this list, when it can theoretically be not available/powered down/rebooting.

By itself, this is a not a serious problem as NTP is meant to work with mutiple servers (which is why the redundancy in our setup as well) and a single host down is not an issue. But from our POV, this is not ideal as this list can extend beyond a single host being down and we will have to update that change in Puppet and roll it out for systemd-timesyncd to be aware of it. It's also not a correct reflection of the state of the DNS boxes and we should fix that.

Solutions

Using confd to manage this list

The simplest solution is to template and manage /etc/systemd/timesyncd.conf via confd and use the current state of the DNS boxes as reported by etcd/confctl. This is fairly easy to do as we are already doing this in a bunch of other places on the DNS hosts themselves, however, it involves rolling out confd to all other 2184 hosts, which may not be ideal.

Anycast NTP

The motivation behind this task is to anycast the list of NTP servers. This will allow us to manage the servers more dynamically as the DNS boxes can simply enter and exit the pool of the available NTP server when desired without a need to update Puppet. This is also one of the main reasons why we did the current NTP anycast for the Debian installer.

To do so, we will come up with the three new NTP addresses: ntp-[abc].anycast.wmnet and then configure the clients to use these instead of the current list. In advertising these, we will follow the current logic:

ntp-a.anycast.wmnet: announced from all sites
ntp-b.anycast.wmnet: announced from all sites
ntp-c.anycast.wmnet: announced from only the core sites

For ntp-[ab].anycast.wmnet, they are announced from all sites but only one DNS box announces each. So in magru, dns7001 will advertise ntp-a.anycast.wmnet while dns7002 will advertise ntp-b.anycast.wmnet. The core sites have 3x dnsboxes, so it's their third servers (dns1006.wikimedia.org and dns2006.wikimedia.org) which will be the only ones in the network advertising ntp-c.anycast.wmnet.

This way we can still map the current setup and maintain the same redundancy as desired but with a dynamic anycast setup. The output of this exercise will then look like:

/etc/systemd/timesyncd.conf
[Time]
Servers=ntp-a.anycast.wmnet ntp-b.anycast.wmnet ntp-c.anycast.wmnet

There should be no further updates to this file from this point forward.

Concerns

  1. The drawback of this approach in general and relative to the confd one is that the setup of this might be more complex, especially given that we will have to come up with a smart way to distribute the ntp-[ab] announcements from different hosts. I think this is not a huge blocker as we are already doing similar things (for the unicast ns0/1 announcements).
  2. This brings yet another thing under the BGP/bird setup, increasing the critical services we have under it.
  3. We will need to deprecate ntp.anycast.wmnet and switch it over to something like ntp-[ab].anycast.wmnet. No issues there.
  4. This is a big change and we should be confident in moving towards this setup. We can start small and do it on a few hosts but eventually it will touch all 2184 hosts and more.
  5. We will need to update or improve the monitoring to adapt for this change.

netops [ @ayounsi / @cmooney ] your input required as always, thank you.

Event Timeline

ssingh triaged this task as Medium priority.Fri, May 31, 3:30 PM
ssingh added a subscriber: BBlack.

To clarify, there is no change to the configuration of the DNS hosts themselves and the peer list there. This is only for the consumers of P:systemd::timesyncd (which we don't use on the DNS hosts).

I suspect Brandon may be more versed in the ways of NTP than myself, and could advise if there are any pitfalls on the protocol side. But from my own understanding this should be ok. The important thing is all the dns hosts peer with each other and thus if an end-host connecting to ntp-X.anycast.wmnet ends up hitting another server (due to failure, depool etc) that next server will have similar time.

End hosts having 3 peers is a good idea, for the debain-installer we can maybe still just use one, perhaps ntp-a.anycast.wmnet. But overall the proposal looks good to me.

Do we need to do anything on the dns servers - or can we - to remove host(s) that are wildly out of sync with the rest of those in the cluster?

Yeah, I've looked at this from the deep-ntp-details POV and it's all pretty sane. We're in alignment with the recommendations in https://www.rfc-editor.org/rfc/rfc8633.html#page-17 and it should result in good time sync stability.

Do we need to do anything on the dns servers - or can we - to remove host(s) that are wildly out of sync with the rest of those in the cluster?

We have a local (bird) healthchecker on each dnsbox that will withdraw the anycast advert if a given dnsbox doesn't have sane time sync with its peers and/or the public upstream pools. This will commonly be the case after a reimage or even reboot for a little while.

Moving the dynamic nature of NTP definition to some automated system instead of human or Puppet is a great idea :)
Human as in right now for network devices, the list is hard-coded https://github.com/wikimedia/operations-homer-public/blob/master/config/common.yaml#L365

Using Bird/BGP with 3 set of servers make sens, ntp.anycast.wmnet could just be a CNAME to one of them to not have to reconfigure PDUs and similar devices.

If we do A/B/C maybe do the same logic than what's planned for DNS so we don't have too many different implementations.

Another idea would be to load balance between the 2/3 nodes in a same site using LVS/Liberica, so we would have X unicast IPs in the LVS low traffic range where X is our number of sites. That list would only need to be updated once we add a new site, so not too often.

Moving the dynamic nature of NTP definition to some automated system instead of human or Puppet is a great idea :)
Human as in right now for network devices, the list is hard-coded https://github.com/wikimedia/operations-homer-public/blob/master/config/common.yaml#L365

Yeah good point! We (at least I :) have missed updating that list in the past so I have it in my notes now but that's another big win IMO when we roll this out.

Using Bird/BGP with 3 set of servers make sens, ntp.anycast.wmnet could just be a CNAME to one of them to not have to reconfigure PDUs and similar devices.

Last time we rolled out this change, it was simply updating modules/install_server/files/autoinstall/common.cfg. Do you have any other place in mind where this might need to be reconfigured? I am personally for removing this completely but it's not a big deal and we can keep it around as well.

If we do A/B/C maybe do the same logic than what's planned for DNS so we don't have too many different implementations.

Do you mean the same logic as planned for the nsX anycast work? Or the same logic as it exists right now in Puppet?

Another idea would be to load balance between the 2/3 nodes in a same site using LVS/Liberica, so we would have X unicast IPs in the LVS low traffic range where X is our number of sites. That list would only need to be updated once we add a new site, so not too often.

The only issue with this is that Liberica and how its design will end up looking is uncertain at this point. And we don't want to put anything behind LVS, so I guess we can either go with this or wait for Liberica to happen and see how it evolves. Regardless I think, moving away from the Puppet state is the intention so whatever helps us achieve our goal.

Last time we rolled out this change, it was simply updating modules/install_server/files/autoinstall/common.cfg. Do you have any other place in mind where this might need to be reconfigured? I am personally for removing this completely but it's not a big deal and we can keep it around as well.

https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Opengear_Serial_Consoles
https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/ServerTech

They actually use the older naming/endpoint, which we might need to keep as CNAME as reconfiguring PDUs is manual.

Do you mean the same logic as planned for the nsX anycast work? Or the same logic as it exists right now in Puppet?

I meant as the nsX anycast if it's doable of course

The only issue with this is that Liberica and how its design will end up looking is uncertain at this point. And we don't want to put anything behind LVS, so I guess we can either go with this or wait for Liberica to happen and see how it evolves. Regardless I think, moving away from the Puppet state is the intention so whatever helps us achieve our goal.

Agreed, I don't think there is a rush, I don't have a strong preference, but to be sure we considered all the options.

Last time we rolled out this change, it was simply updating modules/install_server/files/autoinstall/common.cfg. Do you have any other place in mind where this might need to be reconfigured? I am personally for removing this completely but it's not a big deal and we can keep it around as well.

https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Opengear_Serial_Consoles
https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/ServerTech

They actually use the older naming/endpoint, which we might need to keep as CNAME as reconfiguring PDUs is manual.

Ah right yeah. I didn't remember what we did for the previous NTP change so I went back and found T347054#9223568. It seems like as long as the per-site hostnames don't change, we should be good on what they point to? Which means we will most likely point it to ntp-a.anycast.wmnet.

Do you mean the same logic as planned for the nsX anycast work? Or the same logic as it exists right now in Puppet?

I meant as the nsX anycast if it's doable of course

The nsX anycast A/B/C split is somewhat matched by the distribution of ntp-[abc].anycast.wmnet and how that maps out to the various hosts. For a direct nsX anycast mapping -- I am not sure if the complexity is required but we can see.

The only issue with this is that Liberica and how its design will end up looking is uncertain at this point. And we don't want to put anything behind LVS, so I guess we can either go with this or wait for Liberica to happen and see how it evolves. Regardless I think, moving away from the Puppet state is the intention so whatever helps us achieve our goal.

Agreed, I don't think there is a rush, I don't have a strong preference, but to be sure we considered all the options.

This is certainly an option for this and for other cases where we are considering anycast, to spread our setup if nothing else. I guess it comes down to timing: if we want to wait for Liberica, then we can take that route.

Re: "same logic" - they're different protocols, different hierarchies, and much different on the client behavior front as well. It doesn't make sense to share a strategy between the two.

Mentioned in SAL (#wikimedia-operations) [2024-06-12T13:28:37Z] <sukhe> add ntp-[abc].anycast.wmnet: 10.3.0.[5-7]/32: T366360

Change #1042354 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] [WIP]: dnsbox: advertise ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1042354

Change #1042354 abandoned by Ssingh:

[operations/puppet@production] [WIP]: dnsbox: advertise ntp-[abc].anycast.wmnet

Reason:

Abandoning to figure out a way to avoid the per-host override.

https://gerrit.wikimedia.org/r/1042354

Change #1046675 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] conftool-data: add ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046675

Change #1046685 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] dnsbox: announce ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046685

Change #1046689 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] durum: switch NTP peers to ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046689

Change #1046737 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] config/common: update list of ntp_servers to use anycast NTP servers

https://gerrit.wikimedia.org/r/1046737

Change #1046757 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] P:bird::anycast_monitoring: add monitoring for 10.3.0.[5-7]/32

https://gerrit.wikimedia.org/r/1046757

Change #1047073 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] install_server: update NTP server anycast address for d-i

https://gerrit.wikimedia.org/r/1047073

Change #1047074 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] wikimedia.org: switch ntp.$site to ntp-a.anycast.wmnet

https://gerrit.wikimedia.org/r/1047074

Change #1046675 merged by Ssingh:

[operations/puppet@production] conftool-data: add ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046675

Mentioned in SAL (#wikimedia-operations) [2024-06-19T13:23:48Z] <sukhe> sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360

Change #1046685 merged by Ssingh:

[operations/puppet@production] dnsbox: announce ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046685

Mentioned in SAL (#wikimedia-operations) [2024-06-19T15:16:44Z] <sukhe> sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360

Change #1046689 merged by Ssingh:

[operations/puppet@production] durum: switch NTP peers to ntp-[abc].anycast.wmnet

https://gerrit.wikimedia.org/r/1046689

Mentioned in SAL (#wikimedia-operations) [2024-06-19T16:42:04Z] <sukhe> sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360

Change #1047073 merged by Ssingh:

[operations/puppet@production] install_server: update NTP server anycast address for d-i

https://gerrit.wikimedia.org/r/1047073

Mentioned in SAL (#wikimedia-operations) [2024-06-20T12:54:39Z] <sukhe> sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360

Change #1046737 merged by Ssingh:

[operations/homer/public@master] config/common: update list of ntp_servers to use anycast NTP servers

https://gerrit.wikimedia.org/r/1046737

Mentioned in SAL (#wikimedia-operations) [2024-06-20T13:07:02Z] <sukhe> running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360

Change #1047998 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Loopback filter: allow ntp to/from private ranges

https://gerrit.wikimedia.org/r/1047998

Change #1047998 merged by Ayounsi:

[operations/homer/public@master] Loopback filter: allow ntp to/from private ranges

https://gerrit.wikimedia.org/r/1047998

Change #1046757 merged by Ssingh:

[operations/puppet@production] P:bird::anycast_monitoring: add monitoring for 10.3.0.[5-7]/32

https://gerrit.wikimedia.org/r/1046757

Change #1047074 merged by Ssingh:

[operations/dns@master] wikimedia.org: switch ntp.$site to ntp-a.anycast.wmnet

https://gerrit.wikimedia.org/r/1047074

Change #1048018 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] P:systemd::timesyncd: switch to anycast NTP peers

https://gerrit.wikimedia.org/r/1048018

Change #1048064 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera dnsbox and P:bird: remove references to ntp.anycast.wmnet

https://gerrit.wikimedia.org/r/1048064

Change #1048066 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] policies/cr-labs: remove obsolete ntp.anycast.wmnet

https://gerrit.wikimedia.org/r/1048066

Change #1048067 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] conftool-data: remove ntp service

https://gerrit.wikimedia.org/r/1048067

Frack config has been updated to use the new ntp-[abc].anycast.wmnet servers. The previous dnsXXXX and ntp.anycast.wmnet entries have been removed.

Frack config has been updated to use the new ntp-[abc].anycast.wmnet servers. The previous dnsXXXX and ntp.anycast.wmnet entries have been removed.

Thanks @Dwisehaupt! The production switch is planned for this week but knowing frack has been updated helps in removing the ntp.anycast.wmnet entries.

Change #1048066 merged by Ssingh:

[operations/homer/public@master] policies/cr-labs: remove obsolete ntp.anycast.wmnet

https://gerrit.wikimedia.org/r/1048066

Mentioned in SAL (#wikimedia-operations) [2024-06-24T14:03:35Z] <sukhe> running homer in cr*{eqiad*,codfw*} to remove ntp.anycast.wmnet from policies/cr-labs: T366360

Change #1048018 merged by Ssingh:

[operations/puppet@production] P:systemd::timesyncd: switch to anycast NTP peers

https://gerrit.wikimedia.org/r/1048018

Change #1048064 merged by Ssingh:

[operations/puppet@production] hiera dnsbox and P:bird: remove references to ntp.anycast.wmnet

https://gerrit.wikimedia.org/r/1048064

Change #1048067 merged by Ssingh:

[operations/puppet@production] conftool-data: remove ntp service

https://gerrit.wikimedia.org/r/1048067

Mentioned in SAL (#wikimedia-operations) [2024-06-26T18:14:56Z] <sukhe> # etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360

ssingh claimed this task.

This was rolled out to all 2166 hosts today that are now using ntp-[abc].anycast.wmnet. All traces of ntp.anycast.wmnet (including documentation) have been updated. If there is something left, that's a mistake so please let us know and we will update it here.

Thanks to netops for their help!