Page MenuHomePhabricator

Define 3-host infra cluster for traffic pops
Closed, DuplicatePublic

Description

In the long term view, we want all of our cache PoPs to host, locally:

  1. Recursive DNS (for use by local machines only)
  2. NTP (ditto, peered with other pops + upstream)
  3. AuthDNS (for future Anycast work, see: T98006
  4. Possibly Prometheus
  5. Possibly kafka brokers + zookeeper
  6. Possibly etcd hosts as well?
  7. Possibly install-server stuff, apt mirrors, and/or webproxy

This seems like a good match for using ganeti over 3 physical hosts as an "infra" cluster within each PoP. We can also do this with 3x traditional hosts using blended roles to host some of these lightweight things together. Should sort this out in ulsfo first (as it lacks *everything* on that list) to get working configurations, then upgrade esams as well (which has some of this, but not in a ganeti cluster), and then use the same basic configuration for future PoPs as well.

Event Timeline

BBlack raised the priority of this task from to Low.
BBlack updated the task description. (Show Details)
BBlack added projects: acl*sre-team, Traffic.
BBlack subscribed.
BBlack renamed this task from Deploy recdns + ntp @ ulsfo to Deploy infra ganeti cluster @ ulsfo.May 4 2015, 2:53 PM
BBlack updated the task description. (Show Details)
BBlack set Security to None.
BBlack added a subscriber: faidon.
BBlack added a subscriber: akosiaris.

Most of those services look very good candidates for virtualization indeed. A couple of notes:

  • Timekeeping has been known to have a bad history with virtualization. http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf has some pretty good background info into the whole gamut of problems. We could sidestep this by running NTP on the physical boxes alongside ganeti anyway. That being said, NTP is not a critical protocol for our operations (we will survive weeks of our NTP servers not running before clock drift becomes an issue) so it might not make sense at all to have PoP specific NTP servers
  • The apt mirror is the only service that has some resource intensive needs, namely disk space. Right now our mirrors consume 1.1TB of disk space. Which means spending 2.2TB of disk space in total per ganeti (or 4.4TB if we get RAID1) cluster. It also adds possible mirror synchronization issues.

Finally, why 3 physical boxes ? I assume it's because of etcd/zookeeper and quorum, correct ? Am I missing something else ?

I think at least one of the reasons for the 3 hosts idea was that if one underlying ganeti box died, we could still have 2x instances of various types up and running with some redundancy while waiting on hw fixes.

and we need one server with class { 'ganglia_new::monitor::aggregator': per site.

we usually put it on install servers in other sites, so here it would be install4001.ulsfo.wmnet technically that we need

Of course, another option here is that we can just do a blender role that's built for cache pop infrastructure and mixes up a bunch of these things on a few bare hosts, too. Gets tricky for some things, though.

I think at least one of the reasons for the 3 hosts idea was that if one underlying ganeti box died, we could still have 2x instances of various types up and running with some redundancy while waiting on hw fixes.

That's very true of course but the main reason we were talking about 3, if memory serves, were the Zookeeper/Kafka idea, which is best deployed with an odd number of servers to properly establish quorum and avoid split brains. (That said, we haven't really decided whether we will actually deploy ZK/Kafka there, though.)

BBlack renamed this task from Deploy infra ganeti cluster @ ulsfo to Define 3-host infra cluster for traffic pops.Mar 21 2017, 10:57 PM
BBlack updated the task description. (Show Details)
BBlack mentioned this in Unknown Object (Task).Mar 21 2017, 11:42 PM

The tentative and limited plan for now is to deploy 3x misc/infra hosts (meaning all the hosts other than lvs and cp) at each cache site and not use virtualization. We might revisit this at a later date. The basic layout looks like:

  1. bastN00x - bastion+installserver (like bast4001 and such today)
  2. dnsN001 - recdns + authdns + ntp
  3. dnsN002 - recdns + authdns + ntp

The first site to see this new config will be ulsfo, as we install the new misc hosts that have arrived there. There will be some puppet-level work to do on combining the two DNS roles properly. Off the top of my head, the main things to remember are:

  1. authdns currently binds to the any-address, and should instead bind to the explicit authdns IPs, so that it doesn't conflict with recdns (the real authdns IPs are of course on loopback, like the LVS'd recdns IPs are currently).
  2. authdns gets monitored on the underlying hostname by icinga (and we also sometimes use that for manual test queries). The above will break that. Probably the sanest thing here would be to add some alias addresses+hostname for authdns to listen on in addition to the canonical IPs, such as dnsN001-authdns, for icinga and manual checks to use.
  3. We'll need to sort out the mess of puppet/systemd service dependencies between the local authdns+recdns daemons in a way that doesn't cause downtimes on reconfigurations...
  4. Our current recdns package is a stretch backport, and stretch's ntp package also fixes a big ntp configuration issue for us, so we should probably build these new hosts on stretch from the get-go.

^ Remaining work superseded by new plans in the ticket this was closed into.