Define 3-host infra cluster for traffic pops
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	BBlack
	Apr 22 2015, 2:15 PM

Description

In the long term view, we want all of our cache PoPs to host, locally:

Recursive DNS (for use by local machines only)
NTP (ditto, peered with other pops + upstream)
AuthDNS (for future Anycast work, see: T98006
Possibly Prometheus
Possibly kafka brokers + zookeeper
Possibly etcd hosts as well?
Possibly install-server stuff, apt mirrors, and/or webproxy

This seems like a good match for using ganeti over 3 physical hosts as an "infra" cluster within each PoP. We can also do this with 3x traditional hosts using blended roles to host some of these lightweight things together. Should sort this out in ulsfo first (as it lacks *everything* on that list) to get working configurations, then upgrade esams as well (which has some of this, but not in a ganeti cluster), and then use the same basic configuration for future PoPs as well.

Related Objects

Mentioned In: T220669: RPKI Validation
T190090: Offload pings to dedicated server
Mentioned Here: T98006: Anycast AuthDNS

Event Timeline

BBlack created this task.Apr 22 2015, 2:15 PM

BBlack raised the priority of this task from to Low.

BBlack updated the task description. (Show Details)

BBlack added projects: acl*sre-team, Traffic.

BBlack subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 22 2015, 2:15 PM

BBlack renamed this task from Deploy recdns + ntp @ ulsfo to Deploy infra ganeti cluster @ ulsfo.May 4 2015, 2:53 PM

BBlack updated the task description. (Show Details)

BBlack set Security to None.

BBlack added a subscriber: faidon.

BBlack added a subscriber: akosiaris.

faidon added a subscriber: Ottomata.May 4 2015, 2:56 PM

Most of those services look very good candidates for virtualization indeed. A couple of notes:

Timekeeping has been known to have a bad history with virtualization. http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf has some pretty good background info into the whole gamut of problems. We could sidestep this by running NTP on the physical boxes alongside ganeti anyway. That being said, NTP is not a critical protocol for our operations (we will survive weeks of our NTP servers not running before clock drift becomes an issue) so it might not make sense at all to have PoP specific NTP servers
The apt mirror is the only service that has some resource intensive needs, namely disk space. Right now our mirrors consume 1.1TB of disk space. Which means spending 2.2TB of disk space in total per ganeti (or 4.4TB if we get RAID1) cluster. It also adds possible mirror synchronization issues.

Finally, why 3 physical boxes ? I assume it's because of etcd/zookeeper and quorum, correct ? Am I missing something else ?

I think at least one of the reasons for the 3 hosts idea was that if one underlying ganeti box died, we could still have 2x instances of various types up and running with some redundancy while waiting on hw fixes.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJun 27 2015, 12:47 AM

and we need one server with class { 'ganglia_new::monitor::aggregator': per site.

we usually put it on install servers in other sites, so here it would be install4001.ulsfo.wmnet technically that we need

Of course, another option here is that we can just do a blender role that's built for cache pop infrastructure and mixes up a bunch of these things on a few bare hosts, too. Gets tricky for some things, though.

In T96852#1406463, @BBlack wrote:

I think at least one of the reasons for the 3 hosts idea was that if one underlying ganeti box died, we could still have 2x instances of various types up and running with some redundancy while waiting on hw fixes.

That's very true of course but the main reason we were talking about 3, if memory serves, were the Zookeeper/Kafka idea, which is best deployed with an odd number of servers to properly establish quorum and avoid split brains. (That said, we haven't really decided whether we will actually deploy ZK/Kafka there, though.)

Krenair subscribed.Aug 25 2015, 4:39 PM

fgiunchedi merged a task: Restricted Task.Mar 30 2016, 8:50 AM

fgiunchedi added subscribers: RobH, • MoritzMuehlenhoff, fgiunchedi.

BBlack moved this task from Backlog to General on the Traffic board.Oct 4 2016, 12:48 PM

faidon updated the task description. (Show Details)Dec 12 2016, 4:28 PM

BBlack renamed this task from Deploy infra ganeti cluster @ ulsfo to Define 3-host infra cluster for traffic pops.Mar 21 2017, 10:57 PM

BBlack updated the task description. (Show Details)

BBlack merged a task: T82996: ulsfo: add a DNS recursor.

BBlack mentioned this in Unknown Object (Task).Mar 21 2017, 11:42 PM

The tentative and limited plan for now is to deploy 3x misc/infra hosts (meaning all the hosts other than lvs and cp) at each cache site and not use virtualization. We might revisit this at a later date. The basic layout looks like:

bastN00x - bastion+installserver (like bast4001 and such today)
dnsN001 - recdns + authdns + ntp
dnsN002 - recdns + authdns + ntp

The first site to see this new config will be ulsfo, as we install the new misc hosts that have arrived there. There will be some puppet-level work to do on combining the two DNS roles properly. Off the top of my head, the main things to remember are:

authdns currently binds to the any-address, and should instead bind to the explicit authdns IPs, so that it doesn't conflict with recdns (the real authdns IPs are of course on loopback, like the LVS'd recdns IPs are currently).
authdns gets monitored on the underlying hostname by icinga (and we also sometimes use that for manual test queries). The above will break that. Probably the sanest thing here would be to add some alias addresses+hostname for authdns to listen on in addition to the canonical IPs, such as dnsN001-authdns, for icinga and manual checks to use.
We'll need to sort out the mess of puppet/systemd service dependencies between the local authdns+recdns daemons in a way that doesn't cause downtimes on reconfigurations...
Our current recdns package is a stretch backport, and stretch's ntp package also fixes a big ntp configuration issue for us, so we should probably build these new hosts on stretch from the get-go.

elukey subscribed.May 18 2017, 3:49 PM

ayounsi subscribed.Jul 12 2017, 3:29 PM

Krinkle updated the task description. (Show Details)Nov 21 2017, 1:14 AM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:04 PM

ayounsi mentioned this in T190090: Offload pings to dedicated server.Mar 25 2019, 9:47 PM

ayounsi mentioned this in T220669: RPKI Validation.Apr 11 2019, 12:22 AM

^ Remaining work superseded by new plans in the ticket this was closed into.