Page MenuHomePhabricator
Paste P7699

discussion: elasticsearch multi instance traffic routing

Authored by Gehel on Oct 19 2018, 1:34 PM.
3:47 PM <gehel> unrelated, can I pick your brain about T207195 ?
3:47 PM <stashbot> T207195: Configure LVS endpoints for new elasticsearch clusters -
3:47 PM <gehel> and the related
3:48 PM <bblack> yes. I read a little about it yesterday, but I didn't quite see it all clearly
3:48 PM <gehel> yeah, it's not all that clear in my head yet
3:48 PM <gehel> and the context is spread over a number of tickets
3:48 PM <gehel> for the high level context, we're working on multiple elasticsearch instances on the same server
3:49 PM <bblack> I guess my first question: is this just about HTTPS flows into these split clusters, or also other protocols LVS is handling?
3:49 PM <gehel> the goal is to reduce the number of shards in a single cluster, since this has an impact on cluster wide operations
3:49 PM <bblack> I don't even remember if es has other protocols
3:49 PM <gehel> just HTTPS
3:49 PM <gehel> We expose HTTP as well, but since we're changing the endpoints, we should take the opportunity to drop it completely
3:50 PM <bblack> ok
3:50 PM <gehel> the goal is to have one large cluster for large / high traffic shards, spreading over all nodes
3:50 PM <bblack> so this is search.svc.(eqiad|codfw).wmnet
3:50 PM <bblack> ?
3:50 PM <bblack> (in LVS terms)
3:50 PM <gehel> and 2 small cluster, over half of the ndoes each
3:50 PM <gehel> yes, search.svc
3:51 PM <bblack> and varnish doesn't map into this either I don't think, just internal direct svc->svc traffic
3:51 PM <gehel> yep
3:51 PM <gehel> we're also thinking about taking this opportunity to introduce discovery endpoints, but there are also a couple of loose ends on that side, so maybe better to do it in another step
3:51 PM <bblack> so right now, for the HTTPS traffic, it's all using https://search.svc.(eqiad|codfw).wmnet/ in terms of SNI and Host: headers?
3:52 PM <gehel> yep
3:53 PM <bblack> so, there's a few ways you could do this I guess, at the LVS+backend layer, that seem semi-reasonable:
3:53 PM <bblack> 1) Configure new listeners on new ports on all the backend nodes for some other hostname like search2.svc, and configure that as a separate LVS service and DNS entry and service IP, etc.
3:54 PM <bblack> 2) Re-use the same ports and same IPs and same LVS service, and just create a new hostname search2.svc aliasing the old one, and differentiate in the backends' HTTPS configuration by splitting on SNI (e.g. apache server stanzas for the different hostnames, with different certs?)
3:55 PM <bblack> (but then I suspect you're using puppet certs, I'm not sure if we can even make different certs with different SNIs for the same puppet host?)
3:56 PM <gehel> I'm suspicious of SNI support in client libraries (I've been bitten before)
3:56 PM <bblack> the main functional difference (aside from 2 being easier to configure if it works at all) is that with (1) you'd have multiple service defs at the lvs+etcd level to depool machines in and out of independently (the same set of machines, but two distinct lists)
3:56 PM <bblack> and with (2) they all stay together in terms of shared pooling-state
3:57 PM <bblack> SNI is pretty ancient at this point, I'd be shocked if anything we can still legitimately run at all doesn't support it
3:58 PM <gehel> also, since not all the elasticsearch clusters spread over all nodes, we probably need different LVS endoint anyway
3:58 PM <gehel> yes, I know strange setup
3:58 PM <bblack> but they can share ports, or is it a given that each cluster must have unique listen ports when they share machines?
3:59 PM <gehel> there is an nginx doing ssl termination on the nodes, so we can abstract that via nginx
4:00 PM <gehel> but on the elastic side, we're probably opening a world of pain if we try to have them listen on the same port
4:00 PM <gehel> we can probably play with lo:aliases or something like that
4:00 PM <bblack> for some contextual reference: end-user agents which lack SNI support that bothers to report on are: Android 2.x, IE6-8 on XP, Java6.
4:00 PM <bblack> ok
4:01 PM <gehel> So SNI probably OK, unless proven otherwise
4:01 PM <bblack> yeah but pointless if you don't want shared lists of backends for all clusters
4:01 PM <gehel> yep
4:01 PM <gehel> unless there is some magic that I don't know about at LVS level
4:02 PM <bblack> there's not, because it doesn't see inside protocols for things like SNI or Host: headers
4:02 PM <bblack> it just routes traffic based on IPs and ports in headers
4:02 PM <gehel> that's what I thought, not enough magic :)
4:02 PM <gehel> so we need either different IPs or different ports
4:02 PM <gehel> or both
4:03 PM <bblack> right, currently you have 1 IP for the whole DC, flowing port 9243 through LVS for HTTPS
4:03 PM <gehel> correct
4:03 PM <bblack> if you keps the same listen IP and differentiated solely on port, that would have to be true on the client end as well
4:04 PM <gehel> yep, that's not an issue
4:04 PM <bblack> e.g. a client would have to know to connect to search.svc:9243 for cluster1 and search.svc:9343 for cluster2
4:04 PM <bblack> and you'd get separate node lists to configure and/or depool nodes from independently
4:04 PM <bblack> or you can make it a separate new IP and hostname like search2.svc, keep your same standardized port number.
4:05 PM <bblack> from that POV, it's all basically the same to LVS, either way works. whatever seems more intuitive or cleaner for client config.
4:05 PM <bblack> oh except if we keep the same IP+port, then you're back to SNI at the apache level to tell the difference
4:05 PM <gehel> So if we differentiate on ports we can still have different host list for the same service name but different ports? I'm sure that's no problem on the LVS side, but do you know if there are some assumptions on the puppet side of things?
4:05 PM <bblack> err sorry, that's not true, I got lost
4:06 PM <bblack> let me redo the above, clearer:
4:07 PM <bblack> 1) You could keep your existing search.svc hostname and its IP, and differentiate solely on port number. Two distinct LVS services, with distinct node lists that may happen to overlap, with separate depooling.
4:08 PM <bblack> 2) You could keep your port number constant, and set up a search2.svc hostname + separate IP to differentiate on. Still two distinct LVS services. Your apache listeners now also differentiate on listen-IP instead of listen-Port from (1), and you still have the distinct, overlappable nodelists.
4:08 PM <bblack> (1) seems very slightly simpler
4:09 PM <bblack> (in that you don't have to go make DNS changes and allocate new IPs every time you make new clusters)
4:09 PM <gehel> and find new names for those clusters (you can't imagine the long discussions we've already had just on that topic :)
4:10 PM <bblack> there should be no lvs/puppet/pybal-level issue with that. It's essentially the same pattern we use for e.g. supporting HTTP+HTTPS for public wiki endpoint, or HTTP/HTTPS for your current nginx on search.svc
4:10 PM <bblack> you can think of 9243 and 9343 (or whatever) as being like http+https
4:10 PM <gehel> except the node list is different
4:10 PM <bblack> in terms of how puppet->lvs/pybal->nginx sees the listening/forwarding setup
4:11 PM <bblack> the node list is always different. even for http+https we depool separately
4:11 PM <gehel> ok, so all good (I have not checked the code lately)
4:11 PM <bblack> search-http and search-https are the same
4:11 PM <gehel> Ok, that's a very nice transition for my second question:
4:11 PM <bblack> (they use separate confd pools "elasticsearch" and "elasticsearch-ssl")
4:12 PM <gehel> since we want to have SSL termination on different ports, and that's not supported by our current puppet code, it needs a fix
4:12 PM <gehel> I'm looking for a reviewer and you seem to have written a good chunk of that class
4:13 PM — bblack disavows all knowledge
4:13 PM — gehel can past a `git blame` if needs be
4:14 PM <vgutierrez> lol
4:14 PM <bblack> I think the only thing you have to be careful of, is if you define multiple tlsproxy listeners that differentiate on port instead of ip, you can't use tlsproxy's "redir_port" to auto-redirect port 80.
4:14 PM <bblack> probably, that didn't work for non-default TLS ports even in the singular
4:14 PM <volans> yeah we talked about that yesterday
4:15 PM <volans> apparently redir_port is not used, right gehel ?
4:15 PM <bblack> it's not used anywhere, yet
4:15 PM <gehel> yep, not used anywhere as far as I can grep :)
4:15 PM <bblack> it's planned to be used in tlsproxy's primary use-case on the edge caches though
4:15 PM <bblack> once we get all the non-canonical domains securely redirected from elsewhere, which in turn is dependent on certcentral :P
4:16 PM <bblack> it's been a very long time coming heh
4:16 PM <gehel> but there isn't any sane way to redirect if there are multiple TLS prots for a single HTTP port
4:16 PM <volans> I see two options, either go towards forcing to have only one port if redir is used, or move it to an hash of redir ports
4:16 PM <volans> dest => src
4:16 PM <bblack> right
4:17 PM <gehel> I think (not tested) that the current code would work already if you have a 1:1 mapping of redir_port:tls_port
4:17 PM <volans> the only problem puppet wise is to check consistency as this is a define
4:17 PM <bblack> gehel: it wouldn't, because the template for the redirect doesn't even try to set a new port in the redirect destination URI
4:17 PM <bblack> it juse uses https://$servername$1 or whatever
4:18 PM <bblack> you could fix that, where you plug in a :port variable there if the tls port isn't 443
4:18 PM <gehel> Oh right, so it even only works ifthe TLS port is 443?
4:18 PM <bblack> but then the conflict you still run into is two such localssl's with only a port differential conflict in their port 80 listeners
4:19 PM <bblack> since almost surely nobody will ever actually do any of this in practice
4:19 PM <gehel> unless you have redir_port != 80 (which does not really make any sense)
4:19 PM <wikibugs> netops, Cloud-VPS, Operations, cloud-services-team, ops-eqiad: Rack/cable/configure asw2-b-eqiad switch stack - (ayounsi)
4:19 PM <bblack> probably the simplest approach would be to put a short snippet in the define that makes it break/fail is the tls_port != 443 and redir_port is defined
4:20 PM <gehel> and anyway, how would you do the routing to the right port?
4:20 PM <gehel> yeah, I can add that
4:20 PM <bblack> and then we don't have to mess with the redirect template, and in the multi-port-listeners case one of them has to be non-443 and it would break that way too
4:20 PM <volans> +1 to start simple and just fail if overused
4:23 PM <gehel> ok, change ammended
4:23 PM <bblack> the other pathway is complicated: we'd have to template in the non-443 redirect destination, and also require unique redir_ports if defined (by I guess making a $redir_port_str that says "undefined" if not defined, and putting that in the notify hack too)
4:24 PM <bblack> or I guess, actually in a separate notify hack, or something, since that one only covers the default_server
4:24 PM <bblack> I don't know
4:24 PM <gehel> and we don't even know of a use case that would exercise that for real
4:25 PM <bblack> someday port 80 everywhere will just return ECONNREFUSED
4:26 PM <bblack> but more-seriously: are you merging that today? (the tlsproxy patch)
4:26 PM <bblack> it's super trivial, but any change to tlsproxy carries Risks like "omg you broke nginx on all the public terminators"
4:27 PM <vgutierrez> nice jedi mental trick on gehel, bblack
4:27 PM <bblack> at least pcc-check on some random cpNNNN hosts probably, and maybe have someone around to do a quick puppet run post-merge on a cache to validate it went fine.
4:27 PM <bblack> these are not the redir_ports you are looking for
4:28 PM <gehel> I'll make sure there is someone around when merging!