Page MenuHomePhabricator

Put lists.wikimedia.org web interface behind LVS
Open, MediumPublic

Description

As discussed in T278495: Figure out plan for mailman IP situation, we should put lists.wikimedia.org's web interface behind LVS. Exim/mail is excluded since we might go a different route for that: T232343#7059925.

Currently, we get a TLS cert from acme-chief and Apache redirects nearly all HTTP traffic over to HTTPS, where we have a bunch of routing and redirects

We probably want to end up with Apache just serving over HTTP, and envoy doing HTTPS in between Apache<-->LVS/caches.

Event Timeline

Legoktm raised the priority of this task from Low to Medium.Nov 3 2021, 3:05 AM

Recent events have made it so that we should probably do this sooner instead of waiting. The one catch is that mail delivery is dependent upon the web server being up and currently, because of issues serving it over localhost, Mailman connects to itself over https://lists.wikimedia.org/ which is not terrible when it just loops back to itself, but we probably don't want to keep doing that if we're going through LVS.

because of issues serving it over localhost, Mailman connects to itself over https://lists.wikimedia.org/ which is not terrible when it just loops back to itself, but we probably don't want to keep doing that if we're going through LVS.

These "issues" are T190111: VirtualHost for mod_status breaks debugging Apache/MediaWiki from localhost (on jobrunners).

Setting up envoy as a tlsproxy should be straightforward. The one thing I'm not sure about is how to have it it talk to Apache over HTTP, since we currently have Apache enforcing the HTTPS redirect, see https://gerrit.wikimedia.org/g/operations/puppet/+/61f20b6b6c2478f782b53fb31ce95756441f8bdc/modules/profile/templates/lists/apache.conf.erb#7

Or should we have envoy talk to Apache over HTTPS during the migration period? Or...

@Joe and I discussed how to do this today. To recap, the goals of this are to work towards T278495: Figure out plan for mailman IP situation (eliminating the special case IP) as well as being behind the caching layer so we can take advantage of its facilities for rate limiting, IP/UA blocking, etc. We are also constrained by that Mailman is not HA and it would be more convenient to have the mailman3 + mailman3-web services on the same host.

Joe said that there's not much value in going behind LVS then, and instead we could just have ATS route lists.wikimedia.org directly to lists1001 (likely Apache would keep doing TLS, but probably needs a different cert?). We'd change DNS to have A/AAAA records point to dyna, while MX would still point directly to lists1001.

@BBlack we'd like your input/feedback on this, especially the DNS parts.

@Legoktm it looks like the easiest approach would be adding lists1001 as a backend server on ATS and set the caching policy to pass. Under this scenario, lists.wikimedia.org TLS certificate should be a private one handled by our PKI rather than an acme-chief/LE one. After that, we should drop the A/AAAA records and just add a DYNA record like this

lists      600 IN DYNA geoip!text-addrs

the usual approach would be adding a CNAME to dyna, but this isn't feasible here cause you still need the MX records, that should point directly to lists1001.wikimedia.org rather than the current lists.wikimedia.org