In https://gerrit.wikimedia.org/r/#/c/424707/ I want to make the wikimedia wwwportal work in beta (T173887), but I don't want to introduce an extra apache config file just for beta to do so. The fact that our beta and prod apache config differ so widely is bad, and I'm not going to add to that. So I'm looking for a way to share the configuration that is currently used in prod with beta.
Unfortunately, there is a great pitfall here: The relevant vhost has *.wikimedia.org as an ServerAlias. We can't just move this one over to the shared config - the previous attempt to do so is documented in detail in an incident documentation.
My proposed solution is to kill the wildcard *.wikimedia.org altogether. To do so, I identified the domains that rely on it (see below) and suggest to make those plain redirect to www.wikimedia.org in redirects.conf.
The impact on this will be that subdomains added to wikimedia.org dns in the future will have to be added to an apache vhost explicitely. If that's not done, they'll just show the 'Domain not configured' page instead of falling back to www.wikimedia.org automatically. However, I think that's a good thing, because it means you'll have to explicitely think about and apply a change to make a website appear for that domain - which seems to be true for all/most of our other domains already. Also, it will reduce the chance of repeating the above incident accidentially to 0.
Looking in the operations/dns repo, there's 202 $foo.wikimedia.org domains that are routed into the main apache cluster (assuming all of these use geoip!text-addrs in our dns template):
eddie@eddie-thinkpad:~/develop/operations/dns/templates (master)$ ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | wc -l 202
Let's test for something that the *.wikimedia.org vhost (that is used right now) redirects, for example $foo.wikimedia.org/ch-portal/ and look what status code each of these 202 domains will reply with for that path:
ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | sort | xargs -I '{}' sh -c 'echo {}; curl -I https://{}.wikimedia.org/ch-portal/ -k 2>/dev/null;' | grep HTTP | sort | uniq -c 53 HTTP/2 301 149 HTTP/2 404
Everything replying with the status code 404 can't be using the *.wikimedia.org vhost, because that vhost redirects this URI, right? So it's safe to go ahead and grep for 'HTTP/2 301'. Let's see where those redirect:
So we're talking about 18 subdomains that do redirect to wikipedia.ch. Let's look which these are:
So 18 URIs are actually using the *.wikimedia.org vhost:
- benefactors
- cache
- comcom
- donate-lb.codfw
- donate-lb.eqiad
- donate-lb.eqsin
- donate-lb.esams
- donate-lb.ulsfo
- langcom
- text-lb
- text-lb.codfw
- text-lb.eqiad
- text-lb.eqsin
- text-lb.esams
- text-lb.ulsfo
- wikimania2019
- wikimania2019.m
- www