Page MenuHomePhabricator

Remove wildcard vhost for *.wikimedia.org
Closed, DeclinedPublic

Description

In https://gerrit.wikimedia.org/r/#/c/424707/ I want to make the wikimedia wwwportal work in beta (T173887), but I don't want to introduce an extra apache config file just for beta to do so. The fact that our beta and prod apache config differ so widely is bad, and I'm not going to add to that. So I'm looking for a way to share the configuration that is currently used in prod with beta.

Unfortunately, there is a great pitfall here: The relevant vhost has *.wikimedia.org as an ServerAlias. We can't just move this one over to the shared config - the previous attempt to do so is documented in detail in an incident documentation.

My proposed solution is to kill the wildcard *.wikimedia.org altogether. To do so, I identified the domains that rely on it (see below) and suggest to make those plain redirect to www.wikimedia.org in redirects.conf.

The impact on this will be that subdomains added to wikimedia.org dns in the future will have to be added to an apache vhost explicitely. If that's not done, they'll just show the 'Domain not configured' page instead of falling back to www.wikimedia.org automatically. However, I think that's a good thing, because it means you'll have to explicitely think about and apply a change to make a website appear for that domain - which seems to be true for all/most of our other domains already. Also, it will reduce the chance of repeating the above incident accidentially to 0.


Looking in the operations/dns repo, there's 202 $foo.wikimedia.org domains that are routed into the main apache cluster (assuming all of these use geoip!text-addrs in our dns template):

eddie@eddie-thinkpad:~/develop/operations/dns/templates (master)$ ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | wc -l
202

Let's test for something that the *.wikimedia.org vhost (that is used right now) redirects, for example $foo.wikimedia.org/ch-portal/ and look what status code each of these 202 domains will reply with for that path:

ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | sort | xargs -I '{}' sh -c 'echo {}; curl -I https://{}.wikimedia.org/ch-portal/ -k 2>/dev/null;' | grep HTTP | sort | uniq -c
     53 HTTP/2 301
    149 HTTP/2 404

Everything replying with the status code 404 can't be using the *.wikimedia.org vhost, because that vhost redirects this URI, right? So it's safe to go ahead and grep for 'HTTP/2 301'. Let's see where those redirect:

1eddie@eddie-thinkpad:~/develop/operations/dns/templates (master)$ ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | sort | xargs -I '{}' sh -c 'echo {}; curl -I https://{}.wikimedia.org/ch-portal/ -k 2>/dev/null;' | grep 'HTTP/2 301' -B1 -A4 | grep 'location' | sort | uniq -c
2 2 location: https://affcom.wikimedia.org/ch-portal/
3 1 location: https://blog.wikimedia.org/ch-portal/
4 1 location: https://doc.wikimedia.org/
5 1 location: https://dumps.wikimedia.org/ch-portal/
6 2 location: https://ee.wikimedia.org/ch-portal/
7 1 location: https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol
8 1 location: https://meta.wikimedia.org/wiki/%EC%9C%84%ED%82%A4%EB%AF%B8%EB%94%94%EC%96%B4_%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD
9 1 location: https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017
10 1 location: https://meta.wikimedia.org/wiki/Wikipedia_to_the_Moon
11 2 location: https://nostalgia.wikipedia.org/ch-portal/
12 1 location: https://outreach.wikimedia.org/wiki/Bookshelf/ch-portal/
13 1 location: https://outreach.wikimedia.org/wiki/Special:MyLanguage/Education
14 1 location: https://phabricator.wikimedia.org/diffusion/
15 1 location: https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:WU
16 2 location: https://static-bugzilla.wikimedia.org/ch-portal/
17 1 location: https://store.wikimedia.org/
18 1 location: https://wikimania2018.wikimedia.org/ch-portal/
19 2 location: https://wikimediafoundation.org/wiki/Job_openings/ch-portal/
20 1 location: https://wikimedia.org.uk/ch-portal/
21 2 location: https://wikitech.wikimedia.org/
22 1 location: https://www.mediawiki.org/wiki/API:Data_and_developer_hub
23 1 location: https://www.wikimedia.ch/
24 1 location: https://www.wikimedia.org/ch-portal/
25 1 location: http://wiki.media.hu/ch-portal/
26 1 location: http://wikimedia.org.ve/ch-portal/
27 1 location: http://wikimediapakistan.org/
28 18 location: http://wikipedia.ch/
29 2 location: http://www.wikimedia.cz/
30 1 location: http://www.wikimedia.it/ch-portal/

So we're talking about 18 subdomains that do redirect to wikipedia.ch. Let's look which these are:

1eddie@eddie-thinkpad:~/develop/operations/dns/templates (master)$ ack text-addrs wikimedia.org | awk '{if(NR>1)print $1}' | sort | xargs -I '{}' sh -c 'echo {}; curl -I https://{}.wikimedia.org/ch-portal/ -k 2>/dev/null;' | grep 'HTTP/2 301' -B1 -A4 | grep 'location: http://wikipedia.ch/' -B5
2benefactors
3HTTP/2 301
4date: Sat, 14 Apr 2018 12:56:13 GMT
5content-type: text/html; charset=iso-8859-1
6content-length: 228
7location: http://wikipedia.ch/
8--
9cache
10HTTP/2 301
11date: Sat, 14 Apr 2018 12:56:15 GMT
12content-type: text/html; charset=iso-8859-1
13content-length: 228
14location: http://wikipedia.ch/
15--
16comcom
17HTTP/2 301
18date: Sat, 14 Apr 2018 12:56:19 GMT
19content-type: text/html; charset=iso-8859-1
20content-length: 228
21location: http://wikipedia.ch/
22--
23donate-lb.codfw
24HTTP/2 301
25date: Sat, 14 Apr 2018 12:56:22 GMT
26content-type: text/html; charset=iso-8859-1
27content-length: 228
28location: http://wikipedia.ch/
29--
30donate-lb.eqiad
31HTTP/2 301
32date: Sat, 14 Apr 2018 12:56:23 GMT
33content-type: text/html; charset=iso-8859-1
34content-length: 228
35location: http://wikipedia.ch/
36--
37donate-lb.eqsin
38HTTP/2 301
39date: Sat, 14 Apr 2018 12:56:24 GMT
40content-type: text/html; charset=iso-8859-1
41content-length: 228
42location: http://wikipedia.ch/
43--
44donate-lb.esams
45HTTP/2 301
46date: Sat, 14 Apr 2018 12:56:24 GMT
47content-type: text/html; charset=iso-8859-1
48content-length: 228
49location: http://wikipedia.ch/
50--
51donate-lb.ulsfo
52HTTP/2 301
53date: Sat, 14 Apr 2018 12:56:25 GMT
54content-type: text/html; charset=iso-8859-1
55content-length: 228
56location: http://wikipedia.ch/
57--
58langcom
59HTTP/2 301
60date: Sat, 14 Apr 2018 12:56:33 GMT
61content-type: text/html; charset=iso-8859-1
62content-length: 228
63location: http://wikipedia.ch/
64--
65text-lb
66HTTP/2 301
67date: Sat, 14 Apr 2018 12:56:50 GMT
68content-type: text/html; charset=iso-8859-1
69content-length: 228
70location: http://wikipedia.ch/
71--
72text-lb.codfw
73HTTP/2 301
74date: Sat, 14 Apr 2018 12:56:50 GMT
75content-type: text/html; charset=iso-8859-1
76content-length: 228
77location: http://wikipedia.ch/
78--
79text-lb.eqiad
80HTTP/2 301
81date: Sat, 14 Apr 2018 12:56:51 GMT
82content-type: text/html; charset=iso-8859-1
83content-length: 228
84location: http://wikipedia.ch/
85--
86text-lb.eqsin
87HTTP/2 301
88date: Sat, 14 Apr 2018 12:56:52 GMT
89content-type: text/html; charset=iso-8859-1
90content-length: 228
91location: http://wikipedia.ch/
92--
93text-lb.esams
94HTTP/2 301
95date: Sat, 14 Apr 2018 12:56:52 GMT
96content-type: text/html; charset=iso-8859-1
97content-length: 228
98location: http://wikipedia.ch/
99--
100text-lb.ulsfo
101HTTP/2 301
102date: Sat, 14 Apr 2018 12:56:53 GMT
103content-type: text/html; charset=iso-8859-1
104content-length: 228
105location: http://wikipedia.ch/
106--
107wikimania2019
108HTTP/2 301
109date: Sat, 14 Apr 2018 12:57:04 GMT
110content-type: text/html; charset=iso-8859-1
111content-length: 228
112location: http://wikipedia.ch/
113--
114wikimania2019.m
115HTTP/2 301
116date: Sat, 14 Apr 2018 12:57:04 GMT
117content-type: text/html; charset=iso-8859-1
118content-length: 228
119location: http://wikipedia.ch/
120--
121www
122HTTP/2 301
123date: Sat, 14 Apr 2018 12:57:05 GMT
124content-type: text/html; charset=iso-8859-1
125content-length: 228
126location: http://wikipedia.ch/


So 18 URIs are actually using the *.wikimedia.org vhost:

  • benefactors
  • cache
  • comcom
  • donate-lb.codfw
  • donate-lb.eqiad
  • donate-lb.eqsin
  • donate-lb.esams
  • donate-lb.ulsfo
  • langcom
  • text-lb
  • text-lb.codfw
  • text-lb.eqiad
  • text-lb.eqsin
  • text-lb.esams
  • text-lb.ulsfo
  • wikimania2019
  • wikimania2019.m
  • www

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 424707 had a related patch set uploaded (by EddieGP; owner: EddieGP):
[operations/puppet@production] mediawiki: Move www.wikimedia.org to wwwportals.conf

https://gerrit.wikimedia.org/r/424707

Assigning to joe - it seems you're the one most comfortable (or only one comfortable?) on apache changes. Also per the previous -2 on the patch, so it's blocked on you anyway.

FWIW on my end, the following hostnames are definitely non-functional:

text-lb
text-lb.codfw
text-lb.eqiad
text-lb.eqsin
text-lb.esams
text-lb.ulsfo

I would imagine the donate-lb entries are similar, but check with @Jgreen maybe?

Also, cache looks suspicious. It's been there since before git history began, but seems to have no wiki or other useful content AFAICS. It's probably similar in nature to text-lb but has long been ignored and fallen out of use?

For all of these cases, I don't think you need to even define an apache-level vhost. They're at best for technical debugging/documentation use, and not expected to provide anything other than an error or generic redirect/landing if accessed as an HTTP-level hostname. (Additionally, we can never actually cover all such cases anyways. UAs can and do make up fake/random hostnames to throw at us, new debugging hostnames in our DNS can appear at random, etc).

If we remove all of these things, your list shortens considerably to:

benefactors
comcom
langcom
wikimania2019
wikimania2019.m
www

FWIW on my end, the following hostnames are definitely non-functional:

text-lb
text-lb.codfw
text-lb.eqiad
text-lb.eqsin
text-lb.esams
text-lb.ulsfo

I would imagine the donate-lb entries are similar, but check with @Jgreen maybe?

Also, cache looks suspicious. It's been there since before git history began, but seems to have no wiki or other useful content AFAICS. It's probably similar in nature to text-lb but has long been ignored and fallen out of use?

For all of these cases, I don't think you need to even define an apache-level vhost. They're at best for technical debugging/documentation use, and not expected to provide anything other than an error or generic redirect/landing if accessed as an HTTP-level hostname. (Additionally, we can never actually cover all such cases anyways. UAs can and do make up fake/random hostnames to throw at us, new debugging hostnames in our DNS can appear at random, etc).

If we remove all of these things, your list shortens considerably to:

benefactors
comcom
langcom
wikimania2019
wikimania2019.m
www

My guess is that the donate-lb.* records was part of a deprecated CNAME-pointed-at-colo-specific-A-record scheme for donate.wikimedia.org. I see no reason for the donate-lb.* records to remain as long as donate.wm.o resolves to the right IP.

@Jgreen yeah if you don't have any special purpose for them, then they're basically the same as the text-lb ones (we like having those in DNS so we don't have to remember IPs when doing certain kinds of manual debugging against specific sites, etc, but they're not functionally-used in any public way).

Right, I already wondered whether we need them or they can be removed. I pushed that idea back because I don't want to mix it with the commit getting rid of the wildcard vhost (to have smaller steps, one removing the wildcard vhost and replacing it with redirects, the other to remove the unwanted redirects). I'll upload another patch in the correct relationship when I find some time.

Joe triaged this task as Low priority.Jun 20 2018, 7:23 AM

Change 424707 abandoned by EddieGP:
mediawiki: Move www.wikimedia.org to wwwportals.conf

Reason:
Nobody interested in this.

https://gerrit.wikimedia.org/r/424707