Page MenuHomePhabricator

VirtualHost for mod_status breaks debugging Apache/MediaWiki from localhost
Open, NormalPublic

Description

Various pages on Wikitech, document that one can make locally emulate external requests an Apache using the Host-header idiom. This makes sense.

For example, at https://wikitech.wikimedia.org/wiki/Application_servers and https://wikitech.wikimedia.org/wiki/Debugging_in_production#Locally. Typically something like:

mwdebug1001$ curl -H 'Host: en.wikipedia.org' "http://localhost/"
...
HTTP/1.1 301 Moved Permanently
Server: mwdebug1001.eqiad.wmnet
Location: https://en.wikipedia.org/wiki/Main_Page

or

mwdebug1001$ curl -H 'Host: en.wikipedia.org' "http://localhost/w/load.php"
...
HTTP/1.1 200 OK
Server: mwdebug1001.eqiad.wmnet
..
.. This file is the entry point for ResourceLoader ..

However, as of writing, this is not working. Instead, virtually any attempted url yields a 404 Not Found.

404 Not Found
mwdebug1002:~$ curl -v -H 'Host: test.wikipedia.org' "http://localhost/w/load.php"
* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 80 (#0)
> GET /w/load.php HTTP/1.1
> User-Agent: curl/7.38.0
> Accept: */*
> Host: test.wikipedia.org
> 
< HTTP/1.1 404 Not Found
< Date: Mon, 19 Mar 2018 23:43:16 GMT
* Server mwdebug1002.eqiad.wmnet is not blacklisted
< Server: mwdebug1002.eqiad.wmnet
< Content-Length: 327
< Content-Type: text/html; charset=iso-8859-1
< 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /w/load.php was not found on this server.</p>
<p>Additionally, a 404 Not Found
error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>
* Connection #0 to host localhost left intact

At different points over the past few years, this was broken before, each time we found a work around. At one point, I recall, it was important for (forgotten reasons, something about HTTPS) to leave the url unchanged and instead swap the TCP destination via DNS, as follows:

$ curl -v --resolve 'test.wikipedia.org:80:127.0.0.1' "http://test.wikipedia.org/w/load.php"

But this too doesn't work.

At another point, it was important to include -H 'X-Forwarded-Proto: https'. I think that's still the case for some things, but at the Apache level, most things support both now, with Vary.

I've tried many different variations, none work.

  • curl -v -H 'Host: test.wikipedia.org' "http://localhost/w/load.php" (plain, with -6, with -g, with -g6)
  • curl -v -H 'Host: test.wikipedia.org' "http://127.0.0.1/w/load.php"
  • curl -v -H 'Host: test.wikipedia.org' "http://[::1]/w/load.php" (plain, with -6, with -g, with -g6)
  • curl -v --resolve 'test.wikipedia.org:80:127.0.0.1' "http://test.wikipedia.org/w/load.php"
  • curl -v --resolve 'test.wikipedia.org:80:::1' "http://test.wikipedia.org/w/load.php" (plain, with -6, with -g, with -g6)

Eventually, I tried it from a different host to see if that would work. And by my surprise, that worked:

mwdebug1002$ curl -v -H 'Host: test.wikipedia.org' "http://mwdebug1001.eqiad.wmnet/w/load.php"`
..
HTTP/1.1 200 OK
Server: mwdebug1001.eqiad.wmnet
..

It also works from mwdebug1001 itself, and it works when using mwdebug's local 10.x IP address. These all do work:

  • mwdebug1002$ curl -v -H 'Host: test.wikipedia.org' "http://mwdebug1001.eqiad.wmnet/w/load.php"
  • mwdebug1001$ curl -v -H 'Host: test.wikipedia.org' "http://mwdebug1001.eqiad.wmnet/w/load.php"
  • mwdebug1001$ curl -v -H 'Host: test.wikipedia.org' "http://10.64.32.123.eqiad.wmnet/w/load.php"
  • mwdebug1001$ curl -v --resolve 'test.wikipedia.org:80:10.64.32.123' "http://test.wikipedia.org/w/load.php"

The first thing that came to mind at this point is that maybe something is doing the opposite of Require local and denying connections for all production sites from locally initiated connections. However, even if such thing were to exist, there are two things contradicting it:

  1. Locally initiating was still possible when using the local IP.
  2. It responds with our the custom default VirtualHost, not with an error page.

This last point is important. When removing the path component of the url and revealing the document root, shows that it does actually match one of our VirtualHost configurations, just not the one it is supposed to.

mwdebug1002:~$ curl -v -H 'Host: test.wikipedia.org' "http://localhost/"
* Connected to localhost (::1) port 80 (#0)
> Host: test.wikipedia.org
>  [..]
< HTTP/1.1 200 OK [..]
< Server: mwdebug1002.eqiad.wmnet [..]
< 
<!DOCTYPE html>
<html lang=en>
<meta charset="utf-8">
<title>Unconfigured domain</title>
<link rel="shortcut icon" href="//wikimediafoundation.org/favicon.ico">
<style> [..]

This is /srv/mediawiki/docroot/default/index.html as configured by puppet:/modules/mediawiki/files/apache/sites/nonexistent.conf.

So how come it is matching that one but not the main ones?

Event Timeline

Krinkle created this task.Mar 19 2018, 11:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2018, 11:58 PM
Krinkle added a comment.EditedMar 20 2018, 12:04 AM

Further testing shows that while it matches the custom default VirtualHost on mwdebug1001 and mwdebug1002, it behaves differently on a pooled app server (e.g. mw1299). There it responds with the Debian default page:

mw1299:~$ curl -v -H 'Host: test.wikipedia.org' "http://localhost/"
* Connected to ::1 (::1) port 80 (#0) [..]
> Host: www.mediawiki.org
>  [..]
< Server: Apache [..]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <title>Apache2 Debian Default Page: It works</title>

I still don't know why the fallback happens, why that fallback varies between mwdebug and app servers, or why there is no error message. But I did find out what the culprit was: server-status configuration. Since the resolution of T113090 with https://gerrit.wikimedia.org/r/#/c/239998/, a new virtualhost is registered by configuration:

<VirtualHost 127.0.0.1:80>
  ServerName localhost
  ServerAlias 127.0.0.1
  <Location /server-status>
    [..] Require local

Removing this made it all work again.

mwdebug1001$ sudo rm /etc/apache/conf-enabled/50-server-status.conf
mwdebug1001$ sudo /etc/init.d/apache2 restart

It seems Apache is considering the remote IP address more important than the Host header. And it is even expanding 127.0.0.1 to apply to connections for ::1 (IPv6) as well.

I've documented a work-around at https://wikitech.wikimedia.org/wiki/Debugging_in_production#Locally, by using the LAN address rather than the localhost address. E.g.

$ curl -v -H 'Host: test.wikipedia.org' "http://$(hostname -i)/"
Krinkle updated the task description. (Show Details)Mar 20 2018, 12:30 AM
Krinkle added a subscriber: Joe.Mar 20 2018, 4:52 PM
Imarlier moved this task from Inbox to Radar on the Performance-Team board.Mar 26 2018, 8:22 PM
Imarlier edited projects, added Performance-Team (Radar); removed Performance-Team.

Forgot to say: The aforementioned workaround is not actually a workaround (sorry). The hostname-i hack "works" in the sense that it ends up routing to the MediaWiki virtualhost, which is good, but my debugging to work, I need the application itself (MediaWiki) to see that the request comes from the local 127.0.0.1 or ::1 interface to allow additional debug actions (such as dumping stuff to /tmp).

Those actions are restricted with REMOTE_ADDR, and connecting to Apache using the 10.x address (even from the same server), does not make REMOTE_ADDR be the 127 or ::1 address. That makes sense, but it also means I'm still blocked :)

Dzahn added a subscriber: Dzahn.Apr 10 2018, 5:55 PM

I investigated a bit on the part ".. on mwdebug1001 and mwdebug1002, .. behaves differently on a pooled app server (e.g. mw1299)".

I found that a canary appserver like mw1261 and mwdebug1001 have identical apache config, but mw1299 does NOT.

/etc/apache2/apache2.conf is different on these. Among the differences is:

mw1299 does: IncludeOptional conf-enabled/*.conf
mw1261 does: Include conf-enabled/*.conf

Dzahn added a comment.Apr 10 2018, 6:02 PM

The version of apache2.conf that canaries and mwdebug has matches the puppet repo template:

mediawiki/templates/apache/apache2.conf.erb

The version on pooled appserver m1299 is different.

The template above gets installed from the class mediawiki::web which gets included in role/manifests/mediawiki/webserver.pp

mw1299 is special because it's a jobrunner, using role(mediawiki::jobrunner) . That role does not include the mediawiki::webserver unlike the others.

This should explain why mw1299 is different.

Try testing with a pooled regular appserver that is not a jobrunner, like mw1267.eqiad.wmnet (role(mediawikI::appserver)).

RobH triaged this task as Normal priority.May 1 2018, 3:18 PM