Page MenuHomePhabricator

https://login.wikimedia.beta.wmflabs.org/ trapped in an infinite self-redirect
Closed, ResolvedPublic

Description

On beta, https://login.wikimedia.beta.wmflabs.org/ redirect to itself:

HTTP/1.1 301 Moved Permanently
Server: nginx/1.1.19
Date: Fri, 19 Jul 2013 20:08:24 GMT
Content-Type: text/html; charset=iso-8859-1
Content-Length: 249
Connection: keep-alive
Location: https://login.wikimedia.beta.wmflabs.org/
X-Varnish: 51865472 51864739, 1498759040
Via: 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 329
X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend miss (0)

HTTP/1.1 301 Moved Permanently
Server: nginx/1.1.19
Date: Fri, 19 Jul 2013 20:08:24 GMT
Content-Type: text/html; charset=iso-8859-1
Content-Length: 249
Connection: keep-alive
Location: https://login.wikimedia.beta.wmflabs.org/
X-Varnish: 51865472 51864739, 1498759041 1498759040
Via: 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 329
X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (1)

X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (2)
X-Cache: deployment-cache-text1 hit (1), deployment-cache-text1 frontend hit (3)
...

The Apache configuration in /data/project/apache/conf/loginwiki.php has:

RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^/(.*)$ https://login.wikimedia.beta.wmflabs.org/$1 [R=301,L]

On beta the protoproxy is local to the varnish cache (i.e. it is on deployment-cache-text1). I guess it does not properly set the "X-Forwarded-Proto: https" header.


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:51 AM
bzimport set Reference to bz51700.
hashar created this task.Jul 19 2013, 8:11 PM

Doing a curl query to the HTTPS URL

curl -I "https://login.wikimedia.beta.wmflabs.org/?date +%s"

varnishncsa let you dump any request header using the syntax %{X}-i %{X}o where X is the name of the header and 'i' stand for request header and 'o' for a response header.

I got:

On the frontend:

varnishncsa -n frontend -F 'Proto in: %{X-Forwarded-Proto}i out: %{X-Forwarded-Proto}o "%r" %s %b"'
Proto in: https out: - "HEAD http://login.wikimedia.beta.wmflabs.org/?1374484265 HTTP/1.0" 301 0"
Proto in: https out: - "HEAD http://login.wikimedia.beta.wmflabs.org/?1374484267 HTTP/1.0" 301 0"

On the backend:

varnishncsa -F 'Proto in: %{X-Forwarded-Proto}i out: %{X-Forwarded-Proto}o "%r" %s %b"'
Proto in: http out: - "GET http://login.wikimedia.beta.wmflabs.org/?1374484331 HTTP/1.1" 301 260"

Looking at the remote, the backend hits 127.0.0.1 so that would be put back in varnish text again hence a looping.

The requests do it the apaches:

$ tail -f /data/project/logs/apache-access.log |egrep ^login.wikimedia.beta.wmflabs.org
login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:47 +0000] "GET /?1374484729 HTTP/1.1" 301 260
login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:48 +0000] "GET /?1374484730 HTTP/1.1" 301 260
login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:50 +0000] "GET /?1374484732 HTTP/1.1" 301 260
login.wikimedia.beta.wmflabs.org 10.4.1.133 - - [22/Jul/2013:09:18:51 +0000] "GET /?1374484734 HTTP/1.1" 301 260

I have tweaked the Apache configuration to log the Apache IP and the X-Fowarded-Proto field.

/data/project/apache/conf/wmflabs-logging.conf:

-CustomLog /data/project/logs/apache-access.log "%v %h %l %u %t \"%r\" %>s %b"

+CustomLog /data/project/logs/apache-access.log "%A %v %h %l %u %t \"%r\" %>s %b xfp: %{X-Forwarded-Proto}i"

On varnish, to debug the queries made to login.wikimedia.beta.wmflabs.org, use the varnishlog utility and filter by host header:

Doing a curl "https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=date +%s" with:

varnishlog -n frontend -b -m TxHeader:login.wikimedia.beta.wmflabs

13 BackendOpen  b ipv4_127_0_0_1 127.0.0.1 14366 127.0.0.1 3128
13 TxRequest    b GET
13 TxURL        b /wiki/Main_Page?date=1374487021
13 TxProtocol   b HTTP/1.1
13 TxHeader     b Host: login.wikimedia.beta.wmflabs.org
13 TxHeader     b X-Real-IP: 88.X.X.X
13 TxHeader     b User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8x zlib/1.2.5
13 TxHeader     b Accept: */*
13 TxHeader     b X-Forwarded-For: 127.0.0.1
13 TxHeader     b X-Forwarded-Proto: http
13 TxHeader     b X-Varnish: 1498894934
13 TxHeader     b Accept-Encoding: gzip
13 RxProtocol   b HTTP/1.1
13 RxStatus     b 301
13 RxResponse   b Moved Permanently
13 RxHeader     b Server: Apache
13 RxHeader     b Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=1374487021
13 RxHeader     b Content-Type: text/html; charset=iso-8859-1
13 RxHeader     b Content-Length: 279
13 RxHeader     b Accept-Ranges: bytes
13 RxHeader     b Date: Mon, 22 Jul 2013 09:56:59 GMT
13 RxHeader     b X-Varnish: 52212372
13 RxHeader     b Age: 0
13 RxHeader     b Via: 1.1 varnish
13 RxHeader     b Connection: keep-alive
13 RxHeader     b X-Cache: deployment-cache-text1 miss (0)
13 Fetch_Body   b 4(length) cls 0 mklen 1
13 Length       b 279
13 BackendReuse b ipv4_127_0_0_1

Backend:
varnishlog -b -m TxHeader:login.wikimedia.beta.wmflabs

15 TxRequest    b GET
15 TxURL        b /wiki/Main_Page?date=1374486920
15 TxProtocol   b HTTP/1.1
15 TxHeader     b Host: login.wikimedia.beta.wmflabs.org
15 TxHeader     b X-Real-IP: 88.X.X.X
15 TxHeader     b User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8x zlib/1.2.5
15 TxHeader     b Accept: */*
15 TxHeader     b X-Varnish: 1498894871
15 TxHeader     b X-Forwarded-For: 127.0.0.1
15 TxHeader     b X-Varnish: 52212225
15 TxHeader     b Accept-Encoding: gzip
15 RxProtocol   b HTTP/1.1
15 RxStatus     b 301
15 RxResponse   b Moved Permanently
15 RxHeader     b Date: Mon, 22 Jul 2013 09:55:18 GMT
15 RxHeader     b Server: Apache
15 RxHeader     b Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page?date=1374486920
15 RxHeader     b Content-Length: 279
15 RxHeader     b Content-Type: text/html; charset=iso-8859-1
15 Fetch_Body   b 4(length) cls 0 mklen 1
15 Length       b 279
15 BackendReuse b ipv4_10_4_0_187

As seen in the frontend log, nginx set "X-Fowarded-Proto: http" header. It is not included in the request made by the varnish backend.

From the backend vcl, the header is stripped out whenever the client ip is not part of allow_xff or opera_mini access lists which are:

acl allow_xff {

"91.198.174.0"/24;
"208.80.152.0"/22;
"2620:0:860::"/46;
"10.0.0.0"/8;

}

acl opera_mini {
// bunch of opera requests
}

So I guess adding 127.0.0.1 in allow_xff would fix the issue.

Change 75085 had a related patch set uploaded by Hashar:
varnish: backends trust 127.0.0.1 for XFF

https://gerrit.wikimedia.org/r/75085

Change 75085 merged by Mark Bergsma:
varnish: backends trust 127.0.0.1 for XFF

https://gerrit.wikimedia.org/r/75085

The above change unlocked the redirect loop. There are still old copy of the pages cached in which will cause a redirect loop but that is related to vhtcpd not processing purges. That issue is bug 51874.

This is not entirely fixed. When querying the http://login.wikimedia.beta.wmflabs.org/ URL, a redirect loop is cached.

Purge all pages:

echo -e 'http://login.wikimedia.beta.wmflabs.org/\nhttps://login.wikimedia.beta.wmflabs.org/\nMain_Page'|mwscript purgeList.php --wiki=loginwiki

Connecting to the https url yields a Location redirect to the main page:

$ curl -i https://login.wikimedia.beta.wmflabs.org/ |grep Location
Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page

The http URL yields a Location redirect to the https root:

$ curl -i http://login.wikimedia.beta.wmflabs.org/ |grep Location
Location: https://login.wikimedia.beta.wmflabs.org/

But then the https:// is a redirect to itself:

$ curl -i https://login.wikimedia.beta.wmflabs.org/ |grep Location
Location: https://login.wikimedia.beta.wmflabs.org/

This is always reproducable when clearing the cache then querying the http /.

The login.wikimedia.beta.wmflabs.org virtualhost definition has:

RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^/(.*)$ https://login.wikimedia.beta.wmflabs.org/$1 [R=301,L]

Varnish would do a http query which redirect to https:

$ curl -s -i -x 127.0.0.1:80 http://login.wikimedia.beta.wmflabs.org/|grep Location
Location: https://login.wikimedia.beta.wmflabs.org/

If it has set the X-Fowarded-Proto: https we get the main page:

$ curl -s -i -H 'X-Forwarded-Proto: https' -x 127.0.0.1:80 http://login.wikimedia.beta.wmflabs.org/|grep Location
Location: https://login.wikimedia.beta.wmflabs.org/wiki/Main_Page

After talking with mark, varnish consider both queries (with and without X-Forwarded-Proto) as the same cached content. This is because Apache does not send a Vary: X-Forwarded-Proto header. So we need to fix up our Apache configuration.

In production on mw1177:

$ curl -s -i -x 127.0.0.1:80 http://login.wikimedia.org/|grep Location
Location: https://login.wikimedia.org/
$

$ curl -s -i -H 'X-Forwarded-Proto: https' -x 127.0.0.1:80 http://login.wikimedia.org/
...
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
X-Vary-Options: ...X-Forwarded-Proto...
Location: https://login.wikimedia.org/wiki/Main_Page
$

So something is set differently in production.

So finally here is the explanation:

In production the squids do not cache the redirects (we see MISS everywhere) whereas Varnish text does cache them. So in production the http:// URL is always redirected to the https:// one and thus the cache is only hit with http:// URL with X-Forwarded-Proto: https.

On labs, which uses varnish as a text cache, the redirect are cached. The cache is thus hit with http:// url with or without X-Forwarded-Proto: https set and with or without Vary: X-Forwarded-Proto.

When the X-Forwarded-Proto: https is set, the query hit the MediaWiki backend which properly set the Vary header.

When X-Forwarded-Proto is missing, the Apache does not set any Vary header. That needs to be fixed.

Change 75583 had a related patch set uploaded by Hashar:
Fix up Vary headers on 30x redirects from Apache

https://gerrit.wikimedia.org/r/75583

Puppet is broken in labs (bug 51955) so I cant test out mark fix above.

Change 434644 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] text-be: remove code adding X-F-P to Vary

https://gerrit.wikimedia.org/r/434644

Change 434644 abandoned by Ema:
text-be: remove code adding X-F-P to Vary

Reason:
We might want to do this in ATS once T133548, on which the change is stalled, is complete.

https://gerrit.wikimedia.org/r/434644