Test our production stack's HTTP/2 priority support
Closed, ResolvedPublic

Description

Pat Meenan put this test page together: https://github.com/pmeenan/http2priorities https://twitter.com/patmeenan/status/1065275742483824642

Which we could easily host and test in production, by hosting it on performance.wikimedia.org for example.

Gilles created this task.Thu, Nov 22, 12:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptThu, Nov 22, 12:10 PM
Gilles triaged this task as Normal priority.Thu, Nov 22, 12:10 PM

Change 475310 had a related patch set uploaded (by Gilles; owner: Gilles):
[performance/docroot@master] Add HTTP/2 test page

https://gerrit.wikimedia.org/r/475310

Gilles updated the task description. (Show Details)Thu, Nov 22, 12:30 PM
Gilles updated the task description. (Show Details)

Change 475310 merged by jenkins-bot:
[performance/docroot@master] Add HTTP/2 test page

https://gerrit.wikimedia.org/r/475310

It looks like we're doing fine: https://www.webpagetest.org/result/181122_MY_402f279db0f28ab1957bb10b5d551d61/3/details/#waterfall_view_step1

However, compared to Pat's examples on https://github.com/pmeenan/http2priorities we are interleaving things on the connection a lot more.

And I'm not seeing that amount of interleaving on enwiki either:

https://www.webpagetest.org/result/181122_W4_866bb310d827df8537457c8e46c0649e/1/details/#step1_request

I think the issue might be that performance.wikimedia.org while being behind Varnish, is probably behind a different kind of Varnish configuration than en.wikipedia.org and upload.wikimedia.org

Also, on performance.wikimedia.org everything is served on the same connection, whereas we have 2 on wikipedia.

Unfortunately it seems like the setup of performance.wikimedia.org is too different from Wikipedia to draw conclusions at this point.

There's one thing that's definitely bad in our results, though, which is that since it's interleaving everything, the high priority stuff is sharing bandwidth with low priority things. Some of that is happening in Pat's "good" example as well, though, with the first hidden image competing with the high priority things.

Change 475345 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Add HTTP/2 priorities test to speed tests

https://gerrit.wikimedia.org/r/475345

Peter added a subscriber: Peter.Thu, Nov 22, 7:40 PM
Imarlier moved this task from Inbox to Doing on the Performance-Team board.Mon, Nov 26, 9:07 PM
ema added a subscriber: ema.Tue, Nov 27, 10:04 AM

There's already an ATS cluster we can hit internally. The config in ATS is unified for text and upload.

Emanuele provided a sample request:

curl -sv -H "Host: upload.wikimedia.org" 'http://cp1071.eqiad.wmnet:3129/wikipedia/commons/7/75/Salvator_Rosa_%28Italian%29_-_Allegory_of_Fortune_-_Google_Art_Project.jpg' > /dev/null

It's going to require some juggling to test it on the internal network, but we'll find a way. Worst case scenario we'll hardcode the URLs to point to ATS in the test code.

Change 475345 merged by jenkins-bot:
[operations/mediawiki-config@master] Add HTTP/2 priorities test to speed tests

https://gerrit.wikimedia.org/r/475345

Mentioned in SAL (#wikimedia-operations) [2018-11-27T10:42:00Z] <gilles@deploy1001> Synchronized docroot/wikipedia.org/speed-tests/http2priorities: T210141 HTTP/2 prioritie speed test (duration: 00m 47s)

Change 475992 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp1008: point varnish-fe to ATS host

https://gerrit.wikimedia.org/r/475992

Change 475992 merged by Ema:
[operations/puppet@production] cp1008: point varnish-fe to ATS host

https://gerrit.wikimedia.org/r/475992

Change 476000 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/mediawiki-config@master] Add variant of HTTP/2 priorities test pointing to upload.*

https://gerrit.wikimedia.org/r/476000

Change 476000 merged by jenkins-bot:
[operations/mediawiki-config@master] Add variant of HTTP/2 priorities test pointing to upload.*

https://gerrit.wikimedia.org/r/476000

Mentioned in SAL (#wikimedia-operations) [2018-11-27T12:32:55Z] <gilles@deploy1001> Synchronized docroot/wikipedia.org/speed-tests/http2priorities/upload.wikimedia.org.html: T210141 Add variant of HTTP/2 priorities test pointing to upload (duration: 00m 46s)

Alright, I think this is the closest we can get to running something like the future ATS setup:

https://www.webpagetest.org/result/181127_ZX_5e0adc06e148ab8f217aae7c424afd88/3/details/#waterfall_view_step1

For reference, here's what we get when accessing the enwiki Barack Obama article with the same setup hitting cp1008/ATS:

https://www.webpagetest.org/result/181127_TJ_966b6b2153973e417fe118e8db8f5d82/9/details/#waterfall_view_step1

The test results are ok, with the high priority items getting loaded early. They compete a little with some low prio images that the browser found out sooner about, but that's not the server's fault. I imagine it's the browser's pre-parser kicking in. The high priority font triggered by CSS needs the CSS parsed, which is why it starts slightly after that. Interestingly, the browser seems smart enough to not ask for more low priority images until the high priority stuff is done.

I think the high amount of multiplexing in the test is due to all test resources being fairly large (100kb images). Whereas on a page like the Barack Obama article it's a lot of tiny images. If there's anything that could be improved, it's that behaviour, which presumably comes from nginx. When the high priority font is encountered, for example, it's all that should get pushed through the wire. And yet the server is still slicing in portions of the low priority images the pre-parser requested earlier, alternating the contents of these different streams.

We could attempt to reduce HTTP/2 concurrent streams to a low number: https://nginx.org/en/docs/http/ngx_http_v2_module.html but in the case of that test, it's unclear whether it would just end up pushing the whole low prio images it encountered first before moving on to serving the high prio font. In which case, limiting stream concurrency would be very counter-productive.

Other than that, the http2 module doesn't give us meaningful configuration options that have to do with priorities.

Gilles closed this task as Resolved.Thu, Nov 29, 12:47 PM