We're missing SPDY coalesce for upload.wm.o for images ref'd in projects' page outputs, but that's a trickier problem and we're not ready to move on any solution there. Arguably (a) it can wait and it's not that critical and (b) solving it too early hurts perf for non-H2 clients, too. Also, there are a lot of other questions about whether this is even a good idea and/or reasonably-implementable.
|Stalled||None||T116132 Consider allowing H2 coalesce for upload.wikimedia.org for images used in wiki articles|
|Resolved||BBlack||T124482 Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two|
|Resolved||BBlack||T109286 Merge mobile cache into text cache|
|Resolved||ori||T120151 Improve handling of mobile variants in Varnish|
|Resolved||Ottomata||T122650 Disable legacy tsv mobile, zero and 5xx-mobile jobs|
|Resolved||BBlack||T124165 Fix mobile purging|
|Resolved||BBlack||T124166 Fix varnish handling of mobile hostname rewriting|
|Resolved||BBlack||T96848 Support HTTP/2|
|Resolved||BBlack||T96850 Test then switch to openssl 1.0.2 + nginx 1.9.2|
|Resolved||BBlack||T118892 Update CP cookie VCL once HTTP/2 support lands|
- Mentioned In
- T92298: Investigate our mitigation strategy for HTTPS response length attacks
T124482: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two
T124390: [GOAL] Load images with care
T94896: Optimize prod's resource domains for SPDY/HTTP2
- Mentioned Here
- T96848: Support HTTP/2
T109286: Merge mobile cache into text cache
T94896: Optimize prod's resource domains for SPDY/HTTP2
There are a few concerns here which is why this is kind of "back burner" for now, but on the longer-term radar:
- Even in the desktop case, probably (need to confirm this?) having the separate hostnames (e.g. en.wikipedia.org + upload.wikimedia.org) resolve to the same IP (per-DC) which uses the same shared cert could hurt perf for non- SPDY/HTTP2 clients, as they might hit the non-SPDY browsers' limit on connection parallelism to a given IP. We probably want to reach a certain point in SPDY/HTTP2 adoption before even going down this road. According to https://grafana.wikimedia.org/dashboard/db/client-connections we're averaging about 66% SPDY adoption presently. We haven't yet enabled HTTP/2 (tracked @ T96848) and we've got some upstream issues with nginx dropping SPDY support in the same patch that adds HTTP/2 currently...
- For mobile in particular, we have Zero-rating issues to look at. There's an earlier (but related) issue in T109286 about whether we (eventually) merge the mobile + desktop text IPs to coalesce on the redirects between the two. It's probable/likely that we'll eventually do that, but needs more investigation first. Separately, even if mobile and desktop text IPs merge and we desired to coalesce text+upload connections (this ticket), we might have to avoid coalescing them specifically for mobile so that carriers can still choose to only whitelist by-IP for text and not multimedia content. If that ends up being a sticking point, we may have to look into a scheme such as using upload.m.wikimedia.org for the image links in the mobile version of the site, so that it can remain un-coalesced while the desktop site coalesces IPs with the primary upload.wikimedia.org.
On the Zero issues: the latest update from the Zero team is they still have exactly one carrier that cares about the multimedia-vs-text IP range differences for whitelisting. Apparently it's a very small carrier, too. I'm not pushing them on this yet, though, as even with the Zero side resolved we're not sure we're ready for this on a number of other levels:
- Whether we don't want to take the perf hit (if there is one) on HTTP/1 clients
- Whether this actually is a net gain even for HTTP/2 clients
- Whether we're ok with the effects at the LVS layer, where we currently have text and upload traffic on separate LVS clusters for decent reasons
Also should note (I thought it was mentioned earlier, but apparently not): we're not even sure how we'd structure this at the nginx/varnish layer. LVS can't see the request hostname, so it would have to be somwhere inbetween LVS and varnish caching (e.g. nginx?) where we split the traffic into the cache cluster pools, or come up with some other solution. Having nginx do the split would imply a lot more inter-cache network traffic within each local DC than we have today...
Now that HTTP/2 is a thing and Zero won't be anymore, could we put connection coalescing for upload.wikimedia.org back on the table?
FYI there is now a draft for "secondary certificates" which would allow achieving connection coalescing when the same IP is used, without having to share the same cert.
Getting rid of the connection latency for images would be a pretty big deal, IMHO.
We actually do use the same cert for both, so we don't need the secondary certs bit.
Remaining basic issues/concerns:
- HTTP/2 adoption in terms of overall public request stats is still around ~80%. If we put images on the same IP, legacy HTTP/1 UAs may suffer due to UA limits on parallel connections to a given IP? However, I think this will only get better (in terms of real UA mix) as the TLS1.0 deadline approaches in ~4 months and causes more abandonment of legacy UAs. Also, the 80% figure doesn't try to split off requests from bots and other artificial sources...
- In the HTTP/2 world, I don't know that it's necessarily true that more coalescing is always a win, especially for networks with above-average loss and/or latency. This is something we could test for, though, by simulating real mobile UAs on realistic mobile networks. See e.g. https://www.twilio.com/blog/2017/10/http2-issues.html and some similar papers / postings about some of the downsides of reducing the TCP connection count on imperfect networks.
- We'll probably need to use HTTP/2 priority/push better, to ensure that below-fold heavy image content doesn't slow down the transfer of content necessary for the initial render. nginx (current h/2 terminator) and ATS (probable future one) both have some level of support for push/prio stuff, but we'll need to put some thought and testing into using it well.
Those sorts of questions aside, the big issue within our infra is whether/how we'd be able to handle combining the text and upload cache layers at the IP-level entrypoint, as mentioned earlier in this ticket. LVS can't split the traffic. In our current stack the nginx layer could split the traffic, but it would greatly multiply the amount of local network traffic within the cache clusters.
Splitting after the first layer of caches avoids the traffic increase, but this implies we make first cache layer shared between text/upload. This could reduce hitrates (memory for the first layer doesn't scale even as we add machines, behind a single IP), but also the tuning of the two cases is fundamentally different. Upload uses advanced tricks to tune cache admission based on knowing the size of every response object and some information about their distribution, whereas with most of our text traffic the applayer doesn't even provide us a content-length to use for such purposes.
All of these latter details will get put through a blender as we re-architect the caching stuff next FY, though. I think we should probably keep this case in mind as a design goal (that the cache infra's design shouldn't preclude efficiently presenting a shared text/upload IP to the public), and look at potentially supporting it afterwards. Concurrent with the cache-layer rework ongoing next FY we can wait out the trailing Zero partners and research the above (1-3) better to shore up the case for it.
legacy HTTP/1 UAs may suffer due to UA limits
I believe that the connection limit UAs have for HTTP/1 are per-domain, not per-IP, we should be fine on that front.
I don't know that it's necessarily true that more coalescing is always a win
I've also seen reports like that, but that doesn't replace trying it out for ourselves. The hard work is making this change possible, I imagine that reverting to two different IPs should be fairly straightforward if it doesn't work out in practice, right? I don't think that those tests simulating traffic really prove a point about average real-world conditions anyway. It's interesting to know that there are cases where HTTP/2 is worse, but that doesn't tell us how frequent those scenarios really are for users.
If moving to a single connection has a performance impact, it will be very visible with some of the crude RUM metrics like the page onload event that will capture the fact that images complete loading faster than before.
We'll probably need to use HTTP/2 priority/push better
Indeed, it would be nice to have control over priority, and how content is intertwined on the connection when served. Push is useless until cache digests are supported. I don't see this as a requirement to trying out the single connection, though, as we're not doing anything smart or custom in that respect at the moment. Bar any really stupid behavior like having all the images sent over the wire before the html, but that shouldn't happen with default priorities.
All the perf tradeoffs and relatively-trivial work aside, the major blocker we still face here is the likely problems created by either of the simplest methods of handling this at the tls / varnish-fe layers:
- Merge the TLS termination and frontend cache pools - probably has net benefits in several key areas (connection volume handling and traffic load-spreading, etc) - but it would cause the upload and text datasets to compete for the limited frontend cache space, and this might not go well, unless we explicitly split the space to ensure each gets a fair share...
- Leave the frontend cache pools separate, but merge the TLS layer and have the TLS terminator to re-route traffic to the "other" cluster on a per-request basis - this could cause some very sharp increases in local inter-cache traffic, and in total inbound traffic to the TLS terminators, and we need a way to hash it reasonably-correctly. We could do some investigation to get a loose idea how bad the impact would be.
We could start by gathering some updated metrics. What % of reqs (or conns?) are H/2 these days? What would the impact be of pushing most of cache_upload's traffic through cache_text TLS terminators in various DCs? Can we simulate or test what happens when we naively share a frontend cache pool between normal text and upload traffic?