Page MenuHomePhabricator

Support brotli compression
Open, MediumPublic

Description

Brotli is a relatively-new compression algorithm from Google that's pretty awesome for web stuff. Firefox, Chrome, and Opera now support it and advertise it via Accept-Encoding. There's a C++ shlib and an nginx module already, too. Our options here would be:

  1. The simplest option to implement: We could switch off gzipping in Varnish entirely and move gzipping functionality to nginx first. It would then probably be trivial, to load the nginx brotli module and support both algorithms. I don't know how substantial the CPU hit would be, but this is the simplest option. It also removes all gzip complexity from varnish, which has been a source of past bugs and question-marks. Another big downside is varnish's effective storage size would be diminished due to lack of gzipped storage, at least on text. On the other hand, we already have more storage than we need there, and this doesn't really affect the bulk of cache_upload's objects anyways.
  2. The hard/questionable option: Write a varnish patch or vmod so that it can use brotli in addition to gzip if the client supports it, using the existing varnish gzip support mechanisms. This might be tricky for a number of reasons: the current gzip-support code assumes normalization to one compessed encoding, and also assumes it's a win to store objects in compressed form and hope we don't have to decompress them often as well, since most clients support gzip. We'd have to either double-store objects in both formats, or store in one and decomp->recomp to the other, or always store uncompressed and commonly compress outputs on the fly.
  3. If we think brotli is only supported by a minority of clients - Hybrid preferring gzip? Leave varnish doing gzip as it does now (so we get storage savings and compression-cpu savings in the common case), have nginx force "AE: gzip" just to normalize everything for varnish (and decompress gzip on the fly for non-gzip clients), and then also have nginx decode->recode for brotli on the fly when the client supports it.
  4. If we think brotli is supported by the majority of the clients, we can reverse (3) and mix it with (2): patch varnish to be able to replace gzip-support with brotli support universally (now we get even better storage savings than before), and have nginx on-the-fly decomp->recomp to gzip for older clients.

Probably the first thing to do here is to gauge brotli support level by sampling AE headers before varnish has normalized them to just gzip.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
BBlack triaged this task as Medium priority.Jun 16 2016, 5:13 PM
BBlack added a project: Performance-Team.
BBlack updated the task description. (Show Details)

Other interesting references:

https://datatracker.ietf.org/doc/draft-alakuijala-brotli/ (IETF standard, seems pretty far along in the approval process)
https://blog.cloudflare.com/results-experimenting-brotli/ (Deep analysis by Cloudflare. Results are mixed, but this analysis is dated; the library was improved substantially since).

Semi-related find inside the above - Cloudflare has an optimized variant of zlib that's very promising in terms of faster compression speeds (meaning we could go higher-quality for the same CPU waste). It's just 17-19 commits on top of current zlib master, depending on which variant you use:
https://blog.cloudflare.com/cloudflare-fights-cancer/

BBlack lowered the priority of this task from Medium to Low.Jun 16 2016, 6:11 PM

A very quick check (just a couple of minutes on one cache_text machine) shows about 7% of requests indicate brotli support in Accept-Encoding. Not big enough to care too much yet, but could grow over time.

A very quick check (just a couple of minutes on one cache_text machine) shows about 7% of requests indicate brotli support in Accept-Encoding. Not big enough to care too much yet, but could grow over time.

I think you may have been including vhtcpd PURGE requests in your counts. I'm seeing about 35% support.

[cp1055:~] $ varnishncsa -n frontend -m 'RxRequest:^(?!PURGE)' -F "%{accept-encoding}i" | head -100000 | grep -c br
35471

I think you may have been including vhtcpd PURGE requests in your counts. I'm seeing about 35% support.

That matches caniuse: http://caniuse.com/#feat=brotli

Global 45.81%
Firefox 45+, Chrome 50+, Opera 38+, Chrome for Android 50+

Screen Shot 2016-06-16 at 20.47.43.png (548×1 px, 95 KB)

Screen Shot 2016-06-16 at 20.46.27.png (662×1 px, 93 KB)

This will most likely increase another 7-10% over the next month as users of Opera 37 and Chrome 49 are still in the middle of their auto-upgrade to Opera 38 and Chrome 50.

Yeah ori's right, I didn't filter properly. Interesting!

It might be a good idea to experiment with this locally using our real content, to see what kind of gains we'd be looking at.

SDCH+gzip might be worth looking into as well. There's less support for it (only Chrome and Android browser), but it has the advantage of allowing to transmit a custom dictionary. Meaning that instead of using an all-purpose dictionary like Brotli's static dictionary which was generated with text in 93 languages, we could have a custom dictionary per wiki, generated from the corpus of content we intent to serve. This also could be the subject of local experimentation to see if it's worth pursuing in addition to Brotli.

I saw this drive-by comment on HN that suggest SDCH might be quite the performance gain (I'm guessing he's talking about Facebook?) and better than Brotli.

Interestingly, they compress their SDCH dictionary with Brotli to transmit it, since browsers that support Brotli are a superset of browsers that support SDCH :)

I agree that SDCH has better upsides (for supporting clients), it just also seems like a much larger effort to turn it on and get it tuned, and I have no idea how we'd integrate it with Varnish (again, given that Varnish tries to cache content in compressed form to save cache storage space and also reduce compression codec CPU use in the common cases). Brotli's at least a little more drop-in and simple.

Actually it's probably Linkedin, not Facebook that this guy works for. I pieced it together from his HN history, he often comments on Apache Traffic Server, which Linkedin is known to use: http://www.slideshare.net/thenickberry/reflecting-a-year-after-migrating-to-apache-traffic-server as well as threads about Linkedin.

To get a more up-to-date idea about the percentage of requests we get with AE:br, I've analyzed 30s of GET traffic on cp3033 and was surprised to find zero requests with AE:br. I've then tried to match for non-PURGE, and interestingly during that timeframe we only received AE:br requests for methods other than GET (OPTIONS, POST).

$ timeout --foreground 30 varnishncsa -n frontend -q 'Reqmethod ne "PURGE" and ReqHeader:Accept-Encoding' -F '%m %{Accept-Encoding}i' | sort | uniq -c | sort -rn                                                                                                                                                            
 127364 GET gzip                                                                                                                                              
   2997 POST gzip
   1991 GET identity
    223 HEAD gzip
    151 OPTIONS gzip, deflate, br
    142 POST gzip, deflate, br
     71 POST gzip,deflate
     53 POST identity
      7 POST gzip, deflate
      6 POST br, gzip, deflate
      5 GET *
      2 OPTIONS gzip, deflate
      1 PROPFIND gzip
      1 HEAD identity

during that timeframe we only received AE:br requests for methods other than GET (OPTIONS, POST).

That was due to the fact that varnishncsa only prints the last value of a header if it shows up multiple times in VSM. As we do normalize Accept-Encoding for GET requests, the command in my comment above was not working with Accept-Encoding as received by clients, but rather our own mangled version.

I've tried to instead use varnishlog in raw mode and only print the first occurrence of Accept-Encoding in each transaction. That ended up with very wrong results but I'm gonna paste the effort here anyways as the issue of dealing with multiple VSM entries for a given header in the same transaction might come back in the future. In that case, better to start from past failures than from scratch:

$ timeout --foreground 10 varnishlog -c -n frontend -g raw -q 'ReqHeader:Accept-Encoding' -I ReqHeader:Accept-Encoding |
    awk '!seen[$1] { seen[$1]=1 ; $1=$2=$3=""; print }' |
        awk '/br/ { br++ } END { print br; print NR; print br * 100 / NR  }'                                                                          
418                                                                                                                                                           
39800
1.05025

This shows how in 10 seconds of logging we received 418 requests with AE:br out of 39800, which would be ~1%. That's not how reality looks like (commands executed kinda-concurrently):

$ timeout --foreground 10 varnishncsa -n frontend  -q 'ReqMethod ne "PURGE"  and ReqHeader:Accept-Encoding ~ "br"' | wc -l
29510
$ timeout --foreground 10 varnishncsa -n frontend  -q 'ReqMethod ne "PURGE"' | wc -l
42485

That's ~70% of requests being brotli capable.

And here's an update from third-party stats, which supports @ema's findings.

Global 45%: Firefox 45+, Chrome 50+, Opera 38+, Chrome for Android 50+.

Screen Shot 2016-06-16 at 20.47.43.png (548×1 px, 95 KB)
Screen Shot 2016-06-16 at 20.46.27.png (662×1 px, 93 KB)

At http://caniuse.com/#feat=brotli as of March 2018:

Global 84%: Edge 15+, Firefox 44+, Chrome 50+, Safari 11+, Opera 38+, iOS 11.2+, Android 5.x-6.x WebView, Chrome for Android 50+, UC Browser

Screen Shot 2018-04-09 at 20.17.29.png (563×2 px, 93 KB)
Screen Shot 2018-04-09 at 20.17.43.png (679×2 px, 90 KB)

Notable changes based on quick eye-balling:

  • Microsoft landed support in Edge.
  • Apple landed support in Safari and iOS.
  • Mozilla landed support in Firefox Mobile.
  • Opera and UC Browser have support now (upgraded their Chromium).
  • Decrease in use of IE desktop and Opera Mini (unsupported).
  • Overall shift from desktop to mobile continues.

The tricky part is this: Varnish does our compressing (which is in this case the right place to be doing it), and it compresses hittable things on their way into cache storage from the backend, for two obvious beneficial reasons:

  • Fit more things in cache
  • Only burn CPU compressing a hot object once, for many outputs to users

Even if we had Varnish+brotli code ready to go, it would mean either storing both compressed forms (cache storage space issues!), or picking the more popular of the two (now brotli?) for cache storage and then re-encoding the compression for gzip-only clients on the fly per fetch of the hit object (or some hybrid of the two approaches we where store both forms, but only once they've each been used by a client). We don't have Varnish+brotli code anyways of course. Doing it further out at the edge (e.g. nginx) is possible, but it would be compression-transcoding every response on the fly, which might also be a significant CPU bump.

Re-reading above: probably the better blend of options would be to swap gzip for brotli in Varnish one-for-one (without the whole storing-dual-forms mess) and then have nginx transcode back to gzip for gzip-only clients, and maybe not deploy it until the brotli percentage swings a bit higher than it already is.

For WebP my proposed strategy is to only offer the variant to popular thumbnails (eg. more than X hits on the frontend). The same logic could be used here, where we would generate, store and serve a Brotli variant only for the most trafficked cache objects. Less popular items would stay the way they are now, gzip only. This would avoid the cache cost of doing this for the long tail while reaping the majority of the benefits. And this also allows us to ramp it up and down by changing the popularity threshold, which allows for a smoother transition to using more cache space.

A note about ATS: support for brotli has been added in version 7.1.0. However, libbrotli-dev is not available in jessie. Given the scope of T199720 I'm just gonna drop brotli support in our ATS packages for now. When we'll start talking about using ATS on the frontends we will very likely be using debian > jessie anyways, and we'll get brotli support without having to backport libbrotli-dev.

Re-reading above: probably the better blend of options would be to swap gzip for brotli in Varnish one-for-one (without the whole storing-dual-forms mess) and then have nginx transcode back to gzip for gzip-only clients, and maybe not deploy it until the brotli percentage swings a bit higher than it already is.

It's 90% now (up from 84% when the parent comment was written).

Re-reading above: probably the better blend of options would be to swap gzip for brotli in Varnish one-for-one (without the whole storing-dual-forms mess) and then have nginx transcode back to gzip for gzip-only clients [..]

I think this may be the best strategy to start with, e.g. the spot with the least risk that still has the biggest gain, and with relatively little added complexity. It also means we don't require a Varnish-frontend solution for Brotli per se, it could come from an ATS backend instead - if that's easier.

Krinkle raised the priority of this task from Low to Medium.May 11 2019, 1:06 AM

@ema I assume that ATS frontends as currently deployed support Brotli, right?

@ema I assume that ATS frontends as currently deployed support Brotli, right?

We would need to enable the compress plugin, configure it for using brotli, and test things out. But yes, brotli support is there. Linkedin uses ATS so I assume they're using the compress plugin as well. I'll ask them.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!