Page MenuHomePhabricator

Spike: Does HTML minification gain us anything?
Closed, DeclinedPublic

Description

Page HTML size is a big pain point in load time on mobile on a slow connection. e.g. 1MB Barack Obama page [1]

We should minimise the HTML stripping comments and whitespace. e.g https://code.google.com/p/htmlcompressor/
For developers we can always introduce a query string parameter to override this but the majority of our users shouldn't care about nice tidy HTML.

TODO:
For a selection of pages e.g. Barack Obama let's see what impact compression has.

[1] http://www.webpagetest.org/result/150825_9V_1AYZ/

Event Timeline

Jdlrobson raised the priority of this task from to High.
Jdlrobson updated the task description. (Show Details)

I suspect that the gains obtained with something like htmlcompressor will be minimal after being gzipped. Repetitive whitespace gzips really well.

I agree with @Gilles, let's test it out in dev and report findings.

Ideally check out a few articles with templates expanded to get as real as possible.

Keep in mind that whitespace can be significant for rendering depending on the CSS white-space styling (ex: preformatted text, syntaxhighlight etc). It is also significant for editing.

For Parsoid HTML, stripping attributes made a big difference to compressed sizes: T78676

Another idea potentially worth exploring would be sorting attributes alphabetically to improve compression.

Jdlrobson renamed this task from HTML should be minimised to Spike: Does HTML minification gain us anything?.Sep 23 2015, 6:29 PM

Local tests shows this makes no impact. I suspect exploring attribute stripping might be more useful.

Jdlrobson lowered the priority of this task from High to Low.Sep 23 2015, 7:06 PM

@Jdlrobson Out of curiosity, do you still have the numbers for 'no impact'? Sub-kilobyte range for an article like Barack Obama?

@Jdlrobson, something I was wondering about idly is the effect of attribute serialization order on compression ratios. In theory, consistent order could let a compression algorithm encode repeating attribute combinations more efficiently, by finding longer runs of repeated strings. To test this, it might be interesting to tweak Parsoid's XMLSerializer to emit attributes in sorted order. Another variant might be to serialize the id attribute first, as that's very common in Parsoid output.

Edit: Nevermind, mentioned this already earlier in this task. I guess this confirms me as a broken record on this ;)

Krinkle added subscribers: TrevorParscal, Krinkle.

From Front-end-Standards-Group meeting:

  • Per @GWicke and @TrevorParscal, removal of whitespace would have to be conservatively and even then it's an error-prone endeavor.
  • Whitespace is rare to begin with our HTML mostly generated by Html.php and OOUI. Most whitespace is the odd string literal of the skin, which gzip compresses quite well already.
  • Stripping comments would useful. We can audit the few comments we have in the Vector skin and move them from the HTML string to the PHP context.

Compressing "City of London" with https://kangax.github.io/html-minifier/ which is fairly aggressive with the default settings (and would need to be scaled back to avoid the issues GWicke talks about) gives:

383k -> 363k (~5.3%)
gzipped:
77.4k −> 76.3k (~1.4%)

i.e. not much.

@Krinkle Looking at source code of City of London article, there is just the <div id="siteNotice"><!-- CentralNotice --></div> HTML comment and then the Parser Profiling Report comment with quite some size:

<!-- 
NewPP limit report
Parsed by mw1008
Cached time: 20151128083637
Cache expiry: 2592000
Dynamic content: false
CPU time usage: 2.217 seconds
Real time usage: 2.516 seconds
Preprocessor visited node count: 10437/1000000
Preprocessor generated node count: 0/1500000
Post‐expand include size: 428884/2097152 bytes
Template argument size: 113145/2097152 bytes
Highest expansion depth: 25/40
Expensive parser function count: 6/500
Lua time usage: 0.734/10.000 seconds
Lua memory usage: 9.29 MB/50 MB
Number of Wikibase entities loaded: 1-->

<!-- 
Transclusion expansion time report (%,ms,calls,template)
100.00% 1702.132      1 - -total
 28.39%  483.280      1 - Template:Reflist
 24.37%  414.799      1 - Template:Infobox_settlement
 21.17%  360.411      2 - Template:Infobox
 13.49%  229.562     37 - Template:Cite_web
 12.03%  204.777      1 - Template:Navboxes
 10.13%  172.341     11 - Template:Navbox
  5.09%   86.697     17 - Template:Convert
  4.87%   82.891      1 - Template:Weather_box
  4.45%   75.750      3 - Template:Citation_needed
-->

<!-- Saved in parser cache with key enwiki:pcache:idhash:6883-0!*!0!!en!4!* and timestamp 20151128083635 and revision id 690027635
 -->

Opened T120132: Usefully include Parser Profiling Report just on demand? for tackling the latter.

Jdlrobson claimed this task.

It sounds like it doesn't win us much so I'm happy to decline this task satisfied that we've explored it. We can always reopen it if something new comes to light.