Page MenuHomePhabricator

Measured impact of SVG optimizations
Closed, ResolvedPublic

Description

In course of rounding up the task T178867: [EPIC] Unify and optimize SVG markup across Foundation products and adding another viewpoint of the optimization impact beyond

  • simplifying designers and developers work by
    • unifiying SVG markup across products (having the same go-to code for copying and pasting into new products),
    • establishing an SVG coding convention and
    • providing CI tools to ensure automated feedback/process for optimized code (still ongoing)

we also would like to gather byte over-the-wire saving feedback.

Example of prev.svg of MMV before and after the change:
Before: compression ratio: 232 %, original size: 2080 bytes, gzipped result size: 898 bytes.
After: compression ratio: 204 %, original size: 1080 bytes, gzipped result size: 530 bytes or a 41% reduction.

Possibly interesting numbers would be:

  • total savings down the wire of SVGs before/after in our webrequests
  • exemplified impact in article namespace on an average mobile page

Deadline:
Monday, 8 January would be exceptional, but unrealistic. We need it set by Thursday, 11 January in order to make it in the official slides

Event Timeline

Volker_E triaged this task as Medium priority.Jan 5 2018, 12:23 AM
Volker_E created this task.

To recap from my earlier IRC conversation with Volker:

There is a straightforward way to approach the first question (estimate the overall amount of data saved), by looking at the daily time series of bytes transferred for SVG webrequests. Our available traffic data should capture requests to these files, and countains a response_size field which could be summed up to total bytes transferred per day. With some luck the time series will show a change pattern that is clear enough to make conclusions. (The most thorough method for estimating the savings would be an A/B test, but I assume that is out of the question here.) What was the exact timing and scope of the deployment?

Unfortunately it is unclear in general how reliable the data from this response_size field is. (This has come up in several different contexts recently. E.g. in case of the web team's current work on PDF downloads, where we saw a relatively small PDF file - 306KB - showing up with different response sizes in the logs. Or in case of the WP0 piracy data analysis, for large files - video and audio.) That is to say that we would also need to do some experimental vetting of this data for a few example SVGs.
I'm partially OOO next week and have some some competing priorities, so let's see how far we can get. CCing my fellow Readers data analysts Chelsy and Mikhail in case they are able to jump in.

@Tbayer There's something which should have been pointed out in my original description, most of the optimizations are baked into SVG delivery as data URIs via ResourceLoader in CSS.
As example, this article has 19 SVGs loaded within the CSS and only the Wikipedia wordmark as SVG in HTML.

image.png (534×1 px, 185 KB)

If at all, those numbers are the ones we should look out for.
MobileFrontend patch got merged 28 August, several important RL patches were deployed 19 September.

So these are loaded as part of the CSS files? Then it might be a bit harder to assess this change from the webrequest data. Or do we know that on mobile there's basically the same set of UI SVGs loaded on every pageview? Then perhaps we could just compare the old and new CSS directly, and multiply the difference in bytes with the number of pageviews?

Yeah, that's correct.
As anonymous user it's currently 17 icons on an article as data URIs, with the wordmark it's 18 SVG files.
I will make an approximating calculation. Still it be interesting if we see a dip in file size over the wire on a specific article apart from that.

@Tbayer Would you put in a mulitplier for whatever sum I'll come up with in those 18 SVGs on mobile article (uncached, per month).
Another idea was to compare all VisualEditor icons, so a multiplier for only edit pages with VE seems appropriate.

I am happy retrieve the number of mobile web pageviews - not sure about how to isolate uncached views though...

Wait, why should we be interested in cached vs. uncached requests - the number of bytes transferred is the same, right? (assuming we are talking about server-side caching, not browser caching)

https://en.m.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.cite.styles%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cskins.minerva.base.reset%2Cstyles%7Cskins.minerva.content.styles%7Cskins.minerva.icons.images%7Cskins.minerva.tablet.styles&only=styles&skin=minerva

This has 8 icons included, which I will take as base for exemplified optimization result.
Another 6 icons are dynamically loaded for the menu, which will be left out for now.

StatusSizeSize gzipped
before Op < 15 -o-linear-gradient hack removal40909 bytes9475 bytes
before SVGO & manual optimization39461 bytes9412 bytes
before XML declaration removal36074 bytes8961 bytes
before quotes mangling35642 bytes8916 bytes
Final with quotes mangling (not yet merged)35322 bytes8908 bytes

9475 - 8916 = 559 or 0.54 KB of gzipped reduction with only 8 critical-rendering path icons.

So looking just at the above CSS URL for en.m.wikipedia, it received over 2 billion requests during one recent hour, corresponding to over 1 gigabyte traffic saved during that hour.

Because of the strong daily and weekly variation of our traffic, this shouldn't be extrapolated yet - I'm running the same query for a longer timespan to do that, will post the result here once it has completed.

This will still only cover enWP though. If someone can describe the format URL of the analogous Minerva CSS across all wikis (in a form suitable to be captured in a regular expression), we could estimate the savings overall.

SELECT SUM(1) AS all_css_reqs,
SUM(IF(uri_query = '?debug=false&lang=en&modules=ext.cite.styles%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cskins.minerva.base.reset%2Cstyles%7Cskins.minerva.content.styles%7Cskins.minerva.icons.images%7Cskins.minerva.tablet.styles&only=styles&skin=minerva', 1, 0)) AS enwp_minerva_css_reqs  
FROM wmf.webrequest 
WHERE year = 2018 AND month = 1 AND day = 12 AND hour = 0
AND content_type LIKE 'text/css%';

all_css_reqs	enwp_minerva_css_reqs
18300245	2266163
1 row selected (64.205 seconds)

559 bytes * 2266163 = 1.266... gigabytes

Based on the query result for one week (below), we can now say something "On average, this change saves our mobile readers around 30 gigabytes of traffic per day on the English Wikipedia alone".

SELECT SUM(1) AS all_css_reqs,
SUM(IF(uri_query = '?debug=false&lang=en&modules=ext.cite.styles%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cskins.minerva.base.reset%2Cstyles%7Cskins.minerva.content.styles%7Cskins.minerva.icons.images%7Cskins.minerva.tablet.styles&only=styles&skin=minerva', 1, 0)) AS enwp_minerva_css_reqs
FROM wmf.webrequest
WHERE year = 2018 AND month = 1 AND day >= 8 AND day <=14
AND content_type LIKE 'text/css%'

all_css_reqs    enwp_minerva_css_reqs
3922193880      373021718
1 row selected (2040.665 seconds)

(559 bytes * 373 021 718) / 7 = 29.788... gigabytes

(This task is mostly done and the result was used in the Contributors team's quarterly check-in, but we are keeping it open to extend the result to all languages - it looks like it's easy to extend the above query using a regex - and to hold off until after some further optimization work @Volker_E is doing.)

@Tbayer During Phab Review, the team thought that this ticket could be resolved, and a new one opened for the followup work. Does that seem reasonable to you?

@JKatzWMF also wanted to check with @Volker_E that this work was, in fact, done.

@Tbayer During Phab Review, the team

What does "the team" refer to? ;)

thought that this ticket could be resolved, and a new one opened for the followup work. Does that seem reasonable to you?

Well, as noted above in T184227#3936244 , the original plan was a different one. But considering that half a year later , neither the URL format requested above on January 16 (necessary to extend the result beyond enwiki) nor the request to repeat the analysis following further SVG optimization work has materialized, I think it's reasonable to close this task now as done, with the option to open a new one once either of these two happens. Especially since within the Product Analytics team'sPhab Review processes as they are currently set up, the presence of such open tickets seems to cause significant distraction and several staff (including you and me right now) repeatedly spending time just for task management purposes.

To clarify, the original aim of this task was to provide a result about the impact for use in the January Contributors quarterly check-in, which was achieved.

In T184227#4474365, @MBinder_WMF wrote:
@Tbayer During Phab Review, the team

What does "the team" refer to? ;)

The Product-Analytics team! :)

But considering that half a year later , neither the URL format requested above on January 16 (necessary to extend the result beyond enwiki) nor the request to repeat the analysis following further SVG optimization work has materialized, I think it's reasonable to close this task now as done, with the option to open a new one once either of these two happens. Especially since within the Product Analytics team'sPhab Review processes as they are currently set up, the presence of such open tickets seems to cause significant distraction and several staff (including you and me right now) repeatedly spending time just for task management purposes.

I believe this was also the logic the Product-Analytics team came to. Thanks for reviewing this! :)