In total, the year-long effort is saving 4.3 Terabytes a day of data bandwidth for our users' page views.
The above graph shows the transfer size over time. Sizes are after compression (i.e. the net bandwidth cost as perceived from a browser).
How we did it
This registry contains the metadata for all front-end features deployed on Wikipedia. It enumerates their name, currently deployed version, and their dependency relationships to other such bundles of loadable code.
I started by identifying code that was never used in practice (T202154). This included picking up unfinished or forgotten software deprecations, and removing unused compatibility code for browsers that no longer passed our Grade A feature-test. I also wrote a document about Page load performance. This document serves as reference material, enabling developers to understand the impact of various types of changes on one or more stages of the page load process.
Next was collaborating with the engineering teams here at Wikimedia Foundation and at Wikimedia Deutschland, to identify features that were using more modules than is necessary. For example, by bundling together parts of the same feature that are generally always downloaded together. Thus leading to fewer entry points to have metadata for in the ResourceLoader registry.
- WMF Editing team: The WikiEditor extension now has 11 fewer modules. Another 31 modules were removed in UploadWizard. Thanks Ed Sanders, Bartosz Dziewoński, and James Forrester.
- WMF Language team: Combined 24 modules of the ContentTranslation software. Thanks Santhosh Thottingal.
- WMF Reading Web: Combined 25 modules in MobileFrontend. Thanks Stephen Niedzielski, and Jon Robson.
- WMDE Community Wishlist Team: Removed 20 modules from the RevisionSlider and TwoColConflict features. Thanks Amir Sarabadani.
Last but not least, there was the Wikidata client for Wikipedia. This was an epic journey of its own (T203696). This feature started out with a whopping 248 distinct modules registered on Wikipedia page views. The magnificent efforts of Amir removed over 200 modules, bringing it down to 42 today.
The bar chart above shows small improvements throughout the year, all moving us closer to the goal. Two major drops stand out in particular. One is around two-thirds of the way, in the first week of August. This is when the aforementioned Wikidata improvement was deployed. The second drop is toward the end of the chart and happened this week – more about that below.
This week's improvement was achieved by two holistic changes that organised the data in a smarter way overall.
Second – We shrunk the average size for each entry in the registry overall (T229245). The startup manifest contains two pieces of data for each module: Its name, and its version ID. This version ID previously required 7 bytes of data. After thinking through the Birthday mathematics problem in context of ResourceLoader, we decided that the probability spectrum for our version IDs can be safely reduced from 78 billion down to "only" 60 million. For more details see the code comments, but in summary it means we're saving 2 bytes for each of the 1100 modules still in the registry. Thus reducing the payload by another 2-3 KB.
Below is a close-up for the last few days (this is from synthetic monitoring, plotting the raw/uncompressed size):
The change was detected in ResourceLoader's synthetic monitoring. The above is captured from the Startup manifest size dashboard on our public Grafana instance, showing a 2.8KB decrease in the uncompressed data stream.
With this week's deployment, we've completed the goal of shrinking the startup manifest to under 28 KB. This cross-departmental and cross-organisational project reduced the startup manifest by 9 KB overall (net bandwidth, after compression); From 36.2 kilobytes one year ago, down to 27.2 KB today.
We have around 363,000 page views a minute in total on Wikipedia and sister projects. That's 21.8M an hour, or 523 million every day (User pageview stats). This week's deployment saves around 1.4 Terabytes a day. In total, the year-long effort is saving 4.3 Terabytes a day of bandwidth on our users' page views.
It's great to celebrate that Wikipedia's startup payload now neatly fits into the target budget of 28 KB – chosen as the lowest multiple of 14KB we can fit within subsequent bursts of Internet packets to a web browser.
The challenge going forward will be to keep us there. Over the past year I've kept a very close eye (spreadsheet) on the startup manifest — to verify our progress, and to identify potential regressions. I've since automated this laborious process through a public Grafana dashboard.
We still have many more opportunities on that dashboard to improve bundling of our features, and (for Performance Team) to make it even easier to implement such bundling. I hope these on-going improvements will come in handy whilst we work on finding room in our performance budget for upcoming features.
– Timo Tijhof
Interesting. Thanks for sharing this. I wonder if this is an area that would benefit from more aggressive compression. The data is cached from what i understand so it doesnt have to be compressed on the fly, and getting as much as possible in that initial window seems important
To answer my own question, i did a quick test - Currently its 26751 bytes compressed. Super aggressive gzip (zopfli) could in theory bring that down to 25532 bytes for a saving of 1219 bytes. More sane would be brotli, which could bring down to 23763 bytes for a saving of 2988 bytes.