This week saw the conclusion of a project that I've been shepherding on and off since September of last year. The goal was for the initialisation of our asynchronous JavaScript pipeline (at the time, 36 kilobytes in size) to fit within a budget of 28 KB – the size of two 14 KB bursts of Internet packets.
In total, the year-long effort is saving 4.3 Terabytes a day of data bandwidth for our users' page views.
The above graph shows the transfer size over time. Sizes are after compression (i.e. the net bandwidth cost as perceived from a browser).
How we did it
The startup manifest is a difficult payload to optimise. The vast majority of its code isn't functional logic that can be optimised by traditional means. Rather, it is almost entirely made of pure data. The data is auto-generated by ResourceLoader and represents the registry of module bundles. (ResourceLoader is the delivery system Wikipedia uses for its JavaScript, CSS, interface text.)
This registry contains the metadata for all front-end features deployed on Wikipedia. It enumerates their name, currently deployed version, and their dependency relationships to other such bundles of loadable code.
I started by identifying code that was never used in practice (T202154). This included picking up unfinished or forgotten software deprecations, and removing unused compatibility code for browsers that no longer passed our Grade A feature-test. I also wrote a document about Page load performance. This document serves as reference material, enabling developers to understand the impact of various types of changes on one or more stages of the page load process.
Fewer modules
Next was collaborating with the engineering teams here at Wikimedia Foundation and at Wikimedia Deutschland, to identify features that were using more modules than is necessary. For example, by bundling together parts of the same feature that are generally always downloaded together. Thus leading to fewer entry points to have metadata for in the ResourceLoader registry.
Some highlights:
- WMF Editing team: The WikiEditor extension now has 11 fewer modules. Another 31 modules were removed in UploadWizard. Thanks Ed Sanders, Bartosz Dziewoński, and James Forrester.
- WMF Language team: Combined 24 modules of the ContentTranslation software. Thanks Santhosh Thottingal.
- WMF Reading Web: Combined 25 modules in MobileFrontend. Thanks Stephen Niedzielski, and Jon Robson.
- WMDE Community Wishlist Team: Removed 20 modules from the RevisionSlider and TwoColConflict features. Thanks Rosalie Perside, Jakob Warkotsch, and Amir Sarabadani.
Last but not least, there was the Wikidata client for Wikipedia. This was an epic journey of its own (T203696). This feature started out with a whopping 248 distinct modules registered on Wikipedia page views. The magnificent efforts of WMDE removed over 200 modules, bringing it down to 42 today.
The bar chart above shows small improvements throughout the year, all moving us closer to the goal. Two major drops stand out in particular. One is around two-thirds of the way, in the first week of August. This is when the aforementioned Wikidata improvement was deployed. The second drop is toward the end of the chart and happened this week – more about that below.
Less metadata
This week's improvement was achieved by two holistic changes that organised the data in a smarter way overall.
First – The EventLogging extension previously shipped its schema metadata as part the startup manifest. Roan Kattouw (Growth Team) refactored this mechanism to instead bundle the schema metadata together with the JavaScript code of the EventLogging client. This means the startup footprint of EventLogging was reduced by over 90%. That's 2KB less metadata in the critical path! It also means that going forward, the startup cost for EventLogging no longer grows with each new event instrumentation. This clever bundling is powered by ResourceLoader's new Package files feature. This feature was expedited in February 2019 in part because of its potential to reduce the number of modules in our registry. Package Files make it super easy to combine generated data with JavaScript code in a single module bundle.
Second – We shrunk the average size for each entry in the registry overall (T229245). The startup manifest contains two pieces of data for each module: Its name, and its version ID. This version ID previously required 7 bytes of data. After thinking through the Birthday mathematics problem in context of ResourceLoader, we decided that the probability spectrum for our version IDs can be safely reduced from 78 billion down to "only" 60 million. For more details see the code comments, but in summary it means we're saving 2 bytes for each of the 1100 modules still in the registry. Thus reducing the payload by another 2-3 KB.
Below is a close-up for the last few days (this is from synthetic monitoring, plotting the raw/uncompressed size):
The change was detected in ResourceLoader's synthetic monitoring. The above is captured from the Startup manifest size dashboard on our public Grafana instance, showing a 2.8KB decrease in the uncompressed data stream.
With this week's deployment, we've completed the goal of shrinking the startup manifest to under 28 KB. This cross-departmental and cross-organisational project reduced the startup manifest by 9 KB overall (net bandwidth, after compression); From 36.2 kilobytes one year ago, down to 27.2 KB today.
We have around 363,000 page views a minute in total on Wikipedia and sister projects. That's 21.8M an hour, or 523 million every day (User pageview stats). This week's deployment saves around 1.4 Terabytes a day. In total, the year-long effort is saving 4.3 Terabytes a day of bandwidth on our users' page views.
What's next
It's great to celebrate that Wikipedia's startup payload now neatly fits into the target budget of 28 KB – chosen as the lowest multiple of 14KB we can fit within subsequent bursts of Internet packets to a web browser.
The challenge going forward will be to keep us there. Over the past year I've kept a very close eye (spreadsheet) on the startup manifest — to verify our progress, and to identify potential regressions. I've since automated this laborious process through a public Grafana dashboard.
We still have many more opportunities on that dashboard to improve bundling of our features, and (for Performance Team) to make it even easier to implement such bundling. I hope these on-going improvements will come in handy whilst we work on finding room in our performance budget for upcoming features.
– Timo Tijhof
Further reading:
- Metrics & Perf reports, on performance.wikimedia.org
- ResourceLoader Architecture, on mediawiki.org.
- The 14KB Initial Window, by Tyler Cipriani
- Projects
- None
- Subscribers
- PachaTchernof, Omotecho, spatton and 10 others
- Tokens
Event Timeline
Interesting. Thanks for sharing this. I wonder if this is an area that would benefit from more aggressive compression. The data is cached from what i understand so it doesnt have to be compressed on the fly, and getting as much as possible in that initial window seems important
To answer my own question, i did a quick test - Currently its 26751 bytes compressed. Super aggressive gzip (zopfli) could in theory bring that down to 25532 bytes for a saving of 1219 bytes. More sane would be brotli, which could bring down to 23763 bytes for a saving of 2988 bytes.