The goals around this test have been documented at https://www.mediawiki.org/wiki/Performance_budgeting
Background
The web team enforces a performance budget test via a test in Vector skin:
https://github.com/wikimedia/mediawiki-skins-Vector/blob/master/tests/phpunit/structure/PerformanceBudgetTest.php
This has been successful at catching unintentional performance regressions to the main namespace (T350514), but has also thrown up false positives - particularly due to the fact that some projects run different extensions (e.g. Wikibase / FlaggedRevisions). Examples: T360102, T350338
It also runs on 3rd party extensions which may want to provide different performance budgets or not monitor performance at all (T358432#9578189).
After talking to @Jdforrester-WMF we think this can evolve into a new type of test
Requirements
- The budget can be changed for different setups - for example projects with FlaggedRevisions should be able to define a slightly larger budget. [This is possible because each extension defines their own bundlesize.config.json file)
- The code should live in core or its own dedicated extension.
- The code should run on all Wikimedia-deployed extensions
- Extension maintainers can choose to not track the exact size of bundles if they wish
- It should not be possible to skip tests when the budget is exceeded - when a test fails the only allowable action should be to discuss increasing the budget or optimizing the code that triggered the failure.
- Errors should only be triggered on patches that exceed the budget -we've had cases where skins/extensions have bypassed the budget, have been unable to merge code and caused CI issues in extensions such as GrowthExperiments and Charts where no changes to JS/CSS have been made (see discussions in T373017) [Note: CodexModules and SkinModule modules (to some extent) remain the exception to the rule here)
- The test should only run for anonymous page views.
- The test should prevent extensions adding large JS/CSS assets in Wikimedia production to the article namespace without conversation For example currently OOUI is not loaded on page load, but if it was we'd expect to see an associated large increase in bytes of CSS/JS shipped and would warrant discussion. This should happen during the code review process, as often once code is merged, it is often too late to rollback and can end up in production and in our HTML caches. We have seen this in two high profile incidents so far - Graphs (when fallback images were removed); and roll out of Phonos.
- It should be possible to monitor the size of compressed bundles
- It should be possible to monitor the size of uncompressed bundles, as this tends to contribute to visual completion and responsiveness
Descoped
The following requirements have not been requested so far, even though the test has been enabled for several months now. Feel free to create a ticket if these impact you and further work is needed in this field:
- It should be possible to disable the test in certain setups e.g. SemanticMediaWiki should be able to disable the test
- Different projects should be able to define different budgets for different circumstances - for example Wikidata should be able to define a performance budget for the item namespace (e.g. Q1)
Notes
From chat with @Jdforrester-WMF
- The code could live in core
- Code could use a bespoke Quibble-buster-performance-test
- Could use fresnel