Page MenuHomePhabricator

Include module bundle size in RL Graphite stats
Closed, ResolvedPublic

Description

Prior art:

We have a per-module breakdown on backend cache-miss build latency in production, and a cache-hit/cache-miss rate, and we have a per-module breakdown of startup allocation cost in bytes for module bundle metadata.

What we don't yet have is a breakdown of the size of the module output itself.

High level approaches to consider:

  1. Measure on-demand when building a module during a cache-miss in production, from live load.php requests. This would match what we do today for measuring latencies. This makes sense for latencies and cache hit/miss ratios, as those must reflect what actually happens in production in order to be useful, these vary by current database load, memcached effectiveness, and may vary by wiki configuration, skins and user language preference.

I looked into this, and steered away from it because the size measurement would vary too much by request context and user preferences. The exact size for any given user context can be measured directly by developers through a trivial inspection in DevTools or with curl. The dashboard I imagine would be more useful if it gives a stable approximation based on equal comparison. In particular there is variance between modules loaded via stylesheets and via JS responses.

One of the counter-intuitive things about caching is that the more popular something is, the less often you have a cache miss. (For example, Thumbor, our thumbnailing service, is known to be primarily a 404 generator that ocasionally produces an image, because images are strongly cached whereas 404s are revalidated more often)

This means that the size numbers would be disproportionately skewed by rarely requested contexts that may produce larger variants. One way to avoid that would be to only send stats to Graphite if the current request is "canonical", e.g. a stylesheet request for a styles module, with the wikis' default skin/language. This however means modules may never get the right metrics or very irregularly if they are primarily lazy-loaded. It's not uncommon for a styles module to be requested via JS.

  1. Measure synthetically from a cronjob. This matches what we do today for startup allocation cost because approximating that is relatively expensive (so best deferred), doesn't vary by factors that require live webserver context, and may benefit from being measured at a low but consistent rate from a cronjob so that the size allocation has a predictable measurement frequency.

This would be a bit more code upfront, but would then give a stablecomparison over time, with the drawback that it woud not perfectly capture every variant of a module bundle.

Event Timeline

Change 737799 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Slight refactor to prep for non-startup metrics

https://gerrit.wikimedia.org/r/737799

Change 737828 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@master] [WIP] blameStartupRegistry: Add module response size metrics

https://gerrit.wikimedia.org/r/737828

Change 737799 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Slight refactor to prep for non-startup metrics

https://gerrit.wikimedia.org/r/737799

Change 737828 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Add module response size metrics

https://gerrit.wikimedia.org/r/737828

Change 746948 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/746948

Change 746948 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/746948

Change 746921 had a related patch set uploaded (by Ladsgroup; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@wmf/1.38.0-wmf.13] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/746921

Change 746921 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@wmf/1.38.0-wmf.13] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/746921

Change 747472 had a related patch set uploaded (by Ladsgroup; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@wmf/1.38.0-wmf.12] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/747472

Change 747472 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@wmf/1.38.0-wmf.12] blameStartupRegistry: Fix clash in $startupBytes variable name

https://gerrit.wikimedia.org/r/747472

Mentioned in SAL (#wikimedia-operations) [2021-12-15T14:15:31Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/extensions/WikimediaMaintenance/blameStartupRegistry.php: Backport: [[gerrit:747472|blameStartupRegistry: Fix clash in $startupBytes variable name (T295413)]] (duration: 01m 07s)

This is done.

Bundle sizes of largest modules:
https://grafana.wikimedia.org/d/000000430/resourceloader-modules-overview

Bundle size for a specific module. This one has an optional filter by component to quickly find the modules of any given extension or skin.
https://grafana.wikimedia.org/d/Zx5m6iT7k/resourceloader-bundle-size

Change 747651 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Escape dots in component name

https://gerrit.wikimedia.org/r/747651

Change 747651 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] blameStartupRegistry: Escape dots in stats component name

https://gerrit.wikimedia.org/r/747651