Page MenuHomePhabricator

Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards
Open, HighPublic

Description

Related T212482

In order to diagnose save timing problems much more quickly, it would help to have some breakdowns by component/extension (e.g. FlaggedRevs, Echo). This could go beyond save timing, but should be segregated by entry point (save, upload, login,...). Save timing would be a good start in any case. The idea would be to easily spot large increases in runtime attributed to some module so that an appropriate maintainer can be notified and RelEng can have an easy idea sometimes of what to revert. Right now this is very difficult.

Perhaps we can have per-wiki breakdowns for the top wikis.

A related concern is having leaderboards for template and Lua modules...but that's another task itself.

Event Timeline

This mechanism could also be used to enrich exception traces in Logstash with a string representing the name of an extension, if the exception came from a hook handler. E.g. by using the last string in the array that this feature would use, and pulling it in from Logger/WikiProcessor in some way.

Krinkle triaged this task as High priority.Jul 10 2019, 3:36 PM

We now have dedicated PreSend and PostSend flame graphs published daily, which covers a fair bit of the original use case for this task.

https://performance.wikimedia.org/php-profiling/

The original idea of sending this to Grafana (eg Statsd ro Prometheus) as such is less needed now, but we're recycling this for tracking Save Timing perf budgets, where we do specifically want to plot by higher-level components.

Change 698947 had a related patch set uploaded (by Krinkle; author: Kosta Harlan):

[mediawiki/core@master] DeferredUpdates: Log execution time for updates

https://gerrit.wikimedia.org/r/698947

Change 698947 merged by jenkins-bot:

[mediawiki/core@master] DeferredUpdates: Log execution time for updates

https://gerrit.wikimedia.org/r/698947