Page MenuHomePhabricator

Record long tasks in navtiming
Closed, ResolvedPublic

Description

Collect CPU long tasks information from our real users in the navigation timing extension. A long task is a CPU task that takes more than 50 ms on the main thread in the browser.

In our synthetic tests where we use Chrome we collect CPU long tasks information: we collect number of long tasks, and the actual time spent in long tasks. We also count number of tasks before first paint and before load event end.

The metrics that we get from real users we do not collect any long tasks at all, so a first step could be to collect number of long tasks and the total time spent in long tasks. That way we would start to get a feel for how many long tasks we have for real users and see how stable that metric is, maybe we can use it for alerting. One thing to remember is that the long taks ins depending on the users hardware so it will be different depending on that device the user is using.

Long tasks will continue to happen through the user journey but at as a first implementation we can get the long tasks that happens before we beacon back navigation timing data.

To collect the metric we need to do a couple of things (you can see the full picture at https://wikitech.wikimedia.org/wiki/Performance#/media/File:WMF_Performance_Team_infrastructure_2022.png):

  1. Add the collection of the actual metric in the navigation timing extension https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/NavigationTiming/+/refs/heads/master - you need to pull up Mediawiki and do the changes in the extension, add tests for it and run the tests.
  1. Then we need to make sure that the data is stored in the schema https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/legacy/navigationtiming/ so we need to ad the fields missing there.
  1. And then we need to take care of data when it arrives and send it to Graphite/Prometheus. That happens in navtiming.py: https://gerrit.wikimedia.org/r/plugins/gitiles/performance/navtiming/
  1. When the data has started to arrive we can make a new dashboard/graph in Grafana where we can look at the new metric.

Event Timeline

Change 631432 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Capture long tasks

https://gerrit.wikimedia.org/r/631432

Gilles triaged this task as Medium priority.Oct 1 2020, 11:57 AM

FYI: Sometime ago I implemented the buffered flag in Browsertime but I needed to rollback to the old solution (inject the observer when the page start to load) because I could see that the buffered flag missed long tasks. Also see https://twitter.com/Joseph_Wynn/status/1310755139704074240 and https://bugs.chromium.org/p/chromium/issues/detail?id=1131385&q=long%20task%20buffered&can=2

The buffered flag works fine for me, you can see that I wrote a test case in the patch where the long task happens before the observer is set up and it catches it as expected. We just can't run that test on our CI infrastructure yet because it runs an "old" Chromium (73).

Change 631432 abandoned by Krinkle:

[mediawiki/extensions/NavigationTiming@master] Capture long tasks

Reason:

Closing for now until we're ready to pick up the task and the subsequent research and analysis.

https://gerrit.wikimedia.org/r/631432

Peter removed a subscriber: Aklapper.

I think we should continue with the long tasks but also know that https://github.com/w3c/longtasks/blob/loaf-explainer/loaf-explainer.md implementation is coming. Firefox has stopped their implementation to wait for what happens with loaf.

Change 896370 had a related patch set uploaded (by Barakat Ajadi; author: Barakat Ajadi):

[schemas/event/secondary@master] Navtiming: Add total longtask and total longtask duration

https://gerrit.wikimedia.org/r/896370

Change 896370 merged by jenkins-bot:

[schemas/event/secondary@master] Navtiming: Add total longtask and total longtask duration

https://gerrit.wikimedia.org/r/896370

Change 898771 had a related patch set uploaded (by Barakat Ajadi; author: Barakat Ajadi):

[performance/navtiming@master] Navtiming: Collect longtask data and send to prometheus

https://gerrit.wikimedia.org/r/898771

Change 901174 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[mediawiki/extensions/NavigationTiming@master] Disconnect the observer when we collect the metrics.

https://gerrit.wikimedia.org/r/901174

Change 898771 merged by jenkins-bot:

[performance/navtiming@master] Navtiming: Collect longtask data and send to prometheus

https://gerrit.wikimedia.org/r/898771

Change 901174 merged by jenkins-bot:

[mediawiki/extensions/NavigationTiming@master] Disconnect the observer when we collect the metrics.

https://gerrit.wikimedia.org/r/901174

Change 896319 had a related patch set uploaded (by Krinkle; author: Barakat Ajadi):

[mediawiki/extensions/NavigationTiming@master] Navtiming: Collect longtask data and write test

https://gerrit.wikimedia.org/r/896319

Change 896319 abandoned by Krinkle:

[mediawiki/extensions/NavigationTiming@master] Navtiming: Collect longtask data and write test

Reason:

Superseded by other changes

https://gerrit.wikimedia.org/r/896319