Page MenuHomePhabricator

Remove mwLoadStart/mediaWikiLoadStart and track mwLoadEnd relative to fetchStart
Closed, ResolvedPublic

Description

Ever since we moved to 100% async JS, mwLoadStart stopped being useful, since it races with other things (causing the variation in described in the original description of this task).

References to mediaWikiLoadStart variable:

  • mediawiki/extensions/NavigationTiming: Used to compute the mediaWikiLoadComplete delta. Let's remove and report mediaWikiLoadEnd instead.
  • mediawiki/extensions/VisualEditor: Removed.
  • puppet: navtiming.py – Unused. (We only report mediaWikiLoadComplete). Safe to remove.
  • puppet: coal.py – Unused. (We only report mediaWikiLoadComplete). Safe to remove.

References to mwLoadStart mark:

  • mediawiki/core - Definition. Unused. Safe to remove. Maybe rename to mwStartup?

Details

Related Gerrit Patches:

Event Timeline

Peter created this task.Mar 13 2017, 7:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2017, 7:52 AM
Peter added a comment.Mar 14 2017, 5:31 AM

I've started to collect some example tracelogs from Chrome. Drag and drop it into your devtools timeline (unpack it first), Taking two exteremes first: One with mwLoadStart 550 ms and one 1330 ms.


Peter triaged this task as Medium priority.May 30 2017, 9:40 AM
Krinkle added a subscriber: Krinkle.EditedJul 22 2017, 2:51 AM

Looking at it again, looks like it has gotten a fair bit more stable.

BrowserGraphVariance in last 5-10 days
Firefox
mwLoadStart/mwLoadEnd: ± 0.2s
Chrome
mwLoadStart/mwLoadEnd: ± 0.1s
IE
mwLoadStart: ± 0.1s, mwLoadEnd: Alternates between ~1.7s and ~2.6s (± 0.1s). But a gap of around 1s. It seems to consistently alternate between 2 scenarios.

The mwLoadStart doesn't gives us an extra value since we made the loading async. Instead we should change the mwLoadEnd to be relative when the page starts to load. Let me update the description later.

Gilles renamed this task from Investigate the big difference in mwLoadStart to Remove mwLoadStart and track mwLoadEnd relative to fetchStart.Sep 12 2017, 1:36 PM
Gilles updated the task description. (Show Details)

Change 387510 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/ImageMetrics@master] Remove use of 'mediaWikiLoadStart'

https://gerrit.wikimedia.org/r/387510

Krinkle updated the task description. (Show Details)Oct 31 2017, 3:01 AM
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Nov 1 2017, 12:36 AM
Krinkle removed a project: Patch-For-Review.

Change 387510 merged by jenkins-bot:
[mediawiki/extensions/ImageMetrics@master] Remove use of 'mediaWikiLoadStart'

https://gerrit.wikimedia.org/r/387510

Change 427838 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Remove unused mwLoadStart mark

https://gerrit.wikimedia.org/r/427838

Change 427931 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] Add mediaWikiLoadEnd, remove mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/427931

Change 427936 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/navtiming@master] Remove reporting for mediaWikiLoadStart and mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/427936

Change 427937 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/navtiming@master] Add tests for 'mediaWikiLoadEnd' metric

https://gerrit.wikimedia.org/r/427937

Krinkle claimed this task.Apr 20 2018, 5:51 PM

@Peter This has been in the backlog for a while, but to get metric in-place for T192623, I've written the patches for now. Was quite straight-forward because you already did most of the work in 2015 when we first introduced the mediaWikiLoadEnd metric :)

Krinkle updated the task description. (Show Details)Apr 20 2018, 5:51 PM
Peter added a comment.Apr 23 2018, 8:17 AM

I kind of like the mwLoadStart metric :) I've been looking into the problem we have with the unstable metrics for Firefox on synthetic T185186 and I think I see a pattern and the mwLoadStart helped me.

Here a summary for 11 runs on Firefox:
http://webpagereplay-wikimedia.s3.us-east-1.amazonaws.com/enwiki/desktop/firefox/100/Facebook/2018-04-23-05-53/browsertime.json

The firstPaint (we use both the one from the video but also the "hidden" one in Firefox that seems to work ok for Wikipedia, you enable it with dom.performance.time_to_non_blank_paint.enabled). It doesn't mattern which metric we use, the pattern is the same.

We have two kind of buckets when first paint happens here: One 1.5 seconds and one 2.5 seconds roughly. For the ones when first paint happens at 1.5 seconds the mwLoadStart match first paint, it happens around 1.5 seconds. So we start to render the screen and parse the JavaScript.

The other bucket start to render at 2.5 seconds and mwLoadStarts happens earlier 800-900 ms and block the rendering. It's seems to correlate for the other pages two where we have the two bucket scenarios: Early mwLoadStarts gives us a slow first paint on Firefox.

Without the mwLoadStart I've wouldn't been able to see that, the other metrics gives me no idea what's going on (missing HAR doesn't help either).

Change 427931 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add mediaWikiLoadEnd, remove mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/427931

Krinkle updated the task description. (Show Details)Apr 23 2018, 2:54 PM

Change 427936 merged by jenkins-bot:
[performance/navtiming@master] Remove reporting for mediaWikiLoadStart and mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/427936

Change 427937 merged by jenkins-bot:
[performance/navtiming@master] Add tests for 'mediaWikiLoadEnd' metric

https://gerrit.wikimedia.org/r/427937

The other bucket start to render at 2.5 seconds and mwLoadStarts happens earlier 800-900 ms and block the rendering. It's seems to correlate for the other pages two where we have the two bucket scenarios: Early mwLoadStarts gives us a slow first paint on Firefox.
Without the mwLoadStart I've wouldn't been able to see that, the other metrics gives me no idea what's going on (missing HAR doesn't help either).

I think the mwLoadStart still could help us, checkout https://phabricator.wikimedia.org/T160315#4148868 where it helped me to understand the slow first paint for Firefox. Even though the JavaScript is async the user have different experience depending in when the script is parsed right?

Yes, it is useful to see when the request happened, but we also have other requests that are useful to track in that way.I am not sure whether we should metrics for them, however, because they will vary a lot and (I believe) they are okay to vary.

The end-result matters more, which metrics like firstPaint and mwLoadEnd measure. When we find an issue there, or high variance, we can investigate based on other metrics, but also based on the timeline, right? For example, another problem we found in Firefox recently related to image size and load time.

Maybe we can check if the startup module request shows in the timeline and correlates the same way as mwLoadStart? Anyway, we can keep it for now, but perhaps without polyfill and with a slightly different name to fix the confusion of it relating to mwLoadEnd (which is a problem, per this task). Maybe mwStartup.

Change 428397 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/VisualEditor@master] init: Use navStart directly in mw.trackSubscriber

https://gerrit.wikimedia.org/r/428397

Change 428549 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] Actually update schema revision

https://gerrit.wikimedia.org/r/428549

Change 428553 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/coal@master] coal: Remove reporting for mediaWikiLoadStart and mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/428553

Change 428549 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Actually update schema revision

https://gerrit.wikimedia.org/r/428549

Krinkle updated the task description. (Show Details)Apr 25 2018, 10:47 PM

Change 428553 merged by Krinkle:
[performance/coal@master] coal: Remove reporting for mediaWikiLoadStart and mediaWikiLoadComplete

https://gerrit.wikimedia.org/r/428553

Change 429371 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] ext.navigationTiming: Don't send beacon without basic NavTiming support

https://gerrit.wikimedia.org/r/429371

Change 429372 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] ext.navigationTiming: Fix mediaWikiLoadEnd metric (offset, not timestamp)

https://gerrit.wikimedia.org/r/429372

Krinkle added a comment.EditedApr 27 2018, 3:39 AM

https://graphite.wikimedia.org/render?target=coal.mediaWikiLoadEnd&from=20180424&until=20180502&width=800&height=400&yMin=1524000000000

Time increasing, one minute every minute
                          ¯\_(ツ)_/¯

Change 427838 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Remove mwPerformance stub and rename mwLoadStart mark

https://gerrit.wikimedia.org/r/427838

Change 429371 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] ext.navigationTiming: Don't send beacon without basic NavTiming support

https://gerrit.wikimedia.org/r/429371

Change 429372 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] ext.navigationTiming: Fix mediaWikiLoadEnd metric (offset, not timestamp)

https://gerrit.wikimedia.org/r/429372

Mentioned in SAL (#wikimedia-operations) [2018-05-05T00:26:17Z] <Krinkle> Purging values for navtiming 'mediaWikiLoadEnd' sub-properties from 2018-04-21 to 2018-05-04T04:40:00 (T160315)

Mentioned in SAL (#wikimedia-operations) [2018-05-05T00:35:46Z] <Krinkle> Delete navtiming 'mediaWikiLoadComplete' metrics (T160315)

Purged timing metrics ('count', 'mean', 'p50', 'p95', 'p999', 'sample_rate', 'upper', 'lower', 'median', 'p75', 'p99', 'rate') on following base names between the point where we added the metric (1 week ago) and when the fix went out to all wikis (yesterday) – during which it was accidentally was recording timestamps instead of durations:

frontend.navtiming.mediaWikiLoadEnd.overall.%
frontend.navtiming2.mediaWikiLoadEnd.overall.%

frontend.navtiming.mediaWikiLoadEnd.mobile.overall.%
frontend.navtiming2.mediaWikiLoadEnd.mobile.overall.%
frontend.navtiming.mediaWikiLoadEnd.desktop.overall.%
frontend.navtiming2.mediaWikiLoadEnd.desktop.overall.%

frontend.navtiming_oversample.mediaWikiLoadEnd.overall.%
frontend.navtiming2_oversample.mediaWikiLoadEnd.overall.%

(using the ugly hack that is graphite_wipe.py)

Krinkle closed this task as Resolved.May 5 2018, 12:54 AM

Change 428397 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] mw.trackSubscriber: Remove action.init.timing value

https://gerrit.wikimedia.org/r/428397

Change 437647 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] resourceloader: remove "mediaWikiLoadStart" and move mwNow() to mediawiki.js

https://gerrit.wikimedia.org/r/437647

Krinkle renamed this task from Remove mwLoadStart and track mwLoadEnd relative to fetchStart to Remove mwLoadStart mark() and track mwLoadEnd relative to fetchStart.Jun 5 2018, 11:36 PM
Krinkle renamed this task from Remove mwLoadStart mark() and track mwLoadEnd relative to fetchStart to Remove mwLoadStart/mediaWikiLoadStart and track mwLoadEnd relative to fetchStart.
Krinkle updated the task description. (Show Details)

Change 437647 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: remove "mediaWikiLoadStart" and move mwNow() to mediawiki.js

https://gerrit.wikimedia.org/r/437647