Page MenuHomePhabricator

500ms increase in fully loaded desktop and mobile on 30th March 2018
Closed, ResolvedPublic

Description

Looking at https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1&from=now-7d&to=now we see a small spike on the 30th March for fully loaded time for both desktop and mobile. There appear to be no significant changes in CSS/HTML/JS that could be responsible.

Mobile comparison: http://wpt.wmftest.org/video/compare.php?tests=180331_J8_17Q,180330_FE_RC suggests extra time in the DNS
Same spike on desktop: https://grafana.wikimedia.org/dashboard/db/mobile-2g?orgId=1&panelId=90&fullscreen so I think we can rule out any changes our side.

We changed our deploy server around then (see email: "[Ops] new deployment server (tin -> deploy)"... maybe related)?

Event Timeline

Looking at the desktop site, it seems like it just went back to its previous baseline:

Capture d'écran 2018-04-13 10.52.22.png (667×1 px, 152 KB)

As for mobile, I think you've linked to the wrong graph, it's a page previews one and I don't see anything around March 30th there.

I think this graph of beta cluster and stable mobile fully loaded times is also alarming (and may be what @Jdlrobson intended to link to):

Screenshot from 2018-04-13 08-09-28.png (781×1 px, 194 KB)

(Ignore the great fluctuation in beta cluster before March 29th and after April 5th which were due to content changes.) There is an inexplicable half second increase on on March 30th. I tried to make similar comparisons on wpt.wmftest.org but I couldn't reproduce there.

Your WPT comparison is with the desktop site, though, and the graph is for the mobile site. I've shown above that for desktop it seems to have gotten back to its previous level.

Here's a WPT comparison for the same thing seen on your mobile Grafana graph:

http://wpt.wmftest.org/video/compare.php?tests=180328_GE_15,180412_2F_SY

On the most recent run we can clearly see the problem and some visual garbage on the pronunciation area and a bunch of red links.

Here's the side-by-side video: http://wpt.wmftest.org/video/view.php?id=180328_883eb7ae325df1835b2713fccf347b58c25940a8

I don't think it's the red link garbage that's responsible for the late visual completion (even though that's worrying in itself), it's the "Test banner one (CentralNotice)" showing up at the top.

I can't reproduce these issues locally. But then again, I'm not in "US east", so might not be targeted by this test banner if it is indeed mistakenly displayed in production.

Ah, nevermind, I was comparing stable and beta...

This is an appropriate before/after comparison for the production site: http://wpt.wmftest.org/video/compare.php?tests=180328_GE_15,180413_HY_VD

Looking at the details, I see that HTML and JS size have increased slightly (by 1 and 3%, respectively):

http://wpt.wmftest.org/result/180328_GE_15/1/breakdown/
http://wpt.wmftest.org/result/180413_HY_VD/4/breakdown/

Looking at differences, one thing I notice is that request 9 spends a lot more time busy with CPU and apparently keeping the main thread busy:

Capture d'écran 2018-04-13 22.43.46.png (320×99 px, 5 KB)

This is what this request contains before:

https://en.m.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.centralNotice.choiceData%2Cdisplay%2CgeoIP%2CimpressionDiet%2CkvStore%2CkvStoreMaintenance%2CstartUp%7Cext.centralauth.centralautologin%7Cext.eventLogging%2CnavigationTiming%2CwikimediaEvents%7Cext.eventLogging.Schema%2Csubscriber%7Cext.relatedArticles.readMore.bootstrap%2CeventLogging%2Cgateway%7Cext.visualEditor.targetLoader%2Ctrack%2Cve%7Cjquery.accessKeyLabel%2Cclient%2Ccookie%2CtextSelection%2Cthrottle-debounce%7Cmediawiki.RegExp%2CString%2CTitle%2CUri%2Capi%2Ccldr%2Cexperiments%2CjqueryMsg%2Clanguage%2Cnotify%2Crouter%2Cstorage%2Ctemplate%2Cuser%2Cutil%2Cviewport%7Cmediawiki.api.user%7Cmediawiki.language.data%2Cinit%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.startup%7Cmediawiki.template.hogan%2Cregexp%7Cmediawiki.ui.input%7Cmobile.ajax.styles%7Cmobile.editor.api%7Cmobile.init%2Cissues%2Creferences%2Csearch%2Csite%2Cstartup%2Ctoggle%2Cwatchstar%7Cmobile.issues.images%7Cmobile.loggingSchemas.mobileWebSearch%7Cmobile.pagelist.scripts%2Cstyles%7Cmobile.pagesummary.styles%7Cmobile.references.gateway%2Cimages%7Cmobile.search.api%2Cimages%2Cutil%7Cmobile.startup.images%7Cmobile.startup.images.variants%7Coojs%2Coojs-router%7Cschema.MobileWebSearch%7Cskins.minerva.editor%2CmainMenu%2Cscripts%2Ctoggling%2Cwatchstar%7Cskins.minerva.icons.images.scripts%2Cvariants%7Cskins.minerva.mainMenu.icons%2Cstyles%7Cskins.minerva.scripts.top%7Cuser.defaults&skin=minerva&version=1uiswv0

and after:

https://en.m.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.centralNotice.bannerHistoryLogger%2CchoiceData%2Cdisplay%2CgeoIP%2CimpressionDiet%2CkvStore%2CkvStoreMaintenance%2ClargeBannerLimit%2ClegacySupport%2CstartUp%7Cext.centralauth.centralautologin%7Cext.eventLogging%2CnavigationTiming%2CwikimediaEvents%7Cext.eventLogging.Schema%2Csubscriber%7Cext.relatedArticles.readMore.bootstrap%2CeventLogging%2Cgateway%7Cext.visualEditor.targetLoader%2Ctrack%2Cve%7Cjquery.accessKeyLabel%2Cclient%2Ccookie%2CtextSelection%2Cthrottle-debounce%7Cmediawiki.RegExp%2CString%2CTitle%2CUri%2Capi%2Ccldr%2Ccookie%2Cexperiments%2CjqueryMsg%2Clanguage%2Cnotify%2Crouter%2Cstorage%2Ctemplate%2Cuser%2Cutil%2Cviewport%7Cmediawiki.api.user%7Cmediawiki.language.data%2Cinit%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.startup%7Cmediawiki.template.hogan%2Cregexp%7Cmediawiki.ui.input%7Cmobile.ajax.styles%7Cmobile.editor.api%7Cmobile.init%2Cissues%2Creferences%2Csearch%2Csite%2Cstartup%2Ctoggle%2Cwatchstar%7Cmobile.issues.images%7Cmobile.loggingSchemas.mobileWebSearch%7Cmobile.pagelist.scripts%2Cstyles%7Cmobile.pagesummary.styles%7Cmobile.references.gateway%2Cimages%7Cmobile.search.api%2Cimages%2Cutil%7Cmobile.site.styles%7Cmobile.startup.images%7Cmobile.startup.images.variants%7Coojs%2Coojs-router%7Cschema.MobileWebSearch%7Cskins.minerva.editor%2CmainMenu%2Cscripts%2Ctoggling%2Cwatchstar%7Cskins.minerva.icons.images.scripts%2Cvariants%7Cskins.minerva.mainMenu.icons%2Cstyles%7Cskins.minerva.scripts.top%7Cuser.defaults&skin=minerva&version=14wxt72

But since that newer run is very recent, it could include code changes that are irrelevant to the metric bump. So I'm going to look for runs as close as possible to the change, before and after.

Two runs much closer to the change:

http://wpt.wmftest.org/video/compare.php?tests=180330_AV_ND%2C180330_V8_1C8&thumbSize=200&ival=100&end=full

The before and after uncompressed sizes don't change at all, the content served is absolutely identical. @Peter did you restart the linux agent on March 30th by any chance?

Something to consider is that "visual complete" and "document complete" are identical in both cases. "Fully loaded" is an invisible event to the user.

When you look at the waterfall, in the old run you see that some low priority requests happen after "fully loaded". In the more recent run, "fully loaded" is after these. I'm not sure what the definition of "fully loaded" is, but maybe it's not a good metric anyway, since it seems to be completely disconnected from the user experience?

Opening those two runs' HAR files, the total time for all requests is 4.67 and 4.72s, respectively. So no real difference there.

If you compare screenshots between both runs, and pick "visually complete" as the end point, you can see that they line up perfectly, at 5.1 seconds. This extra time is understandable, as after the very last request is complete, JS needs to run and CSS needs to be applied for what came from the last request.

If you switch to using "fully loaded" as the end point, the first run which has "fully loaded" at 4.6s is cut off in terms of visual completion. It doesn't have time for the star and download buttons to appear.

I think this suggests that prior to the change "fully loaded" matched when all the requests had been completed, regardless of whether or not the JS and CSS have been run and applied. This is logical compared to the wording.

My conclusion is that WPT's "fully loaded" has regressed or that its definition has changed, in that it doesn't stop at just network requests, but now waits for visual completion. It's possible that it's a bug, where for example the browser generates an internal network requests that messes up with WPT's detection (although that seems unlikely, I would imagine that WPT would base "fully loaded" on the HAR).

Regardless of that issue, "fully loaded" is a bad metric to keep track of, as it's disconnected from the user experience. As seen in this example, in can take as much as an extra 10% of time for the JS and CSS of the last request to be run and applied, which can include very visible changes above the fold.

I would advise you to stop tracking fullyLoaded in those Grafana dashboards and to instead rely on "last visual change" and/or "visualComplete99". I've checked and those were stable before and after the content change.

tl;dr use lastVisualChange and/or visualComplete99 instead of fullyLoaded

@Gilles no I didn't touch it the 30th as I remember. I try to add an annotation for every change I do so it's easier to remember. But checking the commit log for WPTagent it was a change: https://github.com/WPO-Foundation/wptagent/commit/217c36fea9fb534b570421e19e0f3148186bf2fe
with the fully loaded. It looks like we missed requests before, so the new way is correct.

When the Docker containers get tagged as in T192050 I'll switch to use them and then turn off the automatic updates.

It would be nice to have the git version of the agent logged somewhere inside the runs.

Yep, I asked before for the Windows version, but let me ask again for the agent. I've been wanting a git version, product version and a changelog for a while.

@Gilles and @Peter, thank you for the analysis and support!

Your WPT comparison is with the desktop site, though, and the graph is for the mobile site.

My mistake, thank you for noting this.

tl;dr use lastVisualChange and/or visualComplete99 instead of fullyLoaded

Thanks, I've revised the panel I referenced earlier.

Screenshot from 2018-04-17 10-02-19.png (958×1 px, 234 KB)

Gilles claimed this task.
This comment was removed by Jdlrobson.