Page MenuHomePhabricator

Possible regression firstVisualChange/SpeedIndex
Closed, DeclinedPublic

Description

Been going through the WebPageTest updated dashboard, and it looks like we have an regression for firstVisualChange/SpeedIndex that started the 15th of April.

First Visual Change for Chrome Desktop seems to go up 100 ms for all URLs except the Obama page:
https://grafana.wikimedia.org/dashboard/db/webpagetest?orgId=1&from=now-7d&to=now&panelId=10&fullscreen

Screen Shot 2018-04-18 at 1.05.54 PM.png (1×2 px, 742 KB)

The same with SpeedIndex on Chrome desktop:
https://grafana.wikimedia.org/dashboard/db/webpagetest?panelId=7&fullscreen&orgId=1

The same for Firefox except the Facebook page doesn't have any change.

Screen Shot 2018-04-18 at 1.01.50 PM.png (922×1 px, 257 KB)

For mobile two out of three also has an increase but a couple of hours before:
https://grafana.wikimedia.org/dashboard/db/webpagetest?panelId=58&fullscreen&orgId=1

Screen Shot 2018-04-18 at 1.03.10 PM.png (904×1 px, 344 KB)

We don't have any change Browsertime/Visual Metrics at that time but we got that change a couple of days before. We had that restart on the instance but at the same time when our measurements was down, we turned on 10% of the traffic to get the new Page Previews. Or I wonder if it could be a AWS issue somehow? I cannot see a change on the Digital Ocean server I run on the side but I've done some changes there so it isn't 100% safe to draw any conclusions from that.

Event Timeline

Playing the game of "spot the difference" between those runs before and after the issue, I notice that a number of central notice modules are loaded in the slow run that aren't present in the fast run: choiceData, largeBannerLimit and legacySupport. They account for an extra 1.5kb compressed JS.

@AndyRussG @Ejegg any idea why the "largeBannerLimit" and "legacySupport" CentralNotice modules started being loaded on (empty cache) initial pageload on our US-based synthetic testing?

@Gilles Those modules are loaded when a user's language and project indicate they may be targeted by a CentralNotice campaign that uses the features those modules provide. Normally, all Fundraising campaigns load them. Since the RL cache doesn't fragment on country, we have to send the modules to lots of users even if client-side CentralNotice code eventually de-selects many of them due to geolocation. For more details, please see Campaign and banner selection.

If you go to the browser console and type mw.CentralNotice.choiceData on enwiki, you'll see two Fundraising campaigns targeting English Wikipedia in France. So it's expected that those modules will load for all anonymous users on enwiki. I'm not sure exactly why that would have started on April15th. I'm afraid we don't have very easy reconstruction of when campaigns actually go out where. Glancing at CentralNotice logs and current campaign setup, I would have expected those modules to have already been going out to enwiki before that. If you like, we can look more closely, though. :)

Gilles triaged this task as Medium priority.
Gilles moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.

Thanks @AndyRussG

Is there any way to disable CentralNotice with JS or a cookie? At least to the extent that it wouldn't load those extra modules triggered by campaigns.

This particular issue is pretty mild, but every year we deal with Big English disrupting our visual metrics and it might be useful to be able to turn that off for the purpose of some performance tests. And in this particular instance if would allow us to verify if that extra bit of JS is responsible for what we're seeing, without having to wait for those campaigns to end.

Bar any new campaigns added until then, the largeBannerLimit module should stop loading after May 8th and legacySupport should stop after May 14th.

Looking at the configuration in choiceData, though, it seems like the fact that those modules load in a place not targeted by the campaigns is avoidable. I.e. the configuration where the mixins are defined also specifies that the campaigns only run in the Netherlands and India, yet the mixin modules are loaded everywhere while these campaigns are active. I'll file a task for that.

largeBannerLimit and legacySupport aren't in any active campaign anymore, let's check if things have gotten back to previous levels

They haven't. CentralNotice isn't to blame for this.

I have noticed a much simpler difference between the before and after runs that I had missed before...

http://wpt.wmftest.org/result/180410_9F_QD/3/details/

user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 PTST/180406.130428

http://wpt.wmftest.org/result/180420_B1_99/4/details/

user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36 PTST/180417.190444

Looking back around the 15th, however, it seems like the agent picked up Chrome 66 only on April 18th, that browser upgrade doesn't explain the start of the spike on the 15th.

One of the last runs with Chrome 65: http://wpt.wmftest.org/result/180418_SB_G2/1/details/
One of the first runs with Chrome 66: http://wpt.wmftest.org/result/180418_DY_JR/1/details/

@Gilles I lost track of this. Is there anything more we can do or just drop it since it was so long ago?

Since our only leads didn't turn out to be the cause, I think we can write this off as a total (old) mystery.