It all started something like 2016-09-08 19:XX
p95 from 4 s to over 10 s, p99 increased from 14 s to over a minute.
Peter | |
Sep 12 2016, 6:04 AM |
F4460595: download (2).png | |
Sep 12 2016, 6:22 PM |
F4458673: Screen Shot 2016-09-12 at 8.32.50 AM.png | |
Sep 12 2016, 6:39 AM |
F4458669: Screen Shot 2016-09-12 at 8.32.37 AM.png | |
Sep 12 2016, 6:39 AM |
F4458639: Screen Shot 2016-09-12 at 8.22.40 AM.png | |
Sep 12 2016, 6:24 AM |
F4458601: Screen Shot 2016-09-12 at 7.59.24 AM.png | |
Sep 12 2016, 6:04 AM |
It all started something like 2016-09-08 19:XX
p95 from 4 s to over 10 s, p99 increased from 14 s to over a minute.
It's all in desktop. Couldn't see anything special in hit ratio for nav timing. And no special increase/decrease for how many metrics from Chrome.
On 2016-09-07 Chrome 53 was released for iOS & Android. But desktop was one week earlier: 2016-08-31. And checking per version, it all happens in 52:
It looks like a little change for authenticated users:
And major change for anonymous:
@Peter, the next step in isolating an issue like this is to check the Server Admin Log to see if the regression coincided with a deployment. In this case, it looks like the following entries are possible correlates:
It looks like it started at around 19:07, which lines up neatly with the group2 to wmf.18 deployment at 19:02 + 5 minutes of ResourceLoader cache time.
Mentioned in SAL [2016-09-12T18:24:51Z] <ori> Changing wikiversion for group2 wikis on mw1017 to debug regression (T145359)
Looking for JavaScript code that has changed from wmf.17 to wmf.18 and that loads for anonymous users on most or all pages, I see changes in ULS and CentralNotice. @Nikerabbit, @AndyRussG, are you aware of any changes that could have caused a regression?
@ori there was a CN update that went out in a SWAT deploy, 23:00 - 0:00 UTC. Nothing earlier that day...
That ULS update contains big jquery.uls update, but nothing in there calls my attention, as all of that code is supposed to be executed only after user interaction.