Page MenuHomePhabricator

Investigate 2016-09-08 increase in firstPaint for p95 & p99
Closed, DuplicatePublic

Description

Screen Shot 2016-09-12 at 7.59.24 AM.png (998×2 px, 251 KB)

It all started something like 2016-09-08 19:XX

p95 from 4 s to over 10 s, p99 increased from 14 s to over a minute.

Event Timeline

It's all in desktop. Couldn't see anything special in hit ratio for nav timing. And no special increase/decrease for how many metrics from Chrome.

On 2016-09-07 Chrome 53 was released for iOS & Android. But desktop was one week earlier: 2016-08-31. And checking per version, it all happens in 52:

Screen Shot 2016-09-12 at 8.22.40 AM.png (942×2 px, 260 KB)

It looks like a little change for authenticated users:

Screen Shot 2016-09-12 at 8.32.37 AM.png (924×2 px, 313 KB)

And major change for anonymous:

Screen Shot 2016-09-12 at 8.32.50 AM.png (904×2 px, 306 KB)

@Peter, the next step in isolating an issue like this is to check the Server Admin Log to see if the regression coincided with a deployment. In this case, it looks like the following entries are possible correlates:

  • 20:03 logmsgbot: demon@tin Synchronized php-1.28.0-wmf.18/extensions/Echo/includes/SeenTime.php: Trying to stop some duplicate redis fetches (duration: 00m 52s)
  • 19:57 bblack: repooling normal traffic to cache_upload in ulsfo
  • 19:02 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.18

It looks like it started at around 19:07, which lines up neatly with the group2 to wmf.18 deployment at 19:02 + 5 minutes of ResourceLoader cache time.

download (2).png (308×586 px, 17 KB)

Mentioned in SAL [2016-09-12T18:24:51Z] <ori> Changing wikiversion for group2 wikis on mw1017 to debug regression (T145359)

Looking for JavaScript code that has changed from wmf.17 to wmf.18 and that loads for anonymous users on most or all pages, I see changes in ULS and CentralNotice. @Nikerabbit, @AndyRussG, are you aware of any changes that could have caused a regression?

@ori there was a CN update that went out in a SWAT deploy, 23:00 - 0:00 UTC. Nothing earlier that day...

Krinkle renamed this task from Investigate increase in firstPaint for p95 & p99 to Investigate 2016-09-08 increase in firstPaint for p95 & p99.Sep 12 2016, 8:57 PM
Krinkle subscribed.

That ULS update contains big jquery.uls update, but nothing in there calls my attention, as all of that code is supposed to be executed only after user interaction.