Chrome 67 performance regression
Open, NormalPublic

Description

Yesterday when I pushed the new version of Chrome 67 I got an alert on my own setup that Speed Index increased. I checked first visual change too and the same there. I rollbacked to 66 and the metrics got back to normal. Today I pushed 67 again and I could see the same thing. I could also see first paint changes. But only on Wikipedia URLs. Paul Irish pinged me on Twitter and said he could help out. I've checked our WebPageTest tests today (I haven't updated Browsertime yet) and I could see the same thing - 67 seven increased the metrics:

The original issue: https://github.com/sitespeedio/sitespeed.io/issues/2069
Upstream: https://bugs.chromium.org/p/chromium/issues/detail?id=849108

Since we see a change in first paint too I'm pretty sure we will see the same in our RUM metrics when 67 rolls out.

Peter created this task.Jun 2 2018, 6:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 2 2018, 6:24 PM
Peter renamed this task from Chome 67 regression to Chome 67 performance regression.Jun 2 2018, 6:25 PM
Aklapper renamed this task from Chome 67 performance regression to Chrome 67 performance regression.Jun 2 2018, 8:54 PM
Peter added a comment.Jun 3 2018, 3:37 PM

I've pushed 67 on Browsertime/WebPageReplay and will just wait to see if we get the same thing there (I get it on the sitespeed.io servers so we should too). I'll create an upstream issue and then I need to go through our alert setups for WebPageTest since it didn't fire.

Peter added a comment.EditedJun 3 2018, 4:34 PM

Yep checking the metrics on Browrsertime, first visual change increase with 33 ms or so. I'll add the graphs when we have more data.

Krinkle updated the task description. (Show Details)Jun 3 2018, 5:12 PM
Krinkle updated the task description. (Show Details)Jun 3 2018, 5:17 PM
Peter added a comment.Jun 4 2018, 6:48 AM

This is interesting. Our alerts didn't fire because all pages aren't affected:

Of those we test Facebook, Aretha Franklin, Sweden have an regression. Obama is the same and Metalloid renders a little faster.

On WebPageTest though we can see that the "fast" runs isn't there anymore for the Obama page:

If I look at the events level there are two that increases with the release. UploadLayerTree seems to change for all URLs

The other increase is in BlinkGCMarking 30 ms per URL.

Legoktm added a subscriber: Legoktm.Jun 4 2018, 7:45 AM
Imarlier moved this task from Inbox to Radar on the Performance-Team board.Jun 4 2018, 8:01 PM
Imarlier edited projects, added Performance-Team (Radar); removed Performance-Team.
Peter updated the task description. (Show Details)Jun 7 2018, 11:52 AM
Peter added a comment.Jun 11 2018, 6:08 AM

Our RUM data has started alerting now:

And we have more traffic now on 67:

Gilles added a subscriber: Gilles.Jun 12 2018, 12:10 PM

...and what do humans think? :)

SET hive.auto.convert.join.noconditionaltask=false;
SELECT useragent.browser_major, event.surveyResponseValue, COUNT(*) FROM event.quicksurveysresponses WHERE year = 2018 AND useragent.browser_family = 'Chrome' AND useragent.browser_major IN (66, 67) AND event.surveyCodeName = "perceived-performance-survey" GROUP BY useragent.browser_major, event.surveyResponseValue;

66	ext-quicksurveys-example-internal-survey-answer-negative	86
66	ext-quicksurveys-example-internal-survey-answer-neutral	114
66	ext-quicksurveys-example-internal-survey-answer-positive	1178
67	ext-quicksurveys-example-internal-survey-answer-negative	26
67	ext-quicksurveys-example-internal-survey-answer-neutral	25
67	ext-quicksurveys-example-internal-survey-answer-positive	298

6.96% of nos for Chrome 66, 8.02% for Chrome 67.

Peter triaged this task as Normal priority.Jun 20 2018, 10:36 AM
Vvjjkkii renamed this task from Chrome 67 performance regression to 5rbaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii removed Peter as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot assigned this task to Peter.
CommunityTechBot lowered the priority of this task from High to Normal.
CommunityTechBot renamed this task from 5rbaaaaaaa to Chrome 67 performance regression.
CommunityTechBot added a subscriber: Aklapper.

I've looked at the RUM data and I'm pretty sure this is a Chrome regression:



Re-running the survey data I get:

93.02% satisfaction on Chrome 66 (1604 total responses)
93.00% satisfaction on Chrome 67 (4952 total responses)
94.16% satisfaction on Chrome 68 (3270 total responses)
94.22% satisfaction on Chrome 69 (1281 total responses)

Which means that now that we have more data, the outcome changed compared to June. Now, this doesn't mean that there's no firstPaint regression. The initial machine learning findings have shown that there is only a very weak correlation between user performance satisfaction and firstPaint.

Yep, you are right.

One thing that strikes me is do we collect which article the user visits when she gets the study? Do the user get the questions after or before she able to read (sorry I forgot). I mean could it be higher correlation between if the user finds what she wants and performance satisfaction instead of specific metrics?

A.k.a when I'm happy I feel that every web site is faaast :)

It can be any article. It derives from NavigationTiming sampling. Here's how fast the survey appears after the load event (something we have no control over), using performance.now() - navtiming2 loadEventEnd:

SELECT ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.5)) AS median, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.75)) AS p75, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.95)) AS p95, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.99)) AS p99, COUNT(*) AS count FROM event.quicksurveyinitiation AS q INNER JOIN event.navigationtiming n ON q.event.surveyInstanceToken = n.event.stickyRandomSessionId WHERE n.year = 2018 AND q.year = 2018 AND q.event.surveyCodeName = "perceived-performance-survey" AND q.event.performanceNow IS NOT NULL AND n.event.loadEventEnd IS NOT NULL;
medianp75p95p99count
925.02156.013826.088303.01749791

And here is the same figure for survey impressions that people actually responded to:

SELECT ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.5)) AS median, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.75)) AS p75, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.95)) AS p95, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.99)) AS p99, COUNT(*) AS count FROM event.quicksurveyinitiation AS q INNER JOIN event.navigationtiming n ON q.event.surveyInstanceToken = n.event.stickyRandomSessionId INNER join event.quicksurveysresponses q2 ON q.event.surveyInstanceToken = q2.event.surveyInstanceToken WHERE n.year = 2018 AND q.year = 2018 AND q2.year = 2018 AND q.event.surveyCodeName = "perceived-performance-survey" AND q.event.performanceNow IS NOT NULL AND n.event.loadEventEnd IS NOT NULL AND q2.event.surveyResponseValue IS NOT NULL;
 medianp75p95p99count
1089.02862.021874.0102742.078160

Out of curiosity, since this is something I haven't looked at before, let's see if those figures are different depending on people's response:

SELECT q2.event.surveyResponseValue, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.5)) AS median, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.75)) AS p75, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.95)) AS p95, ROUND(PERCENTILE(q.event.performanceNow - n.event.loadEventEnd, 0.99)) AS p99, COUNT(*) AS count FROM event.quicksurveyinitiation AS q INNER JOIN event.navigationtiming n ON q.event.surveyInstanceToken = n.event.stickyRandomSessionId INNER join event.quicksurveysresponses q2 ON q.event.surveyInstanceToken = q2.event.surveyInstanceToken WHERE n.year = 2018 AND q.year = 2018 AND q2.year = 2018 AND q.event.surveyCodeName = "perceived-performance-survey" AND q.event.performanceNow IS NOT NULL AND n.event.loadEventEnd IS NOT NULL AND q2.event.surveyResponseValue IS NOT NULL GROUP BY q2.event.surveyResponseValue;
 surveyresponsevaluemedianp75p95p99count
ext-quicksurveys-example-internal-survey-answer-negative1643.04044.026442.0127901.05873
ext-quicksurveys-example-internal-survey-answer-neutral1393.03736.030771.0151637.05764
ext-quicksurveys-example-internal-survey-answer-positive1031.02663.020545.098473.066523

The faster the performance of the page, the faster the survey will appear, though, since being low priority, it depends on the page's performance. We would have to look at surveys that appeared fast despite an initially bad performance (if there any such data points) to see if the survey delay itself has an impact on responses. I'll ask the Telecom ParisTech team to look into that.

There is no correlation between survey responses and transferSize (HTML size). Usually article quality is proportional to length. I.e. if underwhelming stub results yielded worse survey responses, the effect would have been visible when we looked at transferSize individually or as part of features fed to machine learning. I think that if article quality affected people's responses significantly, it would have surfaced in this way.

Gilles added a comment.EditedFri, Sep 28, 1:40 PM

As for happiness/sadness, they looked at the Russian wiki survey responses around the time Russia got kicked out of the world cup (which should have had a depressing effect on a large portion of the Russian population) and didn't find any effect on the ratio of survey responses.

Gilles added a comment.EditedMon, Oct 1, 5:02 AM

To verify the effect of the survey late appearance further, I look at when the survey speed is at odds with the page speed.

Baseline (all data) gives a 91.9% user satisfaction ratio

Slow page (>5s) and fast survey (<1s) gives a 76.9% user satisfaction ratio
Slow page (>5s) and slow survey (>5s) gives a 73.1% user satisfaction ratio

Fast page (<1s) and slow survey (>5s) gives a 92.3% user satisfaction ratio
Fast page (<1s) and fast survey (<1s) gives a 95.2% user satisfaction ratio

Now, for the survey to appear in more than 5s, you need to have slow internet. To prove the real effect of a slow survey on an otherwise fast internet connection, we would need to artificially delay the survey for users with a fast internet connection. With the current data, the late survey's effect is therefore less than the percentage difference seen. What I think is important here is that the satisfaction difference between slow and fast loadEventEnd is huge compared to the differences in slow/fast survey. Which indicates that the survey mechanics play a small part compared to the speed of the page itself.

As a bonus, here's the distribution of survey display time and loadEventEnd: