Page MenuHomePhabricator

Regression: domInteractive & loadEventEnd
Closed, InvalidPublic

Description

We maybe got regression on domInteractive & loadEventEnd:

When you zoom in it seems to happen the 16-17th:

Lets see if we can track it. Thanks @ori for reporting.

https://grafana.wikimedia.org/dashboard/db/navigation-timing

Event Timeline

Peter created this task.Aug 29 2016, 5:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2016, 5:54 AM
Peter updated the task description. (Show Details)Aug 29 2016, 5:58 AM
Peter added a subscriber: ori.
Gilles added a subscriber: Gilles.Aug 29 2016, 7:41 AM

For what it's worth, some of the RL changes were deployed on the 16th if I'm reading https://wikitech.wikimedia.org/wiki/Server_Admin_Log correctly.

Peter added a comment.Aug 29 2016, 8:00 AM

ah cool. I saw the impact is bigger on desktop since we the 19th released the lazy loading of images for mobile. I really like that we have https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-platform?panelId=13&fullscreen

Yeah I don't think the changes I've found are responsible, lazy loading makes more sense.

The huge peak of domInteractive is desktop-only and clearly started on the 17th, though. Mobile is unaffected by that. Same for loadEventEnd.

Peter added a comment.Aug 29 2016, 8:27 AM

yeah lazy loading happened the 19th right, so it's seems to be something else.

Gilles added a comment.EditedAug 29 2016, 8:41 AM

Now that I'm reading the SAL more carefully, it seems unlikely that any deployment on the 16th is responsible. The RL changes were on a branch that was only deployed to group 0 wikis at the time, it only went out to all wikis on the 18th (group 1 on the 17th). On the 16th the only deployments that went out to all wikis were for Flow and Kartographer: https://gerrit.wikimedia.org/r/#/c/304985/ https://gerrit.wikimedia.org/r/#/c/305080/ Both of which look highly unlikely to be the cause.

Maybe it's time to look at the data and figure out if this was a data collection problem. After all, if you look at a very long period, those metrics were only better than usual between July 20th and August 17th.

Gilles closed this task as Resolved.Aug 29 2016, 8:45 AM
Gilles claimed this task.

I've found the answer, the mix of browsers reporting data changed over time:

Chrome was reporting more than others for a while, then that phenomenon went away.

Gilles changed the task status from Resolved to Invalid.EditedAug 29 2016, 8:46 AM

Not a regression, to be clear. Why Chrome started over reporting and then stopped doing so might be worth investigating in a separate task.

Peter added a comment.Aug 29 2016, 8:48 AM

Ah cool. Good work!

This was caused by a misbehaving UA, which Ops blocked on August 17th: T141786