Page MenuHomePhabricator

2019-02-21 loadEventEnd overall + responseStart mobile regression
Closed, ResolvedPublic

Description

The spike that starts around 2019-02-21 03:00 is due to eqsin being taken out of rotation, with users based in Asia being directed to ulsfo instead: https://grafana.wikimedia.org/d/000000304/prometheus-varnish-dc-stats?orgId=1&var-datasource=eqsin%20prometheus%2Fops&var-cluster=cache_upload&var-layer=backend&var-layer=frontend&from=1550697339623&to=1550805339623

However, things went back to normal from a networking perspective around 2019-02-21 20:00

This networking outage seems to have masked a likely MediaWiki regression that happened during that timespan, when group2 graduated to 1.33.0-wmf.18 at 2019-02-21 18:50. loadEvendEnd overall and responseStart mobile have been at significantly higher levels since that deployment.

Event Timeline

Gilles triaged this task as Unbreak Now! priority.Feb 28 2019, 11:25 AM

The explanation is that we lost most of Safari in navtiming during that deployment because of T217210: Nav timing throws exception on Safari "TypeError: entryTypes contained only unsupported types"

Capture d'écran 2019-02-28 13.06.33.png (852×3 px, 608 KB)

And since iOS devices tend to be used by people in wealthier countries, we've lost more fast traffic than slow traffic in the missing data. As seen in the report rate for the US that dropped more, relatively, than others:

Capture d'écran 2019-02-28 13.07.50.png (862×2 px, 572 KB)

Things should get back to normal when the backported bugfix goes live today with the 1.33.0-wmf.19 release.