Page MenuHomePhabricator

navtiming: firstPaint.mobile metric broken on wmf.24
Closed, ResolvedPublic

Description

When the wmf.24 branch went out, two navtiming alerts started firing:

The alert for the metric value:

Screenshot 2019-04-04 at 22.09.36.png (690×1 px, 44 KB)

The apparent speed regression is just a side-effect of another issue, which is that there is no data:

Screenshot 2019-04-04 at 22.14.09.png (396×2 px, 85 KB)

Screenshot 2019-04-04 at 22.09.29.png (696×2 px, 42 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Commits to Navigation Timing that are new in wmf.24:

I don't see a regression in EventLogging in take, which suggests the client is still sending data at the same rate, and still in a valid format:

And the new PaintTiming schema is also working:

This suggests it might be something in the navtiming.py service instead, which might not be processing those events correctly. Note that the firstPaint.overall metric is still receiving data, but the buckets such as .mobile.anonymous were not.

Gilles triaged this task as High priority.

frontend.navtiming2.by_browser.*.* worked throughout the incident

frontend.navtiming2.desktop.overall was twice was it was supposed to during the incident (getting the mobile ones as well):

Capture d'écran 2019-04-05 07.29.53.png (488×402 px, 30 KB)

frontend.navtiming2.desktop.authenticated included all anonymous as well:

Capture d'écran 2019-04-05 07.31.07.png (490×478 px, 29 KB)

It seems like instead of differentiating "site" and "auth" by their usual values, it assumed everything was desktop/authenticated. I think I know why: it must be getting that information from values inside the schema and not from the capsule. Just like the firstPaint schema needs oversampling information, I bet it needs the isAnon and mobileMode fields too.

Yep:

if 'mobileMode' in event:
            if event['mobileMode'] == 'stable':
                site = 'mobile'
            else:
                site = 'mobile-beta'
        else:
            site = 'desktop'
auth = 'anonymous' if event.get('isAnon') else 'authenticated'

Change 501484 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Add isAnon and mobileMode to PaintTiming context

https://gerrit.wikimedia.org/r/501484

Change 501604 had a related patch set uploaded (by Krinkle; owner: Gilles):
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.24] Add isAnon and mobileMode to PaintTiming context

https://gerrit.wikimedia.org/r/501604

Change 501604 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.24] Add isAnon and mobileMode to PaintTiming context

https://gerrit.wikimedia.org/r/501604

Mentioned in SAL (#wikimedia-operations) [2019-04-05T15:57:55Z] <krinkle@deploy1001> Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be850d35c7d19 / T220156 (duration: 01m 00s)

Change 501484 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add isAnon and mobileMode to PaintTiming context

https://gerrit.wikimedia.org/r/501484