When the wmf.24 branch went out, two navtiming alerts started firing:
The alert for the metric value:
The apparent speed regression is just a side-effect of another issue, which is that there is no data:
Krinkle | |
Apr 4 2019, 9:14 PM |
F28584021: Capture d'écran 2019-04-05 07.29.53.png | |
Apr 5 2019, 5:33 AM |
F28584025: Capture d'écran 2019-04-05 07.31.07.png | |
Apr 5 2019, 5:33 AM |
F28582789: Screenshot 2019-04-04 at 22.14.09.png | |
Apr 4 2019, 9:14 PM |
F28582780: Screenshot 2019-04-04 at 22.09.36.png | |
Apr 4 2019, 9:14 PM |
F28582791: Screenshot 2019-04-04 at 22.09.29.png | |
Apr 4 2019, 9:14 PM |
When the wmf.24 branch went out, two navtiming alerts started firing:
The alert for the metric value:
The apparent speed regression is just a side-effect of another issue, which is that there is no data:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Release | • dduvall | T206678 1.33.0-wmf.24 deployment blockers | ||
Resolved | • Gilles | T220156 navtiming: firstPaint.mobile metric broken on wmf.24 |
Commits to Navigation Timing that are new in wmf.24:
I don't see a regression in EventLogging in take, which suggests the client is still sending data at the same rate, and still in a valid format:
And the new PaintTiming schema is also working:
This suggests it might be something in the navtiming.py service instead, which might not be processing those events correctly. Note that the firstPaint.overall metric is still receiving data, but the buckets such as .mobile.anonymous were not.
rENTI72dbd1cbd8c0: Collect Layout Stability API jank scores
rENTI88f679b97b43: Add Element Timing support
I believe those 2 were already in wmf.23?
frontend.navtiming2.by_browser.*.* worked throughout the incident
frontend.navtiming2.desktop.overall was twice was it was supposed to during the incident (getting the mobile ones as well):
frontend.navtiming2.desktop.authenticated included all anonymous as well:
It seems like instead of differentiating "site" and "auth" by their usual values, it assumed everything was desktop/authenticated. I think I know why: it must be getting that information from values inside the schema and not from the capsule. Just like the firstPaint schema needs oversampling information, I bet it needs the isAnon and mobileMode fields too.
Yep:
if 'mobileMode' in event: if event['mobileMode'] == 'stable': site = 'mobile' else: site = 'mobile-beta' else: site = 'desktop' auth = 'anonymous' if event.get('isAnon') else 'authenticated'
Change 501484 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Add isAnon and mobileMode to PaintTiming context
Change 501604 had a related patch set uploaded (by Krinkle; owner: Gilles):
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.24] Add isAnon and mobileMode to PaintTiming context
Change 501604 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.24] Add isAnon and mobileMode to PaintTiming context
Mentioned in SAL (#wikimedia-operations) [2019-04-05T15:57:55Z] <krinkle@deploy1001> Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be850d35c7d19 / T220156 (duration: 01m 00s)
Change 501484 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add isAnon and mobileMode to PaintTiming context