In T205582 we made it possible to count client side errors.
Enable wgMinervaCountErrors on all wikipedia wikis
Enabling it on wikipedia should give us the coverage we need.
In T205582 we made it possible to count client side errors.
Enable wgMinervaCountErrors on all wikipedia wikis
Enabling it on wikipedia should give us the coverage we need.
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Enable client side error counting on Minerva | operations/mediawiki-config | master | +5 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Jdlrobson | T195473 [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture | |||
Resolved | Jdlrobson | T195475 [EPIC] Automate asset bundling in MobileFrontend | |||
Resolved | Jdlrobson | T166905 [EPIC] Talk about and improve our frontend code architecture | |||
Declined | None | T106915 Use Sentry in production | |||
Resolved | Jdlrobson | T167699 [EPIC] Enable JS error reporting in the mobile website | |||
Resolved | Jdlrobson | T206702 Enable client side error counting on Minerva production (wikipedia only) |
Change 467760 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Enable client side error counting on Minerva
Change 467760 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable client side error counting on Minerva
Mentioned in SAL (#wikimedia-operations) [2018-10-17T11:30:34Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: T206702 [[gerrit:467760|Enable client side error counting on Minerva]] (duration: 00m 57s)
Task was deployed to production. There are no stats tracked yet as MinervaSkin is not up to date. Current Minerva version is e679f5207aafa4815298ad923225a0ba8c543b9a (Oct 2nd).
I have a couple of concerns with this change that I spoke about with @pmiazga but wanted to raise here too:
Also, @pmiazga and I have both arrived at the following performance optimisation independently: since we're just counting the number of errors and discarding any detail, it should be possible to count the number of errors per pageview and increment our counter when the page unloads. This would minimise the number of HTTP requests we make to the statsv beacon endpoint.
I made a patch to optimize number of requests: https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/467984, but I'm afraid that this can be not a best way to do it. There are 2 problems:
Most probably we want to track all errors, especially for older browsers. But older browsers might not support sending stats onbeforeunload and because of that, we might lose some data.
https://caniuse.com/#search=beforeunload begs to differ (acknowledging that we can't strictly rely on CIU) but do you have another source?
- not all browsers support AJAX requests during unload events.
statsv requests are sent by the Beacon API when it's supported.
As we discussed, I'm grateful that you raised these points as it highlights a potential problem with the proposed performance improvement: it's likely that those browsers that we deliver JS assets to but don't support the Beacon API are those most likely to be affected by JS errors but we might stand to lose the most data for. I'm not sure if this problem becomes moot if we're only concerned with counting the global error rate.
Hmm, I didn't verify the CIU first, but I remember having some issues with that event, but maybe this was related to using alerts/confirms, not to the event itself. If CIU says it's supported -> let's use it
Moving to Needs More Work to reflect the current state.
Please create a new task for the additional work proposed. This task was only about deploying the change, which looks to have been done.
Given T205582 is in 1.32.0-wmf.26 we won't see this change in production for a while.
I'm moving to backlog for the time being. I'll move back into board when it's possible to sign this off.
Errors are coming in!
https://grafana.wikimedia.org/dashboard/db/reading-web-dashboard?orgId=1&panelId=15&fullscreen
MAGIC!
@phuedx the UDP packet rate increased but then ~7am UTC today it dropped. Graphs looks ok, All other graphs looks ok, nothing we should worry about.