Maniphest T206702

Enable client side error counting on Minerva production (wikipedia only)
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	Jdlrobson
	Oct 10 2018, 9:04 PM

Description

In T205582 we made it possible to count client side errors.

Enable wgMinervaCountErrors on all wikipedia wikis

Enabling it on wikipedia should give us the coverage we need.

Details

	Subject	Repo	Branch	Lines +/-
	Enable client side error counting on Minerva	operations/mediawiki-config	master	+5 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Jdlrobson	T195473 [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture
Resolved	Jdlrobson	T195475 [EPIC] Automate asset bundling in MobileFrontend
Resolved	Jdlrobson	T166905 [EPIC] Talk about and improve our frontend code architecture
Declined	None	T106915 Use Sentry in production
Resolved	Jdlrobson	T167699 [EPIC] Enable JS error reporting in the mobile website
Resolved	Jdlrobson	T206702 Enable client side error counting on Minerva production (wikipedia only)

Event Timeline

Jdlrobson created this task.Oct 10 2018, 9:04 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 10 2018, 9:04 PM

Jdlrobson triaged this task as High priority.Oct 10 2018, 9:04 PM

Restricted Application added a subscriber: Dereckson. · View Herald TranscriptOct 10 2018, 9:04 PM

ovasileva set the point value for this task to 1.Oct 16 2018, 4:47 PM

ovasileva edited projects, added Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2); removed Web-Team-Backlog.

Change 467760 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Enable client side error counting on Minerva

https://gerrit.wikimedia.org/r/467760

gerritbot added a project: Patch-For-Review.Oct 16 2018, 6:43 PM

Piotr will swat this tomorrow AM

Change 467760 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable client side error counting on Minerva

https://gerrit.wikimedia.org/r/467760

Mentioned in SAL (#wikimedia-operations) [2018-10-17T11:30:34Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: T206702 [[gerrit:467760|Enable client side error counting on Minerva]] (duration: 00m 57s)

Task was deployed to production. There are no stats tracked yet as MinervaSkin is not up to date. Current Minerva version is e679f5207aafa4815298ad923225a0ba8c543b9a (Oct 2nd).

I have a couple of concerns with this change that I spoke about with @pmiazga but wanted to raise here too:

There doesn't seem to be a note about how to monitor the additional load on Varnish or Graphite. For the latter, I would advise watching https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?refresh=1m&orgId=1 closely (though I can't find any clarification on-wiki about the difference between the local and frontend graphs)
Further to the above, we appear to be going from 0 to 100% without testing what the additional load on Varnish or Graphite is. Perhaps we could roll this out to one or two large wikis first?

Also, @pmiazga and I have both arrived at the following performance optimisation independently: since we're just counting the number of errors and discarding any detail, it should be possible to count the number of errors per pageview and increment our counter when the page unloads. This would minimise the number of HTTP requests we make to the statsv beacon endpoint.

I made a patch to optimize number of requests: https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/467984, but I'm afraid that this can be not a best way to do it. There are 2 problems:

not all browsers support onbeforeunload
not all browsers support AJAX requests during unload events.

Most probably we want to track all errors, especially for older browsers. But older browsers might not support sending stats onbeforeunload and because of that, we might lose some data.

Jdlrobson moved this task from Needs Code Review to Ready for Signoff on the Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2) board.Oct 17 2018, 5:13 PM

pmiazga moved this task from Ready for Signoff to Needs More Work on the Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2) board.Oct 17 2018, 5:17 PM

Moving to Needs More Work to reflect the current state.

In T206702#4674734, @pmiazga wrote:

not all browsers support onbeforeunload

https://caniuse.com/#search=beforeunload begs to differ (acknowledging that we can't strictly rely on CIU) but do you have another source?

not all browsers support AJAX requests during unload events.

statsv requests are sent by the Beacon API when it's supported.

As we discussed, I'm grateful that you raised these points as it highlights a potential problem with the proposed performance improvement: it's likely that those browsers that we deliver JS assets to but don't support the Beacon API are those most likely to be affected by JS errors but we might stand to lose the most data for. I'm not sure if this problem becomes moot if we're only concerned with counting the global error rate.

In T206702#4675428, @phuedx wrote:

In T206702#4674734, @pmiazga wrote:

not all browsers support onbeforeunload

https://caniuse.com/#search=beforeunload begs to differ (acknowledging that we can't strictly rely on CIU) but do you have another source?

Hmm, I didn't verify the CIU first, but I remember having some issues with that event, but maybe this was related to using alerts/confirms, not to the event itself. If CIU says it's supported -> let's use it

Moving to Needs More Work to reflect the current state.

Please create a new task for the additional work proposed. This task was only about deploying the change, which looks to have been done.
Given T205582 is in 1.32.0-wmf.26 we won't see this change in production for a while.
I'm moving to backlog for the time being. I'll move back into board when it's possible to sign this off.

Jdlrobson edited projects, added Web-Team-Backlog; removed Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2).Oct 17 2018, 9:19 PM

Jdlrobson moved this task from Upcoming to Needs Prioritization on the Web-Team-Backlog board.

Jdlrobson added a parent task: T167699: [EPIC] Enable JS error reporting in the mobile website.Oct 18 2018, 4:16 PM

Jdlrobson mentioned this in T167699: [EPIC] Enable JS error reporting in the mobile website.

Errors are coming in!

https://grafana.wikimedia.org/dashboard/db/reading-web-dashboard?orgId=1&panelId=15&fullscreen

Screen Shot 2018-10-18 at 4.51.48 PM.png (807×1 px, 67 KB)

MAGIC!

… and how's Graphite coping?

@phuedx the UDP packet rate increased but then ~7am UTC today it dropped. Graphs looks ok, All other graphs looks ok, nothing we should worry about.

	F26625040: Screen Shot 2018-10-18 at 4.51.48 PM.png
	Oct 18 2018, 11:52 PM

Enable client side error counting on Minerva production (wikipedia only)Closed, ResolvedPublic1 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Enable client side error counting on Minerva production (wikipedia only)
Closed, ResolvedPublic1 Estimated Story Points
Actions

Related Objects
Search...