Page MenuHomePhabricator

Many errors on ReadingDepth.enable (?) schema
Closed, ResolvedPublic

Description

(assigning to Jon, please reassign as needed)

NOTE: setting priority to Unbreak Now because this is generating over 200 errors per second and messing with the stability of our systems.

We have some events that "look" like reading depth schemas but are coming from what seems to be a mobile android UA that are grossly malformed, schema is set to ReadingDepthSchema.enable which does not exist

"event": {

  "code": "validation",
  "message": "Invalid revision ID -1",
  "rawEvent": "?%7B%22event%22%3A%22page-issues-b_sample%22%2C%22revision%22%3A-1%2C%22schema%22%3A%22ReadingDepthSchema.enable%22%2C%22webHost%22%3A%22fa.m.wikipedia.org%22%2C%22wiki%22%3A%22fawiki%22%7D;\tcp3032.esams.wmnet\t552705932\t2018-10-18T21:00:01\t5.116.187.133\t\"Mozilla/5.0 (Linux; U; Android 4.3; fa-ir; G630-U20 Build/HuaweiG630-U20) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\"",
  "revision": -1,
  "schema": "ReadingDepthSchema.enable"
},
"recvFrom": "eventlog1002.eqiad.wmnet",
"revision": 14035058,
"schema": "EventError",
"timestamp": 1539896402,
"uuid": "c7a1dcded31811e8b7e61418775b0d42",
"wiki": ""

Developer notes

Any event prefixed with "wikimedia.events" will be routed to EventLogging schema land.

wikimedia.event.ReadingDepthSchema.enable should be renamed wikimedia.ReadingDepthSchema.enable

Related Objects

Event Timeline

Nuria created this task.Oct 18 2018, 9:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 468485 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] WikimediaEvents hook was renamed

https://gerrit.wikimedia.org/r/468485

Change 468486 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@master] Don't send events to nonexistent wikimedia.ReadingDepthSchema.enable

https://gerrit.wikimedia.org/r/468486

This is getting to be a huge number of errors, just 20.000 in the last minute.

kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -t eventlogging_EventError

Nuria triaged this task as High priority.Oct 18 2018, 10:08 PM
elukey added a subscriber: elukey.Oct 19 2018, 5:49 AM
Nuria added a comment.Oct 22 2018, 3:11 PM

See graph of errors: https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1&from=now-2d&to=now-5m if we can deploy this sooner rather than later it will be great.

Milimetric raised the priority of this task from High to Unbreak Now!.Oct 22 2018, 3:27 PM
Milimetric updated the task description. (Show Details)
Restricted Application added subscribers: Liuxinyu970226, Mholloway, TerraCodes. · View Herald TranscriptOct 22 2018, 3:27 PM
Milimetric moved this task from Incoming to Radar on the Analytics board.Oct 22 2018, 3:36 PM

Change 469031 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@wmf/1.32.0-wmf.26] WikimediaEvents hook was renamed

https://gerrit.wikimedia.org/r/469031

Change 469032 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@wmf/1.32.0-wmf.26] Don't send events to nonexistent wikimedia.ReadingDepthSchema.enable

https://gerrit.wikimedia.org/r/469032

Change 468486 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Don't send events to nonexistent wikimedia.ReadingDepthSchema.enable

https://gerrit.wikimedia.org/r/468486

Change 469032 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.32.0-wmf.26] Don't send events to nonexistent wikimedia.ReadingDepthSchema.enable

https://gerrit.wikimedia.org/r/469032

Change 469031 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@wmf/1.32.0-wmf.26] WikimediaEvents hook was renamed

https://gerrit.wikimedia.org/r/469031

Change 468485 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] WikimediaEvents hook was renamed

https://gerrit.wikimedia.org/r/468485

Mentioned in SAL (#wikimedia-operations) [2018-10-22T18:35:27Z] <jforrester@deploy1001> Synchronized php-1.32.0-wmf.26/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.readingDepth.js: SWAT Fix reading depth logging part 1 T207423 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2018-10-22T18:36:49Z] <jforrester@deploy1001> Synchronized php-1.32.0-wmf.26/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssuesLogger.js: SWAT Fix reading depth logging part 2 T207423 (duration: 00m 46s)

Change is live in 1.32.0-wmf.26.
I'm still seeing errors when I run kafkacat but this is likely to be the case because of cached JS :-(.

Let's keep an eye on https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1&from=now-2d&to=now-5m

If the error rates are not lowering (e.g. the cache isn't clearing) we'll have to get creative to shut this up.

Once the firefighting is done, could someone please spell out the implications (if any) of this bug for the ReadingDepth data, in particular in context of the currently ongoing Page Issues A/B test?

@Nuria @Milimetric this is deployed but alas if clients have cached old javascript they will continue to trigger these events.

Is there anyway we can suppress these errors in the mean time?

Nuria added a comment.Oct 23 2018, 4:21 PM

@Jdlrobson When was this deployed to all wikis? there is no apparent reduction of errors so either all clients have cached the Js (seems unlikely but maybe the deployment just happened) or the code fix might have an issue and it is not truly fixing what was intended, can we look into this second possibility?

Once the firefighting is done, could someone please spell out the implications (if any) of this bug for the ReadingDepth data, in particular in context of the currently ongoing Page Issues A/B test?

@Jdlrobson and I had a quick chat about this during standup - it should not affect the existing valid events that are being sent (AIUI, those new invalid events were basically being generated previously already but suppressed, and the bug caused that suppression to fail).

Change 469243 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469243

Change 469244 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@wmf/1.32.0-wmf.26] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469244

There was a problem with the original SWAT.
I've marked this as a deployment blocker @twentyafterfour

We should get this fixed in the old branch 1.32.0-wmf.26 as well as the new branch 1.33.0-wmf.1

Change 469244 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@wmf/1.32.0-wmf.26] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469244

Change 469254 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@wmf/1.33.0-wmf.1] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469254

Mentioned in SAL (#wikimedia-operations) [2018-10-23T19:30:59Z] <twentyafterfour@deploy1001> Synchronized php-1.32.0-wmf.26/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssuesLogger.js: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/469244/ refs T207423 (duration: 00m 48s)

Change 469243 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469243

Change 469254 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@wmf/1.33.0-wmf.1] Remove wikimedia.event.ReadingDepthSchema.enable track call

https://gerrit.wikimedia.org/r/469254

Nuria closed this task as Resolved.Oct 23 2018, 9:53 PM
Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:43 AM