Page MenuHomePhabricator

tons of errors on eventlogging events
Closed, ResolvedPublic

Description

There is a spike of errors on eventlogging events that just triggered our alarms:

https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=EventError&from=1564601916938&to=1564688316938

From brief inspection of log I see a lot of errors in TestSearchSatisfaction2, they will be present on events database on eventlogging on the eventerror schema table.

Ana example:

{"event": {"code": "validation", "message": "'isForced' is a required property", "rawEvent": "{\"dt\": \"2019-08-01T19:34:17Z\", \"event\": {\"action\": \"searchResultPage\", \
"articleId\": 2071, \"autocompleteType\": \"comp_suggest\", \"hitsReturned\": 10, \"inputLocation\": \"header\", \"msToDisplayResults\": 133, \"mwSessionId\": \"d5df422719645f8
fcee5\", \"pageViewId\": \"52c859335371401bf7fdjyt2vcjd\", \"query\": \"an\", \"scroll\": false, \"searchSessionId\": \"810b3490786ceebcdd79jyt29c47\", \"searchToken\": \"aoo4p
weec4h6vt11ofsgj35sq\", \"source\": \"autocomplete\", \"uniqueId\": \"30272db87b9aa4b3970ajyt2vd6h\"}, \"recvFrom\": \"cp1089.eqiad.wmnet\", \"revis
ion\": 19240637, \"schema\": \"TestSearchSatisfaction2\", \"seqId\": 61322129,
\"uuid\": \"94581eb6f5755cc8b910d57e17dec492\", \"webHost\": \"es.wikipedia.org\", \"wiki\": \"eswiki\"}", "revision": 19240637, "schema": "TestSearchSatisfaction2"}, "recvFro
m": "eventlog1002.eqiad.wmnet", "revision": 14035058, "schema": "EventError", "timestamp": 1564688057, "uuid": "59f3a124b49311e99e021418775b0d42", "wiki": ""}
{"event": {"code": "validation", "message": "'isForced' is a required property", "rawEvent": "{\"dt\": \"2019-08-01T19:34:16Z\", \"event\": {\"action\": \"click\", \"articleId\
": 54291073, \"mwSessionId\": \"849b7a0361159fe36652\", \"pageViewId\": \"8a8d4f95552440257936jyt2urk1\", \"position\": 1, \"scroll\": false, \"searchSessionId\": \"a6a6c7dc5ff
9d38c9f0ejyt18l0e\", \"searchToken\": \"arl87dzemdxoudbbyohwznia0\", \"source\": \"autocomplete\", \"uniqueId\": \"de6379cc02e7b4339e37jyt2uvcm\"}, \"ip\": \"47.197.5.246\", \"
recvFrom\": \"cp1083.eqiad.wmnet\", \"revision\": 19240637, \"schema\": \"TestSearchSatisfaction2\",

Event Timeline

Change 527210 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Bump TSS2 eventlogging schema

https://gerrit.wikimedia.org/r/527210

Change 527210 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Bump TSS2 eventlogging schema

https://gerrit.wikimedia.org/r/527210

Screen Shot 2019-08-01 at 2.57.55 PM.png (1×2 px, 369 KB)

{"event": {"code": "validation", "message": "u'8' is not of type 'number'", "rawEvent": "{\"dt\": \"2019-08-01T21:55:54Z\", \"event\": {\"action\": \"checkin\", \"checkin\": 20, \"mwSessionId\": \"75f6e66b883c87d28528\", \"pageViewId\": \"f65e083a1b9e36e3f016jyt7x15e\", \"sampleMultiplier\": \"8\", \"scroll\": false, \"searchSessionId\": \"cfed1a696d8d45ecef68jyt7kxmv\", \"source\": \"autocomplete\", \"uniqueId\": \"3d7587eea16a560876b3jyt7xglx\"}, \"ip\": \", \"recvFrom\": \"cp3032.esams.wmnet\", \"revision\": 19240637, \"schema\": \"TestSearchSatisfaction2\", \"seqId\": 25478075, \"userAgent\""uuid\": \"b50434c582c55bcd8bcbeee041194bcc\", \"webHost\": \"commons.wikimedia.org\", \"wiki\": \"commonswiki\"}", "revision": 19240637, "schema": "TestSearchSatisfaction2"}, "recvFrom": "eventlog1002.eqiad.wmnet", "revision": 14035058, "schema": "EventError", "timestamp": 1564696555, "uuid": "2305ba76b4a711e98ef61418775b0d42", "wiki": ""}

More errors will be visible on eventerror table on events database

Change 527231 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.16] Bump TSS2 eventlogging schema

https://gerrit.wikimedia.org/r/527231

Change 527231 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.16] Bump TSS2 eventlogging schema

https://gerrit.wikimedia.org/r/527231

Mentioned in SAL (#wikimedia-operations) [2019-08-01T22:17:42Z] <ebernhardson@deploy1001> Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/extension.json: T229614: Update eventlogging schema version to resolve eventlogging errors in wmf.16 (duration: 00m 47s)

Change 527249 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Eventlogging requires numbers to be numbers

https://gerrit.wikimedia.org/r/527249

Change 527252 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.16] Eventlogging requires numbers to be numbers

https://gerrit.wikimedia.org/r/527252

Change 527252 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.16] Eventlogging requires numbers to be numbers

https://gerrit.wikimedia.org/r/527252

Mentioned in SAL (#wikimedia-operations) [2019-08-01T23:10:28Z] <ebernhardson@deploy1001> Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T229614: Pass proper types to eventlogging to resolve eventlogging errors in wmf.16 (duration: 00m 47s)

Change 527249 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Eventlogging requires numbers to be numbers

https://gerrit.wikimedia.org/r/527249

Backfill was done as follows, with guidance from https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Backfilling#Backfilling_a_kafka_eventlogging_%3CSchema%3E_topic

  • Load eventlogging input data for correct time frame and write fixed input data (from SWAP notebook w/ pyspark): P8862
  • Feed that file into eventlogging-processor as documented on wikitech backfilling page
  • After camus has imported the data, go to https://hue.wikimedia.org and sequentially re-run the appropriate search_satisfaction-daily and search_satisfaction-druid-daily workflows in their respective coordinators.