Page MenuHomePhabricator

Schema:CitationUsage improvements
Closed, ResolvedPublic

Description

Initial data collection revealed the following errors:

Additionally, we'd like to make the following changes:

  • Add 'pageLoad' action.

A/C

  • Fix the above issues and make additional adjustments.

See T199807#4431440

Event Timeline

bmansurov triaged this task as High priority.Jul 17 2018, 2:44 PM
bmansurov created this task.

Change 446329 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: add 'pageLoad' action

https://gerrit.wikimedia.org/r/446329

bmansurov moved this task from Staged to In Progress on the Research board.
bmansurov added a comment.EditedJul 17 2018, 6:15 PM

Debugging errors

Running kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -t eventlogging_EventError -o beginning resulted in 564 errors. The earliest error happened at 2018-07-06T15:17:59.200Z, and the last one at 2018-07-09T19:20:07.279Z. Ignoring hours, during this time we got 11,768,758 valid events. The error rate is 1 in about 20867.

SELECT count(*) FROM event.citationusage WHERE year=2018 AND month=7 AND day<=9 and day>=6;

"'link_occurrence' is a required property"

This is the most frequently happening error. Out of 564 errors we got 480 of this kind of errors.

Case 1

The error occurs with the following data:

{"event": {"dom_interactive_time": 1530893259458, "revision_id": 848599675, "page_id": 27433448, "page_title": "Air India Express Flight 812",  "namespace_id": 0, "page_token": "c10543d647ae516c", "session_token": "db74a79cfc242872", "referrer": "https: //www.google.com/", "skin": "vector", "mode": "desktop", "in_infobox": false, "link_text": "", "action": "fnHover", "event_offset_time": 7645}, "revision": 18051472, "schema": "CitationUsage", "webHost": "en.wikipedia.org", "wiki": "enwiki"}

Notice that link_text is empty. When I visit the article page and look for references with empty text (using the exact same instrumentation code), I don't see any links with empty texts. Here's the code:

$('sup.reference a').each(function (i, l) {
    console.log($(l).text().trim().replace( /\s+/g, ' ' ));
});

So the underlying problem is that the browser, for some reason, is not able to get the link text. As a result link_occurrence is empty.

Case 2

{"event": {"dom_interactive_time": 1530890279200, "revision_id": 842830569, "page_id": 430413, "page_title": "Choe Yeong", "namespace_id": 0, "page_token": "7371522b60cb3037", "session_token": "12a1a6ce7db1316c", "referrer": "https: //en.wikipedia.org/wiki/Main_Page", "skin": "vector", "mode": "desktop", "section_id": "External_links", "in_infobox": false, "link_text": "General Choi,  Young Shrine", "link_url": "http: //www.invil.org/english/tourism/themeTour/assetsTemple/contents.jsp?con_no=850100&page_no=1", "freely_accessible": false, "action": "extClick", "event_offset_time": 386321}

This is a different case where link_text is correctly parsed. However, I cannot reproduce the missing link_occurrence property even on the same OS and browser as the user.

None is not of type 'integer'

This type of error occurred 11 times.

Here's an example,

{"event":{"dom_interactive_time":null,"revision_id":849056190,"page_id":32924090,"page_title":"Boeing 737 MAX","namespace_id":0,"page_token":"79a9792c8bd69b7c","session_token":"9df3112e4cd963a9","referrer":"","skin":"minerva","mode":"mobile","in_infobox":true,"link_text":"[6]","link_url":"https://en.m.wikipedia.org/wiki/Boeing_737_MAX#cite_note-prices-6","link_occurrence":1,"action":"fnHover","event_offset_time":19417},"revision":18051472,"schema":"CitationUsage","webHost":"en.m.wikipedia.org","wiki":"enwiki"}

As we can see dom_interactive_time is null. This means the browser is reporting that it supports window.performance.timing.domInteractive but in fact it doesn't. We check for this in code. We can add additional check, but given the low number of such errors, I think it's safe to ignore them. After all, these pieces of data are not being validated, so they are being dropped, which is what we would achieve by changing the code.

referrer' is a required property

This type of error occurs 2 times.

According to the documentation, document.referrer should always return a string. For some reason, some browsers like Android Browser on Android (KitKat) LG L41C are not returning a string. Given the low number of such errors, I don't think we should support browsers that don't comply with the standards.

some_link is not one of ['extClick', 'upClick', 'fnClick', 'fnHover']

These types of errors happen 30 times. It seems like some kind of a browser extension is hijacking our event listeners.

Extra data: line 1 column ...

This kind of error occurs 27 times. It seems like the back-end is not able to handle the payload.

No JSON object could be decoded

This kind of error occurs 13 times. It seems like the back-end is not able to handle the payload.

Conclusion

Given the low number of errors compared to the valid responses and browser quirks, I suggest we don't make any front-end code modifications. We may want to take a look at back-end errors related to EventLogging in general separately.

Change 446329 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: add 'pageLoad' action

https://gerrit.wikimedia.org/r/446329

bmansurov updated the task description. (Show Details)Jul 26 2018, 7:25 PM
bmansurov moved this task from In Progress to Done (current quarter) on the Research board.

Change 448111 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: update revision

https://gerrit.wikimedia.org/r/448111

@leila the pageLoad action has been added to the codebase. Let me know when we should turn on data collection. Thanks.

DarTar closed this task as Resolved.Jul 28 2018, 2:11 AM
DarTar edited projects, added Research-Archive; removed Research.
DarTar moved this task from Default to Q4-FY18 on the Research-Archive board.Jul 28 2018, 2:13 AM

Change 448111 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: update revision

https://gerrit.wikimedia.org/r/448111

Change 450021 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: do not log fnHover on mobile

https://gerrit.wikimedia.org/r/450021

bmansurov reopened this task as Open.Sep 6 2018, 1:35 PM
bmansurov edited projects, added Research; removed Patch-For-Review.
bmansurov updated the task description. (Show Details)

Change 458835 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: limit some parameter lengths

https://gerrit.wikimedia.org/r/458835

Change 458835 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] CitationUsage: limit some parameter lengths

https://gerrit.wikimedia.org/r/458835

Change 450021 abandoned by Bmansurov:
CitationUsage: do not log fnHover on mobile

Reason:
not needed

https://gerrit.wikimedia.org/r/450021

bmansurov closed this task as Resolved.Oct 9 2018, 1:26 PM

Further improvements will be done as part of T206083.