Page MenuHomePhabricator

Identify and fix data quality problems in the Edit event log
Closed, ResolvedPublic

Description

For full analysis of the problems and supporting statistics, see the pre-improvement analysis notebook on GitHub.

Related Objects

Event Timeline

Neil_P._Quinn_WMF triaged this task as Normal priority.Aug 21 2018, 6:49 PM
Neil_P._Quinn_WMF created this task.
Neil_P._Quinn_WMF moved this task from Triage to Next Up on the Product-Analytics board.
Neil_P._Quinn_WMF renamed this task from Vet the Edit event log to Identify and fix data quality problems in the Edit event log.Aug 27 2018, 8:30 PM
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF moved this task from Next Up to Doing on the Product-Analytics board.
Neil_P._Quinn_WMF updated the task description. (Show Details)

The init platform:other issue is that our tracking code tries to work out the platform via platform: ve.init && ve.init.target && ve.init.target.constructor.static.platformType || 'other',

Unfortunately, almost everywhere that we try to log init from hasn't yet loaded the target. Given that most of these are inside ve.init.mw.DesktopArticleTarget.init.js, I think we could safely just hardcode platform: 'desktop' in those calls. (Look for trackActivateStart in that file.)

init isn't logged at all in WikiEditor. We could add the call into ext.wikiEditor.js or maybe ext.wikiEditor.toolbar.js and have it with a similar-ish meaning to the event in VisualEditor. That said, it might not be a close enough equivalent -- the VE call to it is for "something has happened which is causing us to start working on loading the VE js", and WikiEditor doesn't have quite the same model -- it's entirely "oh, you opened the page? Okay, editor's loading." Given which, you might just want to use the load time from the loaded/ready events instead, I think? @Esanders might remember better whether we deliberately left that out.

Neil_P._Quinn_WMF updated the task description. (Show Details)

init isn't logged at all in WikiEditor. […] @Esanders might remember better whether we deliberately left that out.

Yup. This is because sessions launched by non-JS users won't trigger any JS-land activity (just server-side activity).

Neil_P._Quinn_WMF added a subscriber: Jdlrobson.

Change 455850 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/VisualEditor@master] DesktopArticleTarget.init: Pass platform to ve.track directly

https://gerrit.wikimedia.org/r/455850

Deskana moved this task from Incoming to Code review on the VisualEditor (Current work) board.

The above patch should possibly be in a subtask, unless it resolves the entire set of issues which Neil has described, which I don't think it does. Hmm.

For now, I'll put it in code review, and we can figure that out later.

I think we should do this:

  1. Don't have an 'other' fallback inside the tracking code - make it log a warning (or thrown an exception once we're confident none of our code triggers the path) along the lines of "no target available and no platformType specified"
  2. Require callers to pass in platformType in the data object if there is a chance a target doesn't exist yet (i.e. in every call in DAT.init.js)

Yup. This is because sessions launched by non-JS users won't trigger any JS-land activity (just server-side activity).

@Jdforrester-WMF revealing my ignorance of some of the details of our logging practices... does the server-side activity wind up in the edit schema anywhere, or is it completely absent from the stuff Neil's looking at?

Yup. This is because sessions launched by non-JS users won't trigger any JS-land activity (just server-side activity).

@Jdforrester-WMF revealing my ignorance of some of the details of our logging practices... does the server-side activity wind up in the edit schema anywhere, or is it completely absent from the stuff Neil's looking at?

I think it does, but it's been five years since we put it into practice…

Neil_P._Quinn_WMF updated the task description. (Show Details)

Unusually few abort events recorded for both mobile editors.
On the desktop editors, the combined number of abort and saveAttempt events roughly match the number of ready events, but that isn't the case here.

So, this tracks slightly different things on desktop and mobile.

  • On desktop it's very comprehensive, as it's in logged on any exit path and it's in an onunload handler, which thus should run any time we leave the page for any reason.
  • On mobile there's no onunload handler, so the "the user just closed the tab" case isn't covered.

That said, onunload isn't reliable on mobile browsers, because it doesn't cover the "you switched apps, and later the backgrounded browser unceremoniously dropped the page" case (which I'd speculate is the most-common abandoned-edit route on mobile...), so this is a hard disparity to fix.

@Deskana, @DLynch, @Esanders: I've finished my analysis. The full version is available in the Jupyter notebook linked in the description, but I've summarized the issues I found in the description.

I've created subtasks for the issues we've already discussed and copied that discussion there, but I haven't done it for all of them because it would be so monotonous.

If you start discussing one, I'd suggest going ahead and creating the subtask to keep this more sane :)

Neil_P._Quinn_WMF removed Neil_P._Quinn_WMF as the assignee of this task.Sep 5 2018, 11:29 PM
Neil_P._Quinn_WMF moved this task from Doing to Tracking on the Product-Analytics board.
Neil_P._Quinn_WMF raised the priority of this task from Normal to High.Oct 2 2018, 6:48 PM
matmarex added a subscriber: matmarex.

I don't know the status of this, but there is definitely no code to review.

Neil_P._Quinn_WMF closed this task as Resolved.Feb 27 2019, 6:55 PM
Neil_P._Quinn_WMF claimed this task.

The only subtask that hasn't been done is T206008, but that's not very not important and can float on as its own little thing.