Page MenuHomePhabricator

Develop new schema for editing feature usage
Closed, ResolvedPublic

Details

Related Gerrit Patches:
mediawiki/extensions/WikimediaEvents : masterAdd VisualEditorFeatureUse schema

Event Timeline

Neil_P._Quinn_WMF triaged this task as High priority.Aug 30 2018, 12:50 AM
Neil_P._Quinn_WMF created this task.
Neil_P._Quinn_WMF added a comment.EditedSep 8 2018, 1:00 AM

Okay, I've put together a schema draft under the name Schema:VisualEditorFeatureUse. This is very tentative so feel free to propose changes, including to the terminology I use. In other words, bikeshedding welcome, particularly when it helps me conform to VE's engineering terminology 😁.

The data model I'm envisioning has editors using features ("textStyle/Bold", "link", "image"...) by triggering actions ("open", "set", "toggle-selection"...) on targets ("dialog", "annotation").

Here are some of my key considerations:

  • In order to know the calculate the proportion of edit attempts (aka edit sessions) that use features in the first place, we somehow need to log attempts that don't use features. I think the easiest way to do this is to have this schema use the same session ID that the Edit schema uses, and look to that data to see the total number of edit attempts (probably by counting the number of ready events). Ideally, we would sample the same set of sessions for both, but 6.25% sampling for this schema might produce too many events. I need to think about this a bit more and look at the number of sessions logged in the Edit schema.
  • We can either log the action and target in one combine field (e.g. action: "annotation-clear") or in separates fields ( target: "annotation", action: "clear"). The benefit of splitting is that you can easily group by target and see, for example, how many actions target annotations, and how many target dialogs. The benefit of joining is that you can easily validate that incorrect pairs don't occur: at least at the moment, you can't tell EventLogging to reject target: "dialog", action: "clear-all", but you can easily not provide action: "dialog-clear-all" as an enum value. I've decided on the latter and combined target and action, but I'm open to doing the reverse too.
  • I can't really think of additional data I want to include since I'm assuming we will be able to join this with the Edit schema on the session ID. If not, I would want to include editor, platform, user class (IP, bot), and edit count (bucketed for privacy).
  • By default, all EventLogging schemas are wrapped in the event capsule, which includes wiki, timestamp, user agent, and IP.

@DLynch, @Esanders, @Deskana thoughts?

Small note: in our code we use "textStyle/bold" instead of "textStyle/Bold" (and similarly lower case for italic/underline)

Does feature need to be an enum? There are lots of other dialogs that our code would track automatically: template, gallery, math, help, findAndReplace.

What about the non-dialog/non-annotation actions we mentioned, such as copy/paste?

cf @DLynch's patch: https://gerrit.wikimedia.org/r/#/c/VisualEditor/VisualEditor/+/457931/

Perhaps @DLynch can list all the types of events his patch produces and we can go from there?

DLynch added a comment.Oct 3 2018, 4:24 PM

Sure!

Currently it will track:

  • Clipboard interactions
    • copy, paste, cut
  • Simple setting/removing of annotations
    • bold, italic, code, super, sub, etc
    • Whether this was set on a selection or as the current insertion
    • Currently this doesn't include language, but easily could
  • Anything which involves a dialog popping up
    • link, reference
    • just whether the dialog opened, not whether it was then canceled

Here's a sample from a few seconds of using it:

Neil_P._Quinn_WMF lowered the priority of this task from High to Normal.Oct 4 2018, 9:52 PM
Neil_P._Quinn_WMF raised the priority of this task from Normal to High.
Neil_P._Quinn_WMF moved this task from Next Up to Doing on the Product-Analytics board.
Deskana closed this task as Resolved.Oct 9 2018, 4:42 PM

Based on a discussion just held with @Neil_P._Quinn_WMF and @Esanders, this work is complete—the schema is adequately defined for implementation.

Change 465571 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/WikimediaEvents@master] Add VisualEditorFeatureUse schema

https://gerrit.wikimedia.org/r/465571

Change 465571 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Add VisualEditorFeatureUse schema

https://gerrit.wikimedia.org/r/465571

@Neil_P._Quinn_WMF : I wanted to let you know, and at the same time document, that since the proposed Schema:Edit2 uses snake_case (per recommendations from AE), the editingSessionId property in the Edit schema becomes editing_session_id in the new version. In this (VisualEditorFeatureUse) schema it's camelCase. This might make joins between the two schemas somewhat confusing. Wanted to flag that so you can consider whether to keep it or not.

@Neil_P._Quinn_WMF : I wanted to let you know, and at the same time document, that since the proposed Schema:Edit2 uses snake_case (per recommendations from AE), the editingSessionId property in the Edit schema becomes editing_session_id in the new version. In this (VisualEditorFeatureUse) schema it's camelCase. This might make joins between the two schemas somewhat confusing. Wanted to flag that so you can consider whether to keep it or not.

Good point—thanks for the heads up! Since snake_case is the official recommendation, it's definitely worth keeping it like that in the new Edit schema. At the same time, since the feature use schema is already gathering data, I don't think it's worth changing even if it would make things much more elegant.