|mediawiki/extensions/WikimediaEvents : master||Add VisualEditorFeatureUse schema|
|Resolved||Neil_P._Quinn_WMF||T202132 EPIC: Generate one-off metric snapshots for mobile editing documentation|
|Resolved||Neil_P._Quinn_WMF||T202133 Snapshot: usage of common editing features|
|Resolved||MMiller_WMF||T205754 [EPIC] Growth: Understanding first day|
|Resolved||DLynch||T202148 Instrument editing pipeline to be able to figure out which common editing features are used|
|Resolved||Neil_P._Quinn_WMF||T203136 Develop new schema for editing feature usage|
Okay, I've put together a schema draft under the name Schema:VisualEditorFeatureUse. This is very tentative so feel free to propose changes, including to the terminology I use. In other words, bikeshedding welcome, particularly when it helps me conform to VE's engineering terminology 😁.
The data model I'm envisioning has editors using features ("textStyle/Bold", "link", "image"...) by triggering actions ("open", "set", "toggle-selection"...) on targets ("dialog", "annotation").
Here are some of my key considerations:
- In order to know the calculate the proportion of edit attempts (aka edit sessions) that use features in the first place, we somehow need to log attempts that don't use features. I think the easiest way to do this is to have this schema use the same session ID that the Edit schema uses, and look to that data to see the total number of edit attempts (probably by counting the number of ready events). Ideally, we would sample the same set of sessions for both, but 6.25% sampling for this schema might produce too many events. I need to think about this a bit more and look at the number of sessions logged in the Edit schema.
- We can either log the action and target in one combine field (e.g. action: "annotation-clear") or in separates fields ( target: "annotation", action: "clear"). The benefit of splitting is that you can easily group by target and see, for example, how many actions target annotations, and how many target dialogs. The benefit of joining is that you can easily validate that incorrect pairs don't occur: at least at the moment, you can't tell EventLogging to reject target: "dialog", action: "clear-all", but you can easily not provide action: "dialog-clear-all" as an enum value. I've decided on the latter and combined target and action, but I'm open to doing the reverse too.
- I can't really think of additional data I want to include since I'm assuming we will be able to join this with the Edit schema on the session ID. If not, I would want to include editor, platform, user class (IP, bot), and edit count (bucketed for privacy).
- By default, all EventLogging schemas are wrapped in the event capsule, which includes wiki, timestamp, user agent, and IP.
Does feature need to be an enum? There are lots of other dialogs that our code would track automatically: template, gallery, math, help, findAndReplace.
What about the non-dialog/non-annotation actions we mentioned, such as copy/paste?
Currently it will track:
- Clipboard interactions
- copy, paste, cut
- Simple setting/removing of annotations
- bold, italic, code, super, sub, etc
- Whether this was set on a selection or as the current insertion
- Currently this doesn't include language, but easily could
- Anything which involves a dialog popping up
- link, reference
- just whether the dialog opened, not whether it was then canceled
Here's a sample from a few seconds of using it:
@Neil_P._Quinn_WMF : I wanted to let you know, and at the same time document, that since the proposed Schema:Edit2 uses snake_case (per recommendations from AE), the editingSessionId property in the Edit schema becomes editing_session_id in the new version. In this (VisualEditorFeatureUse) schema it's camelCase. This might make joins between the two schemas somewhat confusing. Wanted to flag that so you can consider whether to keep it or not.
Good point—thanks for the heads up! Since snake_case is the official recommendation, it's definitely worth keeping it like that in the new Edit schema. At the same time, since the feature use schema is already gathering data, I don't think it's worth changing even if it would make things much more elegant.