Page MenuHomePhabricator

Instrument event logging for VE's image search
Open, HighPublic

Description

This task is about implementing the necessary event logging the Structured Data team will need to answer the questions defined in T259308. These questions are also listed in the "Application" section below.

Background

The SD team will be integrating its new MediaSearch API into Visual Editor (see T259896). We need to measure the success of this change, and to do so we need image search in VE to be instrumented, which it is not currently (see T259308).

Application

Per T259308, we want to be able to answer the following:

  • What percentage of image searches in VE lead to the subsequent addition of an image to the article?
  • Where within the image grid was the selected image located?
  • How often do people use VE's image search?

Note: If location in the image grid of the image that is chosen is more difficult to measure, that could wait. We'd love to be able to measure this both before and after the change to using the new MediaSearch API.

Requirements

This section should contain an exhaustive list of the events ("Event name") that are being implemented to fulfill the "Applications" defined above, the action(s) ("Trigger action(s)") that should cause said events to be emitted and the schema in which these events should be logged. [i]

Trigger action(s)Event nameSchema

Done

  • Structured Data to complete the "Requirements" section above
  • Editing to approve the drafted "Requirements"
  • Structured Data to implement the approved "Requirements"
  • Editing Team to provide code review for newly instrumented events
  • Structured Data to verify the newly-implemented events are being emitted as expected by clients
  • Structured Data/Product Analytics to verify the newly-implemented events are being logged in the database in the ways we expect, once the new instrumentation has landed on production.
  • Structured Data/Product Analytics to update documentation [ii] with the events spec'd in this task once they have been verified to be implemented as expected. Read: clients are emitting events as expected and said events are being logged in the database as expected as well.

i. E.g. https://meta.wikimedia.org/wiki/Schema:VisualEditorFeatureUse
ii. https://www.mediawiki.org/wiki/VisualEditor/FeatureUse_data_dictionary

Event Timeline

ppelberg created this task.Oct 9 2020, 1:21 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 9 2020, 1:21 AM
ppelberg added a subscriber: Ramsey-WMF.EditedOct 9 2020, 1:28 AM

This task is the amalgamation of the instrumentation request the Structured Data Team filed in T264779 and the Editing Team's process for implementing new instrumentation and verifying that said instrumentation has been implemented correctly.

As such, a resulting question:


cc @DLynch, @JTannerWMF, @MNeisler & @Ryasmeen for your awareness.

Thanks @ppelberg for starting this. I've recently been doing some analytics instrumentation for our new MediaSearch UI so I'll likely be working on implementation here as well.

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

Thanks @ppelberg for starting this. I've recently been doing some analytics instrumentation for our new MediaSearch UI so I'll likely be working on implementation here as well.

Awesome and you bet.

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

These are good questions that I think @MNeisler and @DLynch are best positioned to answer.

RE timing of the answers to said questions, would it be alright for us to have answers to you sometime next week?

RE timing of the answers to said questions, would it be alright for us to have answers to you sometime next week?

Sure! That works for me. I wouldn't have time to get started before that anyway.

In addition to figuring out what schema(s) to use, I will probably need help with getting a VE development environment up and running (I've never enabled it locally thus far, and the process seemed tricky the last time I seriously looked). Hopefully it's not too difficult to point the image search tool to Commons so that I can approximate the real-world use of this feature more closely. That's also something that could happen in the next few weeks (no rush).

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

@egardner If it's just a few new properties, I think it would make sense to try to use the VisualEditorFeatureUse schema if possible to track this data. The schema is open-ended enough to support types of additions needed to track the use of image search and addition of an image using VE. It would also avoid creating multiple schemas tracking similar data and should simplify the analysis. AS @DLynch mentioned, the one possible limitation is being able to track "Location in the image grid" as I'm not sure how that would fit into the current instrumentation. Also copying @nettrom_WMF in case he has any preferences in how this data is tracked.

I agree with @MNeisler that using the VisualEditorFeatureUse schema makes sense since we're asking questions about user behaviour around features in VE specifically.

For the question "How often do people use VE's image search?", I'm thinking that would be a search action added to the "media" feature to reflect a user running a search.

For "What percentage of image searches in VE lead to the subsequent addition of an image to the article?" I think we should define whether "addition" means "adding an image" to the article, or if it can also mean "replacing an existing image". I'm bringing this up because if I interpret the data dictionary correctly, the former maps to the dialog-insert action, while the latter maps to some combination of opening the tool followed by dialog-done, and I want to make sure that @Ramsey-WMF and @CBogen get the measurement they're looking for.

For "Where within the image grid was the selected image located?", I see the point @DLynch brought up in T259308#6519797: if we do search-result-chosen-34 with the last digits being the position, we're creating a bunch of actions. I'm not excited about that idea, but I'm not sure how excited everyone else would be about adding a field to the schema to allow for some kind of flexible action parameter storage, as that might be more work than necessary (both to specify and implement)? Maybe the way forward is to do something like search-result-chosen-34 and not make any modifications to the schema until it's migrated to Event Platform, because this action is very specific to one particular feature? While I prefer a cleaner solution I'm also interested in getting things done, so I wouldn't object to it being implemented that way.

For "What percentage of image searches in VE lead to the subsequent addition of an image to the article?" I think we should define whether "addition" means "adding an image" to the article, or if it can also mean "replacing an existing image". I'm bringing this up because if I interpret the data dictionary correctly, the former maps to the dialog-insert action, while the latter maps to some combination of opening the tool followed by dialog-done, and I want to make sure that @Ramsey-WMF and @CBogen get the measurement they're looking for.

It would be great to measure both "adding an image" and "replacing an existing image". I think both are important measures of the success of the tool.

ppelberg updated the task description. (Show Details)Oct 19 2020, 8:21 PM

Task description update
I've added the following to the task description's ===Done section:

  • Editing Team to provide code review for newly instrumented events
sdkim assigned this task to jlinehan.Nov 2 2020, 4:31 PM
sdkim triaged this task as High priority.
sdkim moved this task from Inbox to Doing on the Product-Infrastructure-Data board.

@jlinehan taking this on for creating the schema specifically. Will hand back off once this is merged

sdkim moved this task from Inbox to Doing on the Better Use Of Data board.
DLynch added a comment.Nov 2 2020, 4:47 PM

taking this on for creating the schema specifically. Will hand back off once this is merged

I had the impression that using the existing schema had been settled on earlier?

sdkim added a subscriber: Cparle.Nov 2 2020, 5:16 PM

@DLynch Given recent conversation with @egardner @Cparle @jlinehan @nettrom_WMF we will be taking a net new approach to the schema this image search will be using.
Happy to include you to review once we have the patch up

Update
As discussed during today's meeting, the next steps as it relates to adding the required instrumentation to VE are as follows:

  • 1. @CBogen to coordinate with the Structured Data team to write the patch(es) necessary for logging the VE image search events they'd like to track via the VisualEditorFeatureUse schema.
  • 2. Once "1." is done, the Editing Team will review said patches
sdkim reassigned this task from jlinehan to egardner.Thu, Nov 19, 8:46 PM

Passing batons given @ppelberg 's update above. Passing to Eric to facilitate between SD and Editing