Page MenuHomePhabricator

Instrument event logging for VE's image search
Closed, ResolvedPublic

Description

This task is about implementing the necessary event logging the Structured Data team will need to answer the questions defined in T259308. These questions are also listed in the "Application" section below.

Background

The SD team will be integrating its new MediaSearch API into Visual Editor (see T259896). We need to measure the success of this change, and to do so we need image search in VE to be instrumented, which it is not currently (see T259308).

Application

Per T259308, we want to be able to answer the following:

  • What percentage of image searches in VE lead to the subsequent addition of an image to the article?
  • Where within the image grid was the selected image located?
  • How often do people use VE's image search?

Note: If location in the image grid of the image that is chosen is more difficult to measure, that could wait. We'd love to be able to measure this both before and after the change to using the new MediaSearch API.

Requirements

This section should contain an exhaustive list of the events ("Event name") that are being implemented to fulfill the "Applications" defined above, the action(s) ("Trigger action(s)") that should cause said events to be emitted and the schema in which these events should be logged.[i] Unless a specific schema is mentioned, it is assumed that the events are captured in the VisualEditorFeatureUse schema with feature set to media and action set to the event name.

Trigger action(s)Event nameSchema
User clicks "Insert > Images and Media" in toolbarwindow-open-from-tool
User clicks on an existing image and then clicks on "Edit"window-open-from-context
User enters a search term in the mwMediaDialog search inputsearch-change-query
User clears a search term in the mwMediaDialog search inputsearch-clear-query
User expands a result image by clicking on itsearch-choose-image
User clicks "use this image" in the mwMediaDialogsearch-confirm-image
User clicks back arrow to return to grid of resultssearch-change-image
User clicks "change image" after they have previously clicked "use this image", taking them back to the results gridsearch-change-image
User inserts image into the page, closing the dialogdialog-insert
User clicks "Apply changes" to close the dialog and update an existing imagedialog-done
User clicks the "X" icon before finishing, closing the dialogdialog-abort
User hits the ESC key, closing the dialogdialog-abort
User uploads their own imagesearch-upload-image

Done

  • Structured Data to complete the "Requirements" section above
  • Editing to approve the drafted "Requirements"
  • Structured Data to implement the approved "Requirements"
  • Editing Team to provide code review for newly instrumented events
  • Structured Data to verify the newly-implemented events are being emitted as expected by clients
  • Structured Data/Product Analytics to verify the newly-implemented events are being logged in the database in the ways we expect, once the new instrumentation has landed on production.
  • Structured Data/Product Analytics to update documentation [ii] with the events spec'd in this task once they have been verified to be implemented as expected. Read: clients are emitting events as expected and said events are being logged in the database as expected as well.

i. E.g. https://meta.wikimedia.org/wiki/Schema:VisualEditorFeatureUse
ii. https://www.mediawiki.org/wiki/VisualEditor/FeatureUse_data_dictionary

Event Timeline

This task is the amalgamation of the instrumentation request the Structured Data Team filed in T264779 and the Editing Team's process for implementing new instrumentation and verifying that said instrumentation has been implemented correctly.

As such, a resulting question:


cc @DLynch, @JTannerWMF, @MNeisler & @Ryasmeen for your awareness.

Thanks @ppelberg for starting this. I've recently been doing some analytics instrumentation for our new MediaSearch UI so I'll likely be working on implementation here as well.

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

Thanks @ppelberg for starting this. I've recently been doing some analytics instrumentation for our new MediaSearch UI so I'll likely be working on implementation here as well.

Awesome and you bet.

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

These are good questions that I think @MNeisler and @DLynch are best positioned to answer.

RE timing of the answers to said questions, would it be alright for us to have answers to you sometime next week?

RE timing of the answers to said questions, would it be alright for us to have answers to you sometime next week?

Sure! That works for me. I wouldn't have time to get started before that anyway.

In addition to figuring out what schema(s) to use, I will probably need help with getting a VE development environment up and running (I've never enabled it locally thus far, and the process seemed tricky the last time I seriously looked). Hopefully it's not too difficult to point the image search tool to Commons so that I can approximate the real-world use of this feature more closely. That's also something that could happen in the next few weeks (no rush).

In terms of the "Requirements" section above, I have a question about schemas. The VisualEditorFeatureUse schema seems pretty open-ended. Would it make sense to use that schema to record data about use of VE's image search feature (potentially with the addition of a few new properties that are specific to image-search, but no breaking changes)? Or would we want to define a new schema just for this feature? Would using more than one schema (FeatureUse for basics plus a dedicated schema to measure image search interactions) make it harder to follow the data?

@egardner If it's just a few new properties, I think it would make sense to try to use the VisualEditorFeatureUse schema if possible to track this data. The schema is open-ended enough to support types of additions needed to track the use of image search and addition of an image using VE. It would also avoid creating multiple schemas tracking similar data and should simplify the analysis. AS @DLynch mentioned, the one possible limitation is being able to track "Location in the image grid" as I'm not sure how that would fit into the current instrumentation. Also copying @nettrom_WMF in case he has any preferences in how this data is tracked.

I agree with @MNeisler that using the VisualEditorFeatureUse schema makes sense since we're asking questions about user behaviour around features in VE specifically.

For the question "How often do people use VE's image search?", I'm thinking that would be a search action added to the "media" feature to reflect a user running a search.

For "What percentage of image searches in VE lead to the subsequent addition of an image to the article?" I think we should define whether "addition" means "adding an image" to the article, or if it can also mean "replacing an existing image". I'm bringing this up because if I interpret the data dictionary correctly, the former maps to the dialog-insert action, while the latter maps to some combination of opening the tool followed by dialog-done, and I want to make sure that @Ramsey-WMF and @CBogen get the measurement they're looking for.

For "Where within the image grid was the selected image located?", I see the point @DLynch brought up in T259308#6519797: if we do search-result-chosen-34 with the last digits being the position, we're creating a bunch of actions. I'm not excited about that idea, but I'm not sure how excited everyone else would be about adding a field to the schema to allow for some kind of flexible action parameter storage, as that might be more work than necessary (both to specify and implement)? Maybe the way forward is to do something like search-result-chosen-34 and not make any modifications to the schema until it's migrated to Event Platform, because this action is very specific to one particular feature? While I prefer a cleaner solution I'm also interested in getting things done, so I wouldn't object to it being implemented that way.

For "What percentage of image searches in VE lead to the subsequent addition of an image to the article?" I think we should define whether "addition" means "adding an image" to the article, or if it can also mean "replacing an existing image". I'm bringing this up because if I interpret the data dictionary correctly, the former maps to the dialog-insert action, while the latter maps to some combination of opening the tool followed by dialog-done, and I want to make sure that @Ramsey-WMF and @CBogen get the measurement they're looking for.

It would be great to measure both "adding an image" and "replacing an existing image". I think both are important measures of the success of the tool.

Task description update
I've added the following to the task description's ===Done section:

  • Editing Team to provide code review for newly instrumented events
sdkim triaged this task as High priority.
sdkim moved this task from Inbox to Doing on the Product-Data-Infrastructure board.

@jlinehan taking this on for creating the schema specifically. Will hand back off once this is merged

taking this on for creating the schema specifically. Will hand back off once this is merged

I had the impression that using the existing schema had been settled on earlier?

@DLynch Given recent conversation with @egardner @Cparle @jlinehan @nettrom_WMF we will be taking a net new approach to the schema this image search will be using.
Happy to include you to review once we have the patch up

Update
As discussed during today's meeting, the next steps as it relates to adding the required instrumentation to VE are as follows:

  • 1. @CBogen to coordinate with the Structured Data team to write the patch(es) necessary for logging the VE image search events they'd like to track via the VisualEditorFeatureUse schema.
  • 2. Once "1." is done, the Editing Team will review said patches

Passing batons given @ppelberg 's update above. Passing to Eric to facilitate between SD and Editing

egardner updated the task description. (Show Details)
egardner updated the task description. (Show Details)

I've updated the table above with my first stab at listing all the user activities we care about. If nothing important seems missing here I will attempt to translate these into new entries in the VE Feature Use Data Dictionary.

I imagine that some of these activities might already be getting captured in a more generic way and have noted accordingly (like when the user closes the currently open modal), but maybe not? If not I can add new definitions for those actions as well.

Question: do we care about any periodic "check-in" event like what search satisfaction does? Is anything like this happening automatically in VE already?

Finally, some of these user activities will need additional data included in the event logging payload: for example, when the user clicks an image in a grid, we want to record the positions that result occupies in the sequence of results (0 indexed / one-dimensional "position"; we're not going to keep track of rows/columns here since that will change based on viewport size). How should these additional bits of data be defined?

egardner updated the task description. (Show Details)
egardner updated the task description. (Show Details)

@egardner : the way the VisualEditorFeatureUse schema is set up these events will all have feature="media" set, because they're happening in the media dialogue. How about we remove the "media-" prefix in the events?

I noticed that the "change image" button is there after a user has chosen "Use this image", as well as when a user has clicked "Edit" on an existing image. In the first context, it makes sense that it's "back to results", but in the other context it's more of a "change image" event. How about we use change-image in both cases? I'm open to other naming suggestions too, but I think I'd like to have a single event for both contexts, so we don't end up tracking context.

I made a couple of edits to the task description, let me know if you have questions about any of them!

@Nettrom sounds good to me, I'll update the list accordingly. And I'm happy to just use change-image for now. In the UI code there are a few different actions which take the user back to the starting point in the dialog (grid of results) – "change", "back", "cancelchoose", etc, so if we need to distinguish more precisely in the future we will be able to.

egardner updated the task description. (Show Details)

Change 649748 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[mediawiki/extensions/VisualEditor@master] Instrument media search interactions in MWMediaDialog

https://gerrit.wikimedia.org/r/649748

egardner updated the task description. (Show Details)

Change 649748 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Instrument media search interactions in MWMediaDialog

https://gerrit.wikimedia.org/r/649748

I'm late to reviewing this (I'm sorry about that). Although, before marking this as resolved: @egardner: are you able to update the VE Feature Use Data Dictionary?

...it looks like the page hasn't been updated wit the changes you alluded to in T265101#6690050.

Also, I'm going to assume that y'all have verified the new instrumentation is working as you expect it to. If this is not the case, please comment as much.

@egardner @CBogen I chatted with the @ppelberg and the Editing team, and wanted to follow up on this. We're hoping you can provide an update on the following, ideally by 2021/02/25:

  1. Are you able to update the VE Feature Use Data Dictionary?
  2. Has the new instrumentation been verified to work as you expect it to?

I recognize this simply reiterates the above, albeit in a format I hope clarifies what the Editing team would like your input on. :)

@egardner @CBogen I chatted with the @ppelberg and the Editing team, and wanted to follow up on this. We're hoping you can provide an update on the following, ideally by 2021/02/25:

  1. Are you able to update the VE Feature Use Data Dictionary?
  2. Has the new instrumentation been verified to work as you expect it to?

I recognize this simply reiterates the above, albeit in a format I hope clarifies what the Editing team would like your input on. :)

Re #2, yes, the new instrumentation has been verified to work as expected. @nettrom_WMF is working on using it to create a dashboard and all is going well.

@egardner can help answer #1.

I can update the VE feature dictionary later today.

Re #2, yes, the new instrumentation has been verified to work as expected. @nettrom_WMF is working on using it to create a dashboard and all is going well.

Wonderful. Thank you, Carly.

Thank you, Eric.