Page MenuHomePhabricator

Measure usage of image search in Visual Editor
Closed, ResolvedPublic

Description

The Structured Data team is planning to integrate the Media Search backend into the image search in Visual Editor. Before we do so, we'd like to have some baseline statistics on the usage of the existing image search in Visual Editor, so that we can measure the success of the Media Search integration.

Things we'd like to measure:

  • What percentage of image searches in VE lead to the subsequent addition of an image to the article? This will help us understand whether the search results are useful.
  • The location in the image grid of the image that is chosen. This will help us understand the impact that the grid has on the success of a search, and we may choose to change the grid in response.
  • How often image search in VE is used. This will help us know the importance of focusing on this feature in the future, and also whether changing the results leads to increased usage.

Note: We've also thought about measuring this by using an edit tag that indicates that an image was added via image search in VE; that would be an additional useful metric but is separate from this task.

Event Timeline

LGoto triaged this task as Medium priority.Aug 4 2020, 5:12 PM
LGoto moved this task from Triage to Needs Investigation on the Product-Analytics board.

@nettrom_WMF I put this task in the MediaSearch-Beta milestone - it would be great to have these numbers by the end of September, before we start the work on integration in October.

I've dug into this a bit to get an understanding of what data is available through the VisualEditorFeatureUse schema. I also met with @MNeisler on the Product Analytics team to get a check on whether my understanding of the data was correct, and it appears to be.

Based on the data and the data dictionary it appears that the "media" feature is where we'd find this kind of data, and from looking at events logged during September this feature logs various ways it was opened (command/context/tool) and closed (abort/done/insert). One issue I've run into with this data is that the open & close counts don't line up (there's approximately 15,500 open events but 30,000 close events for the "media" feature), which appears to come from it being possible to open this dialogue in indirect ways.

There doesn't appear to be instrumentation of media searches in VE, thus it seems impossible to answer either of the three questions. If these are key questions that we need a baseline for prior to adding MediaSearch, my recommendation is to have an engineer inspect the instrumentation code and patch it to start logging the information we need. Another possibility might be to A/B test the two searches, removing the need for a wait to get the baseline.

I'll also add @DLynch and @nshahquinn-wmf as subscribers, because they might know something about the instrumentation that I'm missing.

Moving this to the "Needs Review" column on our kanban board for now, so @CBogen and @Ramsey-WMF can review.

To agree with @nettrom_WMF, there's currently no special instrumentation for media. Thus it just logs whatever is automatically logged due to generic hooks in the dialog/context-item systems. In practice, that means that we don't have any particular information about what someone did inside the dialog while it's open -- you could tell the difference between editing an existing image on the page and adding a new one, but I don't think you can see whether that new one is from search or upload.

It'd be pretty simple to add a new event logging when search is used / when a search result is picked. "Location in the image grid" doesn't really fit into current schema neatly, however -- unless you're okay with a potentially infinite number of actions in the form search-result-chosen-34 for each offset.

there's approximately 15,500 open events but 30,000 close events for the "media" feature

Looking at that right now... I think we may have a logging regression around dialog close events. I tried Media, Template, and Cite, and all of them double-recorded their close event:

image.png (414×932 px, 94 KB)

Thanks @nettrom_WMF and @DLynch.

There doesn't appear to be instrumentation of media searches in VE, thus it seems impossible to answer either of the three questions. If these are key questions that we need a baseline for prior to adding MediaSearch, my recommendation is to have an engineer inspect the instrumentation code and patch it to start logging the information we need. Another possibility might be to A/B test the two searches, removing the need for a wait to get the baseline.

I think we need to instrument this before we move forward with the integration, so that we can answer T260254. We can still do an A/B test rather than waiting for a baseline - once instrumented, how long do we need to wait for a valid baseline? 30 days?

It'd be pretty simple to add a new event logging when search is used / when a search result is picked. "Location in the image grid" doesn't really fit into current schema neatly, however -- unless you're okay with a potentially infinite number of actions in the form search-result-chosen-34 for each offset.

I'm assuming the editing team needs to add this event logging - what might be the projected timeline for getting this done?

I think we can live without location in the image grid for now, although this would be really nice to have. What's the impact of having a potentially infinite number of actions in the form search-result-chosen-34 for each offset?

I think we can live without location in the image grid for now, although this would be really nice to have. What's the impact of having a potentially infinite number of actions in the form search-result-chosen-34 for each offset?

Mainly running the queries to analyze the data would be more complicated, I assume. @MNeisler or @nshahquinn-wmf could speak to that more than I could.

I'm closing this as resolved as we're also tracking this work in T260254