Measure usage of image search in Visual Editor
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	CBogen
	Jul 30 2020, 9:11 PM

Description

The Structured Data team is planning to integrate the Media Search backend into the image search in Visual Editor. Before we do so, we'd like to have some baseline statistics on the usage of the existing image search in Visual Editor, so that we can measure the success of the Media Search integration.

Things we'd like to measure:

What percentage of image searches in VE lead to the subsequent addition of an image to the article? This will help us understand whether the search results are useful.
The location in the image grid of the image that is chosen. This will help us understand the impact that the grid has on the success of a search, and we may choose to change the grid in response.
How often image search in VE is used. This will help us know the importance of focusing on this feature in the future, and also whether changing the results leads to increased usage.

Note: We've also thought about measuring this by using an edit tag that indicates that an image was added via image search in VE; that would be an additional useful metric but is separate from this task.

Related Objects
Search...

Status	Assigned	Task
Resolved	nettrom_WMF	T260254 Measure usage of Media Search integration in Visual Editor
Resolved	nettrom_WMF	T259308 Measure usage of image search in Visual Editor
Resolved	nettrom_WMF	T265761 Update Media Search measurement specification with Visual Editor measurements
Resolved	egardner	T265101 Instrument event logging for VE's image search

Event Timeline

CBogen created this task.Jul 30 2020, 9:11 PM

Iniquity subscribed.Jul 30 2020, 9:17 PM

CBogen moved this task from Triage to Tracking on the Structured-Data-Backlog board.Aug 3 2020, 5:33 PM

LGoto triaged this task as Medium priority.Aug 4 2020, 5:12 PM

LGoto moved this task from Triage to Needs Investigation on the Product-Analytics board.

@nettrom_WMF I put this task in the MediaSearch-Beta milestone - it would be great to have these numbers by the end of September, before we start the work on integration in October.

CBogen mentioned this in T260254: Measure usage of Media Search integration in Visual Editor.Aug 12 2020, 3:47 PM

CBogen moved this task from Tracking to Analytics on the Structured-Data-Backlog board.Aug 25 2020, 8:10 PM

CBogen mentioned this in T262271: Activate mediasearch profile without requiring an explicit flag.Oct 2 2020, 2:17 PM

nettrom_WMF edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Oct 5 2020, 8:52 PM

I've dug into this a bit to get an understanding of what data is available through the VisualEditorFeatureUse schema. I also met with @MNeisler on the Product Analytics team to get a check on whether my understanding of the data was correct, and it appears to be.

Based on the data and the data dictionary it appears that the "media" feature is where we'd find this kind of data, and from looking at events logged during September this feature logs various ways it was opened (command/context/tool) and closed (abort/done/insert). One issue I've run into with this data is that the open & close counts don't line up (there's approximately 15,500 open events but 30,000 close events for the "media" feature), which appears to come from it being possible to open this dialogue in indirect ways.

There doesn't appear to be instrumentation of media searches in VE, thus it seems impossible to answer either of the three questions. If these are key questions that we need a baseline for prior to adding MediaSearch, my recommendation is to have an engineer inspect the instrumentation code and patch it to start logging the information we need. Another possibility might be to A/B test the two searches, removing the need for a wait to get the baseline.

I'll also add @DLynch and @nshahquinn-wmf as subscribers, because they might know something about the instrumentation that I'm missing.

Moving this to the "Needs Review" column on our kanban board for now, so @CBogen and @Ramsey-WMF can review.

To agree with @nettrom_WMF, there's currently no special instrumentation for media. Thus it just logs whatever is automatically logged due to generic hooks in the dialog/context-item systems. In practice, that means that we don't have any particular information about what someone did inside the dialog while it's open -- you could tell the difference between editing an existing image on the page and adding a new one, but I don't think you can see whether that new one is from search or upload.

It'd be pretty simple to add a new event logging when search is used / when a search result is picked. "Location in the image grid" doesn't really fit into current schema neatly, however -- unless you're okay with a potentially infinite number of actions in the form search-result-chosen-34 for each offset.

there's approximately 15,500 open events but 30,000 close events for the "media" feature

Looking at that right now... I think we may have a logging regression around dialog close events. I tried Media, Template, and Cite, and all of them double-recorded their close event:

DLynch mentioned this in T264690: Double-logging of dialog close events.Oct 5 2020, 11:39 PM

• ppelberg subscribed.Oct 5 2020, 11:51 PM

Thanks @nettrom_WMF and @DLynch.

There doesn't appear to be instrumentation of media searches in VE, thus it seems impossible to answer either of the three questions. If these are key questions that we need a baseline for prior to adding MediaSearch, my recommendation is to have an engineer inspect the instrumentation code and patch it to start logging the information we need. Another possibility might be to A/B test the two searches, removing the need for a wait to get the baseline.

I think we need to instrument this before we move forward with the integration, so that we can answer T260254. We can still do an A/B test rather than waiting for a baseline - once instrumented, how long do we need to wait for a valid baseline? 30 days?

It'd be pretty simple to add a new event logging when search is used / when a search result is picked. "Location in the image grid" doesn't really fit into current schema neatly, however -- unless you're okay with a potentially infinite number of actions in the form search-result-chosen-34 for each offset.

I'm assuming the editing team needs to add this event logging - what might be the projected timeline for getting this done?

I think we can live without location in the image grid for now, although this would be really nice to have. What's the impact of having a potentially infinite number of actions in the form search-result-chosen-34 for each offset?

I think we can live without location in the image grid for now, although this would be really nice to have. What's the impact of having a potentially infinite number of actions in the form search-result-chosen-34 for each offset?

Mainly running the queries to analyze the data would be more complicated, I assume. @MNeisler or @nshahquinn-wmf could speak to that more than I could.