Analyze Media Search A/B test
Closed, ResolvedPublic
Actions

Description

Now that the A/B test in T254388 is complete, we need to analyze the results to determine whether we can move forward with using the new MediaSearch results.

The plan is to use the simple analysis of preference from interleaved A/B tests described here: Estimating Preference For Ranking Functions With Clicks On Interleaved Search Results.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		nettrom_WMF	T261759 Analyze Media Search A/B test
		Resolved		nettrom_WMF	T265935 Create layperson-friendly version of Media Search A/B test analysis

Event Timeline

Note that this analysis was originally part of T254388 and was already on @nettrom_WMF's radar to finish up when he returns from vacation on September 8, but we decided to break the analysis into a separate ticket since the A/B test itself is complete.

CBogen edited projects, added Structured-Data-Backlog; removed Structured-Data-Backlog (Current Work).Sep 1 2020, 3:06 PM

CBogen moved this task from Triage to Analytics on the Structured-Data-Backlog board.

matthiasmullie mentioned this in T262271: Activate mediasearch profile without requiring an explicit flag.Sep 8 2020, 12:56 PM

LGoto triaged this task as High priority.Sep 8 2020, 5:07 PM

LGoto edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

nettrom_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Sep 8 2020, 8:54 PM

Moving this out of "Doing" as we've discovered that the data gathering had a bug leading to us being unable to determine which algorithm produced a clicked result when interleaving occurred. Will pick up the analysis again once the second iteration of the test has been completed. And yes, we'll QA the data after relaunch to make sure it's working correctly.

nettrom_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Sep 21 2020, 5:14 PM

The analysis has been done and can be found in this Jupyter/R notebook. We find a slight preference for the control condition (legacy search) over Media Search.

I'd like to extend a thank you to @mpopov for reviewing this work! :)

@CBogen & @Ramsey-WMF : let me know what questions you might have about this.

We're unsure if the finding is trustworthy. I'm moving this back to "Doing" to dig further into this.

A huge thanks to @mpopov for doing a lot of work on this, improving the data processing code and figuring out ways massage the data from SearchSatisfaction to pull out the insights!

I've updated the notebook on GitHub with the improved analysis. We've extensively QAed this notebook as well as the old processing code in order to understand where things work and where they break. As far as I can tell, this is as good as we can get it for now, in that if we are to extract more data we'll need to throw a lot more time at it. Instead, I think we should call this good, and if we'll be running additional tests I recommend the instrumentation code changes to explicitly store what team/algorithm produced a clicked/visited result to remove the challenges of mapping the click/visit to a SERP in post-processing.

The conclusion changes in the new notebook: we find a strong preference for the new Media Search algorithm.

matthiasmullie subscribed.Sep 30 2020, 7:34 AM

nettrom_WMF moved this task from Needs Review to [Deprecated] Done (previously: Needs sign-off) on the Product-Analytics (Kanban) board.Oct 6 2020, 5:06 PM

Now that the subtask is resolved and the notebook is accessible, I'm closing this task as well.

	CBogen
	Sep 1 2020, 3:04 PM

Analyze Media Search A/B testClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Analyze Media Search A/B test
Closed, ResolvedPublic
Actions

Related Objects
Search...