Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Deskana
	Sep 1 2015, 5:18 PM

Description

The Discovery Department's Q1 goal is to reduce the zero results rate. We've built out a new API (T105746) as an experiment. Initial tests of the API (T109729) were promising. Now let's test it more thoroughly.

Area: Search bar at the top right on desktop on English Wikipedia
Bucketing: 0.01% control, 0.01% experimental group
Measuring: zero results rate, hoping it will decrease with the experimental group
Start date: 2015-09-08
Duration: two weeks

Details

	Subject	Repo	Branch	Lines +/-
	Enable experiment with experimental completion suggester	operations/mediawiki-config	master	+7 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	mpopov	T111858 Analyze results of A/B test on suggester (on or after 2015-09-22)
Resolved	mpopov	T112813 Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-17) redux
Resolved	EBernhardson	T112585 Fix CompletionSuggestion data collection and re-start the test.
Resolved	mpopov	T111857 Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-09)
Resolved	EBernhardson	T105743 Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch
Resolved	EBernhardson	T111078 Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08
Resolved	EBernhardson	T111091 Allow extensions to change the method used to get suggestion results
Resolved	EBernhardson	T111137 Override core suggester in AB test between current suggestions and the experimental cirrus-suggest api

Event Timeline

• Deskana created this task.Sep 1 2015, 5:18 PM

• Deskana raised the priority of this task from to High.

• Deskana updated the task description. (Show Details)

• Deskana added projects: Discovery-ARCHIVED, Discovery-Search (Current work).

• Deskana subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 1 2015, 5:18 PM

• Deskana renamed this task from Run A/B test on the search suggester to measure zero results rate to Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08.Sep 1 2015, 5:20 PM

• Deskana set Security to None.

mpopov added subscribers: Ironholds, mpopov.Sep 1 2015, 5:25 PM

Science details for transparency: we agreed we want to be able to detect that the experimental group is at least 1.5 times more likely to get results than the control group, with 99% power to detect this effect and 95% confidence (meaning that a p-value less than 0.05 will be called significant). We guessed that 65% of the control group is going to get some results, which means that our sample size should be...

R> wmf::sample_size_odds(odds_ratio = 1.5, p_control = 0.65, power = 0.99, conf_level = 0.95, sample_ratio = 1)

2133 observations, although that's with a guess of 65% (= 35% zero results rate) prevalence within the control group. The final number will change as we figure out a more accurate, enwiki-specific prevalence.

For now, we are going with a sampling rate of 0.01% and then we are going to sub-sample down to the actual sample size after we collect the data.

• ksmith added a project: OKR-Work.Sep 1 2015, 6:03 PM

mpopov mentioned this in T109344: Perform a power analysis to figure out sample size for next A/B test.Sep 1 2015, 6:20 PM

• Deskana added a parent task: T105743: Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch .Sep 1 2015, 6:36 PM

• Deskana updated the task description. (Show Details)

EBernhardson added a subtask: T111091: Allow extensions to change the method used to get suggestion results.Sep 1 2015, 6:57 PM

Change 235345 had a related patch set uploaded (by EBernhardson):
[WIP] A/B test for experimental suggestions api

https://gerrit.wikimedia.org/r/235345

gerritbot added a project: Patch-For-Review.Sep 1 2015, 7:56 PM

Typically when opting users into a user test we either do it on a per page bases or a a longer session basis to provide a good experience for the user. I think in this case getting different suggestions from a different page is acceptable, but within a single page load the user will see either the new suggestions or the old ones. Additionally since this is a search as you type deal, there will be multiple events per user that is opted into the test. Will that work or do i need to adjust things?

I've got the following sketched out as the schema: https://meta.wikimedia.org/wiki/Schema:CompletionSuggestions

There is something else that may or may not effect the test. At the minimum its something we don't consider when measuring our zero result rate:

There isn't any throttling of the suggestions in mediawiki. Each time you type a new letter the old request is canceled and a new request is issued. When we log these on the backend we don't know that a request was canceled, php always runs to the end of the request even if it cuts off.

Naively logging in the frontend we would only log the responses that are shown to the user. This would be consistent across the test, just not with our other measurements. It seems sane i just wanted to make it known.

EBernhardson added a subtask: T111137: Override core suggester in AB test between current suggestions and the experimental cirrus-suggest api.Sep 2 2015, 2:38 AM

Change 235391 had a related patch set uploaded (by EBernhardson):
Enable experiment with experimental completion suggester

https://gerrit.wikimedia.org/r/235391

EBernhardson claimed this task.Sep 2 2015, 2:46 AM

EBernhardson moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.

• Deskana updated the task description. (Show Details)Sep 2 2015, 2:56 AM

It looks like opensearch is cached by varnish but I'm not sure that the new api will be cached will this affect the test?

• Deskana closed subtask T111091: Allow extensions to change the method used to get suggestion results as Resolved.Sep 8 2015, 5:06 PM

Change 235391 merged by jenkins-bot:
Enable experiment with experimental completion suggester

https://gerrit.wikimedia.org/r/235391

EBernhardson mentioned this in rOMWC896672ba7690: Enable experiment with experimental completion suggester.Sep 8 2015, 11:31 PM

• Deskana closed subtask T111137: Override core suggester in AB test between current suggestions and the experimental cirrus-suggest api as Resolved.Sep 9 2015, 2:23 AM