Page MenuHomePhabricator

Turn on A/B test varying confidence threshold and smoothing
Closed, ResolvedPublic

Description

This test will run for all types of search requests on the cluster regardless of source.

Sample of users

10% of users will take part in the test. Half of those will be our control group, half of them will be our test group.

Parameters to change

  • Reduce confidence from 2.0, to 1.0
  • Change smoothing algorithm from "stupid_backoff" to "laplace"

Rationale

"reducing confidence and increasing smoothing of the suggester. lowering confidence brings in more suggestions,smoothing tries to prefer the better ones"

Expected User Experience

Some search users who would have received zero results and no suggestion will now receive suggestion(s) along with the corresponding results.

Logging destination

Per-execution logs delivered to fluorine via CirrusSearchUserTesting channel

Details

Related Gerrit Patches:
operations/mediawiki-config : masterStart CirrusSearch AB test on suggestion confidence

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a subscriber: EBernhardson.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 5 2015, 9:17 PM
Deskana updated the task description. (Show Details)Aug 5 2015, 9:19 PM
Deskana set Security to None.
Deskana updated the task description. (Show Details)Aug 5 2015, 9:24 PM
Deskana added a subscriber: Deskana.

I've updated the details on this task based on our discussions on Tuesday. The actual change still needs to be defined here.

Deskana updated the task description. (Show Details)Aug 5 2015, 9:32 PM

Can we add something like this to the task description:

Expected User Experience

Some search users who would have received zero results and no suggestion will now receive suggestion(s) along with the corresponding results.

Deskana updated the task description. (Show Details)Aug 6 2015, 1:13 AM

@ksmith Agreed. Done.

dcausse added a subscriber: dcausse.Aug 6 2015, 9:43 AM
Deskana renamed this task from Run AB test varying confidence threshold and smoothing to Turn on A/B test varying confidence threshold and smoothing .Aug 6 2015, 4:38 PM

Change 229462 had a related patch set uploaded (by EBernhardson):
Start CirrusSearch AB test on suggestion confidence

https://gerrit.wikimedia.org/r/229462

Change 229462 merged by jenkins-bot:
Start CirrusSearch AB test on suggestion confidence

https://gerrit.wikimedia.org/r/229462

Joe added a subscriber: Joe.Aug 7 2015, 7:53 AM

Can I have some details on how is this being implemented? How do "extract" users? where do we store that information?

EBernhardson updated the task description. (Show Details)Aug 7 2015, 2:43 PM
EBernhardson added a comment.EditedAug 7 2015, 2:46 PM

The users are sampled and bucketed by hashing a string of (testName IP, User-Agent, X-Forwarded-For) and converting that to a probability between 0 and 1. This ensures when we see a user from the same browser they will have the same experience. If, for example, we have 1 in 10 sampling and 2 buckets then users that hash to a value from 0-0.05 will be in bucket a, 0.05-0.1 will be in bucket b, and .1-1.0 are not members of the test.

The results of the test are being written to fluorine, a new channel CirrusSearchUserTesting was defined and is being written to now (work is in progress to get this shifted from fluorine to kafka though).

Deskana closed this task as Resolved.Aug 7 2015, 8:20 PM
Deskana claimed this task.
Deskana triaged this task as Normal priority.

This was deployed.

Joe added a comment.Aug 8 2015, 8:03 AM

@EBernhardson while this is more than ok for me (no new cookie added - no varnish rule - maybe some code rot in the backend but nothing horrible) is this ok for the future, or is it a one-off opportunity? I think this could get us in serious troubles as in general we cannot rely on client IP stability to identify a user (mobile users may change their IP frequently if they are moving) ,

Is a/b testing this way (cookieless, ip/ua dependent) going to be ok in general?

For this particular test the experience only needs to stay the same for a short period of time. A changing ip address shouldn't be a big problem. For tests that need longer term stability we will have to reevaluate, perhaps only run them in browsers with active local storage like the current search satisfaction schema does.

mpopov added a subscriber: mpopov.Aug 10 2015, 6:21 PM