An AB test was recently run on enwiki for T187148 from 20180302 through 20180316. The data is stored in HDFS as the SearchSatisfaction schema (hdfs://analytics-hadoop/wmf/data/raw/eventlogging/eventlogging_SearchSatisfaction/hourly/2018/03/*)
We ran 3 standard buckets:
control - The currently deployed ML ranker
classic - The classic non-ML ranker
explorer - The new ranker under test
We also ran 2 interleaved buckets:
control-explorer-i: control in A, explorer in B
classic-explorer-i: classic in A, explorer in B
The main question is if explorer is better than control. Offline testing suggests a strong improvement. Classic was also included to see how much improvement we have gained over the current FY by working on ML ranking.