Page MenuHomePhabricator

Initial BM25 A/B Test
Closed, ResolvedPublic

Description

We've been hard at work preparing a new query scoring method, BM25. Run a test with the user satisfaction schema to see if it has any effect on user behavior.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2016, 6:10 PM
mpopov added a subscriber: mpopov.Aug 22 2016, 8:25 PM
debt triaged this task as Medium priority.Aug 25 2016, 10:09 PM
debt moved this task from needs triage to Current work on the Discovery-Search board.
debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

One thing i just noticed from this test, the latency profile is much different with BM25. Before doing a full push to production we will need to run some load tests to get an idea of the expected impact on available server resources.

https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=44&fullscreen

tl/dr (using a 500 pt moving average):

dcp50p75p95
eqiad3055120
codfw205290390
EBernhardson added a comment.EditedSep 2 2016, 3:21 PM

Pulled some more specific numbers from hive.

This first one compares the total php execution time between buckets:

select backendusertests[0], 
       percentile(cast(tookms as int), 0.5),
       percentile(cast(tookms as int), 0.75),
       percentile(cast(tookms as int), 0.95)
  from CirrusSearchRequestSet
 where year=2016 and month=8 and day=31
   and size(backendusertests) = 1
   and array_contains(requests.querytype, 'full_text')
 group by backendusertests[0];
testp50p75p95
bm25:bm25_inclinks520.5628.0936.1999999999998
bm25:bm25_inclinks_pv523.0625.25955.25
bm25:bm25_allfield487.5559.0883.1499999999999
bm25:control222.5267.75719.6999999999999
bm25:bm25_inclinks_pv_rev554.5673.251086.5999999999995

This is a more specific query, summing the time elasticsearch reported spending executing the queries:

add jar /home/ebernhardson/brickhouse-0.7.1-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION sum_array AS 'brickhouse.udf.timeseries.SumArrayUDF';

select backendusertests[0], 
       percentile(cast(sum_array(requests.elastictookms) as int), 0.5),
       percentile(cast(sum_array(requests.elastictookms) as int), 0.75),
       percentile(cast(sum_array(requests.elastictookms) as int), 0.95)
  from CirrusSearchRequestSet
 where year=2016 and month=8 and day=31
   and size(backendusertests) = 1
   and array_contains(requests.querytype, 'full_text')
 group by backendusertests[0];
bucketp50p75p95
bm25:bm25_inclinks130.5211.0411.4499999999999
bm25:bm25_inclinks_pv134.0201.5379.25
bm25:bm25_allfield83.0139.25354.04999999999995
bm25:control65.0100.0159.0499999999999
bm25:bm25_inclinks_pv_rev159.5229.0409.84999999999997

Those values are so incredibly different, it makes me wonder what could be taking all that extra time if not elasticsearch. Here is one more that shows percentiles of time taken as seen by php. This should (i think?) just be the above query + network latency and a very small amount of php processing.

The results here seem to indicate network latency, but i'm not sure how that could be the case.

add jar /home/ebernhardson/brickhouse-0.7.1-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION sum_array AS 'brickhouse.udf.timeseries.SumArrayUDF';

select backendusertests[0], 
       percentile(cast(sum_array(requests.tookms) as int), 0.5),
       percentile(cast(sum_array(requests.tookms) as int), 0.75),
       percentile(cast(sum_array(requests.tookms) as int), 0.95)
  from CirrusSearchRequestSet
 where year=2016 and month=8 and day=31
   and size(backendusertests) = 1
   and array_contains(requests.querytype, 'full_text')
 group by backendusertests[0];
bucketp50p75p95
bm25:bm25_inclinks423.0513.75696.9000000000001
bm25:bm25_inclinks_pv421.0513.0688.75
bm25:bm25_allfield384.5456.0688.3999999999996
bm25:control126.5163.0229.0
bm25:bm25_inclinks_pv_rev440.0538.5785.9499999999998
Deskana closed this task as Resolved.Oct 28 2016, 4:24 AM
Deskana claimed this task.
Deskana added a subscriber: Deskana.

I love it when an epic comes together.