We've been hard at work preparing a new query scoring method, BM25. Run a test with the user satisfaction schema to see if it has any effect on user behavior.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • Deskana | T143585 Initial BM25 A/B Test | |||
| Resolved | mpopov | T143587 Verify data pipeline for bm25 AB test | |||
| Resolved | • debt | T143586 Turn on BM25 test | |||
| Resolved | EBernhardson | T143590 Reindex codfw search cluster for the bm25 AB test | |||
| Resolved | • debt | T143588 Turn off BM25 AB test | |||
| Resolved | mpopov | T143589 Analyze results of BM25 AB test | |||
| Resolved | • debt | T147008 Outcome of BM25 A/B test - our next steps on using BM25 |
Event Timeline
One thing i just noticed from this test, the latency profile is much different with BM25. Before doing a full push to production we will need to run some load tests to get an idea of the expected impact on available server resources.
https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=44&fullscreen
tl/dr (using a 500 pt moving average):
| dc | p50 | p75 | p95 |
| eqiad | 30 | 55 | 120 |
| codfw | 205 | 290 | 390 |
Pulled some more specific numbers from hive.
This first one compares the total php execution time between buckets:
select backendusertests[0],
percentile(cast(tookms as int), 0.5),
percentile(cast(tookms as int), 0.75),
percentile(cast(tookms as int), 0.95)
from CirrusSearchRequestSet
where year=2016 and month=8 and day=31
and size(backendusertests) = 1
and array_contains(requests.querytype, 'full_text')
group by backendusertests[0];| test | p50 | p75 | p95 |
| bm25:bm25_inclinks | 520.5 | 628.0 | 936.1999999999998 |
| bm25:bm25_inclinks_pv | 523.0 | 625.25 | 955.25 |
| bm25:bm25_allfield | 487.5 | 559.0 | 883.1499999999999 |
| bm25:control | 222.5 | 267.75 | 719.6999999999999 |
| bm25:bm25_inclinks_pv_rev | 554.5 | 673.25 | 1086.5999999999995 |
This is a more specific query, summing the time elasticsearch reported spending executing the queries:
add jar /home/ebernhardson/brickhouse-0.7.1-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION sum_array AS 'brickhouse.udf.timeseries.SumArrayUDF';
select backendusertests[0],
percentile(cast(sum_array(requests.elastictookms) as int), 0.5),
percentile(cast(sum_array(requests.elastictookms) as int), 0.75),
percentile(cast(sum_array(requests.elastictookms) as int), 0.95)
from CirrusSearchRequestSet
where year=2016 and month=8 and day=31
and size(backendusertests) = 1
and array_contains(requests.querytype, 'full_text')
group by backendusertests[0];| bucket | p50 | p75 | p95 |
| bm25:bm25_inclinks | 130.5 | 211.0 | 411.4499999999999 |
| bm25:bm25_inclinks_pv | 134.0 | 201.5 | 379.25 |
| bm25:bm25_allfield | 83.0 | 139.25 | 354.04999999999995 |
| bm25:control | 65.0 | 100.0 | 159.0499999999999 |
| bm25:bm25_inclinks_pv_rev | 159.5 | 229.0 | 409.84999999999997 |
Those values are so incredibly different, it makes me wonder what could be taking all that extra time if not elasticsearch. Here is one more that shows percentiles of time taken as seen by php. This should (i think?) just be the above query + network latency and a very small amount of php processing.
The results here seem to indicate network latency, but i'm not sure how that could be the case.
add jar /home/ebernhardson/brickhouse-0.7.1-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION sum_array AS 'brickhouse.udf.timeseries.SumArrayUDF';
select backendusertests[0],
percentile(cast(sum_array(requests.tookms) as int), 0.5),
percentile(cast(sum_array(requests.tookms) as int), 0.75),
percentile(cast(sum_array(requests.tookms) as int), 0.95)
from CirrusSearchRequestSet
where year=2016 and month=8 and day=31
and size(backendusertests) = 1
and array_contains(requests.querytype, 'full_text')
group by backendusertests[0];| bucket | p50 | p75 | p95 |
| bm25:bm25_inclinks | 423.0 | 513.75 | 696.9000000000001 |
| bm25:bm25_inclinks_pv | 421.0 | 513.0 | 688.75 |
| bm25:bm25_allfield | 384.5 | 456.0 | 688.3999999999996 |
| bm25:control | 126.5 | 163.0 | 229.0 |
| bm25:bm25_inclinks_pv_rev | 440.0 | 538.5 | 785.9499999999998 |