Page MenuHomePhabricator

Re-tune query weights for elasticsearch 5
Closed, DeclinedPublic

Description

Elasticsearch 5 does away with coordinating factors, which changes how our scoring works. As one example from our browser tests:

Search for: Relevancyclosetest Foo
Result: 'Relevancyclosetest Foô'
es2 : 0.5 (coord) (24.794167 (title) + (9.484357 (suggest) * 0.5 (coord)) + 4.4098306 (text)) + 8.483211 (phrase) = 25.45 * 10(lang) = 254.5
es5: 25.698328 (title) + 10.148193 (suggest) + 4.9270577 (text) + 9.486414 (phrase) = 50.25 * 10 (lang) = 502.5

On es2 suggest had the coord factor applied twice, the original suggest score was 9.484357 and it was cut in half to 4.74, then after summing the parts it was cut in half again before adding the phrase, which has no coord.

We won't be able to get the exact same scoring in es5 as we had in es2, because the coord factor was determined by the # of query tokens. Before we can re-tune these weights we need to get es 5.x on the relforge cluster.

Event Timeline

may not be necessary, as prod bm25 configuration doesn't use coordination factors

EBernhardson added a subscriber: dcausse.

per discussion with @dcausse this is not necessary, as prod has already disabled coordination factors with the bm25 update.