Page MenuHomePhabricator

Enable BM25 by default in cirrus and evaluate its impact with relcomp on relforge servers
Closed, ResolvedPublic

Description

Enabling BM25 is very simple: we just need to change profiles/SimilarityProfiles.php and reindex.
Tuning BM25 params is a tough task and the way it's implemented in elasticsearch makes it hard to use in an optimization process (may requires closing or even reindex the index).
Depending on the needs we could maybe implement our own similarity. A kind of Super BM25 similarity that could change settings on the fly, it's not clear if it's possible yet but it'll be a nasty hack in elastic for sure. It could be used and enabled only on relforge servers if it helps.

Because we don't know if it's possible/easy I'd suggest to :

  1. Enable BM25 with possibly lower b for array fields
  2. Run relcomp and evaluate the magnitude of change with the ClassicSimiliraty
  3. Tweak some BM25 settings run a second evaluation

We could determine if a change in k1 & b values leads to some interesting results and decide accordingly if it's worth spending some time on this Super BM25 similarity.

Event Timeline

debt triaged this task as Medium priority.Jul 19 2016, 10:08 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.

Erik ran some evaluations:

  • discernatron data (P3859) shows a small preference for bm25.
  • unfortunately PaulScore seems to prefer current results.
debt subscribed.

closing as resolved...we're continuing to monitor the results.