Page MenuHomePhabricator

Communicate to community the upcoming BM25 release
Closed, ResolvedPublic


Let's workout a plan to let the community know our rollout plan for BM25.

Event Timeline

Email sent to wikitech-l, wikitech ambassadors, and discovery email lists:

Latest search updates
After extensive testing over the last several months using a new search query scoring method called BM25 (Best Matching) [1], we recently completed a limited ​production ​release to the following top languages: English, German, Spanish, Russian, Portuguese, French, Italian, Polish, Dutch and Arabic. This new release is replacing the older search method called tf-idf (term frequency-inverse document frequency) [2].

We have ​additional testing to do [3,4] to figure out if BM25 will work in languages that don’t use spaces in-between their words​,​ i.e.: Japanese, Chinese, etc.

The Discovery team announces much of ​our​ completed work in weekly status updates [5​, 6​], but some of the work isn’t actually obvious to anyone who uses our search engine​ - t​hat is because it isn’t actually ‘live’ until a complete re-index of the servers occur. We’ve created a recurring ticket in Phabricator [​7​] to keep track of the work that goes live​ in production​ after a re-index, such as the one we’ve also just completed. A few​ highlights​ of the ​recent ​​re-index are implementing ascii-folding for the French language and ​fixing several​ bugs​ for French ÿ, and Russian ’Е’ and 'Ё' when ​those characters are ​entered in a search query.

Cheers from the Discovery Search Team!