Page MenuHomePhabricator

Test MLR models for zhwiki, jawiki and kowiki
Open, MediumPublic


zhwiki and jawiki because of T219533 and kowiki because of T216738

Per @EBernhardson new models are automatically built and shipped to prod, they just need to be tested

Testing consists of running AB tests

First, deploy a config patch, see example here: This will wrap the config object with our custom values. The new models also need to be added to wgCirrusSearchMLRModel in that same InitaliseSettings.php file.

then, turn on the AB tests, example here:

Related Objects

Event Timeline

Restricted Application added subscribers: Stang, revi, Aklapper. · View Herald Transcript
Mstyles updated the task description. (Show Details)
Mstyles added a subscriber: Gehel.
Mstyles renamed this task from Train MLR models for zhwiki, jawiki and kowiki to Test MLR models for zhwiki, jawiki and kowiki.Dec 18 2019, 5:23 PM

Change 559614 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[operations/mediawiki-config@master] Add new MLR models

Change 572063 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[mediawiki/extensions/WikimediaEvents@master] Deploy MLR2020 test

Change 559614 merged by jenkins-bot:
[operations/mediawiki-config@master] Add new MLR models

Mentioned in SAL (#wikimedia-operations) [2020-02-13T19:17:28Z] <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T219534 Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)

Config changes have been deployed but due to a configuration conflict with jawiki using the default for wgCirrusSearchFullTextQueryBuilderProfile (see, some further changes will have to be made to have the model name show up here

Reviewing the history, I think the primary concern related to bm25 and spaceless languages was:

@dcausse predicted the ZRR would go down because, if I recall correctly, we aren't searching for phrases anymore, so any page that randomly has all the same characters as the query in it will be returned.

Essentially in spaceless languages without a good analysis pipeline we fall back to character n-grams, and the search is only reasonable using phrase search. We previously evaluated using a language specific analysis chain for japanese, but the community review of the that (I think it was done by putting an index with the new configuratoin on relforge, and letting users query it from an mw instance in cloud?) didn't find it to be an improvement, so we are still on 2-grams.

Additionally, we cannot switch between tf/idf and bm25 with a simple query-time selection. While we do have a custom hack in the search-extra plugin that does exactly that, we've only used it at development time and never in production. The official method to switch is to change the analysis pipeline and re-index the wiki. This means A/B testing the analysis pipeline requires reindexing one of the two search clusters, and then splitting A/B traffic between them. Fake latency also has to be injected as the eqiad side to account for round trips between app servers and the codfw cluster in the other bucket.

That all sounds like a bunch of additional work that is probably not justified. The remaining options are to either adjust the old query builder to be able to generate ltr queries, or drop jawiki from the A/B test. I haven't had a chance to find the exact code, but i suspect the old query builder not generating LTR queries is a bug of some sort, might be worth looking into.

Testing the Japanese analyzer (Kuromoji) was hard because we weren't set up to test tokenization and there isn't necessarily a canonical tokenization for some sentences, so my automated testing corpus was not 100% reliable—and using RelForge was hard because throwing people at a blank search box is not the best search evaluation plan. My write up is here.

It is possible that Kuromoji has improved in the last 2.5 years, and I'm sure my ability to test it has improved. Korean went much better, and it was easier to identify and even fix some of the problems (by unpacking and customizing the analyzer). I've wanted to look at Kuromoji again for a while, but just hadn't gotten around to it. We could think about doing it in Q4 or Q1 and look at this again after that.

On another note, it would be possible to use RelForge to set up a BM25 index of jawiki there and do a comparison of the impact it has on results, even if you can't assess the quality of the change (and it might be possible to get a human to review some of the changes). It may still not be worth the effort, though.


Talked about this today. Short term: Investigate why the classic query building isn't generating an LTR query, it almost certainly should. If that isn't fruitful we should ship the test without jawiki. Longer term: Elasticsearch is deprecating (turning into a noop) the phrase query we use here in 6.8, and removing it in 7.x. We need to re-evaluate the kuromoji analysis chain and hopefully move ja onto a proper analysis chain before upgrading to 6.8. The other spaceless languages are probably not big enough to need specific support, we can perhaps use shingles since the text content is minimal (outside jawiki).

@dcausse looked into the rescore building, RescoreBuilder::isProfileSyntaxSupported is rejecting the profile containing the LTR because it the query_string search query syntax is reported. Essentially this is saying that the search query needs further parsing on the elasticsearch side, and since our LTR query can't be modified to apply that parsing it instead rejects it outright. Most likely we could add a regex to check how simple the query string is and allow through all queries that are simple search strings and contain no syntax that would be handled by the query_string query.

Instead of filtering the query string queries, we want to move off of query string for spaceless languages and on to using the full text simple match query builder. This will help when we upgrade elastic search and no longer use query strings. In order to the make this move, the FTSM query builder has to be tested in relforge for Japanese. There's currently an upgrade going on with relforge, so this task will be paused until the python upgrade for relforge is complete.

Change 595019 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[operations/mediawiki-config@master] Update ML models

Change 595019 merged by jenkins-bot:
[operations/mediawiki-config@master] Update ML models

Mstyles triaged this task as Medium priority.
Mstyles added a project: Discovery-Search.
Mstyles subscribed.

Change 572063 abandoned by Mstyles:
[mediawiki/extensions/WikimediaEvents@master] Deploy MLR2020 test