As a maintainer of CirrusSearch I would like to reduce the usage of the query_string query so that I can reduce technical debt and verify that the fixes made for T262845 are valid.
The use of the default query builder and classic rescore window are left overs of the switch to BM25 for which the tests were inconclusive (https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/).
I think it would make sense to re-assess this by running another A/B test because many components related to these languages have changed since then:
- auto_generate_phrase_queries is no longer available and was probably the cause of the low recall on such languages: T219267
- there were no dedicated analyzers for chinese, japanese and thai (T158203, T166731, T151743)
I suggest testing:
- wgCirrusSearchFullTextQueryBuilderProfile: perfield_builder
- wgCirrusSearchRescoreProfile: wsum_inclinks
on all wikipedias using spaceless languages:
- bowiki, dzwiki, ganwiki, jawiki, kmwiki, lowiki, mywiki, thwiki, wuuwiki, zhwiki, zh_classicalwiki, yuewiki, zh_yuewiki, bugwiki, cdowiki, crwiki, hakwiki, jvwiki, nanwiki, zh_min_nanwiki
AC:
- run an A/B test on these wikis
- provide some data to ensure that the fixes for T262845 are valid