Page MenuHomePhabricator

Back test effectiveness of multi-language search by importing relevant indexes to the hypothesis-testing cluster and running zero result queries from enwiki logs.
Closed, DuplicatePublic

Description

I don't think we can do a full end to end test, getting the enwiki index along with the other language indexes into our hypothesis testing cluster is probably a bit too much to ask of it. We probably can though detect the language of zero result queries from enwiki and import the top 2-4 relevant indexes.

So basically:

  • Extract some number of zero result queries from enwiki request logs (ideally enough samples so we have enough foreign language queries)
  • Run all those queries against the language detection plugin and come up with a list of the most relevant indexes to import
  • Import the relevant indexes to the hypothesis-testing cluster
  • Run the queries through the language detector again, this time running them against the suggested indexes and report the results.

Or something like that, adjust as needed.