To figure out which analysers to start with, we're using our intuition and data from previous tests. For example, we know that the Chinese language analyser is very bad from our recent tests of BM25 on that wiki, so we can research a new analyser for them. We have the possibility of also prioritising other analysers, but it's hard for us to know where to start; the languages spoken by the Search Team are fairly limited.
It would be great if we could do some outreach to figure out which communities could benefit from having a new language analyser. We'll have to craft the questions we ask them carefully; feedback such as "X query gives Y bad results" is not helpful in this case, since we're talking specifically about bad language analysis.