We can do a better more balanced assessment of the new language models (T118287) to decide which ones are really not good (e.g., probably Igbo) and which ones are just not appropriate for enwiki (e.g., hopefully French and German).
The obvious approach is to create a "fair" evaluation test set with equal numbers of examples for each language (say, 100 random queries for each language, manually reviewed to make sure they are in the proper language), and evaluating performance on that set.
Randomly sample ~1000 queries from a given wiki, randomize their order, and delete non-target language queries (junk, names, DOI, obvious bots, other languages) until there are 100 good queries. Repeat for N languages, sorted by query volume.
From T118287, the next 20 languages by volume after English are Italian (though known to have many duplicates due to cross-wiki searches), German, Spanish, French, Russian, Japanese, Portuguese, Indonesian, Arabic, Chinese, Dutch, Polish, Czech, Turkish, Farsii, Korean, Swedish, Vietnamese, Ukranian, and Hebrew. (Sorting by "filtered queries" from T118287 drops Hebrew for Finnish and gives a slightly different order—except for Italian, which drops to 9th.)
Estimate is 1 hour for fresh data extraction and setup, plus 1-2 hours per language for most languages.