After reindexing (T342444) is complete, analyze the impact of the changes on various samples of wiki queries.
I have general samples from over 100 wikis, many with task-specific sub-samples with examples of queries that should be affected by apostrophe normalization, camelCase handling, acronym handling, updating word_break_helper, and enabling the icu_tokenizer with icu_tokenizer_repair.
I've been running daily regression tests with these samples since before reindexing began, so I should be able to detect changes from the day of reindexing, and compare that to typical day-to-day changes.
We are generally looking for increased recall in the task-speciifc sub-samples to see how many languages that have examples of a phenomena see an improvement. I will also quickly look at changes in the general sample for a sense of overall impact from these harmonization efforts.