Test and analyze Vietnamese language analyzer
I'll spend a bit of time looking around for alternatives to the Elastic-recommended Vietnamese analyzer (and keep an eye out for any Japanese alternatives, since T166731 didn't pan out).

So, unless there is another serious contender, the plan is to test the Vietnamese Analysis Plugin, analyze the results to see if it is better or not, set up a test instance in labs, and post to the Village Pump for feedback.

If all goes well, we'll file tasks to deploy the new analyzer and re-index the relevant wikis.

Unfortunately, I don't think this plugin is mature enough to use, especially on our projects, where foreign words are both common and important.

We should definitely check back in a few versions and see what improvements have been made.

More details on what I found in my write up.

I moved this back to the backlog. The plugin author responded to my bug reports and made several critical fixes. Definitely worth taking another look.

Another round of analysis of the plugin is on MediaWiki.

Potential show stoppers:

  • Certain whitespace conditions cause runtime exceptions.
  • Analysis with the plugin is very slow.

There are also still a number of tokenization errors and inconsistencies, too.

@debt, I've moved this to Done because we aren't going forward with it. If there are future updates we can re-open, or start a new ticket.

Thanks for all the work on this @TJones :)