Before considering some form of Zawgyi detection and transliteration for Myanmar-language wikis, we should:
- get a sense of the frequency of Zawgyi-encoded queries
- get a sense of the accuracy of Google’s detection library on short (i.e., query-length) strings
- evaluate available transliteration tools and transliteration complexity
- maybe evaluate other detection tools that would be more convenient to implement (like TextCat)
- evaluate detection and transliteration on non-Myanmar text, too
I've also written up more details, adapted from a previous email conversation about this, in my notes on MediaWiki.