Could the Wikidata frontend display a warning when users enter strings in Zawgyi encoding?
Background: In Myanmar, the majority of users use legacy keyboards that don’t use proper Unicode encoding. Instead, they use a pseudo-Unicode encoding that is almost like Unicode; only the Myanmar codepoints have non-standard semantics. Web browsers transmit such strings to web servers as structurally valid Unicode text, typically in UTF-8 encoding. On computer systems that implement Unicode according to spec (i.e., almost all computer systems outside Myanmar), Zawgyi-encoded text appears as illegible garbage. This is a notorious problem for all major web services, and sadly it won’t go away anytime soon (complicated story). The Unicode Myanmar FAQ recommends to catch Zawgyi as early as possible, and to store all text in proper Unicode inside the backend database. To my knowledge (which might be wrong), the Burmese Wikipedia has a group of users who manually look for Zawgyi, and manually correct mistakes in Wikipedia articles.
For Wikidata, my proposal would be to detect and discourage Zawgyi in the Wikidata user interface. When a user enters a string, Wikidata would run a Zawgyi detector. If the text has a high Zawgyi likelihood, the UI would then display a warning symbol, perhaps similar to constraint violations. Unfortunately, Zawgyi can’t be 100% reliably detected, especially not on very short strings. So users should still be able to enter a string that gets flagged. But a warning would help; users can then switch to a proper Unicode keyboard.
Zawgyi detection would only need to be called on strings with Unicode characters in the Myanmar code block. This is very fast to check, so the latency impact would be zero for most users.