Page MenuHomePhabricator

Error with Vietnamese input when using search bar in new Vector (due to local AVIM gadget)
Closed, ResolvedPublicBUG REPORT

Description

Hi,

My community just reported an error with the Vietnamese input when we type in the search bar in the new Vector skin.

In Vietnamese Wikipedia there is an input method for the language called "AVIM", and it appears to trigger an error with the search bar, where, when you try to type a word, it is automatically "corrected", and of course this action is not desirable. Noted that not every words you type in is auto-corrected, just some of them.

I think the cause of the error is the incompatibility between AVIM and the new Vector, because when I tried typing in English or using other Vietnamese input method (after turning AVIM off), everything's fine. It's also working okay with the Legacy Vector.

I've made 2 short videos demonstrating this error and attached them here, please take a look. Thanks!


Event Timeline

Hi, does this also happen in the search bar in legacy/old Vector? What are exact steps to reproduce, expected outcome, and actual outcome? Thanks!

Hi @Aklapper, as I mentioned, it's still working fine with the old Vector.

  • Steps to reproduce:
    • (1) switch on the new Vector;
    • (2) turn on the Vietnamese input called AVIM that the Vietnamese Wikipedia has (on the right side bar, see picture below), you can tick on either Telex or VNI (or auto), both have the error. To turn AVIM on, you have to activate it on Preferences/Gadgets and tick on the AVIM box and the box right below it.
    • (3) type something but with tones (like "thần tượng" instead of "than tuong"). It can be quite difficult to type like this, please take a quick look at this wikihow page to grasp the idea.
      Capture.PNG (190×152 px, 6 KB)
  • Expected outcome: when you type "thần tượng" in the search bar, the words should appear to be "thần tượng" whether we have articles for it or not
  • Actual outcome: the word got auto-corrected: to "thần tượ" instead of "thần tượng", or "ngoại bât" instead of "ngoại bất" (notice that the tone is gone)
PPham updated the task description. (Show Details)

For short, type in these letters, respectively:

  • T h a a n f <space> t u o n g w j (Telex)

or:

  • T h a n 6 2 <space> t u o n g 7 5 (VNI)

If I understand correctly there is a gadget that is incompatible with the new search version?
I can't replicate this with all gadgets disabled: https://vi.wikipedia.org/wiki/FC_Barcelona?safemode=1

A fix will be needed in AVIM. Who maintains the AVIM gadget?

Screen Shot 2021-07-19 at 11.17.33 AM.png (134×452 px, 13 KB)

Aklapper renamed this task from Error with Vietnamese input when using search bar in the new Vector to Error with Vietnamese input when using search bar in new Vector (due to local AVIM gadget).Jul 19 2021, 10:05 PM

@Jdlrobson Literally, no one. The developers of this gadget retired a long time ago.

Looking at https://vi.wikipedia.org/wiki/Đặc_biệt:GadgetUsage , this seems to be a default gadget (so we have no idea how many people use it?).
https://vi.wikipedia.org/wiki/MediaWiki:Gadget-AVIM.js and https://vi.wikipedia.org/wiki/MediaWiki:Gadget-AVIM_portlet.js imply it is some 13 year old "Vietnamese Input Method". It would help a lot to know/understand if modern operating systems have solved some/most/all of what this gadget offers, and potentially disable it?

Most modern operating systems support common Vietnamese input methods, such as: Telex and VNI.

But most people still use softwares like UniKey, EVKey, Labankey... because the default inputs of OS doesn't have many convenient features.

About AVIM, I think this gadget is too outdated, many errors and no maintenance activities. Thus, it is best to disable it.

But if possible, I hope that WMF will integrate Vietnamese input methods (Telex and VNI) into Universal Language Selector Extension, as an alternative to AVIM.

In T286863#7225232, @MikePlantilla wrote:

But if possible, I hope that WMF will integrate Vietnamese input methods (Telex and VNI) into Universal Language Selector Extension, as an alternative to AVIM.

That would be T65465: Add Vietnamese (vi) input method. I assume patches are welcome (as this does not depend on WMF doing work but anyone volunteering).

Thanks, I got it. I will add Vietnamese input methods into jquery.ime. But, I don't know Gerrit, I only know GitHub: https://github.com/wikimedia/jquery.ime. Can I create a pull request on this GitHub repository?

I stand corrected; that repository seems to be hosted on Github instead of Gerrit (sigh), indeed.

Looking at https://vi.wikipedia.org/wiki/Đặc_biệt:GadgetUsage , this seems to be a default gadget (so we have no idea how many people use it?).
https://vi.wikipedia.org/wiki/MediaWiki:Gadget-AVIM.js and https://vi.wikipedia.org/wiki/MediaWiki:Gadget-AVIM_portlet.js imply it is some 13 year old "Vietnamese Input Method". It would help a lot to know/understand if modern operating systems have solved some/most/all of what this gadget offers, and potentially disable it?

Every operating system nominally comes with a Vietnamese input method, and some come with decent ones, but all system-level solutions suffer from some common usability problems, even third-party packages. Operating systems provide APIs for keyboard layouts or true IMEs. A keyboard layout doesn’t support typing tone marks at the end of the word, while an IME requires the user to explicitly “commit” a composition, so one missed or mistyped tone mark means you have to retype the whole word. The only way to get around these issues is to install a browser extension or rely on the website to provide its own IME; both approaches have access to the DOM in order to provide a better experience.

The AVIM gadget comes from a time when Vietnamese-language websites were expected to provide their own IMEs because operating system support was much worse, browser extensions didn’t exist yet, and people in Vietnam used Internet cafés that disallowed installing native IMEs or extensions. I think we could get away with disabling AVIM by default, but I would caution against uninstalling it completely. So we need to find a way to make it work when it’s enabled.

In T286863#7225232, @MikePlantilla wrote:

About AVIM, I think this gadget is too outdated, many errors and no maintenance activities. Thus, it is best to disable it.

But if possible, I hope that WMF will integrate Vietnamese input methods (Telex and VNI) into Universal Language Selector Extension, as an alternative to AVIM.

AVIM is older than 13 years, maybe old enough to drive. The fact that it doesn’t see much change doesn’t necessarily mean it’s abandoned, just that it’s mature for what it’s designed to do and not flexible enough to accommodate much in the way of enhancements.

Back in 2011, I split AVIM into two gadgets: AVIM is a standalone IME that’s largely the same as the third-party script we adopted, while AVIM portlet is what integrates the IME into various MediaWiki skins. At the time, a number of websites were known to directly hotlink the AVIM gadget for their own IME needs, but AVIM portlet only uses the core business logic in it.

I split the gadget in two with the expectation that the official third-party AVIM script would change significantly over time and that the portlet code would eventually need to be overhauled as MediaWiki skins evolved. Evidently the latter happened first. However, I have occasionally made changes to both gadgets, based on a Firefox extension that I maintain on the side. For example, I fixed AVIM to work with VisualEditor in 2013 (which I think was before jquery.ime became compatible with VisualEditor).

I would like to see Vietnamese integrated into the ULS UI at some point. However, as I looked into T65465, I found jquery.ime to be extremely constraining since it relies on a declarative list of patterns. Vietnamese requires true input methods, not simpler keyboard layouts like, say, some Indic languages. AVIM has some of the thorniest JavaScript code I’ve ever come across, right up there with obfuscated code, but it works much better than other Vietnamese IME scripts like Mudim that took the same approach as jquery.ime. So if jquery.ime were to gain Vietnamese support, it would basically be bolted onto the side and share nothing in common with the other languages. I think it would be more practical to rewrite the AVIM portlet gadget to add AVIM’s input methods to the ULS UI and ignore jquery.ime for now.

Are the errors specific to the new Vector skin, or are there other errors? Having already fixed several site gadgets to work with the new Vector skin, I’d imagine it’s just one or two minor issues causing a slew of cascading errors. Even if migrating to jquery.ime is the right long-term solution, we shouldn’t toss out the whole functionality just because of a bug. I noticed for instance that the AVIM Firefox extension, which is based on this gadget, has no problem with the new Vector search bar.

This change fixes the errors reported above by synthesizing an input method, based on this change in an old version of the AVIM Firefox extension. I tested it in Firefox 56, Firefox 91, Chrome 94, and Safari 13.1 while logged in. (I don’t know of a way to test the new Vector improvements while logged out.)

There was a race condition in the typeahead search functionality. If I attempted to type “Việt” using Vie^.t in VIQR at a normal typing speed, the text field ended up saying “Vie”, matching the results below. What happened was that typing Vie kicks off a search request, but I typed ^.t before the search results come in asynchronously. Once the search results became available, typeahead search replaced the contents of the text box to match the query that originally kicked off the search request.

Typeahead search normally debounces search requests when characters are entered into the text field. However, in AVIM or any Vietnamese IME that supports “free typing” (gõ tự do), characters such as ^ and . are dead keys, as are letters such as n that appear after diacritics. AVIM intercepts these characters and uses preventDefault to keep any other site script from interpreting the keystroke before AVIM has a chance to manipulate the text field’s value.

A lot of reactive Web applications have this problem, so I’m not surprised the Vue.js typeahead search does too. Back in 2016, the AVIM Firefox extension got lots of reports of a similar issue when Facebook rolled out a React-based version of Messenger, since worked around and later fixed more robustly using extension-specific APIs. The issue still affects popular system-level Vietnamese IMEs, which don’t have access to those APIs or the DOM.

@PPham, can you verify that the issue is fixed? Thanks!

PPham claimed this task.

@mxn Thanks, it seems that it's back to normal now even if I turn AVIM on! I'll mark this task as resolved.

Thanks @mxn for fixing AVIM.

I actually made an Vietnamese IME based on jquery.ime. I have used a list of special regex patterns (not like the patterns of others), and some special functions to support “free typing” feature, and some other features like "spelling correction", support Telex, VNI, VIQR, with only 400 lines of code.

I will show the demo at Wikipedia:Thảo luận (Village pump) page when the software is complete. Hope you can make a review.

Thanks @MikePlantilla, that looks promising indeed! I look forward to taking the implementation for a spin. Let’s continue the conversation over in T65465.