Page MenuHomePhabricator

Detect large IME insertions as pastes
Open, Needs TriagePublic

Description

Some mobile IMEs offer a paste button above the keyboard which inserts text into VE without firing any paste events.

We should detect insertions above a certain threshold of words/chracters (to exclude IMEs that insert whole words while typing, e.g. by swiping or autocomplete) and annotate those as pastes.

Event Timeline

Change #1194907 had a related patch set uploaded (by Esanders; author: Esanders):

[VisualEditor/VisualEditor@master] Treat detected insertions above thresholds as IME pastes

https://gerrit.wikimedia.org/r/1194907

Hmm this is an intriguing idea. However there are some IMEs that "paste" a sentence at once. Below is an example with a Chinese text input method. Many voice input methods do this too I think. Therefore the thresholds would need to be high.

Could we exclude observations that happen during the composition event cycle?

Unfortunately composition events are not at all consistent across IMEs (some don't even issue them at all). I think it would be better to go with a higher limit, like say 250 chars. Coincidentally that would also help to exclude other things that are likely to be reasonable short pastes, e.g. the official long name of an institution, where copyright law / Wikipedia policy is unlikely to be relevant.

Here's an example with voice input running well over 100 characters in a single "paste". I think for a mobile user writing a paragraph from scratch, that would be pretty feasible.

We should decide whether or not the issues mentioned amount to a showstopper.

That seems like a pretty rare edge case. Voice input is probably pretty rare, voice input to dictate multiple sentences rarer still. Other voice IME's / languages will insert the text in smaller chunks as well.

ppelberg subscribed.

Would it be accurate for me to think the issue here is that someone using an IME could, in essence, bypass Paste Check because the paste button within it does not emit the kind of paste event that Paste Check depends on?

If so, how might we estimate how prevalent this sort of thing is?

In the meantime, I'm going to move this to the backlog.