Accept CAPTCHA responses with diacritics removed
Open, MediumPublicFeature
Actions

Assigned To

None

Authored By

	mxn
	Mar 28 2014, 8:36 AM

Description

With a fix for bug 5309, such as the one discussed at https://gerrit.wikimedia.org/r/121255/, it’s entirely possible that a user might get a CAPTCHA with illegible diacritics. Diacritics in Latin alphabets can look identical to one another when distorted, for example i í ì ỉ, or ó ơ.

For better usability, ConfirmEdit should display a CAPTCHA containing diacritics but require the user to enter the characters without diacritics. There’s a third-party module called Unidecode that does a decent job of accent folding.

One tradeoff would be that such CAPTCHAs might be easier for a bot to crack. There’s also the issue that a character like Ê might be considered a base letter in one language (as in Vietnamese) but a letter with a diacritic in another (Portuguese).

Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63217
https://github.com/mitsuhiko/babel/issues/89

Details

Reference: bz63216

Related Objects
Search...

Status	Subtype	Assigned	Task
Open	Feature	None	T65216 Accept CAPTCHA responses with diacritics removed
Open		None	T7309 Localize captcha images
Declined		None	T34695 Implement, Review and Deploy Wikicaptcha
Declined		jayvdb	T94186 Security review of Wikicaptcha

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:53 AM

• bzimport added projects: ConfirmEdit (CAPTCHA extension), I18n.

• bzimport set Reference to bz63216.

• bzimport added a subscriber: Unknown Object (MLST).

mxn created this task.Mar 28 2014, 8:36 AM

I'm not sure about the "only" part: for usability it's better if the system is completely agnostic to details, or I may correctly enter all diacritics and have my solution rejected for no reason.

When implementing this we're probably going to use some standard Unicode solution for case folding and diacritics/accent folding.

Yes, this is absolutely necessary. Not only diacritics might not be visible, but also some users may not have the keyboard to enter them.

I am not sure how to implement the folding, and it may even be language-dependent. For example, users may enter 'ö' as 'o' or as 'oe', or 'đ' as 'đ', 'ð', 'd' or 'dj'. A possibility is to simply avoid words with diacritics, which should be possible for most languages.

In future, when non-Latin captchas are implemented, the same should apply to alphabets (f.e. it should be possible to enter a Cyrillic captcha in Latin alphabet).

In T65216#653257, @Nikola_Smolenski wrote:

A possibility is to simply avoid words with diacritics, which should be possible for most languages.

That probably will help with Western European languages. Unfortunately it’d also exclude the vast majority of Vietnamese, leaving a word list too short to serve as an effective CAPTCHA word list.

Libraries like ICU have diacritic-folding facilities that should make it possible to accept o for ở and both o and oe for ö, as long as the source language is known (which it is in this case). Other scripts would be supported to some extent, though transliteration is a far messier problem than diacritic folding.

Restricted Application added a subscriber: Florian. · View Herald TranscriptAug 28 2016, 9:59 PM

In T65216#2589692, @mxn wrote:

Libraries like ICU have diacritic-folding facilities that should make it possible to accept o for ở and both o and oe for ö

Yes please. (Can we update the task summary? That "only" makes me shudder.)

mxn mentioned this in T7309: Localize captcha images.Feb 13 2017, 4:21 AM

Platonides renamed this task from Only accept CAPTCHA responses with diacritics removed to Accept CAPTCHA responses with diacritics removed.Apr 18 2017, 12:21 AM

Liuxinyu970226 mentioned this in T65217: Augment our AntiSpoof normalization data with Unicode/CLDR data.Jun 22 2017, 10:48 PM

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:23 PM

Aklapper removed a subscriber: • wikibugs-l-list.

Accept CAPTCHA responses with diacritics removedOpen, MediumPublicFeatureActions

Description

Details

Related ObjectsSearch...

Event Timeline

Accept CAPTCHA responses with diacritics removed
Open, MediumPublicFeature
Actions

Related Objects
Search...