Page MenuHomePhabricator

Generate new Captcha word list for prod
Open, MediumPublic

Description

Following on from T204611: Generate beta captchas... I was thinking maybe it's time to consider adding more words to the word list we use? Maybe?

Event Timeline

I kind of think maybe we should just go with random letters. I don't think the combining two words thing helps users very much since usually they are weird enough words its not identifyable as a word. But it does probably help attackers quite a bit.

[a-zA-Z0-9]

? How long? 10 characters?

Hmm, http://www.123seminarsonly.com/Seminar-Reports/008/47584359-captcha.pdf has some advice about eliminating characters that look alike (e.g. 1 and l)

I have no idea how long. I would feel weird about anything less than 6, but beyond that, it feels like picking numbers out of a hat. I guess 10 works. Maybe 8? I don't really know.

A bit out of scope for this task, but have we ever considered creating alternative captchas (math, image classifying, etc?)

sbassett triaged this task as Medium priority.Dec 6 2018, 3:21 PM

We have implementations, but often google type dependencies....

A bit out of scope for this task, but have we ever considered creating alternative captchas (math, image classifying, etc?)

I don't know about math, but I do know image classifying has been discussed before. ConfirmEdit has the ability to do math captchas already, BTW.

We've even discussed stuff like using machine learning of mouse click timing and such (T158909) or reCAPTCHA-like micro edits (T34695). There are plenty of other tasks in ConfirmEdit (CAPTCHA extension) that you might look through too.

I suspect a math captcha is probably about the same as having an "I am not a bot" checkbox (a regular one, not the fancy Google kind), computers are good at math and parsing the problem isn't likely to be hard. Not that our current captcha is all that great, T141490 says it can be read by off-the-shelf OCR software from 2014.

Image classifying captchas need a large corpus of classified images. Since MediaWiki is open source, an attacker could probably just download our corpus.

Also amusingly relevant: https://xkcd.com/1897/ and https://xkcd.com/810/ ;)

Interesting - thanks for the context and history, @Anomie.