Page MenuHomePhabricator

Deploy improved FancyCaptcha
Open, LowestPublic

Assigned To
None
Authored By
tstarling
Jul 27 2016, 10:13 PM
Referenced Files
F8617553: Screen Shot 2017-07-04 at 16.23.51.png
Jul 4 2017, 3:39 PM
F8604324: threshold.png
Jul 3 2017, 10:49 AM
F4313223: new.png
Jul 27 2016, 10:13 PM
F4313219: old.png
Jul 27 2016, 10:13 PM
Tokens
"Like" token, awarded by MarcoAurelio."Like" token, awarded by Bawolff.

Description

In 2014, I investigated FancyCaptcha's resistance to OCR. I found that it had essentially no resistance, that it could be trivially broken by open source software without image preprocessing or OCR engine configuration.

In these two changes, I implemented changes which were confirmed to defeat such naïve OCR attacks. Specifically, I tweaked the tunable parameters to improve distortion of the baseline, and added low-spatial-frequency noise and a gradient to defeat thresholding.
These changes were never deployed to WMF. I propose now doing so.

Here is some representative output:

OldNew
old.png (929×235 px, 74 KB)
new.png (1×237 px, 173 KB)

The procedure to regenerate the captcha image set is documented at https://wikitech.wikimedia.org/wiki/Generating_CAPTCHAs

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Is it still necessary to specify --blacklist, now that one is provided with the extension and used by default? Is the word list still located at /home/aaron/words?

Yes, the blacklist that the WMF uses has more words in it.

They are in that location for now, but ops have put them into the private puppet repo, and as part of T150029 they will be staged on disk at /etc/fancycaptcha/words and /etc/fancycaptcha/badwords

Hi guys. Today we've got a nasty bunch of spambots registering. Can we move forward with this? Thanks!

Hi guys. Today we've got a nasty bunch of spambots registering. Can we move forward with this? Thanks!

AFAIK we're still waiting for "the community" to decide we can deploy this improved version

Not sure offhand which ticket had the discussion, or if it was onwiki or whatever

FWIW, it's easy enough to switch over, and I don't have a problem actioning it. The question is who needs to give the sign off

I've locked 200 spambots myself today. What's the best way to achieve consensus for this? RfC linked in tech news?

[ - acl* ; public task, sorry ]

I'd say if @Bawolff / Security-Team and @tstarling is okay, we could deploy it. We ain't sure if it will make any difference indeed, but if it does not it may give us a hint I think.

[ - acl* ; public task, sorry ]

I'd say if @Bawolff / Security-Team and @tstarling is okay, we could deploy it. We ain't sure if it will make any difference indeed, but if it does not it may give us a hint I think.

So in the past, there's been some disagreement over:

  • If community consensus is needed (aka have an RFC on meta) to deploy the new changes
  • What the effect would be in registration of real human users, and more importantly, can we really effectively measure it?
  • What is the affect on spam bot registration, and can we measure it?
  • What is the affect on percentage of successful solves by real humans, and can we measure it?

Maybe a short term deployment would be in order - deploy for say a week and see how that affects spam bots, see if users complain that captchas are harder to solve, and after one week re-evaluate.

In any case, I don't really have strong opinions and @Reedy is the member of Security-Team who knows the most about captcha stuff, so I defer to him.

The stats are crappy, see T157735 and some vague numbers in T152219

Since today we're been litteraly flooded again with spambots, I suggest that we deploy them for one or two weeks and see if anything bad happens. That'll allow us to gather too some stats/numbers and see if they have any effect in the counter-spam activities. Once deployed, I suggest we inform User-notice so people is aware. Does that sound right? Could we have it deployed in today's SWAT or earlier? Regards.

Waiting for users to complain is not a good strategy - power users can deal with it and non-power users don't complain, just leave quietly. Once I accidentally deployed a bug to a major product aimed at non-power users which completely broke it on enwiki in a major browser (10%+ user share) and it took a week to receive the first complaint. It will be much worse for something that specifically targets new user registrations.

Re stats, made a dashboard for convenience: https://grafana.wikimedia.org/dashboard/db/captcha-failure-rates

Is it really immune to thresholding?

threshold.png (1×237 px, 22 KB)
looks pretty readable to me after a simple threshold filter in the gimp. Untested as I don't have any ocr software installed to test it against.

Is it really immune to thresholding?

threshold.png (1×237 px, 22 KB)
looks pretty readable to me after a simple threshold filter in the gimp. Untested as I don't have any ocr software installed to test it against.

What options etc was that? In Tims original post...

new.png (1×237 px, 173 KB)

was proposed to be how they'd look. Which looks quite a bit different to what you've posted :)

It's probably worth noting that Tim made the changes in September 2014, so nearly 3 years ago. OCR software will have improved too in that time...

https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit/commits/master/captcha.py

Since today we're been litteraly flooded again with spambots, I suggest that we deploy them for one or two weeks and see if anything bad happens. That'll allow us to gather too some stats/numbers and see if they have any effect in the counter-spam activities. Once deployed, I suggest we inform User-notice so people is aware. Does that sound right? Could we have it deployed in today's SWAT or earlier? Regards.

I'd advise against running in a swat window. It takes a long time to run the generation script, though, should be somewhat quicker after Florian fixing T157734

I could do it in the Security deploy window, as we have a longer window tonight.

@Reedy that's just threshold set at ~60

deploy for say a week and see how that affects spam bots

+1: a short test may tell us whether it's already useless, although it won't be able to tell us whether the spambots' OCR will be adapted in a few days more.

CAPTCHA is useful, somewhat. I still remember that not so long ago they
disabled it as a test on mediawiki.org and they have to switch it back
within hours due to the sudden increase of spambot registration. It is
better to have it for now until a better solution is found. I know you'll
wave hands at me but maybe we should implement a system like reCAPTCHA
which seems to be working well (at least most sites I visit have been
switching from old systems to that new one, so it might indicate some
success...). Note the like. I know our privacy policy won't allow us to
use reCaptcha directly unless it is possible not to submit user data to
Google, so maybe we could work in creating a MediaWiki extension or update
what we currently have?

Maybe Milimetric could help gather accurate stats for the test period so if
we see a strange peaks of captcha failing we can investigate them?

Is it really immune to thresholding?

threshold.png (1×237 px, 22 KB)
looks pretty readable to me after a simple threshold filter in the gimp. Untested as I don't have any ocr software installed to test it against.

It's not that readable to a computer. Adding the gradient took tesseract success rate from ~10% to <0.1%. I tried preprocessing the images with various thresholds before feeding them to tesseract, and it could generally only get a few of the letters, in the region of the image where the threshold happened to be optimal.

OOI, was that on a recent version of tesseract? Or was that when your changes were made in 2014?

Just thinking if we should be looking to make further tweaks before trying to use it, and similarly, if newer versions have a better success rate than ones from 3 years ago

Note that OTRS volunteers already receive messages of people who can't even read the current captcha.

OOI, was that on a recent version of tesseract? Or was that when your changes were made in 2014?

Just thinking if we should be looking to make further tweaks before trying to use it, and similarly, if newer versions have a better success rate than ones from 3 years ago

It was the packaged version of whatever Ubuntu I was using in 2014, presumably Trusty, in which case it was Tesseract 3.03. The current stable version is 3.05, just a minor update. The current git master is termed "4.0 alpha" and includes a "new neural network system based on LSTMs, with major accuracy gains", so that may indeed produce different results.

Note that the new FancyCaptcha can be broken with the old Tesseract with a few minutes' work, by just subtracting the gradient (which is fixed), or by using edge detection instead of thresholding. The point is to require those few minutes' work. There's a fair chance the spammers have already done something along those lines, I'm not guaranteeing that this will work.

I'm somewhat interested in setting up a honeypot and using it to test some new ideas against real spambots. I think distorted text is a dead-end, it's not a long-term development direction. So I'm not really interested in doing further tweaks to FancyCaptcha.

Note that OTRS volunteers already receive messages of people who can't even read the current captcha.

I would be fine with just turning it off. But it seemed pretty pointless to deter only the humans, and allow the bots, we should at least be able to deter both, right?

Real people can be added to the captcha-exempt global group temporary to
let them pass the captchas until they are no longer required to solve them.
I never received much complaints in that sense though.

Real people can be added to the captcha-exempt global group temporary to
let them pass the captchas until they are no longer required to solve them.
I never received much complaints in that sense though.

That's ridiculous. a) you can't add to that group unregistered users which is the main case the system requires captcha b) you are saying "if you wish to edit and can't read captcha, you must find somebody to do it for you for the first time and apply for exemption" which will make users saying "in that case I do not want to edit at all" (because they do not love Wikipedia at the moment).

I've received complains for Wikimedia Czech Republic's instructor for senior's courses. In significant amount of cases the instructor must solve the captcha instead of the trainee because they simply can't read it. I really do not think that this will a) decrease the amount of spambots b) decrease number of users stopped by captcha.

I do not think it is good idea to add new captcha which is less readable than the current one.

From my expierence, Wikipedia's captcha is one of the hardest to solve.

I know our privacy policy won't allow us to use reCaptcha directly unless it is possible not to submit user data to Google, so maybe we could work in creating a MediaWiki extension or update what we currently have?

See T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) .

That's ridiculous. a) you can't add to that group unregistered users which is the main case the system requires captcha b) you are saying "if you wish to edit and can't read captcha, you must find somebody to do it for you for the first time and apply for exemption" which will make users saying "in that case I do not want to edit at all" (because they do not love Wikipedia at the moment).

I've received complains for Wikimedia Czech Republic's instructor for senior's courses. In significant amount of cases the instructor must solve the captcha instead of the trainee because they simply can't read it. I really do not think that this will a) decrease the amount of spambots b) decrease number of users stopped by captcha.

I do not think it is good idea to add new captcha which is less readable than the current one.

From my expierence, Wikipedia's captcha is one of the hardest to solve.

Plus, the account request workflow is clunky, at best, for people who can't (for accessibility issues etc) complete our captchas as it is

Nice to have someone with the opinion that they're unreadable, so worth digging into it a bit (I should point out that I'm vaguely neutral about it)

Just to be clear, it's for the lack of readability, rather than a lack of it being localised words? (ie in English, not in Czech. Which would seem strange when so many Czech people speak English pretty well, not sure if reading is quite so much of an issue)

And by "seniors" you presumably do mean senior citizen? Which I guess is a common group to potentially have vision issues? Not that that should detract from any reasoning etc.

FWIW I don't find this hard to solve/read, but I do have good vision.

Screen Shot 2017-07-04 at 16.23.51.png (166×508 px, 36 KB)

I'm not telling they are totally unreadable (or unsolvable). I'm telling that they are solvable if you have good vison and are significant accesibility issue which should be solved. What about QuestionCaptcha? It will take some time to think out some easy questions and we should switch them too but this will stop spambots totally I think. Or significantly decreasing number of spambots at least.

Yeah, I mean senior citizens.

It will take some time to think out some easy questions and we should switch them too but this will stop spambots totally I think. Or significantly decreasing number of spambots at least.

It will stop non-targeted spam bots. But unless you make say over a million questions, it won't stop people intentionally targeting Wikipedia. Additionally the questions have to be easy enough that everyone can answer, including non-english speakers.

That's true. We can localise them too as well as the interface.

That's true. We can localise them too as well as the interface.

Not without publicly disclosing what the questions/answers are. Which may work for a spam bot not specifically targeting us, but if they are trying to target us, than they would just take all the questions.

Ultimately, the current captcha is the worst possible compromise - Its hard to read for humans, easy to read for machines. We could go in 2 possible directions, make the captchas easier for humans since bots can already read them, or make them harder so bots can't read them.

Googles NoCaptcha implementation seems to be getting popular. No idea how successful it is.

The other options being the "select all the dogs" type photo ones... Which would be nice with some way of feeding back the data for categorisation usage or similar on common

This seems like great opinion. There is one known problem with our current privacy policy which prohibits just using NoCaptcha. Maybe we can create own alternative?

I would be fine with just turning it off. But it seemed pretty pointless to deter only the humans, and allow the bots, we should at least be able to deter both, right?

Maybe instead of making the captcha harder to read, we could make it easier and see if we can find something that still deters the simplistic spambots it deters now, with less collateral damage?

That's ridiculous. a) you can't add to that group unregistered users which is the main case the system requires captcha

No it is not. We receive from time to time requests to create accounts for people who cannot read CAPTCHA. We create the accounts for them and add them to the global group temporary. I guess people from the enwiki account creation team and their UTRS tool could provide some stats about how many requests for account creation are created using that rationale (addenda: the confirmed local group can be used to avoid new users resolve captcha). I accede it is not optimal though, but what else can we do for now?

What it is really ridiculous is to have volunteers' time absorbed exclusively on locking spambots, and doing so for years.

https://grafana.wikimedia.org/dashboard/db/authentication-metrics should show the direct effects on registration/login.

How interesting. The data from yesterday's API failures around 00:00 UTC matches as well with the quiet period we've seen in the abuse and spam blacklist logs, and also matches more or less the time in which I locked 200 accounts and similar number of IP addresses in a batch detected by our systems. This confirms our suspicion that these are automated programs. Maybe strenghthening CAPTCHA on API requests could be an option as well? We ain't talking about simply registration, they sometimes find a non-blacklisted domain or get around a filter to post actual spam to the wikis. Thanks.

I would be fine with just turning it off. But it seemed pretty pointless to deter only the humans, and allow the bots, we should at least be able to deter both, right?

Maybe instead of making the captcha harder to read, we could make it easier and see if we can find something that still deters the simplistic spambots it deters now, with less collateral damage?

I think if we don't go in the harder direction, we should go in the easier direction. I somewhat suspect (but don't know) that the current captcha would be just as effective as writing on an image with no distortion. I still think its worth deploying this new harder version, even if only for a short time, in order to determine whether it would actually be effective or not. We have very little information on how effective our various options are. We aren't going to find out unless we try.

The other options being the "select all the dogs" type photo ones... Which would be nice with some way of feeding back the data for categorisation usage or similar on common

That's not exactly future-proof either, image recognition APIs like Google Vision are pretty accurate in telling what the thing in an image is. Also we would need a secret source of image labels and out projects are not really meant to provide secret things.

Maybe strenghthening CAPTCHA on API requests could be an option as well?

That would still affect the official Android/iOS apps at least. And it might or might not affect spambots (they don't necessarily use the API).

Nobody has explained how to actually interpret the metrics we are collecting. If we deploy this and the failure rate goes up, what is the conclusion? Is it stopping bot edits or deterring humans? Everything is mixed together.

EDIT: I'm doing some more analysis myself and putting it on T152219.

Note that OTRS volunteers already receive messages of people who can't even read the current captcha.

I would be fine with just turning it off. But it seemed pretty pointless to deter only the humans, and allow the bots, we should at least be able to deter both, right?

CAPTCHA is useful, somewhat. I still remember that not so long ago they
disabled it as a test on mediawiki.org and they have to switch it back
within hours due to the sudden increase of spambot registration.

This was https://gerrit.wikimedia.org/r/177494 and https://gerrit.wikimedia.org/r/177708 from December 2014. There are some related notes here: https://www.mediawiki.org/wiki/Extension:ConfirmEdit/FancyCaptcha_experiments.

Nobody has explained how to actually interpret the metrics we are collecting. If we deploy this and the failure rate goes up, what is the conclusion? Is it stopping bot edits or deterring humans? Everything is mixed together.

EDIT: I'm doing some more analysis myself and putting it on T152219.

The primary metric i wanted to look at is number of newly registered accounts globally locked for being a spam bot. This would be a very direct measure of success in the short term. As you said somewhere else (not sure where), it would be difficult to get meaningful metrics on how readable captchas are unless we had a known pool of real humans, so i dont know about that side of it.

@Bawolff Does the metrics at T125132#3339987 help you in any way (note: it needs to be adjusted to pull the last months)? Those showed the number of "spam-only account: spambot" locks on global accounts. Many of them are pretty new, although it is not strange that a spambot registers and is dormant for some time before they try to spam and get caught by SpamBlacklist/AbuseFilter.

Ping. Status please?

Same as before

Looks like T186244: Deploy AICaptcha data collection is getting some movement though

@Reedy What about a deployment.wikimedia beta cluster test? That wiki is only getting spambot registration. We could test there if the new fancy captcha is of any help?

Only if we have a way of measuring it.. Otherwise it's just guessing.

Also, the word lists for beta are much more limited... So is it a fair test?

Do we know if captchas are even being regenerated on beta? Is the cronjob deployed to do it? I'm guessing by there being 949... Probably not?

There's no deployment-terbium.. And deployment-tin's www-data user crontab doesn't have anything for regenerating captchas...

So presumably resolving that should be a pre-requisite?

I run maintenance scripts on deployment-tin absent a better place...

The AICaptcha data might help differentiating between human and bot captcha failures, although right now the data does not include captcha success status (but that could be improved). It's not working reliably in beta though (EventLogging seems to be flaky there), plus I doubt that you get many human registrations on deploymentwiki, or even the whole of beta.

So this has been stuck for a while. I decided to look into what else we can do with hopefully less contention.

I came up with https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ConfirmEdit/+/446489 . In my testing it has similar resistance to tesseract as Tim's (In my test, Tim's seemed to be about 6% where this was 6.5%), but I think it will be lot less controversial.

Example images can be found at https://tools.wmflabs.org/bawolff/captcha/setG/

https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:RecentChanges should be a good place where to start testing this new improved captcha system. Absolutely all accounts you see in there are spambots (you can tell by the pattern). I don't think there would be any issues if we deploy those to deploymentwiki only and check how good are they (metrics, etc?).

Yes, but:

check how good are they

We can only test the effectiveness against generic MediaWiki spambots, e.g. whether a) it's true that such spambots use tesseract or other OCR and b) such new fancycaptchas would make life harder for those. You can't really test how well they'd work in reality, because real spammers will presumably just solve all our captchas however hard they are (in the recent flood T212667, nearly 100 % captchas were solved correctly according to authentication metrics).

Reedy changed the task status from Open to Stalled.Nov 1 2019, 11:35 AM
Reedy triaged this task as Medium priority.
Tgr added a parent task: Restricted Task.Jan 8 2020, 8:38 AM

@Reedy: Hmm, what exactly is this task stalled on?

Lack of any consensus. Lack of decent enough metrics to even deploy the change and see what happens. It's been open 4 years at this point, and no clear path of moving forward for deploying any of the improvements proposed.

It'll probably get declined at some point

Mainly lack of metrics; lack of consensus is a consequence. So the path forward would be T255208: Catalog and evaluate methods of analysis for Wikimedia captcha performance or something similar - and eventually, some reporting mechanism that gives some reasonably reliable numbers on how well the captcha performs against humans vs. against bots. It's a few weeks of work, IMO, it just has been hotpotato'd around between various departments.

Aklapper changed the task status from Stalled to Open.Nov 3 2020, 9:56 AM
Aklapper lowered the priority of this task from Medium to Lowest.

Lack of any consensus. Lack of decent enough metrics to even deploy the change and see what happens. It's been open 4 years at this point, and no clear path of moving forward for deploying any of the improvements proposed.

That doesn't make it stalled by definition but low priority :)

I might be stating the obvious but a change like this shouldn't take 10 years to be deployed (with the last comment being more than three years ago). Is there anything I can do to get this off the ground? Should I just make the patches and get it deployed and check some (*waves at the air*) metrics? Should I just get T255208 done first? Anyone willing to help?

I might be stating the obvious but a change like this shouldn't take 10 years to be deployed (with the last comment being more than three years ago). Is there anything I can do to get this off the ground? Should I just make the patches and get it deployed and check some (*waves at the air*) metrics? Should I just get T255208 done first? Anyone willing to help?

Given the amount of time that has passed and the lack of conclusive data on how this might improve FancyCaptcha's efficacy while not further hindering accessibility, I'm not sure a simple code-cleanup and deploy would be the best option here. We definitely wouldn't want to introduce a worse captcha experience for project users at this point.

AIUI the fundamental blocker here is not being able to differentate between increased captcha failure rate for bots vs. increased captcha failure rate for humans. So we'd need captcha success rate stats (which we sort of have, but not very good ones) plus some sort of bot detection.

Also IMO the patch proposed here is a non-starter because it makes the captcha a lot harder to read, while the security gains would be limited at best. @Bawolff had some ideas for captchas which at least at first glance don't look harder for a human (see comments starting at T125132#4432800).

I feel like we're over emphasizing perfect stats here. Its not like the original captcha had extensive testing. Surely we can test how readable any given new captcha proposal is, by asking for a bunch of volunteers from the community to solve some captchas and see how they do.