Automatically detect spambot registration using machine learning (like invisible reCAPTCHA)
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Tgr
	Feb 23 2017, 9:11 PM

Description

Wikimedia's captchas are fundamentally broken: they keep users away but allow robots in. While they can filter out the most stupid spambots, they are easily breakable with off-the-shelf tools. (T141490) At the same time, they take significant effort and often multiple tries for a human to solve (research), and are especially bad for people with visual impairments (T6845) and those who don't speak English or don't even use Latin script (T7309). Our captcha stats (T152219) show a failure rate of around 30% (and that does not count users who don't even submit the form; there is about one captcha submission per hundred captcha displays, but we don't know to what extent that's crawlers/spambots).

AI could help to build something like reCAPTCHA (that does not violate our privacy policy): a two-tier system where users are given a trivial test (click the button - could even be integrated into clicking the usual button), the system collects as much information (timing, mouse movements, browser details etc) as possible and makes a judgement; suspicious users are given a harder test (which could just be a regular captcha, but if we can generate questions based on image recognition or other hard-for-robots-easy-for-humans tasks, even better). Maybe even make the first test invisible, like Google does with invisible reCAPTCHA (where the easy test is basically just clicking the registration button).

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T158909 Automatically detect spambot registration using machine learning (like invisible reCAPTCHA)
		Declined		None	T273572 Implement SliderCaptcha in ConfirmEdit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 383714 had a related patch set uploaded (by Sam0410; owner: Sam0410):
[mediawiki/core@master] [DO NOT MERGE] Outreachy Task T158909

https://gerrit.wikimedia.org/r/383714

gerritbot added a project: Patch-For-Review.Oct 11 2017, 9:47 PM

@tkasarla Hello! and thank for your interest in participating in Outreachy. Remember, that you have less than two weeks before the application deadline and you may not have enough time to go through all the application steps https://www.mediawiki.org/wiki/Outreachy/Participants#Application_process_steps. If you want to be ambitious and give it a shot, great! If not, and you are interested in contributing to Wikimedia projects, please see https://www.mediawiki.org/wiki/New_Developers. Also, not having open source project contribution earlier does not matter :)

Tgr updated the task description. (Show Details)Oct 15 2017, 12:52 AM

Hi all!

If you are already working on a microtask/application: the deadline is October 23, and you have to finalize your application by then, and finish at least one microtask, but you can continue working on the other microtasks afterwards (probably for one week or so). Keep that in mind when prioritizing what to work on.

If you haven't started yet and are still in the process of looking for a project: this one is a bit crowded already. In the list of projects some have the comment "Needs More Applicants"; you'll probably have an easier time with those.

Tgr updated the task description. (Show Details)Oct 16 2017, 3:59 AM

Tgr updated the task description. (Show Details)

SAM0410 unsubscribed.Oct 17 2017, 12:42 PM

Groovier subscribed.Oct 17 2017, 5:22 PM

Hi,
I am Vinitha. I heard about the Outreachy program recently. I am excited to get involved with the MediaWiki project(spambot registration detection). I have started working on microtasks and will post updates soon :)

@Groovier Hello and thanks for your interest! Check @Tgr's comment

In T158909#3685721, @Tgr wrote:

If you haven't started yet and are still in the process of looking for a project: this one is a bit crowded already. In the list of projects some have the comment "Needs More Applicants"; you'll probably have an easier time with those.

If you would like to contribute to Wikimedia projects, check our New Developers guide https://www.mediawiki.org/wiki/New_Developers.

Change 384753 had a related patch set uploaded (by Groovier1; owner: Groovier1):
[mediawiki/core@master] T158909 Automatically detect spambot registration using machine learning: Tracking mouse click position on the create your account button

https://gerrit.wikimedia.org/r/384753

I have raised a code review at https://gerrit.wikimedia.org/r/384753 . Please take a look

Tgr updated the task description. (Show Details)Oct 17 2017, 9:44 PM

Tgr added a subscriber: SAM0410.

Tgr updated the task description. (Show Details)Oct 18 2017, 2:21 AM

Great job getting started on fixing this!!!

Awesome_Aasim awarded a token.Oct 18 2017, 4:17 AM

Hi @Tgr ,

Thanks for the comments. I have made the changes and raised a fresh code review. Kindly review .

@UpsandDowns1234 Thank you :)

@Tgr I have incorporated the changes in the review.. The lint tests are failing on my current code. I will push the corrected one immediately. Sorry for the trouble.

Tgr updated the task description. (Show Details)Oct 18 2017, 6:10 AM

Tgr updated the task description. (Show Details)Oct 18 2017, 9:10 AM

Tgr updated the task description. (Show Details)Oct 18 2017, 9:13 AM

Hi @srishakatux, @Tgr

I have taken quite some time to set up vagrant and solve the proxy issues and hence the delay in doing the tasks. Now since I have set up the basic requirements and have loved this work, I am keen to continue working on this. I have worked on the first task and I understand that I have to work on more tasks to gain more insight about this project.

I think my next step would be to focus on solving other tasks and parallely read and research more about the issue and solutions. Is there anything else I should be taking care of?

I have filled in the application form (only eligibility part) so that I can confirm my eligibility. I am not sure how to complete the detailed questions in the application form as of now. Is it ok for me to fill that eventually?

Thank you

Tgr updated the task description. (Show Details)Oct 18 2017, 8:19 PM

Tgr updated the task description. (Show Details)Oct 18 2017, 8:26 PM

SAM0410 unsubscribed.Oct 18 2017, 9:01 PM

Tgr updated the task description. (Show Details)Oct 19 2017, 2:18 AM

Tgr updated the task description. (Show Details)

Tgr added a subscriber: SAM0410.

Tgr removed a subscriber: SAM0410.

Hi all, reminder that you have to finish your application / project proposal, and publish the non-eligibility-related part of it as a Phabricator task, until the application deadline (Oct 23). See application step #9.

We'll mostly look at the Phabricator version of your proposals (the outreachy.org forms are a pain to read, have no rich text or change tracking) so make sure everything you consider important is present there. (Except for the eligibility-related information; that's only needed in the outreachy.org form.)

You can work on microtasks a little longer if you want (I'd guess until end of October but I don't know the exact date).

@Groovier see above, you should focus on the application for now.

Tgr updated the task description. (Show Details)Oct 19 2017, 6:47 PM

Tgr updated the task description. (Show Details)Oct 21 2017, 7:27 AM

Hello @Tgr , I just had a quick question!
Should the application proposal as Phabricator task be like application template mentioned in application step #9 or the Outreachy proposal application template non-eligibility part?
Because the questions are contrasting in some aspects.

In T158909#3695830, @Tgr wrote:

Hi all, reminder that you have to finish your application / project proposal, and publish the non-eligibility-related part of it as a Phabricator task, until the application deadline (Oct 23). See application step #9.

We'll mostly look at the Phabricator version of your proposals (the outreachy.org forms are a pain to read, have no rich text or change tracking) so make sure everything you consider important is present there. (Except for the eligibility-related information; that's only needed in the outreachy.org form.)

You can work on microtasks a little longer if you want (I'd guess until end of October but I don't know the exact date).

@Nehagup when in doubt go with the Wikimedia application form, it was made for specifically this purpose. The questions there seem to me pretty similar to the ones in the Outreachy form, though.

Tgr updated the task description. (Show Details)Oct 22 2017, 12:02 AM

Tgr updated the task description. (Show Details)Oct 23 2017, 7:27 PM

Tgr updated the task description. (Show Details)Oct 23 2017, 7:41 PM

Tgr updated the task description. (Show Details)Oct 27 2017, 12:43 AM

Tgr updated the task description. (Show Details)Oct 28 2017, 10:03 AM

Tgr updated the task description. (Show Details)Oct 29 2017, 8:05 PM

Tgr moved this task from Next to Pending on the User-Tgr board.Oct 31 2017, 12:01 AM

Thanks all for contributing! The selection process has ended; the results will be published on Nov 9. If you would like to continue working on any open tasks, or contribute code in some other way, you are welcome to do so and I will provide code review if time permits, but it will not influence the selection.

If you *don't* want to finish a gerrit patch, please use the Abandon button so it's not marked as needing review anymore.

Tgr mentioned this in T179635: Allow captchas to be stacked.Nov 3 2017, 5:07 AM

HakanIST subscribed.Nov 6 2017, 3:42 PM

Thank you so much @Tgr and @awight. This means a lot to me.. :) Looking forward to working closely with you all.

Change 384753 abandoned by Gergő Tisza:
[DO NOT MERGE] Outreachy micotask T158909

Reason:
Abandoning all Outreachy microtask related changesets; the application period is over. For contributing outside Outreachy, see https://www.mediawiki.org/wiki/New_Developers and https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker .

https://gerrit.wikimedia.org/r/384753

Archiving information related to the Outreachy application period into a comment.

Outreachy information

Skills needed: basic PHP/JS (for collecting data / integrating with the machine learning system), Python, machine learning
Mentors: @Tgr, @awight
Microtasks:

Please see these portals for more about how to apply to work on a MediaWiki project through Outreachy:
https://www.mediawiki.org/wiki/Outreachy/Round_15
https://www.mediawiki.org/wiki/Outreachy/Participants

user	eligibility	task1	task2	task3	task4	CI whitelist	proposal
@Groovier	link	c384753	c385845	github in progress	c387080 in porgress	added	T178463
@Kamsuri5	link	c377044 in progress		github in progress			T178814
@Nehagup	link	c379990	c381787	github in progress	c382974 in progress	added	T178565
@Sagorika1996	link			github in progress
SAM0410	link	c382842 in progress	c383714 in progress	github in progress		in progress
@Smarita	link	c380466	c383765	gitub in progress	c383299 in progress	added	T178697
@Sofmonk	link	c382155 in progress	c382717 in progress	github in progress
@Veenasankar		c377031 in progress

Congrats @Groovier on being accepted and thanks everyone else for participating!

Tgr updated the task description. (Show Details)Nov 11 2017, 8:27 PM

Change 383714 abandoned by Gergő Tisza:
[DO NOT MERGE] Outreachy Task T158909

https://gerrit.wikimedia.org/r/383714

Tgr mentioned this in AICaptcha.Dec 3 2017, 6:23 AM

Tgr edited projects, added AICaptcha; removed Patch-For-Review.

Tgr mentioned this in T181952: Requesting access to EventLogging data for Vinitha.Dec 8 2017, 9:42 PM

Tgr updated the task description. (Show Details)Jan 1 2018, 9:20 PM

Krinkle awarded a token.Jan 5 2018, 12:16 AM

Krinkle subscribed.

Mentioned in SAL (#wikimedia-cloud) [2018-01-11T17:50:28Z] <tgr> added Groovier1 to project members for T158909

Kaartic awarded a token.Jan 15 2018, 10:06 AM

Harej moved this task from Incoming to Confirmed Extension Requests on the MediaWiki-extension-requests board.Jan 29 2018, 8:50 PM

ToBeFree awarded a token.Mar 20 2018, 3:40 AM

ToBeFree subscribed.

Tgr updated the task description. (Show Details)Apr 1 2018, 2:03 PM

Tgr updated the task description. (Show Details)Apr 1 2018, 2:08 PM

The Outreachy project has concluded (a presentation of the results is available); the original goal was not doable in three month (as it turns out most spambots do not even try to emulate the keyboard / mouse, and the remaining ones are too few to produce enough data in a couple weeks) but we learned a couple useful things about spambots and have some longer-term plans on how to address them. This task will live on as a volunteer project. Thanks Vinitha for taking it so far!

Tgr mentioned this in T178463: Automatically detect spambot registration using machine learning like invisible reCAPTCHA (Vinitha V S).Apr 1 2018, 2:11 PM

Good to hear the results. Congratulations Vinitha on the research done.

srishakatux awarded a token.Apr 2 2018, 8:36 PM

Lofhi subscribed.Apr 3 2018, 2:51 PM

@Tgr I'm guessing this task should not still live under Outreach-Programs-Projects? I am boldly removing the tag as we are cleaning up this workboard and planning on killing Possible-Tech-Projects.

ToBeFree rescinded a token.Jul 7 2018, 4:21 AM

ToBeFree awarded a token.

Capankajsmilyo subscribed.Oct 31 2018, 1:18 PM

Anomie mentioned this in T204615: Generate new Captcha word list for prod.Dec 6 2018, 4:02 PM

Volker_E awarded a token.Feb 21 2019, 9:53 PM

Tgr moved this task from Pending to Backlog on the User-Tgr board.Feb 23 2019, 7:21 AM

MarcoAurelio mentioned this in T230304: Ongoing spambot attack 2019-08-{10,11,.*}.Aug 11 2019, 11:33 PM

AntiCompositeNumber subscribed.Aug 20 2019, 12:14 PM

awight mentioned this in T6845: CAPTCHA doesn't work for people with visual impairments.Oct 21 2019, 10:35 AM

sbassett mentioned this in T241921: Fix Wikimedia captchas.Jan 6 2020, 9:45 PM

Aklapper mentioned this in T183869: Collect AICaptcha data from WikimediaEvents extension.Apr 5 2020, 8:45 AM

Crossposting from T183869#6029899 :
It looks like the AICaptcha Phabricator project tag was created for T158909 as an Outreachy Round-15 project.
There are a bunch of open tasks on the AICaptcha workboard , some "Done", some "Doing". That board and project looks neglected.
What should happen with these tasks?
If some tasks should remain open, which codebase are these tasks about if they have no other project tag assigned?
And should the artificial-intelligence tag be added in addition?
Thanks.

Here is a CAPTCHA idea that we could maybe implement on Wikimedia (for visual Captcha).

I have seen this on sites like TikTok and Genshin Impact.

Take an image from Wikimedia Commons and take a puzzle piece out. Then we have to drag the cut out puzzle piece with a slider and slide the puzzle piece into the slot. Something like this:

https://www.jqueryscript.net/images/image-puzzle-slider-captcha.jpg

This is something we could implement potentially with a small margin of error.

Awesome_Aasim added a subtask: T273572: Implement SliderCaptcha in ConfirmEdit.Feb 2 2021, 1:58 AM