Page MenuHomePhabricator

Automatically detect spambot registration using machine learning (like invisible reCAPTCHA)
Open, Needs TriagePublic

Assigned To
None
Authored By
Tgr
Feb 23 2017, 9:11 PM
Referenced Files
None
Tokens
"Barnstar" token, awarded by Volker_E."Barnstar" token, awarded by ToBeFree."Love" token, awarded by srishakatux."Like" token, awarded by Kaartic."Orange Medal" token, awarded by Krinkle."Yellow Medal" token, awarded by Awesome_Aasim."Love" token, awarded by MarcoAurelio.

Description

Wikimedia's captchas are fundamentally broken: they keep users away but allow robots in. While they can filter out the most stupid spambots, they are easily breakable with off-the-shelf tools. (T141490) At the same time, they take significant effort and often multiple tries for a human to solve (research), and are especially bad for people with visual impairments (T6845) and those who don't speak English or don't even use Latin script (T7309). Our captcha stats (T152219) show a failure rate of around 30% (and that does not count users who don't even submit the form; there is about one captcha submission per hundred captcha displays, but we don't know to what extent that's crawlers/spambots).

AI could help to build something like reCAPTCHA (that does not violate our privacy policy): a two-tier system where users are given a trivial test (click the button - could even be integrated into clicking the usual button), the system collects as much information (timing, mouse movements, browser details etc) as possible and makes a judgement; suspicious users are given a harder test (which could just be a regular captcha, but if we can generate questions based on image recognition or other hard-for-robots-easy-for-humans tasks, even better). Maybe even make the first test invisible, like Google does with invisible reCAPTCHA (where the easy test is basically just clicking the registration button).

See also

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 383714 had a related patch set uploaded (by Sam0410; owner: Sam0410):
[mediawiki/core@master] [DO NOT MERGE] Outreachy Task T158909

https://gerrit.wikimedia.org/r/383714

@tkasarla Hello! and thank for your interest in participating in Outreachy. Remember, that you have less than two weeks before the application deadline and you may not have enough time to go through all the application steps https://www.mediawiki.org/wiki/Outreachy/Participants#Application_process_steps. If you want to be ambitious and give it a shot, great! If not, and you are interested in contributing to Wikimedia projects, please see https://www.mediawiki.org/wiki/New_Developers. Also, not having open source project contribution earlier does not matter :)

Hi all!

If you are already working on a microtask/application: the deadline is October 23, and you have to finalize your application by then, and finish at least one microtask, but you can continue working on the other microtasks afterwards (probably for one week or so). Keep that in mind when prioritizing what to work on.

If you haven't started yet and are still in the process of looking for a project: this one is a bit crowded already. In the list of projects some have the comment "Needs More Applicants"; you'll probably have an easier time with those.

Tgr updated the task description. (Show Details)

Hi,
I am Vinitha. I heard about the Outreachy program recently. I am excited to get involved with the MediaWiki project(spambot registration detection). I have started working on microtasks and will post updates soon :)

@Groovier Hello and thanks for your interest! Check @Tgr's comment

If you haven't started yet and are still in the process of looking for a project: this one is a bit crowded already. In the list of projects some have the comment "Needs More Applicants"; you'll probably have an easier time with those.

If you would like to contribute to Wikimedia projects, check our New Developers guide https://www.mediawiki.org/wiki/New_Developers.

Change 384753 had a related patch set uploaded (by Groovier1; owner: Groovier1):
[mediawiki/core@master] T158909 Automatically detect spambot registration using machine learning: Tracking mouse click position on the create your account button

https://gerrit.wikimedia.org/r/384753

Tgr added a subscriber: SAM0410.

Great job getting started on fixing this!!!

Hi @Tgr ,

Thanks for the comments. I have made the changes and raised a fresh code review. Kindly review .

@UpsandDowns1234 Thank you :)

@Tgr I have incorporated the changes in the review.. The lint tests are failing on my current code. I will push the corrected one immediately. Sorry for the trouble.

Hi @srishakatux, @Tgr

I have taken quite some time to set up vagrant and solve the proxy issues and hence the delay in doing the tasks. Now since I have set up the basic requirements and have loved this work, I am keen to continue working on this. I have worked on the first task and I understand that I have to work on more tasks to gain more insight about this project.

  1. I think my next step would be to focus on solving other tasks and parallely read and research more about the issue and solutions. Is there anything else I should be taking care of?
  1. I have filled in the application form (only eligibility part) so that I can confirm my eligibility. I am not sure how to complete the detailed questions in the application form as of now. Is it ok for me to fill that eventually?

Thank you

Tgr updated the task description. (Show Details)
Tgr added a subscriber: SAM0410.
Tgr removed a subscriber: SAM0410.

Hi all, reminder that you have to finish your application / project proposal, and publish the non-eligibility-related part of it as a Phabricator task, until the application deadline (Oct 23). See application step #9.

We'll mostly look at the Phabricator version of your proposals (the outreachy.org forms are a pain to read, have no rich text or change tracking) so make sure everything you consider important is present there. (Except for the eligibility-related information; that's only needed in the outreachy.org form.)

You can work on microtasks a little longer if you want (I'd guess until end of October but I don't know the exact date).

@Groovier see above, you should focus on the application for now.

Hello @Tgr , I just had a quick question!
Should the application proposal as Phabricator task be like application template mentioned in application step #9 or the Outreachy proposal application template non-eligibility part?
Because the questions are contrasting in some aspects.

Hi all, reminder that you have to finish your application / project proposal, and publish the non-eligibility-related part of it as a Phabricator task, until the application deadline (Oct 23). See application step #9.

We'll mostly look at the Phabricator version of your proposals (the outreachy.org forms are a pain to read, have no rich text or change tracking) so make sure everything you consider important is present there. (Except for the eligibility-related information; that's only needed in the outreachy.org form.)

You can work on microtasks a little longer if you want (I'd guess until end of October but I don't know the exact date).

@Nehagup when in doubt go with the Wikimedia application form, it was made for specifically this purpose. The questions there seem to me pretty similar to the ones in the Outreachy form, though.

Thanks all for contributing! The selection process has ended; the results will be published on Nov 9. If you would like to continue working on any open tasks, or contribute code in some other way, you are welcome to do so and I will provide code review if time permits, but it will not influence the selection.

If you *don't* want to finish a gerrit patch, please use the Abandon button so it's not marked as needing review anymore.

Thank you so much @Tgr and @awight. This means a lot to me.. :) Looking forward to working closely with you all.

Change 384753 abandoned by Gergő Tisza:
[DO NOT MERGE] Outreachy micotask T158909

Reason:
Abandoning all Outreachy microtask related changesets; the application period is over. For contributing outside Outreachy, see https://www.mediawiki.org/wiki/New_Developers and https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker .

https://gerrit.wikimedia.org/r/384753

Archiving information related to the Outreachy application period into a comment.

Outreachy information

Skills needed: basic PHP/JS (for collecting data / integrating with the machine learning system), Python, machine learning
Mentors: @Tgr, @awight
Microtasks:

Please see these portals for more about how to apply to work on a MediaWiki project through Outreachy:
https://www.mediawiki.org/wiki/Outreachy/Round_15
https://www.mediawiki.org/wiki/Outreachy/Participants

usereligibilitytask1task2task3task4CI whitelistproposal
@Groovierlink c384753 c385845 github in progressc387080 in porgressaddedT178463
@Kamsuri5link c377044 in progressgithub in progressT178814
@Nehaguplink c379990 c381787 github in progressc382974 in progressaddedT178565
@Sagorika1996link github in progress
SAM0410link c382842 in progressc383714 in progressgithub in progressin progress
@Smaritalink c380466 c383765 gitub in progressc383299 in progressaddedT178697
@Sofmonklink c382155 in progressc382717 in progressgithub in progress
@Veenasankarc377031 in progress

Congrats @Groovier on being accepted and thanks everyone else for participating!

Change 383714 abandoned by Gergő Tisza:
[DO NOT MERGE] Outreachy Task T158909

https://gerrit.wikimedia.org/r/383714

Mentioned in SAL (#wikimedia-cloud) [2018-01-11T17:50:28Z] <tgr> added Groovier1 to project members for T158909

The Outreachy project has concluded (a presentation of the results is available); the original goal was not doable in three month (as it turns out most spambots do not even try to emulate the keyboard / mouse, and the remaining ones are too few to produce enough data in a couple weeks) but we learned a couple useful things about spambots and have some longer-term plans on how to address them. This task will live on as a volunteer project. Thanks Vinitha for taking it so far!

Good to hear the results. Congratulations Vinitha on the research done.

@Tgr I'm guessing this task should not still live under Outreach-Programs-Projects? I am boldly removing the tag as we are cleaning up this workboard and planning on killing Possible-Tech-Projects.

Crossposting from T183869#6029899 :
It looks like the AICaptcha Phabricator project tag was created for T158909 as an Outreachy Round-15 project.
There are a bunch of open tasks on the AICaptcha workboard , some "Done", some "Doing". That board and project looks neglected.
What should happen with these tasks?
If some tasks should remain open, which codebase are these tasks about if they have no other project tag assigned?
And should the artificial-intelligence tag be added in addition?
Thanks.

Here is a CAPTCHA idea that we could maybe implement on Wikimedia (for visual Captcha).

I have seen this on sites like TikTok and Genshin Impact.

Take an image from Wikimedia Commons and take a puzzle piece out. Then we have to drag the cut out puzzle piece with a slider and slide the puzzle piece into the slot. Something like this:

https://www.jqueryscript.net/images/image-puzzle-slider-captcha.jpg

This is something we could implement potentially with a small margin of error.

@Awesome_Aasim: How is that related to automatically detecting spambot registration by using machine learning?