Wikimedia's captchas are fundamentally broken: they keep users away but allow robots in. While they can filter out the most stupid spambots, they are easily breakable with off-the-shelf tools. (T141490) At the same time, they take significant effort and often multiple tries for a human to solve ([[https://meta.wikimedia.org/wiki/Research:Account_creation_UX/CAPTCHA|research]]), and are especially bad for people with visual impairments (T6845) and those who don't speak English or don't even use Latin script (T7309). Our captcha stats (T152219) show a failure rate of around 30% (and that does not count users who don't even submit the form; there is about one captcha submission per hundred captcha displays, but we don't know to what extent that's crawlers/spambots).
AI could help to build something like [[https://www.google.com/recaptcha/intro/|reCAPTCHA]] (that does not violate our privacy policy): a two-tier system where users are given a trivial test (click the button - could even be integrated into clicking the usual button), the system collects as much information (timing, mouse movements, browser details etc) as possible and makes a judgement; suspicious users are given a harder test (which could just be a regular captcha, but if we can generate questions based on image recognition or other hard-for-robots-easy-for-humans tasks, even better). Maybe even make the first test invisible, like Google does with [[https://developers.google.com/recaptcha/docs/invisible|invisible reCAPTCHA]] (where the easy test is basically just clicking the registration button).
== Outreachy information
Skills needed: basic PHP/JS (for collecting data / integrating with the machine learning system), Python, machine learning
Mentors: @Tgr, @awight
Microtasks:
* {T175330}
* {T175331}
* {T177033}
* {T177034}
Please see these portals for more about how to apply to work on a MediaWiki project through Outreachy:
https://www.mediawiki.org/wiki/Outreachy/Round_15
https://www.mediawiki.org/wiki/Outreachy/Participants
|user|eligibility|task1|task2|task3|task4|CI whitelist|
| @Kamsuri5 | [[https://outreachy.gnome.org/?q=view_projects&prg=9&p=1847|link]] in progress | [[https://gerrit.wikimedia.org/r/#/c/377044/|c377044]] in progress | | [[https://github.com/kamsuri/Mouse_data_extraction-Task-T177033-|github]] in progress | | |
| @Nehagup | | [[https://gerrit.wikimedia.org/r/#/c/379990/|c379990]] {icon check-circle color=green} | [[https://gerrit.wikimedia.org/r/#/c/381787|c381787]] in progress | [[https://github.com/nehagup/wikimedia_microtask3/blob/master/features_preprocessing.ipynb|github]] in progress | [[https://gerrit.wikimedia.org/r/#/c/382974/|c382974]] in progress | added |
| @Sagorika1996 | [[https://outreachy.gnome.org/?q=view_projects&prg=9&p=1844|link]] in progress | | | [[https://github.com/sagorika1996/SpamBot-Project|github]] in progress | | |
| @SAM0410 | | [[https://gerrit.wikimedia.org/r/#/c/382842/|c382842]] in progress | | [[https://github.com/sam0410/ML-on-mouse-movements|github]] in progress | |
| @Smarita | | [[https://gerrit.wikimedia.org/r/#/c/380466/|c380466]] {icon check-circle color=green} | | | | added |
| @Sofmonk | | [[https://gerrit.wikimedia.org/r/#/c/382155/|c382155]] in progress | [[https://gerrit.wikimedia.org/r/#/c/382717/|c382717]] in progress | [[https://github.com/sofmonk/outreachy_2017/blob/master/microtask.ipynb|github]] in progress | | |
| @Veenasankar | | [[https://gerrit.wikimedia.org/r/#/c/377031/|c377031]] in progress | | | | |