Wikimedia's captchas are fundamentally broken: they keep users away but allow robots in. While they can filter out the most stupid spambots, they are easily breakable with off-the-shelf tools. (T141490) At the same time, they take significant effort and often multiple tries for a human to solve (research), and are especially bad for people with visual impairments (T6845) and those who don't speak English or don't even use Latin script (T7309). Our captcha stats (T152219) show a failure rate of around 30% (and that does not count users who don't even submit the form; there is about one captcha submission per hundred captcha displays, but we don't know to what extent that's crawlers/spambots).
AI could help to build something like reCAPTCHA (that does not violate our privacy policy): a two-tier system where users are given a trivial test (click the button - could even be integrated into clicking the usual button), the system collects as much information (timing, mouse movements, browser details etc) as possible and makes a judgement; suspicious users are given a harder test (which could just be a regular captcha, but if we can generate questions based on image recognition or other hard-for-robots-easy-for-humans tasks, even better). Maybe even make the first test invisible, like Google does with invisible reCAPTCHA (where the easy test is basically just clicking the registration button).
See also
- the Outreachy 15 project in AICaptcha where initial work for this task was done: T178463: Automatically detect spambot registration using machine learning like invisible reCAPTCHA (Vinitha V S)
- research page