Page MenuHomePhabricator

[Timebox: 4hrs] Investigate spam account creation on a Wiki
Closed, ResolvedPublic

Description

When we resolved https://phabricator.wikimedia.org/T322665, we realized there was a lot of spam accounts being created on Wiki's even with a Captcha in place and an up-to-date MW version.
We've seen reports of close to 2k spam accounts created per Wiki per month (see comments). This is big overhead for the Wikibase admins to manage their Wikibase and community with legitimate account requests.

Helpful context around the Captcha: https://www.google.com/recaptcha/admin/site/577576057
Perhaps adding oil to the flame: T301243

Question we want to answer: Are people creating these spam accounts able to go around the Captcha somehow (is the Captcha not working?) or is this done manually and if so, how do we stop it from happening?

AC:

  • Stress test the Captcha
  • Check during a random timeframe (occurs any day) how many legitimate vs. spam accounts were created
  • Define a way to prevent the spam accounts from getting through

Event Timeline

@GreenReaper Hey there, we created this ticket based on a heads up from you that you were receiving a lot of spam account requests (see linked ticket as well). Since there's a Captcha there and now that we're up to date with the latest MW versions again, just wanted to check in if you're still experiencing this issue? Thanks!

It appears so. There are ~50 spam requests in the last day. Many of them are marked as having confirmed email addresses. In total there were 1751 in the last month (showing as open requests), which suggests the rate has not changed recently.

I am not actively running it as a public wiki with expected new users but if I were it would be tricky to review them.

Evelien_WMDE renamed this task from StopForumSpam improvements to cover account creation request spam to [Timebox: 4hrs] Investigate spam account creation on a Wiki.Jun 8 2023, 1:13 PM
Evelien_WMDE updated the task description. (Show Details)

@Evelien_WMDE Should we remove "Stress test the captcha" from the AC now that https://phabricator.wikimedia.org/T335769 exists?

@Fring These are 2 different things: T335769 refers to the sign up on https://www.wikibase.cloud/ as a Wikibase owner, whereas this ticket refers to the sign up on a Wikibase as a contributor. The first one is preventive in the context of the open beta, whereas this ticket is an already known problem

When looking at the reCAPTCHA console there is a warning (unfortunately I will see the German copy no matter what)

Wir haben festgestellt, dass von deiner Website weniger als 50 % der mit reCAPTCHA übergebenen Lösungen überprüft werden. Dies könnte auf ein Problem bei der Integration von reCAPTCHA hinweisen. Weitere Informationen findest du auf unserer Entwicklerwebsite.

Google Translate:

We found that your site verifies less than 50% of the solutions submitted with reCAPTCHA. This could indicate a problem with the integration of reCAPTCHA. Visit our developer website for more information.

which sounds as if there might be a problem with our integration. Has anyone ever seen this before, and if yes investigated the problem somehow? This is the documentation mentioned in the error message https://developers.google.com/recaptcha/docs/verify#api-request

We discussed this in the technical refinement session, starting to talk about a solution. A first step would be to figure out if the Recaptcha is even working, e.g. is it working on Wiki account creation or not. However, even if Recaptcha is being used correctly, spam bots have already broken it, so finding a mitigating solution is a secondary objective; e.g. limiting IPs. A potential other solution is to try another config, e.g. hcaptcha.

reCAPTCHA Enterprise purports to offer more, but I get a sneaking feeling it might also be a way to track users now Google's been told it can't do it through other means, especially if you add it to other pages like they want.

I looked into this, but haven't really found anything obvious:

  • As of now the Captcha injected by ConfirmEdit seems to work as intended. I could get myself blocked by giving bad answers as well as pass by giving correct answers
    • I could not find any obvious loopholes, neither from testing, nor from reading the code in the extension
    • As opposed to the message shown in the ReCaptcha console, the default codepath does validate the response
  • Spam accounts are indeed being created a lot for furry.wikibase.cloud:
    • 562 in June 2023, 201 of them with a confirmed email adress
    • 952 account request in total as of now
    • Those 952 accounts have been created from 790 distinct IPs using Real World user agents on Mac and Windows
  • Up until this January, a lot of Spam accounts have been created successfully and somehow passed the queue. i.e. on furry.wikibase.cloud, there are currently 4714 user accounts

Unfortunately I don't have any brilliant ideas about how to improve the situation other than trying a different Captcha and hope that it confuses the bad sign ups that are scripts for a little while.

Fring removed Fring as the assignee of this task.Jul 17 2023, 1:30 PM
Fring moved this task from Doing to In Review on the Wikibase Cloud (Kanban board Q3 2023) board.

Regarding "why the behaviour stopped", we discovered that Special:CreateAccount was not being disabled which was fixed in January.

Unfortunately my experience is that much spam is created from botnets and proxies, although some server ranges might be blockable. As you say a different captcha may help. It might also be worth reaching out to WMF to see if they have any stuff related to e.g. SpamBlacklist email regexes or DNSBLs.

It might also be worth reaching out to WMF to see if they have any stuff related to e.g. SpamBlacklist email regexes or DNSBLs.

This sounds reasonable to me. From what I understand, ConfirmEdit does work as intended in our setup, it's just not powerful enough.

I agree there is no super obvious path forwards. Looks like we are following the general best practices from https://www.mediawiki.org/wiki/Manual:Combating_spam etc. We could certainly change the captcha to questy (T342143) or hcaptcha(T342142). I've made tickets for these options for us to discuss with @Evelien_WMDE

Tarrow subscribed.
Tarrow claimed this task.