Page MenuHomePhabricator

Fight incoming spam in Wikimini (Jan 2025)
Open, HighPublic2 Estimated Story Points

Description

Reported by Laurent today. Wikimini is flooded by spammers, more than usual.

https://fr.wikimini.org/wiki/Special:RecentChanges

Maybe better to (re?) activate a CAPTCHA or something.

Available persons:

@valerio.bozzolanI can maybe invest 2 volunteer hours before Thursday, but not today
@ValerioBoz-WMCHI can work on this in working time, this Thursday and Friday
WikiValley?
......

Event Timeline

I've applied the change in this file:

WikiminiSettings/ExtensionSettings/ExtensionsFrench.php

Toggling this that was false:

$wgCaptchaTriggers['createaccount'] = true;

I've also executed the "spam obliviater" (T308969), without finding matches. Probably the community already cleared up things

OK, that user registered 2025-01-15, so, before the captcha. Please block that user on-wiki if possible

OK, I've found more than 1 user registered after the captcha.

It's probably a good idea to set this for a while:

$wgEmailConfirmToEdit = true;

https://www.mediawiki.org/wiki/Manual:$wgEmailConfirmToEdit

Green light from @Lorangeo?

Yes, but only as a very temporary solution, as it's dificult to require email confirmation from primary school students.

OK. Lorangeo, maybe we can assume that European FR schools are not opened before 7 AM CET, and they are not opened after 19 PM, so we can activate $wgEmailConfirmToEdit accordingly to this time interval to minimize disruption of most school activities in our region. Is this something completely nonsense or maybe useful?

Also, if we do the first thing, we can raise a bit the permissions needed to write during late night. With late night I mean after CET 1:00 AM and before 06:00 AM, imposing the "autopatrolled" permission for that period. Again, this may be totally nonsense (since French is spoken worldwide and not only in France), or maybe it could be useful to mitigate a bit disruption in the most active areas.

Feel free to say "Uh, OK! Let's try" or "Naaah".

I propose that since it would be crazy simple to implement and maybe effective, until better ideas are adopted.

P.S. there is a bug in the current MediaWiki or in the current theme: if we activate $wgEmailConfirmToEdit then the homepage for some reasons shows the error "Please verify your email", with a link to the preferences, but that cannot be seen, since it shows the same confusing error message. Gulp.

Hi all!

A mystery first

On last Saturday (or around), I edited ExtensionsFrench.php in order to completely enable CAPTCHAs on account creation. Beforehand, the CAPTCHA was only enabled on users who tried to create an account AND matched a specific condition. After this change, I really think the CAPTCHA was always on for account creation, I even tested it.
Thus, I'm surprised Valerio actually had to enable CAPTCHAs for sign-up, since I thought I saw it was already on (even if ineffective) o_O. Perhaps I was sleepy or something, thanks for clearing things up in all cases.

Attempted solution

Anyway, now it's enabled, but it did not change anything. The CAPTCHA is a QuestyCaptcha, so with a custom set of question and responses. It apparently tends to be strong against bots (I imagine especially against those with only default captcha-bypass techniques), but once cracked it's gone, which probably happened.

So, as a test, I've put up a new set of questions: we will see if it slows down the vague, or if these bots either automatically pass them or the attackers are determined enough to update their answer base. @Lorangeo, as suggested by the documentation, I've avoided giving direct hints of the answer in the question itself. You can see the questions I've put on by trying to create an account (while being logged out), or directly in the server configuration. I tried to make the questions answerable for children. Please feel free to tell me what you think!

Ideas

If QuestyCaptcha isn't enough, couple ideas:

  • Be sure there aren't any other ways to create account without having to bother with the CAPTCHA.
  • Rely on things like TitleBlacklist and AbuseFilter.
  • Make CAPTCHAs more aggressive (with more questions or other types of CAPTCHA perhaps) but also more selective to avoid annoying real users: use things like wgCaptchaTriggersOnNamespace and wgCaptchaRegexes, enable on all edits that contain a link… see https://www.mediawiki.org/wiki/Extension:ConfirmEdit#Configuration. Also, perhaps rate-limiting against CAPTCHAs failure might be a good idea (need to enable cache).
  • As for forcing email confirmation, perhaps do it only if the account is suspicious (same idea as above), but I don't have any clue how to this at the moment. Using time-range will be useful in order to sleep at night, but please note that bots come all day long.
  • Lorangeo used in the past something called IPQualityScore. It is currently off, perhaps enable it again if it's a good service?
  • Manually reviewing edits before they go live.

Notice

@ValerioBoz-WMCH

imposing the "autopatrolled" permission for that period.

Since no one has the autopatrol permission despite sysops this will fully lock the wiki down. This may be what we want, since no one is around at night, but perhaps you meant autoconfirmed? Otherwise, reviewing the permission scheme can also work.

OK. Lorangeo, maybe we can assume that European FR schools are not opened before 7 AM CET, and they are not opened after 19 PM, so we can activate $wgEmailConfirmToEdit accordingly to this time interval to minimize disruption of most school activities in our region. Is this something completely nonsense or maybe useful? [...]

It sounds like a good idea, but we have quite a few schools participating outside of France (Quebec, Africa, private schools in various countries, etc.). The need to confirm their email would be really complicated for the students, especially the younger ones. But if there’s no other solution, why not… or at least temporarily?

P.S. there is a bug in the current MediaWiki or in the current theme: if we activate $wgEmailConfirmToEdit then the homepage for some reasons shows the error "Please verify your email", with a link to the preferences, but that cannot be seen, since it shows the same confusing error message. Gulp.

With the Wikimini theme, we intentionally hid certain sections of the 'Preferences' menu to make it less complicated. There were certainly better ways to do it.

[...]

@Lorangeo, as suggested by the documentation, I've avoided giving direct hints of the answer in the question itself. You can see the questions I've put on by trying to create an account (while being logged out), or directly in the server configuration. I tried to make the questions answerable for children. Please feel free to tell me what you think!

I still can’t connect to the server, but I tried creating an account on the front-end to see the new captcha questions. They seem a bit difficult to me, not only to solve but also to 'read' for younger children. There is also now too much text on this page (message addressed to spambots, etc.). The previous questions were simpler thanks to the hint, and perhaps also a bit more 'funny' to solve. I could come up with some new ones if needed. And for the hint, maybe we could use a special font to trick the bots. For example, '𝓟*𝓷𝓭𝓪' instead of 'P*nda'.

But, are you sure these bots are actually solving these captchas? @Raphoraph created new questions and it had absolutely NO EFFECT. A large number of new spambots accounts have been created and continue to be created!

If QuestyCaptcha isn't enough, couple ideas:

  • Be sure there aren't any other ways to create account without having to bother with the CAPTCHA.

Maybe we could check this by creating an impossible captcha to solve. For example: 2 + 2 (and put 'sun' as the answer). We'll see if the bots solve it or not. Intuitively, I have the impression that they are using another means.

  • Rely on things like TitleBlacklist and AbuseFilter.
  • Make CAPTCHAs more aggressive (with more questions or other types of CAPTCHA perhaps) but also more selective to avoid annoying real users: use things like wgCaptchaTriggersOnNamespace and wgCaptchaRegexes, enable on all edits that contain a link… see https://www.mediawiki.org/wiki/Extension:ConfirmEdit#Configuration. Also, perhaps rate-limiting against CAPTCHAs failure might be a good idea (need to enable cache).

Why not, however, since registration is required to to edit on Wikimini, I think it's better to focus on protecting the registration page.

  • As for forcing email confirmation, perhaps do it only if the account is suspicious (same idea as above), but I don't have any clue how to this at the moment. Using time-range will be useful in order to sleep at night, but please note that bots come all day long.
  • Lorangeo used in the past something called IPQualityScore. It is currently off, perhaps enable it again if it's a good service?

Yes, that's true, I had forgotten about this. Another option would be to use Cloudflare and set some restrictive rules on this page.

I still haven’t been able to connect to the server. But if you find any way to stop these new registrations, please do so. They are very numerous! However, I think it’s worth checking if these bots are really solving our captchas. I don’t believe they are. And it has proven to be a very good system so far.

Maybe they are bypassing the captcha by using the Wikimini API directly, we should probably force registrations throught the registration page only

Maybe with htaccess as a very quick fix?

RewriteEngine On

# Bloquer les requêtes API avec action=createaccount
RewriteCond %{REQUEST_URI} ^/w/api\.php$
RewriteCond %{QUERY_STRING} (^|&)action=createaccount(&|$)
RewriteRule ^ - [F,L]

Would that work? (no access to the server to test it)

I've tested wikimini's API and (un)fortunately, it is properly
protected. API returns an error if the CAPTCHA was not properly
answered, and moreover fails to even properly gives the CAPTCHA
question, making it virtually impossible to answer.

Maybe they are bypassing the captcha by using the Wikimini API
directly, we should probably force registrations throught the
registration page only

For the record:

Maybe with htaccess as a very quick fix?

A quick permission fix to disable write-acess to API for unregistered users (thus blocking account creation) would also work. Granted, it's less radical than diret 403 Forbidden^^

$wgGroupPermissions['*']['writeapi'] = false;
$wgGroupPermissions['user']['writeapi'] = true;

I have reactivated Cloudflare on Wikimini.org and added a strict security rule on the account creation page. I hope this will be effective. Perhaps we could even consider restoring the previous question system (which was simpler and only appeared under certain conditions).

Hum... I get some "unknown error" when publishing new content and the Special:RecentChanges doesn't seem to show the latest changes anymore, even with CloudFlare caching rules disabled (dev mode). Can someone confirm?

I've set-up a logging system and unfortunately, it seems so. Bots (or
human farm) actually give the good answer.

However, it also seems that while captcha with math question completely
get rolled on, the questions now in place
are less often cracked. But it might just be an impression.

Le 20/01/2025 à 19:35, Lorangeo a écrit :

But, *are you sure these bots are actually solving these captchas?*

I've set-up a logging system and unfortunately, it seems so. Bots (or human farm) actually give the good answer.

That's actually a good news! So bots are not using any security holes.

Tomorrow I can work on an improvement of QuestyCaptcha that will:

  • allow to prepare some future questions
  • allow to archive "old" questions
  • log something very very scarying if somebody answered an "old" question
    • if this happens twice, fail2ban bans the IP for some minutes
    • if this happens multiple times, fail2ban bans for even more
  • log something if somebody answers a current question, but invalid
    • if this happens twice, fail2ban bans the IP for few minutes
    • if this happens multiple times, fail2ban bans the IP for more minutes

Still to be decided but that's actually the "minimum code, maximum impact" thing that I can think about

Fun fact, we can use https://duck.ai with this prompt:

Generate an array of questions and answers for children, in french, in an associative array $wgCaptchaQuestions. The key is the question, the value is the answer. Questions should involve simple answers without accents.

Result

$wgCaptchaQuestions = [
    'Quelle est la couleur du ciel?' => 'bleu',
    'Combien de pattes a un chien?' => 'quatre',
    'Quel animal dit "meuh"?' => 'vache',
    'Quel fruit est jaune et courbe?' => 'banane',
    'Combien de jours y a-t-il dans une semaine?' => 'sept',
    'Quel est le nom du petit de l\'oiseau?' => 'oiseau',
    'Quel est le contraire de "grand"?' => 'petit',
    'Quel est le premier mois de l\'annee?' => 'janvier',
    'Quel animal est connu pour sa lenteur?' => 'tortue',
    'Quel est le son que fait un chat?' => 'miaou'
];

Not really impressing but saves some time and needs few manual changes

Unfortunately, I had to disable Cloudflare's special proxy features due to some strange issues when editing pages (likely related to SSL or caching), which resulted in unknown errors. I also noticed that changes were no longer being logged on the recent changes page and other similar pages, even with caching disabled. It's a shame because Cloudflare could have been a strong and simple solution to combat spambots and reduce server load.

Just two minutes after disabling the Cloudflare proxy features, the spambots are back :-(

I added some hidden texts (display: none) in the questions to try to deceive the bots, because there were still a lot of new account creations. Let's see what happens...

I read the comments there. A few suggestions/ideas:

  • In the past I recommended to prepare a 5-10 QuestyCaptcha questions related to the wiki, e.g. for Wiki-Rennes "What is the river crossing Rennes from east to west?" or perhaps for Wikimini "What is the name of the country where Wikimini is hosted?"; in the age of AI or possibly underpaid workers I’m no more sure it is sufficiently resistent.
  • An option is to install ConfirmAccount, where users request an account, then an administrator allow or deny the account creation; this may be quite some work for the administrators and it may delay the account creation for the user.
  • Possibly define $wgAccountCreationThrottle = [ [ 'count' => 30, 'seconds' => 30*86400, ] ]; // 30 accounts per IP and per month, it should be enought even if pupils from a same classrooms create an account the same day
  • On Wikimedia wikis, it is restricted to 6 new accounts per IP and per day (ref, search "wgAccountCreationThrottle"), and there is a user group Account creators (with the right 'noratelimit') whose membership is distributed to trusted persons to mass-create accounts during editathons or an exception of the throttling may be granted previously to the event to an IP (or IP range) thanks to the extension ThrottleOverride (I just tested it, it seems it works fine)
  • Else an idea of a still non-existant feature: would it be practical if some secret password is distributed to teachers, and the teacher then distributes it to its pupils, so that the pupil can create freely an account with this password requested during account creation? with a mechanism to limit a given password to e.g. 40 account creations.

Hi, thank you for the new suggestions and ideas. We may keep them in mind for the future in case of new bot registrations. For now, I’m happy to report that the issue has been fully resolved, thanks to the new QuestyCaptcha questions (big thanks to @Raphoraph!) AND the addition of hidXXXden chXXXaractXXXers that mXXXake the quXXXestiXXXons "appear" differently to bots. (Haha, I’m really proud of this clever little trick!)

Hello, it seems that spambots have found their way onto Wikimini SV since last month. Could someone enable the Questy Captcha on Wikimini SV and perhaps automatically delete the spam from the past few weeks?

Thanks @Seb35 for your extra help in this if you can 👍

Hello,

Spambots have now arrived on Wikimini EN. Questy Captcha should be activated there as soon as possible. I must admit that if FTP access were provided, I’d gladly take care of it myself.