Feature summary:
- Make BETA cluster beta.wmflabs.org available if apparently a human being is trying to access any page.
- Use an HTTP cookie not obvious to external bots.
Use case(s):
- Anybody attempting to even read, not only editing any page in BETA cluster, might be blocked now to defend against AI crawlers.
- Bots, most probably sent to feed AI LLM, are accessing pages in tremendous frequency, some 100,000 attempts per day and bot.
- Distinguishing bots and humans quick and easy is needed, otherwise BETA cluster cannot be used any longer.
Benefits:
- All people developing things via BETA testing.
- Currently most are excluded.
- Some days ago I learnt that Vodafone devices are blocked. There are 400 million contracts around the planet, this is ridiculous.
- AI crawlers are bounced back by many websites due to their intrusive behaviour. They are using botnets now, hiding at regular end user accounts with compromised devices.
- See T420833 and T226688 etc.
- Naturally, this approach is not bullet-proof.
- However, crawlers are trying to read from all websites around the planet. The implementation is universal, not WMF specific.
- On BETA there are a few meaningless test pages, used for Lua, JavaScript, CSS or advanced template development. No large site, no interesting content. No human crawler developer is supposed to provide such cookies for this rare particular domain.
Last remedy:
- The common approaches began to get useless. IP blocking is causing more damage than identifying bad guys.
- IP ranges are used from regular general ISP.
- User agents are pretending contemporary human Firefox, Edge, Android etc.
- No other way left to distinguish good and bad attempts.
- Even it works for a couple of months, then the first bot showing up using a cookie might be blocked again by changing cookie name or changing the value into two or 42 or whatever.
- Since no other way to defend is known, this approach does not take much implementation and should be tried.
Cookie details:
name: GoodFriend value: 1 domain: beta.wmflabs.org path: / expires: Thu, 31 Dec 2026 23:59:59 GMT
- On first stage just the existence of such cookie may be tested.
- If compromised, the value might change and advertised in the backyard.
- Users might enter manually into browser forms.
- Alternatively, they could inject JavaScript into page console.
document.cookie = "GoodFriend=1; domain=beta.wmflabs.org; path=/; expires=Thu, 31 Dec 2026 23:59:59 GMT";
- Note that mw. object is not available; therefore neither mw.loader.using() nor mediawiki.cookie can be used.
- Expiry date is on user decision.
Adevertising:
- On the blocking message page wikitech:Beta/Blocked is mentioned. No hint shall be given there, in case, AI or human developers are detecting and following that URL.
- mw: or WP:BETA might communicate technical details.
- enWP technical village pump or similar at Commons might tell a tiny hint to developers recommending to look at this section.
- User talk pages might receive a note, via mass mail, based upon known BETA development accounts or bd808 requests.
- If no human developer here tells the unknown bot maintainers with no feedback pages, it is unlikely that they learn to provide that particular cookie.