Page MenuHomePhabricator

Add rate limit class for accounts that are in a local bot group on any wiki
Open, Needs TriagePublic

Description

We want to be able to allow higher rate limits for accounts that are in a local bot group on any wiki, even when accessing a different wiki. (@daniel or @JTweed-WMF may be able to add some context here).

In order to do so efficiently, without having to query local groups on every wiki, we need to sync this to a central place. There are two reasonable ways to do it:

a) $wgCentralAuthAutomaticGlobalGroups

(recently added for T376315)

  • Configure here: CommonSettings.php so that adding user to a local bot group anywhere also adds them to the local-bot global group (removing groups and expiry are also handled)
  • In the rate limit class code: WikimediaEventsHooks.php add something like:
			elseif ( in_array( 'local-bot', $centralUser->getGlobalGroups() ) ) {
				$jwtData['rlc'] = 'local-bot';
			}

The local-bot group needs to be created as a normal user group first. It will be visible on pages like Special:GlobalGroupPermissions, Special:Log/gblrights and Special:GlobalUserRights, and it needs a localisable label.

We may need to write and run a maintenance script to initially fill the new group (it seems this was not needed for T376315, as all the groups there were new and initially empty?), afterwards it will be updated automatically.

b) Rely on caching in getLocalGroups()

(recently added for T410878)
Simply add something like:

			elseif ( in_array( 'bot', $centralUser->getLocalGroups() ) ) {
				$jwtData['rlc'] = 'local-bot';
			}

In many ways this is simpler to do, however, the case when the cached data is not available is very pessimistic (worst case requires querying hundreds of wikis and takes several seconds, and it would happen in the middle of generating a JWT token).

Event Timeline

Note you may want to check (any groups with) "bot" right instead of "bot" group - some (but not all) wikis have other related groups for bots such as "botadmin" (which may bw assigned instead of bot) or "flood" (though not intended to be assigned permanently).

Note you may want to check (any groups with) "bot" right instead of "bot" group - some (but not all) wikis have other related groups for bots such as "botadmin" (which may bw assigned instead of bot) or "flood" (though not intended to be assigned permanently).

That's a good point, but may not be possible, because we don't have a concept of "global permissions". Maybe it could be done by mapping a local permission to a global group. But wouldn't that be confusing?

			elseif ( in_array( 'local-bot', $centralUser->getGlobalGroups() ) ) {
				$jwtData['rlc'] = 'local-bot';
			}

I would prefer if we could decouple the names of mediawiki groups from the names of ratelimit classes. Also, hard-coding group names seems like a bad idea in general.

I was thinking we could have a configurable mapping, such as

$wgWikimediaGlobalGroupToRateLimitClass = [
  'local-bot' => 'approved-bot',
  'global-bot' => 'approved-bot',
]

That way, we wouldn't have to duplicate the rate limit config for approved-bot for each group, and again for bots approved using a different mechanism (e.g. WMCS account approval).

Instead of the if/then/else chain, we'd then have a loop loke this:

for ( $mappings as $group => $rlc ) {
  if ( in_array( $group, $globalGroups ) ) {
	$jwtData['rlc'] = $rlc;
  }
}

Agreed with @daniel, I think that abstraction is useful here as keeps complexity with the authorisation logic inside MediaWiki. I don't see us needing a distinction between true "global" bots vs "local" bots in terms of limiting, as the signal we're looking for is that a given user/application has undergone some form of human review and deemed useful by community process.

At some point in the future we might want to approve e.g. specific OAuth apps for higher rate limits, without involving a bot account (which is more about how the account interacts with patrollers than how it interacts with infrastructure). So an extra level of abstraction is definitely the way to go.

CentralUser::getLocalGroups() is probably too slow to be called when refreshing JWT cookies (this happens during login but also on arbitrary pageviews when the cookie is about to expire), so I think option A is much better.

Change #1234537 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaMessages@master] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1234537

Change #1234538 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[operations/mediawiki-config@master] Configure rate limit class for local and global bots

https://gerrit.wikimedia.org/r/1234538

Change #1234539 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaEvents@master] Allow configuring rate limit classes for global groups

https://gerrit.wikimedia.org/r/1234539

Change #1234540 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] Add maintenance script to update automatic global group membership

https://gerrit.wikimedia.org/r/1234540

Deployment plan – please review:

  1. Document what the 'local-bot' group does, somewhere on Meta-Wiki (unless we can just link to this task?)
  2. Deploy the WikimediaMessages patch
  3. Create the 'local-bot' global group using https://meta.wikimedia.org/wiki/Special:GlobalGroupPermissions (grant it 'read' permission only)
  4. Deploy the operations/mediawiki-config patch
  5. Deploy the WikimediaEvents patch – this will start using the configured rate limit class ('approved-bot') instead of the previously hardcoded one ('global-bot'). It's not clear to me whether we're using these for anything already; if we do, we should keep using 'global-bot' even though it's inaccurate and change it later.
  6. Deploy the CentralAuth patch (this is just the maintenance script)
  7. Run the CentralAuth:UpdateAutomaticGlobalGroupMembership maintenance script on each wiki

I think this is the right order to avoid any temporary problems during the deployment.

It's worth confirming with @daniel, but I'm pretty sure that the answer to (5) is that nothing is relying on this and we should change it to approved-bot asap so that we can update the Envoy config to look for it.

It's worth confirming with @daniel, but I'm pretty sure that the answer to (5) is that nothing is relying on this and we should change it to approved-bot asap so that we can update the Envoy config to look for it.

Nothing in the API Gateway or REST Gateway is relying on it. And since the rlc field is very new, and not part of any documented interface, I would be very surprised if anything else relied on it. If it turns out that the change does break something, we can always chang4e the config to emit global-bot again for now.

As discussed today, I sent a note to the Stewards asking for comments: https://meta.wikimedia.org/wiki/Stewards'_noticeboard#Planned_addition_of_a_global_user_group:_'local-bot' and proposed a message in the next Tech News: https://meta.wikimedia.org/wiki/Tech/News/2026/07.

I will create a documentation page at https://meta.wikimedia.org/wiki/Local_bots (this will probably be a redirect to a section on https://meta.wikimedia.org/wiki/Bot, unless someone suggests a better place).

What permissions, exactly, will this group contain? I assume "higher rate limits" means apihighlimits and not noratelimit. If it's just the former, then wouldn't apihighlimits-requestor suffice? Several bots have been granted that group already.

Plus, how often it is for a local bot to query data from another wiki that is not Commons or Wikidata? Principle of least privilege applies here.

Finally, until T397224: Improve automatic assigning of IP viewer global group is fixed, I'm against using the auto-assign mechanism to do anything else. Let bot owners request the group manually at SRP or SRB if they want it.

What permissions, exactly, will this group contain? I assume "higher rate limits" means apihighlimits and not noratelimit. If it's just the former, then wouldn't apihighlimits-requestor suffice? Several bots have been granted that group already.

No permissions at all. The rate limits in question will apply to "low-level" operations such as making web requests to our servers, which are handled outside of MediaWiki's permission system, and which are enforced by our web servers instead of MediaWiki.

These rate limits are aimed at crawlers overusing our resources (see https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/), and the goal is to exempt our good bots from running into these rate limits, not to give them any new permissions. We don't want the bot operators to have to ask for additional permissions before they can continue doing what they're already doing.

(At least one permission has to be assigned in order to create the group, and I was going to use the 'read' permission, which is already granted to all users.)

Plus, how often it is for a local bot to query data from another wiki that is not Commons or Wikidata? Principle of least privilege applies here.

Probably not very often, but the aforementioned web server configuration is shared across all wikis, so we need a central place to control the permissions. A global group seemed like a reliable and transparent way to do that.

Finally, until T397224: Improve automatic assigning of IP viewer global group is fixed, I'm against using the auto-assign mechanism to do anything else. Let bot owners request the group manually at SRP or SRB if they want it.

I read that task and it seems to me that only one of the concerns applies to this new group: it won't be possible to override the automatic assignment. I don't really understand why that would be desirable, can you explain?

For what it's worth, it would be possible to forgo the automatic assignment and have bot operators ask for it manually, but this seemed to us to be an unnecessary nuisance for the operators and the stewards who would have to review such requests.

I read that task and it seems to me that only one of the concerns applies to this new group: it won't be possible to override the automatic assignment. I don't really understand why that would be desirable, can you explain?

Personally, I feel the mechanism hasn't been battle-tested enough to be relied on. I know it's easy to say after the fact, but those problems could have been discovered early on with better tests, especially the no-op change and log summary ones.

Perhaps I'm in the minority. Regardless, here's one !vote against auto-assigning. Neutral regarding creating the global group.

As the operator of a bot affected by this, I'm also confused as to what exactly is being achieved here. Can you give concrete examples of requests that would be impacted if that right isn't there? My feeling is that most bots don't need it.

For context, Wikimedia sites (like pretty much all other websites) are suffering from increased web scraper traffic. Everyone is trying to build their own AI language model these days, and so there are tons of robots roaming the internet trying to collecting all kinds of text for AI training, or for reselling to AI trainers. This is extra problematic for Wikimedia sites which rely very heavily on caching, and since scrapers exhibit different patterns of behavior from humans, their requests don't cache well. The more server resources are consumed by scrapers, the less resources are left for readers and the editor community. This has led to performance degradation, and in extreme cases to outages.

So the WMF is trying to build a system for automated detection and throttling of scraper traffic. You can differentiate between scrapers and browsers based on the characteristics of the web requests, but this doesn't always work for community bots, which use similar low-level tooling to scrapers. So we want to identify requests coming from known bots (defined as accounts with a bot right on at least one wiki) to avoid accidentally identifying them as scrapers and banning/rate limiting their traffic, or unnecessarily alerting sysadmins because of it.

For technical reasons I won't go into here, it's a lot harder to answer "is this user in a given group on *any* wiki?" than "is this user in a given global group?", so we are turning the harder problem to the easier problem by syncing the local "bot" group with a global group created for this purpose. The global group won't be used in the normal way (to assign permissions), it's just a system property that's easier to access in software code.

Personally, I feel the mechanism hasn't been battle-tested enough to be relied on.

You can't battle-test something without sending it into battle. You might want to avoid using a new feature for critical tasks until it gets more testing, but this aspect of scraper detection is (for a while at least) not critical (since there are many other aspects of it that are also still being tested), so that should be fine. If anything, it's good for testing the global group auto-assignment functionality before it gets used for something more sensitive.

@matmarex It's a good time to start drafting for Tech News; how should I word it?

Deployment plan – please review:

  1. Document what the 'local-bot' group does, somewhere on Meta-Wiki (unless we can just link to this task?)
  2. Deploy the WikimediaMessages patch
  3. Create the 'local-bot' global group using https://meta.wikimedia.org/wiki/Special:GlobalGroupPermissions (grant it 'read' permission only)
  4. Deploy the operations/mediawiki-config patch
  5. Deploy the WikimediaEvents patch – this will start using the configured rate limit class ('approved-bot') instead of the previously hardcoded one ('global-bot'). It's not clear to me whether we're using these for anything already; if we do, we should keep using 'global-bot' even though it's inaccurate and change it later.
  6. Deploy the CentralAuth patch (this is just the maintenance script)
  7. Run the CentralAuth:UpdateAutomaticGlobalGroupMembership maintenance script on each wiki

I think this is the right order to avoid any temporary problems during the deployment.

Change #1234537 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1234537

Change #1236811 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaMessages@wmf/1.46.0-wmf.13] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1236811

Change #1236812 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaMessages@wmf/1.46.0-wmf.14] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1236812

Change #1236811 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@wmf/1.46.0-wmf.13] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1236811

Change #1236812 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@wmf/1.46.0-wmf.14] Add messages for 'local-bot' global group

https://gerrit.wikimedia.org/r/1236812

Mentioned in SAL (#wikimedia-operations) [2026-02-04T21:07:49Z] <dancy@deploy2002> Started scap sync-world: Backport for [[gerrit:1236811|Add messages for 'local-bot' global group (T415588)]], [[gerrit:1236812|Add messages for 'local-bot' global group (T415588)]]

Mentioned in SAL (#wikimedia-operations) [2026-02-04T21:34:13Z] <dancy@deploy2002> matmarex, dancy: Backport for [[gerrit:1236811|Add messages for 'local-bot' global group (T415588)]], [[gerrit:1236812|Add messages for 'local-bot' global group (T415588)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-02-04T21:47:49Z] <dancy@deploy2002> Finished scap sync-world: Backport for [[gerrit:1236811|Add messages for 'local-bot' global group (T415588)]], [[gerrit:1236812|Add messages for 'local-bot' global group (T415588)]] (duration: 40m 00s)

Regarding Tech News, I prefer to say it allows high traffic on other wikis than the local bot. The current phrase about not adding permissions is confusing about effects, even it is not about MediaWiki permissions.

I created the group: https://meta.wikimedia.org/wiki/Special:GlobalGroupPermissions/local-bot

And started the documentation page: https://meta.wikimedia.org/wiki/Local_bots (redirects to https://meta.wikimedia.org/wiki/Bot#Local_and_global_bots)

The group is not automatically populated yet, as I would like to resolve bugs T416541 and T416542 first (thanks to @NguoiDungKhongDinhDanh and @Johannnes89 for pointing out these problems). I hope that will be finished next week.

Regarding Tech News, I prefer to say it allows high traffic on other wikis than the local bot. The current phrase about not adding permissions is confusing about effects, even it is not about MediaWiki permissions.

I tried a different phrasing: https://meta.wikimedia.org/w/index.php?diff=30031758 I hope this is better, but feel free to edit the draft further.

Change #1238075 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaCustomizations@master] Allow configuring rate limit classes for global groups

https://gerrit.wikimedia.org/r/1238075

Change #1238075 merged by jenkins-bot:

[mediawiki/extensions/WikimediaCustomizations@master] Allow configuring rate limit classes for global groups

https://gerrit.wikimedia.org/r/1238075

Change #1234539 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Move rate limit classes code to WikimediaCustomizations

https://gerrit.wikimedia.org/r/1234539

Change #1238432 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[operations/mediawiki-config@master] Configure rate limit class for local bots (and local-bot global group)

https://gerrit.wikimedia.org/r/1238432

Updated plan, since things got a bit more complex with the added dependencies, and a migration from WikimediaEvents to WikimediaCustomizations:

  1. (This week) Deploy Configure rate limit class for global bots to avoid changing existing behavior
  2. (Next week) Wait for all of the patches to roll out with the train
  3. Deploy Configure rate limit class for local bots (and local-bot global group)
  4. Run the CentralAuth:UpdateAutomaticGlobalGroupMembership maintenance script on each wiki

Change #1234538 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure rate limit class for global bots

https://gerrit.wikimedia.org/r/1234538

Mentioned in SAL (#wikimedia-operations) [2026-02-11T21:17:27Z] <cjming@deploy2002> Started scap sync-world: Backport for [[gerrit:1234538|Configure rate limit class for global bots (T415588)]], [[gerrit:1235499|Remove the wgGlobalWatchlistWikibaseSite variable values (T415440)]]

Mentioned in SAL (#wikimedia-operations) [2026-02-11T21:19:42Z] <cjming@deploy2002> cjming, matmarex, ikhitron: Backport for [[gerrit:1234538|Configure rate limit class for global bots (T415588)]], [[gerrit:1235499|Remove the wgGlobalWatchlistWikibaseSite variable values (T415440)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-02-11T21:25:11Z] <cjming@deploy2002> Finished scap sync-world: Backport for [[gerrit:1234538|Configure rate limit class for global bots (T415588)]], [[gerrit:1235499|Remove the wgGlobalWatchlistWikibaseSite variable values (T415440)]] (duration: 07m 43s)