Page MenuHomePhabricator

API rate limits: define tiers for logged-in (browser) users
Closed, ResolvedPublic

Description

We need to define tiers of logged-in (browser) users, so we don't disrupt power user's work while still preventing scrapers from simply creating accounts.

Context:
Power users often use Gadgets and other features that can, for a while, create a large number of requests, when the user performs certain tasks. If enforce a rate limit of e.g. 5000 API requests per hours, that would only affect a small fraction of logged in users (less that 0.1%), in total numbers about 20 for a given hour. However, these users are very active community members performing important tasks, and letting them hit an API rate limit will prevent them from working efficiently and may break the site for them in unpredictable and confusing ways.

Ideas and considerations:

  • temp accounts should have a separate user class and the same limit as anons
  • whether or not the client is a browser should not impact the rate limiting, but is useful for metrics. Do we need to duplicate all classes?
  • New users should be distinguished from established users that have undergone some level of community scrutiny (similar to autoconfirmed status).
  • the daily limit of an established user perhaps doesn't have to be higher than the effective daily limit of a regular user or simple logged-in bot. But their hourly limit needs to be higher.
  • The global edit count (perhaps together with account age and confirmed email) could be a simple and good-enough signal for considering a user "established" (no longer new). This would be the global equivalent of newbie vs autoconfirmed.
  • Do we need a "power user" tier beyond new user and established user? Based on wmfGetGlobalgroups? That's slow, but has the right signal.
  • We could detect if the user is blocked anywhere, but temporary blocks happens quite frequently for operational reasons. Probably not a good signal.

Event Timeline

My current thinking is that we want two tiers:

  • new user: temp account or low global edit count (<1000) or recent account creation (<7 days). Maybe also if there is no email set. Use the same limits as for anon (or slightly higher).
  • confirmed user: everyone else. Use very high hourly limits (100k?) but medium daily limits (240k, 10k per hour on average). The restrictive daily limit should be fine for organic human traffic but deter bots and scrapers.

The new user classification would take precedence over other classifications (WMCS, known client, etc). The confirmed user would be overruled by other classifications.

Ideally, we would allow more complex rules for combining classes - some groups should be applied only if they grant better access, other classes should be applied to restrict access. That's however tricky since the classification code doesn't have access to the actual limits. That could be changed, but would introduce quite a bit of complexity.

New users should be distinguished from established users that have undergone some level of community scrutiny (similar to autoconfirmed status).

extendedconfirmed, rollback, autopatrolled, sysop, … should be a good signal that the user has gone some level of community checks. Also, WMF Annual plan 2025-2026 has done the hard work of defining extended rights, which can be a good starting point.

Do we need a "power user" tier beyond new user and established user? Based on wmfGetGlobalgroups? That's slow, but has the right signal.

At the very least, global-rollback, global-sysop, staff, stewards would benefit from even more extended limits.

Change #1258190 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/WikimediaCustomizations@master] Introduce WMCRateLimitClassConditions

https://gerrit.wikimedia.org/r/1258190

Current plan: introduce a mechanism that assigns the rate limit class based on conditions similar to what we support for $wgAutopromote. As a starting point, support global group membership, global edit count, account age, and email confirmed status. Support for more conditions can be added later.

That would cover global groups. For Users having extended rights on some wiki, we'll have to decide if we want to grant automatic global groups, e.g. "sysop-somewhere".

So, we want to:

  • replace authed-browser with unprivileged-user (browser only fallback).
  • introduce autoconfirmed-user for clients with edit_count >= 1000 and account_age >= 7 days (browser only? that would be tricky)
  • introduce extended-rights-user for clients with global-rollback, global-sysop, staff, stewards global groups (browser or not)

Change #1260724 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/WikimediaCustomizations@master] Set rate limit class based on edit count and account age

https://gerrit.wikimedia.org/r/1260724

Change #1258190 abandoned by Daniel Kinzler:

[mediawiki/extensions/WikimediaCustomizations@master] Introduce WMCRateLimitClassConditions

Reason:

I0c63af11de3aa2ac58a21b461e0ea0b0708c7028

https://gerrit.wikimedia.org/r/1258190

Change #1260774 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[operations/deployment-charts@master] rest-gateway: add values for auth-newuser rate limiting class for feature patch

https://gerrit.wikimedia.org/r/1260774

Change #1260763 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[operations/deployment-charts@master] rest-gateway: Refactor request classification for readability

https://gerrit.wikimedia.org/r/1260763

Updated proposal:

  • authed-user: default for (new) logged-in users, like authed-browser.
  • autoconfirmed-user: for clients with edit_count >= 1000 and account_age >= 7 days
  • highlimits-user: for clients with global-rollback, global-sysop, staff, stewards; also for apihighlimits-requestor - that can be assigned on request, it doesn't provide any other rights.
  • approved-bot: (already exists) for clients in the global-bot and local-bot groups.
  • autoconfirmed-user: for clients with edit_count >= 1000 and account_age >= 7 days

That's really confusing; "autoconfirmed" typically requires 0-10 edits, maybe 50 on a few wikis. Suggest naming it something else, perhaps mediumlimits-user or auto-mediumimits-user.

Also, 1000 is really high. How many scrapers, realistically, are going to create an account and make (say) 50 edits just to continue scraping, when it's far easier just to rotate IPs? I suspect they can be dealt with manually. If I'm wrong, the limits aren't set it stone.

It seems to better to start with an easy requirement and make it more strict iff needed, rather than start with a strict requirement and hope you aren't disrupting the community of good-faith users. That's what we do with edit filters. I assume the goal is to reduce the impact of scrapers, not stop every last one. This isn't "hard" security. If a few slip though, no one's privacy is violated, no database is corrupted, no accounts are compromised, etc.

  • highlimits-user: for clients with global-rollback, global-sysop, staff, stewards; also for apihighlimits-requestor - that can be assigned on request, it doesn't provide any other rights.

This should also apply to people with local apihighlimits rights, e.g. sysop. If a local admin is making "too many" requests your first assumption should be that they have a very good reason. If it turns out, that, no, some badly-designed script is stuck in an infinite loop, that can, again, be dealt with manually (e.g. by fixing the script, or asking someone to).

  • autoconfirmed-user: for clients with edit_count >= 1000 and account_age >= 7 days

That's really confusing; "autoconfirmed" typically requires 0-10 edits, maybe 50 on a few wikis. Suggest naming it something else, perhaps mediumlimits-user or auto-mediumimits-user.

Right, I confused it with the extendedconfirmed limits (500 edits in most places). I was trying to pick something that is familiar and meaningful to users, but you are right that this isn't a great choice.

Also, 1000 is really high. How many scrapers, realistically, are going to create an account and make (say) 50 edits just to continue scraping, when it's far easier just to rotate IPs?

If you only have to do it once, making 50 edits (or 1000) is much easier and cheaper than rotating IP constantly. We are not talking about big tech, we are talking about startups mostly.

1000 is really high for a single wiki, but it's just one edit per wiki. Easy to automate, and still so low you'd likely fly under the radar. So from that perspective, it's too low to ensure human scrutiny...

It seems to better to start with an easy requirement and make it more strict iff needed, rather than start with a strict requirement and hope you aren't disrupting the community of good-faith users. That's what we do with edit filters. I assume the goal is to reduce the impact of scrapers, not stop every last one. This isn't "hard" security. If a few slip though, no one's privacy is violated, no database is corrupted, no accounts are compromised, etc.

Yes, but for edits, there is another level of oversight (regular edit patrolling). We don't have that for API load. We only notice when Things Go Bad, and then we can go and investigate. There is no early warning. That is perhaps something we could try to build, but it's unfortunately not trivial.

I think deciding on the threshold has to be done relative to the limits you get before you reach that threshold. The limits we are currently considering would affect 0.1% of users all users. We are planning to deploy code next week that will allow us to see what percentage of new users (below the threshold) it would affect - my guess is that this will be a lot lower still, probably less than 0.01%, maybe even 0.001%. Basically, we have to decide what percentage of affected new users is acceptable. That number should be very low, but it cannot be 0 (false positives should ideally be 0, but not all positives are false). We can then tweak the two numbers (the limit and the threshold) accordingly.

The elevate (medium) limit you'd get after meeting the threshold is so high that I have not seen it triggered once by a logged-in browser users. Even so, highlimits-user would be allowed to go beyond that.

  • highlimits-user: for clients with global-rollback, global-sysop, staff, stewards; also for apihighlimits-requestor - that can be assigned on request, it doesn't provide any other rights.

This should also apply to people with local apihighlimits rights, e.g. sysop. If a local admin is making "too many" requests your first assumption should be that they have a very good reason. If it turns out, that, no, some badly-designed script is stuck in an infinite loop, that can, again, be dealt with manually (e.g. by fixing the script, or asking someone to).

That would be ideal, but unfortunately rather complicated, both technically and conceptually. There is no efficient way to check whether a user has a right somewhere, the system was not designed for that. Also, some groups only exist on certain wikis, and certain rights ore more or less impactful on some projects, making it hard to interpret them in a global context.

We are trying to strike a balance between effectiveness and simplicity. We have two mechanisms available:

  1. global groups, plus the (somewhat hackish) ability to automatically assign a global group when a user is added to a local group. We do that for the "bot" group which implies the global "local-bot" group. We could do the same for sysop is needed. Other privileged groups are tricky because they are not universal across wikis.
  2. global conditions, global like edit count and account age. We are assuming that in the vast majority of cases, well-trusted users will have more than 1000 edits globally.

While it would be nice to support "has the apihighlimits right somewhere" as a criterion, the complexity of making that happen doesn't seem justified. The metrics tell us that it would just be super rare for someone to a) have that right somewhere b) not pass the threshold and c) have a need to make so many API requests. And when that happens, "ask a steward to add you to the global apihighlimits-requestor group" seems a reasonable approach.

It actually seems much more likely that people who don't have a bunch of edits and need to make so many requests also don't have apihighlimits on any wiki - that's the typical "university research project" case. So we need the "ask WMF or the community for super high limits" mechanism anyway.

Change #1260763 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: Refactor request classification for readability

https://gerrit.wikimedia.org/r/1260763

Change #1260774 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: add values for new rate limiting classes

https://gerrit.wikimedia.org/r/1260774

Change #1266237 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/deployment-charts@master] rest gateway: define authed-user class

https://gerrit.wikimedia.org/r/1266237

Change #1266237 merged by jenkins-bot:

[operations/deployment-charts@master] rest gateway: define authed-user class

https://gerrit.wikimedia.org/r/1266237

Change #1260724 merged by jenkins-bot:

[mediawiki/extensions/WikimediaCustomizations@master] Set rate limit class based on edit count and account age

https://gerrit.wikimedia.org/r/1260724

Change #1270765 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] API rate limits: add highlimits-user class

https://gerrit.wikimedia.org/r/1270765

Change #1270765 merged by jenkins-bot:

[operations/mediawiki-config@master] API rate limits: add highlimits-user class

https://gerrit.wikimedia.org/r/1270765

Mentioned in SAL (#wikimedia-operations) [2026-04-16T14:59:21Z] <daniel@deploy1003> Started scap sync-world: Backport for [[gerrit:1270765|API rate limits: add highlimits-user class (T419796)]]

Mentioned in SAL (#wikimedia-operations) [2026-04-16T15:01:44Z] <daniel@deploy1003> daniel: Backport for [[gerrit:1270765|API rate limits: add highlimits-user class (T419796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-04-16T15:10:08Z] <daniel@deploy1003> Finished scap sync-world: Backport for [[gerrit:1270765|API rate limits: add highlimits-user class (T419796)]] (duration: 10m 47s)

Change #1275410 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] api rate limits: use global apihighlimits-requestor group.

https://gerrit.wikimedia.org/r/1275410

Change #1276363 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/deployment-charts@master] redioscope: add more histogram buckets

https://gerrit.wikimedia.org/r/1276363

Change #1275410 merged by jenkins-bot:

[operations/mediawiki-config@master] api rate limits: use global apihighlimits-requestor group.

https://gerrit.wikimedia.org/r/1275410

Mentioned in SAL (#wikimedia-operations) [2026-04-23T10:08:44Z] <daniel@deploy1003> Started scap sync-world: Backport for [[gerrit:1275410|api rate limits: use global apihighlimits-requestor group. (T419796)]]

Mentioned in SAL (#wikimedia-operations) [2026-04-23T10:10:23Z] <daniel@deploy1003> daniel: Backport for [[gerrit:1275410|api rate limits: use global apihighlimits-requestor group. (T419796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Change #1276363 merged by jenkins-bot:

[operations/deployment-charts@master] redioscope: add more histogram buckets

https://gerrit.wikimedia.org/r/1276363

Mentioned in SAL (#wikimedia-operations) [2026-04-23T10:16:21Z] <daniel@deploy1003> Finished scap sync-world: Backport for [[gerrit:1275410|api rate limits: use global apihighlimits-requestor group. (T419796)]] (duration: 07m 37s)