Page MenuHomePhabricator

Decide what the rate limit should be for temporary account creations
Closed, ResolvedPublic

Description

Background

The number of accounts created from the same IP address is rate limited, using the $wgAccountCreationThrottle config.

The number of temporary accounts created is not currently rate limited. This task is to determine whether we should apply a rate limit, and if so what that limit should be.

Why we might want rate limiting

It can be relatively easy to block cookies for a given website, in which case each edit is assigned to a new temporary account, even within a browser session.

Q: How much do we expect this to happen, and how much would this need to happen in order to create database storage problems or difficulties for patrollers?
A:

  • KH: we should take the worst case scenario as the guideline for this, which is scripted temporary account creation, and that is currently bounded by the limit of 8 edits per minute per IP. We can work out a more constricted rate limit in T357771: Analyze how many distinct devices edit per day from a given IP address
  • KH: we can also consider a tiered system for the rate limits; we have a hard limit that denies the edit/temp account creation entirely after it's been tripped, and a soft limit that prompts for CAPTCHA completion. That would allow for limiting scripted abuse while still allowing good faith users from a particular IP to make an initial edit.
Why we might not want rate limiting

Some IPs are shared by a large number of people, e.g. covering a large geographical area. Rate limiting could significantly harm the ability of people using these IPs to edit.

Q: If we decide to implement rate limiting, how can we monitor to what extent this is happening?
A: T357763: [Epic] Create a temporary accounts initiative Grafana dashboard would provide insight into this

Other considerations

Q: Is the rate limit on anon edits still applied correctly with temp account creation enabled?
A: yes, the $wgRateLimits['edit']['ip'] applies to edits that result in temp acocunt creation.

Related Objects

StatusSubtypeAssignedTask
Resolvedkostajh
DeclinedNone
In ProgressNiharika
Openkostajh
Resolvedkostajh
Resolvedkostajh
Resolvedjwang
Declinedkostajh
Resolvedkostajh
Resolvedkostajh
DeclinedNone
OpenNone
Openmszabo
Resolvedachou
Openkostajh
Openkostajh
Resolvedsbassett
OpenNone
OpenNone
ResolvedMunizaA
OpenNone

Event Timeline

@Madalina In our roadmap, this is assigned to Product for Nov/Dec 2023. Did @Niharika share any updates with you?

It can be relatively easy to block cookies for a given website, in which case each edit is assigned to a new temporary account, even within a browser session.

If we don't rate limit temporary account creations per IP address, then we run the risk of a malicious actor using a script to rapidly edit pages. Each edit would create a new temporary account.

I would propose that we start out with some limits per IP address, and we could scale this up or down as needed.

It's true that users who are behind an IP address that serves thousands of people could be negatively impacted. This is somewhat mitigated in that those users would be able to create an account and edit.

It can be relatively easy to block cookies for a given website, in which case each edit is assigned to a new temporary account, even within a browser session.

If we don't rate limit temporary account creations per IP address, then we run the risk of a malicious actor using a script to rapidly edit pages. Each edit would create a new temporary account.

I would propose that we start out with some limits per IP address, and we could scale this up or down as needed.

It's true that users who are behind an IP address that serves thousands of people could be negatively impacted. This is somewhat mitigated in that those users would be able to create an account and edit.

Alternatively, we could think about showing a CAPTCHA to gate temporary account creation, but that brings other issues.

Alternatively, we could think about showing a CAPTCHA to gate temporary account creation, but that brings other issues.

That should be fairly easy to do since we already show a captcha sometimes during editing. (Not sure if there are other actions which should create a temp account but don't support captcha? Flow?) It would make temp account creation almost as hard as normal account creation (where the captcha is the most disruptive step, given how easy password management is in today's browsers) but then the only alternative that would be to send the user to normal account creation, so probably still a win?

That said,

If we don't rate limit temporary account creations per IP address, then we run the risk of a malicious actor using a script to rapidly edit pages. Each edit would create a new temporary account.

is that actually a problem? I mean, a bigger problem then someone making lots of scripted edits (which is not harder today than it would be with temp accounts)? The DB cost of temp accounts isn't huge.

If we don't rate limit temporary account creations per IP address, then we run the risk of a malicious actor using a script to rapidly edit pages. Each edit would create a new temporary account.

is that actually a problem? I mean, a bigger problem then someone making lots of scripted edits (which is not harder today than it would be with temp accounts)? The DB cost of temp accounts isn't huge.

I don't know. It's not just DB cost, but also all the hooks that are invoked and other knock-on effects of creating an account.

For scripted edits, I assume (maybe wrongly) at some point we start showing a CAPTCHA? At some point, scripted edits would hit a rate limit for an IP address. But if an attacker wants to create a few thousand temp accounts, and store the cookies for later use, they'll be able to use those thousands of temp accounts multiplied by whatever rate limit is set for temp user edits per hour to generate a lot of vandalism. If those scripted edits are then not all made from the same IP address, they'll be difficult to clean up.

For scripted edits, I assume (maybe wrongly) at some point we start showing a CAPTCHA? At some point, scripted edits would hit a rate limit for an IP address.

I don't think we throttle edits via captcha; we show it when someone tries to add an URL. We do use per-IP throttling. But any defenses we have against mass edits applies to mass edits which would create temp users just as well. And from the other direction, preventing temp account creation means preventing the edit since there is not much else we could do with it. So I don't think there's a meaningful difference between an anonymous edit throttle (doable via $wgRateLimits['edit']) and a temp account creation throttle, unless there is some different, less conspicuous way of creating temp accounts.

If those scripted edits are then not all made from the same IP address, they'll be difficult to clean up.

Not very different from making the original temp-account-creating edits from different IP addresses, I think?

So I don't think there's a meaningful difference between an anonymous edit throttle (doable via $wgRateLimits['edit']) and a temp account creation throttle, unless there is some different, less conspicuous way of creating temp accounts.

In theory temp accounts could be created from any action listed in $wgAutoCreateTempUser['actions'], though we only currently support edit. We're investigating whether temp accounts may need to be created on actions other than edit (i.e. which workflows try to create an IP actor) in T349219.

I suppose one would hope that any action that an anon user can do is already rate limited. Though the actual rate limit on temp account creations would be roughly the sum of the rate limits for all the different $wgAutoCreateTempUser['actions'], so if the list of actions is huge or one of the actions has a very high rate limit then it would be worth rate limiting temp account creations separately.

That's probably an unlikely scenario for production, but perhaps there's enough uncertainty here that it's worth supporting temp account creation rate limits.

My concern here is that if we don't apply a rate limit it would be very easy for someone to get a new temporary account username per edit.

This is an issue because it makes it hard to see a pattern of abuse for users without the ability to temporary account IPs. One temporary account that does lots of vandalism even after warnings is easy to block, but multiple temporary accounts performing this vandalism could be seen as seperate people and then not appropriately blocked.

For example:

  • With rate limiting, the user uses one (or a few) temporary account usernames to make one vandalism edit to pages A, B, C, D, E, F, G, H, .... The small number of usernames that were used makes it easier to link them together and block them.
  • With a rate limit equal to the rate limit for editing, then a new temporary account could be created per edit (via a clear of the session). Therefore none of the accounts have more than one edit, meaning on their own they are not blockable as no pattern of abuse is seen. Because there is no overlap, it also makes it difficult to link each temporary user together and treat them as one user for the purposes of blocking. I would argue that the community may not access the IP for a one edit temporary account that could just be warned, and may only access the IP (and find other temporary account usernames) once one of the temporary accounts shows a pattern of abuse. As such, this is not a way around this. Furthermore, only a subset of all users who report vandalism are going to have the ability to see temporary account IP addresses.

FWIW, currently we have a 8/min throttle on anonymous edits (a bit higher on Commons) and 6/day on regular account creations. What would be the new settings? You can create a temp account without solving a captcha, so presumably we'd still want to limit all temp accounts using the same IP to max 8 edits per min (even if technically they aren't "anonymous" anymore), right?

Note: there are multiple 8/min throttle: see https://www.mediawiki.org/wiki/Manual:$wgRateLimits - If the user have no noratelimit user right and is not using IP listed in $wgRateLimitsExcludedIPs:

  • all users (registered or not) without the autoconfirmed user right on single IP are limited to 8 edits/minute across all WMF wikis (the ip part)
  • each registered user without the autoconfirmed user right (with any IP) is limited to 8 edits/minute in one site (the newbie part)

In Commons:

  • all users (registered or not) without the autoconfirmed user right can not edit if there are 120 non-autoconfirmed edits across all WMF wikis on this IP in latest 5 minutes
  • user without the autoconfirmed user right can not edit if there are 120 edits for this user (with any IP) in Commons in latest 5 minutes

You can create a temp account without solving a captcha

At the moment, yes, but I'd like to consider requiring a CAPTCHA at least for the first edit that would result in temp account creation.

One issue with the implementation in the patch for T342770: Can't edit any page via visual editor while not logged into an account or a temporary account is that getUserForPermission returns a temp user placeholder name (*Unregistered *) for rate limit checks:

ApiEditPage.php
private function getUserForPermissions() {
	$user = $this->getUser();
	if ( $this->tempUserCreator->shouldAutoCreate( $user, 'edit' ) ) {
		return $this->userFactory->newUnsavedTempUser(
			$this->tempUserCreator->getStashedName( $this->getRequest()->getSession() )
		);
	}
	return $user;
}

Which means that multiple attempts for account creation from a single IP will be checked against the rate limits for *Unregistered *, but we probably want to first check the rate limit for editing by IP address.

One issue with the implementation in the patch for T342770: Can't edit any page via visual editor while not logged into an account or a temporary account is that getUserForPermission returns a temp user placeholder name (*Unregistered *) for rate limit checks:

ApiEditPage.php
private function getUserForPermissions() {
	$user = $this->getUser();
	if ( $this->tempUserCreator->shouldAutoCreate( $user, 'edit' ) ) {
		return $this->userFactory->newUnsavedTempUser(
			$this->tempUserCreator->getStashedName( $this->getRequest()->getSession() )
		);
	}
	return $user;
}

Which means that multiple attempts for account creation from a single IP will be checked against the rate limits for *Unregistered *, but we probably want to first check the rate limit for editing by IP address.

Summarizing a discussion @kostajh and I just had about this:

  • Example log seen when the rate limit is tripped in this way:
ratelimit.INFO: User::pingLimiter: User tripped rate limit {"action":"edit","limit":8,"period":60,"count":8,"key":"ip","name":"*Unregistered *","ip":"127.0.0.1"}
  • The rate limit that's being tripped is $wgRateLimits['edit']['ip'], which is the limit for all IP/newbie/temp users editing from the same IP address - so that's working as it should be
  • Seeing *Unregistered * in the logs is a little confusing
  • Since T336187#9341749, we expect permissions checks to be performed against * rather than a placeholder temp account, so we may not need to do this any more. Filed as T355210
kostajh updated the task description. (Show Details)
kostajh renamed this task from Decide whether temporary account creations should be rate limited to Decide whether temporary account creations should have more restrictive rate limit than default IP edit rate limit.Feb 16 2024, 2:17 PM
kostajh renamed this task from Decide whether temporary account creations should have more restrictive rate limit than default IP edit rate limit to Decide what the rate limit should be for temporary account creations.

My proposal is:

My 2c that you can easily ignore (I think you mentioned it): I really like this idea: First edit that would trigger a new temp account should require a captcha, the subsequent edits by the same temp account shouldn't (unless the usual case of adding external links which is automatically enforced by ConfirmEdit extension). It would put up a decent-ish barrier against large-scale abuse.

First edit that would trigger a new temp account should require a captcha,

Thanks. That is covered in T357778: Provide ability to require logged-out users to complete a CAPTCHA on temporary account creations in certain circumstances

the subsequent edits by the same temp account shouldn't (unless the usual case of adding external links which is automatically enforced by ConfirmEdit extension)

There's a proposal in T357779: Provide ability to require temporary account users to complete a CAPTCHA in certain circumstances to sometimes require temp account users to fill out a CAPTCHA on subsequent edits after creating an account.

kostajh added a subscriber: jwang.

https://phabricator.wikimedia.org/T357771#9648033 found that p99 values are 3 per IP per day and p75 is 2 per IP per day. The current value of 6 account creations per IP per day (controlled via $wgAccountCreationThrottle) is probably fine, so we can resolve this task. This is something we could monitor as part of rollout. (cc @Madalina @Niharika @jwang to think about in health metrics for rollout.)