Page MenuHomePhabricator

Add rate limit for creating Item IDs
Closed, ResolvedPublic8 Estimated Story Points

Description

Problem:
We want to prevent people from allocating a large number of unassigned Item IDs because we do not want to skip too many Item IDs.

If users try to generate too many IDs – regardless of what exactly is causing them to generate so many IDs – they should get a rate limit error, and no further IDs should be generated until the rate limit expires. The rate limit should be the same as the existing limit for page creation. See also T268625#6651847.

In order to do that we want to add a rate limit in front of the ID generator.

BDD
GIVEN an account creating a large number of new Items
WHEN generating too many new Item IDs over a short period of time
THEN the account should be rate limited
AND get an error for further requests

Acceptance criteria:

  • Item ID assignment is rate limited
  • there is one limit and it applies to all entity types
  • the limit is configured to be the same as the current page creation rate limit
  • the error shown should be the same as the one shown by the page creation rate limit

Tips to reproduce it
Set the creation rate limit really low e.g. 1 per minute, that would raise an API error from the second Item created within the minute
https://www.mediawiki.org/wiki/Manual:$wgRateLimits

Event Timeline

It sounds like a slightly different way of wording this would be to ensure that the existing page creation limit check happens before IDs are created to be assigned?

I suppose that would also be an option. I feel like it would be more difficult to implement, but maybe not.

Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher moved this task from Incoming to Unconnected Stories on the Wikidata-Campsite board.

I think the entity ID generation currently happens in EntitySavingHelper::loadEntity() (called by ModifyEntity, the parent class of EditEntity and others; calls createEntity(), which calls EntityStore::assignFreshId()), while the “create” rate limit is checked later in MediawikiEditEntity::attemptSave() (calls checkRateLimits()).

I would suggest implementing this using a new IdGenerator implementation that wraps another generator and checks the rate limit before delegating to it, similar to the LoggingIdGenerator we introduced for the investigation.

darthmon_wmde updated the task description. (Show Details)
darthmon_wmde set the point value for this task to 8.

Change 660864 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Add wikibase-idgenerator rate limit

https://gerrit.wikimedia.org/r/660864

For verifying this locally:
add the following to your LocalSettings.php:
$wgWBRepoSettings['idGeneratorRateLimiting'] = true;
$wgRateLimits['wikibase-idgenerator']['user'] = [ 0, 60 ]; 1 per minute
$wgRateLimits['wikibase-idgenerator']['anon'] = [ 0, 60 ];
1 per minute
$wgMainCacheType = CACHE_DB;

(the rate limit for 'anon' is probably enough but I have both so pasted both here)
Open a private window and try to create an item without logging in. you should receive an error.

Change 660864 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add wikibase-idgenerator rate limit

https://gerrit.wikimedia.org/r/660864

Alright, the rate limit is merged but disabled by default. @Lydia_Pintscher do you think this needs an announcement as a significant change, or can we turn it on in production as soon as the code is rolled out?

I think we can turn it on and is not a significant change. I think it does deserve a mention at least in the weekly summary though.

Alright. Suggested summary:

Later this week, we will enable a rate limit for assigning new item IDs. Bots and users who successfully create items should notice no change, since the rate limit is equal to the existing limit on all edits. However, bots that often fail to create items may start to see different error messages than usual. We hope that this will reduce the problem of skipped item IDs.

(Wasn’t sure if that would belong under Other Noteworthy Stuff or Development, so I’m putting it here.)

@Lucas_Werkmeister_WMDE looking at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/660864 the options.md docs are not updated.
Looking at the current options we choose to document there this should probably have an entry

What would you expect in options.md – the rate limit itself, or the idGeneratorRateLimiting option? I’d expect the latter to be temporary.

The idGeneratorRateLimiting option. Aah, if this is only planned to be a temporary option then I wouldn't bother.
In the past we have prefixed similar temporary options with tmp or mentioned it in the doc block.

True, I probably should’ve done that for those options.

Which reminds me, we should have a task for removing both idGeneratorLogging and idGeneratorRateLimiting. (Probably two tasks.) I’ll go and create them.

Edit: T274156 and T274157

The code for this should now be rolled out, so we can enable it in production. Do we need to wait for anything (e.g. announcement) or can we go ahead?

Let's go ahead and mention it in the next weekly summary.

Alright, I’ve added it to Other Noteworthy Stuff for now – feel free to edit it and/or move it around.

Change 664507 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/mediawiki-config@master] Enable Wikibase Repo ID generator rate limiting on Test Wikidata

https://gerrit.wikimedia.org/r/664507

Change 664507 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable Wikibase Repo ID generator rate limiting on Test Wikidata

https://gerrit.wikimedia.org/r/664507

Mentioned in SAL (#wikimedia-operations) [2021-02-16T12:38:35Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:664507|Enable Wikibase Repo ID generator rate limiting on Test Wikidata (T272032)]] 1/2 (duration: 01m 12s)

Mentioned in SAL (#wikimedia-operations) [2021-02-16T12:39:54Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/Wikibase.php: Config: [[gerrit:664507|Enable Wikibase Repo ID generator rate limiting on Test Wikidata (T272032)]] 2/2 (duration: 01m 06s)

Tested using a script to send a burst of 150 bad item ID creation requests (P14376).

Output before deploying the change:

Bot username: Lucas Werkmeister (WMDE)@T272032_test
Bot password: 
Q214286
{'not-recognized-array': 150}
Q214437

The only error I got was an error about the bad labels array I passed in, and the item ID counter was increased by 150 between the two good “marker” items, one per failed request.

Output after deploying the config change:

Bot username: Lucas Werkmeister (WMDE)@T272032_test
Bot password: 
Q214438
{'not-recognized-array': 89, 'no-automatic-entity-id': 61}
Traceback (most recent call last):
  File "./floodids.py", line 49, in <module>
    print(marker_item())
  File "./floodids.py", line 19, in marker_item
    return session.post(action='wbeditentity',
  File "/home/luwe/.local/lib/python3.8/site-packages/mwapi/session.py", line 340, in post
    return self.request('POST', params=params, auth=auth,
  File "/home/luwe/.local/lib/python3.8/site-packages/mwapi/session.py", line 170, in request
    return self._request(method, params=normal_params, auth=auth,
  File "/home/luwe/.local/lib/python3.8/site-packages/mwapi/session.py", line 126, in _request
    raise APIError.from_doc(doc['error'])
mwapi.errors.APIError: no-automatic-entity-id: Cannot automatically assign ID: As an anti-abuse measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes. -- See https://test.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.

After the last error, I manually created a new item on the website and it got the ID Q214528, so this time the item ID counter was only increased by 89 between the marker items, and the remaining 61 requests got a no-automatic-entity-id error without further increasing the counter. That would match the rate limit of 90/minute.

Change 664593 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/mediawiki-config@master] Enable Wikibase Repo ID generator rate limiting on Wikidata

https://gerrit.wikimedia.org/r/664593

If the rate limit as it stands is not enough, we could maybe consider something like this:

diff --git a/repo/includes/Api/ModifyEntity.php b/repo/includes/Api/ModifyEntity.php
index d040105732..4a0d083544 100644
--- a/repo/includes/Api/ModifyEntity.php
+++ b/repo/includes/Api/ModifyEntity.php
@@ -357,28 +357,38 @@ private function loadEntityFromSavingHelper( ?EntityId $entityId ): EntityDocume
 		$entity = $this->entitySavingHelper->loadEntity( $entityId, EntitySavingHelper::NO_FRESH_ID );
 
 		if ( $entity->getId() === null ) {
 			// Make sure the user is allowed to create an entity before attempting to assign an id
 			$permStatus = $this->permissionChecker->getPermissionStatusForEntity(
 				$this->getUser(),
 				EntityPermissionChecker::ACTION_EDIT,
 				$entity
 			);
 			if ( !$permStatus->isOK() ) {
 				$this->errorReporter->dieStatus( $permStatus, 'permissiondenied' );
 			}
 
 			$entity = $this->entitySavingHelper->loadEntity( $entityId, EntitySavingHelper::ASSIGN_FRESH_ID );
+			// since we assigned a fresh entity ID, “punish” the user if an API error happens after this point
+			// (which means the ID was wasted, since no entity will be saved)
+			// by increasing the same rate limit which RateLimitingIdGenerator also checks, but with a larger increment
+			$this->getHookContainer()->register(
+				'ApiMain::onException',
+				function () {
+					$this->getUser()->pingLimiter( 'wikibase-idgenerator', 10 );
+					// result is ignored, we only bump the limit for the next check in RateLimitingIdGenerator
+				}
+			);
 		}
 
 		return $entity;
 	}

(This is just a sketch, I haven’t tested it yet. I also don’t know if adding a hook at runtime like this is considered acceptable or not.)

Change 664593 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable Wikibase Repo ID generator rate limiting on Wikidata

https://gerrit.wikimedia.org/r/664593

Mentioned in SAL (#wikimedia-operations) [2021-02-17T13:19:47Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:664593|Enable Wikibase Repo ID generator rate limiting on Wikidata (T272032)]] (duration: 01m 11s)