Prevent creation of items having the same sitelinks (duplicates) using memcached and database locks
Closed, ResolvedPublic
Actions

Description

Problem:
Concurrent edits can create multiple items with the same sitelink, which isn’t supposed to be possible.

Suggested solution:
For each sitelink to be added, take a database lock and set a key in memcached. See T44325#6981283.

Steps to reproduce:
See T44325#6981283.

Details

Subject	Repo	Branch	Lines +/-
Prevent creation of true duplicates	mediawiki/extensions/Wikibase	master	+477 -22
Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup	mediawiki/extensions/Wikibase	wmf/1.38.0-wmf.2	+7 -2
Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup	mediawiki/extensions/Wikibase	wmf/1.38.0-wmf.1	+7 -2
Allow unknown item ID in SiteLinkConflictLookup	mediawiki/extensions/Wikibase	master	+40 -21
Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup	mediawiki/extensions/Wikibase	master	+7 -2

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T44325 Prevent creation of items having the same sitelinks (duplicates)
Resolved		Lucas_Werkmeister_WMDE	T291377 Prevent creation of items having the same sitelinks (duplicates) using memcached and database locks
Resolved	PRODUCTION ERROR	Lucas_Werkmeister_WMDE	T293104 TypeError: Argument 1 passed to Wikibase\Repo\Validators\UniquenessViolation::__construct() must implement interface Wikibase\DataModel\Entity\EntityId or be null, string given, called in /srv/mediawiki/php-1.38.0-wmf.3/extensions/Wikibase/repo/includes/Validators/SiteLinkUniquenessValidator.php on line 63

Event Timeline

Lucas_Werkmeister_WMDE created this task.Sep 20 2021, 12:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 20 2021, 12:59 PM

Lucas_Werkmeister_WMDE claimed this task.Sep 20 2021, 12:59 PM

Lucas_Werkmeister_WMDE moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Apparently prior to MySQL 5.7 / MariaDB 10.0.2, a connection can only hold one lock at a time, and a GET_LOCK() call will release the previous lock. MediaWiki core currently claims to support MySQL 5.5.8+, so I guess if someone runs Wikibase on an old MySQL version, the changes we make here won’t fully protect them from duplicate items (though they should still work for single sitelinks). We can probably live with that, though. (In production, we run a sufficiently new MariaDB version: 10.4.15, according to Special:Version.)

Proposed procedure, refined from @hoo’s comment:

for each sitelink:
- get the database lock, with the site + title as name
  - if we fail to acquire it, report conflict (with unknown other item)
- getWithSetCallback the memcached “lock”, with the site + title as key, the item ID as the value, and a low TTL (60 seconds?)
  - if the returned value is not the item ID, load the other item (from the primary DB); if it indeed has a sitelink conflict, report conflict
- on transaction commit, free the database lock

Rationale:

The database lock is the main protection. If the connection drops while we hold the lock, MediaWiki will not automatically reconnect – so if the connection drops right after we get the lock, then the save will fail, which is good because otherwise we might save a conflicting item. However, we don’t want this to happen during the secondary data updates (where items_per_site is updated), which is why we release the lock on transaction commit.
The memcached “lock” bridges the gap between the commit of the main transaction and the secondary data update that writes the sitelinks to items_per_site (at which point the regular conflict detection can pick them up, see SqlSiteLinkConflictLookup). The case that two requests concurrently get the key from memcached, see that it’s free, and each write their item ID to memcached, is mostly prevented by the database lock.
We write the item ID to memcached, and check the mentioned item before reporting a conflict, to avoid a scenario where a sitelink is quickly added, removed, and then added to another item (we want to allow it on the other item, not detect that as a conflict).

A tiny race condition is still possible with this scheme, I believe:

request A gets the database lock
request A’s database connection dies, releasing the lock
request A gets the value from memcached, sees it’s empty
request B gets the database lock
request B gets the value from memcached, sees it’s empty
request B writes its item ID to memcached
request B proceeds to save the item and release the database lock
request A writes its item ID to memcached
request A tries to save the item, fails because the connection died (and won’t be automatically restored, because a lock was held)
request C gets the database lock
request C gets the value from memcached, sees request A’s item ID
request C tries to get request A’s item, fails because request A never managed to save its item
request C believes there is no conflict, saves the item
undetected conflict between request B and C

I can’t think of a realistic way to solve this, and I think we can probably just ignore it: in practice, it should be very unlikely. (If we didn’t write the item ID to memcached, and instead assumed that the presence of any key means a conflict, that would prevent this issue, but block more legitimate edits than we want, I believe.)

Hm, unless we really ignore the value in the memcached key, but also remove the memcached entries after successfully writing to items_per_site, as Marius said?

In that case, a failed request A as in the above example would block saves for that sitelink until the TTL expires, but that’s alright. And the “sitelink is quickly added, removed, and then added to another item” case still works, as long as the “added to another item” happens after the first request’s deferred updates finish (and removed the sitelink from memcached). Maybe that’s actually better.

Yeah, actually, this seems nicely symmetrical. We get the database lock, and release it on main transaction commit; and we get the memcached “lock” (set the key), and “release” (delete) it on secondary updates commit.

Hm, apparently memcached even has an add command that atomically sets an entry only if it doesn’t exist. (Thus is exposed in BagOStuff too; in Redis it corresponds to the nx flag). In that case, do we even need the database lock at all? Previously, the database lock prevented that two requests simultaneously get the entry, see it doesn’t exist, create it, and then successfully save (with the lock, that sequence is only possible if one request drops the lock, at which point it will no longer successfully save). But with add it seems like we might not need that at all?

(Of course, it’s possible that the memcached entry is evicted for whatever reason, but that’s true regardless of the database lock, and the database lock also doesn’t protect against this if the eviction happens after we release the lock on main transaction commit.)

Addshore moved this task from Inbox to Active on the [DEPRECATED] wdwb-tech board.Sep 20 2021, 3:46 PM

Hmm, but if I wire this up with SiteLinkUniquenessValidator (as I had intended), then this will always be used to check all sitelinks of an item, not just added or modified sitelinks. And this would mean that, as long as an item has any sitelinks at all, a second edit to the item would only be possible after the first edit’s secondary data updates have finished. When the item has been edited, but the secondary updates haven’t been finished yet, then the keys for all the sitelinks will exist in memcached, and any other edits to the items will be detected as a conflict (because we’re not using the values for those keys, so we have no idea that it’s a “self-conflict”).

Is this a problem? Is it usual for another edit to an item to start before the secondary updates have finished?

And if it is a problem… do we need to go back to the version again where we write the item ID to the memcached key?

Hm, if we write the item ID to memcached, but assume that any other item ID means a conflict (rather than trying to load that item and making sure), then we avoid that “tiny race condition” from earlier. This should be more okay now than in the proposal of that comment, because now we’re removing the memcached entry when the secondary data update finishes, so it’s okay that we only use the item ID to prevent self-conflicts.

Change 722410 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] WIP: Prevent creation of true duplicates

https://gerrit.wikimedia.org/r/722410

gerritbot added a project: Patch-For-Review.Sep 20 2021, 5:23 PM

Addshore moved this task from incoming to in progress on the Wikidata board.Sep 21 2021, 8:37 AM

So the above change doesn’t work locally, and I think the reason is the database isolation level. I assumed earlier that it’s safe to delete the key from memcached after the transaction writing to items_per_site committed – but this assumes that the other transaction (trying to find conflicts in items_per_site) will READ COMMITTED rows. At the default REPEATABLE READ level, and if that transaction started before the writeback committed, the conflict won’t be seen in the database.

Yeah, the transaction isolation is the problem. The conflict is detected when I use

SqlSiteLinkConflictLookup::getConflictsForItem()

			// $dbr = $this->db->connections()->getWriteConnectionRef();
			$dbr = $this->db->connections()->getWriteConnection( ILoadBalancer::CONN_TRX_AUTOCOMMIT );

to get a separate connection without a running transaction.

Here’s a version of the reproduction commands from T44325#7355099 for easier testing: first, setup once…

const params = { action: 'wbeditentity', new: 'item', data: JSON.stringify( { sitelinks: { [ mw.config.get( 'wgDBname' ) ]: { site: mw.config.get( 'wgDBname' ), title: mw.config.get( 'wgTitle' ) } } } ) };
const api = new mw.Api();

and then repeat the following snippet as often as you want:

await Promise.all( [ api.postWithEditToken( params ), api.postWithEditToken( params ) ].map( promise => promise.then( response => response.entity.id ).then( itemId => {
  console.log( itemId );
  return api.postWithEditToken( { action: 'delete', title: `Item:${itemId}` } );
} ) ) )

This will try to create two items with the same sitelink, and either log a success (if both were created, i.e. the conflict was not detected, bad) or an error (failed-save indicates the conflict was detected, good). Either way, any created item is also deleted afterwards, so you can immediately try again.

Using CONN_TRX_AUTOCOMMIT for the conflict lookup has an unexpected side effect, though. I just got this error message from the API:

The link <a rel="nofollow" class="external text" href="http://localhost/wiki1/index.php/True_duplicate">wiki1:True duplicate</a> is already used by Item <a href="/wiki1/index.php?title=Item:Q2546&action=edit&redlink=1" class="new" title="Item:Q2546 (page does not exist)">Q2546</a>. You may remove it from <a href="/wiki1/index.php?title=Item:Q2546&action=edit&redlink=1" class="new" title="Item:Q2546 (page does not exist)">Q2546</a> if it does not belong there or merge the Items if they are about the exact same topic.

The conflict lookup correctly detected the conflict with Q2546, but the rest of MediaWiki was working in a transaction that started before that item had been committed, so it thought the item didn’t exist and the link became a redlink. I don’t think there’s anything we can do about this – the message is rendered wikitext, the link is a wikilink, we can’t just replace a call to LinkRenderer::makeLink() with makeKnownLink() and call it a day.

On the other hand, I’m pretty sure this error was already possible (albeit unlikely) before: if request A deletes an item, commits, request B starts with an attempt to create another item with the same sitelink, and request A’s secondary data updates only run afterwards (deleting the item’s sitelinks from the database), then request B will still detect a conflict, but will already see that item as deleted, so it should render a redlink as well.

Change 722924 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/722924

Change 722925 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Allow unknown item ID in SiteLinkConflictLookup

https://gerrit.wikimedia.org/r/722925

Lucas_Werkmeister_WMDE moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Sep 22 2021, 5:00 PM

Robby subscribed.Sep 23 2021, 7:14 AM

Lucas_Werkmeister_WMDE mentioned this in T291706: Make WikibaseRepo.BagOStuffSiteLinkConflictLookup a proper unit test.Sep 24 2021, 4:58 PM

Lucas_Werkmeister_WMDE mentioned this in T291906: Prevent creation of properties having the same label.Sep 28 2021, 8:56 AM

Change 722924 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/722924

Change 722925 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Allow unknown item ID in SiteLinkConflictLookup

https://gerrit.wikimedia.org/r/722925

Change 724370 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.1] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/724370

Change 724371 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.2] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/724371

Change 724370 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.1] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/724370

Mentioned in SAL (#wikimedia-operations) [2021-09-28T11:54:22Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.38.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: [[gerrit:724370|Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377)]] (duration: 00m 57s)

The wmf.1 backport seems to work well on Test Wikidata at the moment, I tried it on https://test.wikidata.org/wiki/Wikidata:T291377_test_page. The wmf.2 backport is on hold, see T281166#7383319. But until the train reaches group1 tomorrow evening, true duplicates should now be less likely on Wikidata.

ReleaseTaggerBot added a project: MW-1.38-notes (1.38.0-wmf.3; 2021-10-05).Sep 28 2021, 4:00 PM

Change 724371 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.2] Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup

https://gerrit.wikimedia.org/r/724371

Mentioned in SAL (#wikimedia-operations) [2021-09-29T11:43:58Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: [[gerrit:724371|Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377)]] (duration: 01m 07s)

Lucas_Werkmeister_WMDE mentioned this in T44325: Prevent creation of items having the same sitelinks (duplicates).Sep 29 2021, 11:56 AM

Lucas_Werkmeister_WMDE moved this task from Peer Review to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

ReleaseTaggerBot edited projects, added MW-1.38-notes (1.38.0-wmf.2; 2021-09-28); removed MW-1.38-notes (1.38.0-wmf.3; 2021-10-05).Sep 29 2021, 12:00 PM

Change 722410 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Prevent creation of true duplicates

https://gerrit.wikimedia.org/r/722410

ReleaseTaggerBot edited projects, added MW-1.38-notes (1.38.0-wmf.3; 2021-10-05); removed MW-1.38-notes (1.38.0-wmf.2; 2021-09-28).Oct 4 2021, 3:00 PM

Lucas_Werkmeister_WMDE added a subtask: T293104: TypeError: Argument 1 passed to Wikibase\Repo\Validators\UniquenessViolation::__construct() must implement interface Wikibase\DataModel\Entity\EntityId or be null, string given, called in /srv/mediawiki/php-1.38.0-wmf.3/extensions/Wikibase/repo/includes/Validators/SiteLinkUniquenessValidator.php on line 63.Oct 13 2021, 9:14 AM

Lucas_Werkmeister_WMDE closed subtask T293104: TypeError: Argument 1 passed to Wikibase\Repo\Validators\UniquenessViolation::__construct() must implement interface Wikibase\DataModel\Entity\EntityId or be null, string given, called in /srv/mediawiki/php-1.38.0-wmf.3/extensions/Wikibase/repo/includes/Validators/SiteLinkUniquenessValidator.php on line 63 as Resolved.Oct 14 2021, 1:27 PM

Addshore closed this task as Resolved.Oct 18 2021, 8:54 AM