Page MenuHomePhabricator

Race condition in DatabaseBlockStore::acquireTarget()
Closed, ResolvedPublic

Description

The original implementation of conflict detection in the block_target table was faulty, with a race condition that we are hitting in production, causing multiple block_target rows for some users.

The query sequence was UPDATE/SELECT/INSERT. The update query acquires a shared lock and so is not serialized with another update query. However, the sequence 1.update, 2.update, 1.insert, 2.insert fails safely with a deadlock error.

The insert acquires an exclusive lock. If the first thread does both an update and insert before the second thread does its update, the second update will wait for the first thread to commit. The sequence is then 1.update, 1.insert, 2.update(wait), 1.commit, 2.update(complete), 2.select, 2.insert.

The 2.update had a condition bt_count=0 and so returned affectedRows=0 despite having waited for the insert to be committed.

The insert was conditional on the result of the select. Unfortunately, the select was non-locking. A locking select always returns the updated row, however a non-locking select often returns an empty result set despite the thread having waited for the new row to be committed.

The committed fix uses GET_LOCK() to synchronise externally, and this does help to avoid deadlock errors. That commit also added FOR UPDATE to the select, which would have been enough by itself to avoid duplicate rows.

Adding a unique index to the block_target table and using upsert is not feasible since an upsert on a table with multiple unique indexes is unsafe for replication. Also, nullable fields make it difficult to force the rows to collide, since any null value causes an aggregate index like (bt_user, bt_auto, bt_address) to compare non-equal.

Event Timeline

Change #1128065 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Add a unique index to the block_target table

https://gerrit.wikimedia.org/r/1128065

From schema point of view, since the table is small. Adding more indexes should be fine. I don't have any objections but I don't know the schema and internals of blocking storage well enough to fully approve the change. I leave it to someone from TSP or CommTech to handle it. Hope that's fine with people.

Change #1128561 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] rdbms: Require SQLite 3.31+

https://gerrit.wikimedia.org/r/1128561

Change #1128585 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] rdbms: Require SQLite 3.31+

https://gerrit.wikimedia.org/r/1128585

Change #1128561 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Require SQLite 3.24+

https://gerrit.wikimedia.org/r/1128561

tstarling renamed this task from Add a unique index to the block_target table to Race condition in DatabaseBlockStore::acquireTarget().Apr 7 2025, 7:35 AM

Change #1134635 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134635

Change #1134635 merged by jenkins-bot:

[mediawiki/core@master] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134635

Change #1134665 had a related patch set uploaded (by Reedy; author: Tim Starling):

[mediawiki/core@REL1_43] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134665

Change #1134671 had a related patch set uploaded (by Reedy; author: Tim Starling):

[mediawiki/core@REL1_42] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134671

Change #1134671 merged by jenkins-bot:

[mediawiki/core@REL1_42] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134671

Change #1134665 merged by jenkins-bot:

[mediawiki/core@REL1_43] block: Fix DBS::acquireTarget() race using GET_LOCK()

https://gerrit.wikimedia.org/r/1134665

Change #1128065 abandoned by Tim Starling:

[mediawiki/core@master] Add a unique index to the block_target table

Reason:

doesn't work

https://gerrit.wikimedia.org/r/1128065

tstarling updated the task description. (Show Details)

Change #1128585 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Require SQLite 3.31+

https://gerrit.wikimedia.org/r/1128585