Page MenuHomePhabricator

Lock wait timeout exceeded in SqlIdGenerator::generateNewId
Closed, ResolvedPublic

Description

2018-05-09 17:47:48 [WvM-1NApAMEMAADtSlpwAAADA] mw1232 wikidatawiki 1.32.0-wmf.2 DBQuery ERROR: Wikibase\SqlIdGenerator::generateNewId  10.64.48.26     1205    Lock wait timeout exceeded; try restarting transaction (10.64.48.26)    SELECT  id_value  FROM `wb_id_counters`    WHERE id_type = 'wikibase-item'  LIMIT 1   FOR UPDATE {"db_server":"10.64.48.26","db_name":"wikidatawiki","db_user":"wikiuser","method":"Wikimedia\\Rdbms\\Database::makeQueryException","errno":1205,"error":"Lock wait timeout exceeded; try restarting transaction (10.64.48.26)","sql1line":"SELECT  id_value  FROM `wb_id_counters`    WHERE id_type = 'wikibase-item'  LIMIT 1   FOR UPDATE","fname":"Wikibase\\SqlIdGenerator::generateNewId"}

This is happening occasionally, 0 to 5 times a day.

Patch-For-Review / TODO:

Details

Related Gerrit Patches:
mediawiki/extensions/Wikibase : masterRetry SELECT FOR UPDATE in SqlIdGenerator once
mediawiki/extensions/Wikibase : masterIdGenerator, Introduce UpsertSqlIdGenerator
operations/mediawiki-config : masterWikibase, use mysql-upsert on all Wikibases
operations/mediawiki-config : masterWikibase, use mysql-upsert id generator on testwikidatawiki
operations/mediawiki-config : masterBETA: Wikibase, use mysql-upsert id generator
operations/mediawiki-config : masterWikibase: prepare to set $wgWBRepoSettings['idGenerator']
mediawiki/extensions/Wikibase : masterAdd config to choose which IdGenerator to use
mediawiki/extensions/Wikibase : masterInject IdGenerator into SqlStore
mediawiki/extensions/Wikibase : masterWIP SqlIdGenerator, Use upsert for ID Generation

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron added a comment.Nov 2 2018, 6:28 AM

openConnection is badly named and still reuses connections. You'd probably want getConnection with CONN_TRX_AUTO

I hate this hack. This may *still* re-use connections, if anything else used CONN_TRX_AUTO. We should have CONN_NEW.

Reuse is fine here, it's just confusing for something called "openConnection". CONN_TRX_AUTO will just reuse other CONN_TRX_AUTO connections.

@aaron yea, but using CONN_TRX_AUTO in places where we want a different connection, but don't care at all about the TRX mode, is just bad. It obfuscates the code. If you read something like this for the first time, you can end up spending hours trying to understand why CONN_TRX_AUTO is relevant here - and perhaps decide it isn't, and remove it. We should really have a flag that describes the actual use case, e.g. CONN_NEW if we want to force a new connection, and CONN_AUX if all we need is "not the default" connection.

For context, note that we are not using an auto-increment field, we are doing the increment in code, and write it back to the database.

Why not do something like this?

INSERT INTO sequence_table ('sequence_name', 'sequence_value') VALUES (?, 1)
ON DUPLICATE KEY UPDATE sequence_value = LAST_INSERT_ID(sequence_value + 1);
SELECT LAST_INSERT_ID();

Wouldn't that at least reduce the time to hold a lock while still being atomic?

Addshore added a comment.EditedNov 16 2018, 2:43 PM

That will indeed work:

INSERT INTO wb_id_counters (id_type, id_value) VALUES ('wikibase-item', 1) ON DUPLICATE KEY UPDATE id_value = LAST_INSERT_ID(id_value + 1);
SELECT LAST_INSERT_ID();

Although interestingly for the initial insertion of the row the SELECT LAST_INSERT_ID(); will return 0

Change 474274 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] WIP SqlIdGenerator, Use upsert for ID Generation

https://gerrit.wikimedia.org/r/474274

Change 474278 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] IdGenerator, Introduce UpsertSqlIdGenerator

https://gerrit.wikimedia.org/r/474278

That will indeed work:

INSERT INTO wb_id_counters (id_type, id_value) VALUES ('wikibase-item', 1) ON DUPLICATE KEY UPDATE id_value = LAST_INSERT_ID(id_value + 1);
SELECT LAST_INSERT_ID();

This solution is specific to MySQL, right?

I guess that's ok, since Wikibase doesn't support anything else. Do we check on install?

Yeah it's definitely a mysql-specific technique but it's atomic, fast and recommended in the mysql documentation.

Addshore added a comment.EditedNov 19 2018, 1:48 PM

I guess that's ok, since Wikibase doesn't support anything else. Do we check on install?

The implementation in the patch currently uses the mediawiki abstraction for upsert, so this should work for all db types mediawiki supports.
However LAST_INSERT_ID is indeed mysql only.

I was planning on making the id generator configurable to begin with anyway, so we can
leave the old implementation there too for people to use (it will be needed for sqlite).

Although it looks like we could make it work for sqlite too? https://sqlite.org/c3ref/last_insert_rowid.html

Change 474725 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Inject IdGenerator into SqlStore

https://gerrit.wikimedia.org/r/474725

Change 474731 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Add config to choose which IdGenerator to use

https://gerrit.wikimedia.org/r/474731

Change 474274 abandoned by Addshore:
WIP SqlIdGenerator, Use upsert for ID Generation

Reason:
See https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/ /474731/

https://gerrit.wikimedia.org/r/474274

This comment was removed by Ladsgroup.

Change 474278 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] IdGenerator, Introduce UpsertSqlIdGenerator

https://gerrit.wikimedia.org/r/474278

Change 474725 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Inject IdGenerator into SqlStore

https://gerrit.wikimedia.org/r/474725

Change 474731 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add config to choose which IdGenerator to use

https://gerrit.wikimedia.org/r/474731

hoo added a comment.Dec 11 2018, 10:52 AM

Given the lock timeouts happen on SELECT, I don't really see how using a single query instead of SELECT+UPDATE/INSERT would greatly help us here. While that's definitely nice to have performance wise (less queries, less roundtrips), the main problem seems to me that something is getting a new id and then keeping the transaction (=> the lock) open to do other things before commit/rollback.

While I definitely think that using UpsertSqlIdGenerator is an improvement and will greatly help us here once the contention around this table grows even more… I don't think it can fix the problem at hand. We wont be able to solve the problem as long as we don't commit immediately after UPDATE/INSERT, I fear.

The new ID generator should be deployed to Test Wikidata within the next few hours – do we want to test it afterwards, or wait until the train reaches Wikidata and test there?

The new ID generator should be deployed to Test Wikidata within the next few hours – do we want to test it afterwards, or wait until the train reaches Wikidata and test there?

It's behind a config variable and should be turned off for now. That's my understanding of it.

Yes – what I meant is: do we want to write and deploy a config change to enable it on Test Wikidata separately, or go ahead with Wikidata directly (and only test on the debug server before deploying it everywhere)?

Addshore added a comment.EditedDec 12 2018, 11:19 AM

Yes – what I meant is: do we want to write and deploy a config change to enable it on Test Wikidata separately, or go ahead with Wikidata directly (and only test on the debug server before deploying it everywhere)?

We want to do beta, then test, then prod :)

And then also leave it enabled on all three, I assume.

Should we attach the necessary config changes to this task or create a subtask?

Addshore removed Addshore as the assignee of this task.Dec 14 2018, 12:24 PM

And then also leave it enabled on all three, I assume.
Should we attach the necessary config changes to this task or create a subtask?

I think we could move this back to TODO and create the config patches here for it :)
It looks like the code is already deployed so we could look at deploying this next week.?

Addshore updated the task description. (Show Details)Dec 14 2018, 12:24 PM
Addshore updated the task description. (Show Details)

Change 480774 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Wikibase: prepare to set $wgWBRepoSettings['idGenerator']

https://gerrit.wikimedia.org/r/480774

Change 480775 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] BETA: Wikibase, use mysql-upsert id generator

https://gerrit.wikimedia.org/r/480775

Change 480777 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Wikibase, use mysql-upsert id generator on testwikidatawiki

https://gerrit.wikimedia.org/r/480777

Change 480778 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Wikibase, use mysql-upsert on all Wikibases

https://gerrit.wikimedia.org/r/480778

Change 480774 merged by jenkins-bot:
[operations/mediawiki-config@master] Wikibase: prepare to set $wgWBRepoSettings['idGenerator']

https://gerrit.wikimedia.org/r/480774

Change 480775 merged by jenkins-bot:
[operations/mediawiki-config@master] BETA: Wikibase, use mysql-upsert id generator

https://gerrit.wikimedia.org/r/480775

Mentioned in SAL (#wikimedia-operations) [2018-12-19T17:04:40Z] <addshore@deploy1001> Synchronized wmf-config: Wikibase: prepare to set $wgWBRepoSettings idGenerator, T194299 (duration: 00m 53s)

Change 480777 merged by jenkins-bot:
[operations/mediawiki-config@master] Wikibase, use mysql-upsert id generator on testwikidatawiki

https://gerrit.wikimedia.org/r/480777

Mentioned in SAL (#wikimedia-operations) [2018-12-19T17:13:31Z] <addshore@deploy1001> Synchronized wmf-config: Wikibase: testwikidatawiki upsert idGenerator, T194299 (duration: 00m 52s)

Change 480778 merged by jenkins-bot:
[operations/mediawiki-config@master] Wikibase, use mysql-upsert on all Wikibases

https://gerrit.wikimedia.org/r/480778

Mentioned in SAL (#wikimedia-operations) [2018-12-19T17:22:39Z] <addshore@deploy1001> Synchronized wmf-config: Wikibase: wikidatawiki upsert idGenerator, T194299 (duration: 00m 52s)

Addshore updated the task description. (Show Details)Dec 19 2018, 5:27 PM

So the new IdGenerator has just been deployed to all production wikibases.
We should be able to check logstash @ https://logstash.wikimedia.org/goto/b2a28cc42a9203bfe1d796929696fe55 and make sure that the old error stops occouring and no new errors occour with the new ID generator.

From yesterday:

Wikibase\UpsertSqlIdGenerator::upsertId	10.64.48.26	1205	Lock wait timeout exceeded; try restarting transaction (10.64.48.26)	INSERT INTO `wb_id_counters` (id_type,id_value) VALUES ('wikibase-item','1') ON DUPLICATE KEY UPDATE id_value = LAST_INSERT_ID(id_value + 1)

So we will need another different approach in order to make this stop happening as the small optimization didn't cut it.

The next thing would be T194299#4713654
Looking at T194299#4715287 it sounds like right now we should be aiming to use CONN_TRX_AUTO still.
@aaron is that correct? We can just add a note next to the use stating that we don't actually care about the TRX mode.

Addshore moved this task from Incoming to Needs Work on the Wikidata-Campsite board.
Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)
aaron added a comment.Dec 21 2018, 6:10 PM

That sounds right.

Addshore removed Addshore as the assignee of this task.Jan 9 2019, 9:37 AM
Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)Feb 4 2019, 9:52 AM
Addshore closed this task as Resolved.Feb 22 2019, 11:23 AM
Addshore claimed this task.
Addshore moved this task from Found longer ago to Resolved on the Wikimedia-production-error board.

This should now be resolved.
I don't see any errors in the log since the deployment.

Change 432156 abandoned by Lucas Werkmeister (WMDE):
Retry SELECT FOR UPDATE in SqlIdGenerator once

Reason:
This was superseded by other efforts, mainly I4fd46adae1, I believe.

https://gerrit.wikimedia.org/r/432156

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM