Page MenuHomePhabricator

New item fails (Special and WEF tool)
Closed, ResolvedPublic

Description

When trying to creat a new item theough Special:NewItem it fails with the message

A database query error has occurred. This may indicate a bug in the software.
[WjosPApAEKsAAEHeEl8AAAAA] 2017-12-20 09:24:28: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Had previously tried to create item with WEF tool which had also failed a few times.

Event Timeline

tried a separate PC and got same resulting failure

after a big gap in RecentChanges there is a trickle of item creations now occurring.

I was able to create.

I see lots of database errors like this in logstash:

A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  page_latest  FROM `page`    WHERE page_id = '47348009' AND page_namespace = '0' AND page_title = 'Q46210030'  LIMIT 1   FOR UPDATE
Function: WikiPage::lockAndGetLatest
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.25)

#0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(991): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1374): Wikimedia\Rdbms\Database->query(string, string)
#2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1200): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
#3 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(2994): Wikimedia\Rdbms\Database->selectField(string, string, array, string, array)
#4 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(1741): WikiPage->lockAndGetLatest()
#5 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(1659): WikiPage->doModify(Wikibase\ItemContent, integer, User, string, array)
#6 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Store/Sql/WikiPageEntityStore.php(292): WikiPage->doEditContent(Wikibase\ItemContent, string, integer, integer, User)
#7 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Store/Sql/WikiPageEntityStore.php(189): Wikibase\Repo\Store\WikiPageEntityStore->saveEntityContent(Wikibase\ItemContent, string, User, integer, integer)
#8 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity.php(712): Wikibase\Repo\Store\WikiPageEntityStore->saveEntity(Wikibase\DataModel\Entity\Item, string, User, integer, integer)
#9 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(351): Wikibase\EditEntity->attemptSave(Wikibase\DataModel\Entity\Item, string, integer, string)
#10 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Api/CreateClaim.php(102): Wikibase\Repo\Api\EntitySavingHelper->attemptSaveEntity(Wikibase\DataModel\Entity\Item, string)
#11 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(1582): Wikibase\Repo\Api\CreateClaim->execute()
#12 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(544): ApiMain->executeAction()
#13 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(515): ApiMain->executeActionWithErrorHandling()
#14 /srv/mediawiki/php-1.31.0-wmf.12/api.php(94): ApiMain->execute()
#15 /srv/mediawiki/w/api.php(3): include(string)
#16 {main}

In SAL, I see depooling of some s5 hosts and the lag doesn't sound well. I think DBAs need to take a look

I see lots of database errors like this in logstash:

A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  page_latest  FROM `page`    WHERE page_id = '47348009' AND page_namespace = '0' AND page_title = 'Q46210030'  LIMIT 1   FOR UPDATE
Function: WikiPage::lockAndGetLatest
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.25)

#0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(991): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1374): Wikimedia\Rdbms\Database->query(string, string)
#2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1200): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
#3 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(2994): Wikimedia\Rdbms\Database->selectField(string, string, array, string, array)
#4 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(1741): WikiPage->lockAndGetLatest()
#5 /srv/mediawiki/php-1.31.0-wmf.12/includes/page/WikiPage.php(1659): WikiPage->doModify(Wikibase\ItemContent, integer, User, string, array)
#6 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Store/Sql/WikiPageEntityStore.php(292): WikiPage->doEditContent(Wikibase\ItemContent, string, integer, integer, User)
#7 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Store/Sql/WikiPageEntityStore.php(189): Wikibase\Repo\Store\WikiPageEntityStore->saveEntityContent(Wikibase\ItemContent, string, User, integer, integer)
#8 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity.php(712): Wikibase\Repo\Store\WikiPageEntityStore->saveEntity(Wikibase\DataModel\Entity\Item, string, User, integer, integer)
#9 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(351): Wikibase\EditEntity->attemptSave(Wikibase\DataModel\Entity\Item, string, integer, string)
#10 /srv/mediawiki/php-1.31.0-wmf.12/extensions/Wikibase/repo/includes/Api/CreateClaim.php(102): Wikibase\Repo\Api\EntitySavingHelper->attemptSaveEntity(Wikibase\DataModel\Entity\Item, string)
#11 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(1582): Wikibase\Repo\Api\CreateClaim->execute()
#12 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(544): ApiMain->executeAction()
#13 /srv/mediawiki/php-1.31.0-wmf.12/includes/api/ApiMain.php(515): ApiMain->executeActionWithErrorHandling()
#14 /srv/mediawiki/php-1.31.0-wmf.12/api.php(94): ApiMain->execute()
#15 /srv/mediawiki/w/api.php(3): include(string)
#16 {main}

In SAL, I see depooling of some s5 hosts and the lag doesn't sound well. I think DBAs need to take a look

That lag was from a depooled host.

if this is something that may recur, may I ask for the coordination of a more informative error message, or something that tells me to stop and come back soon?

The host that was complaining was 10.64.48.25 which is s5 master.
I was doing some SELECTs on the master as part of T161294 (which roughly matches those times). However, those SELECTs were on change_tag and tag_summary.
However the SELECTs are pretty fast (compare.py)

if this is something that may recur, may I ask for the coordination of a more informative error message, or something that tells me to stop and come back soon?

Might be related to T183341#3851064 - looks like it never happened again after that period of time, no?

The LIMIT 1 FOR UPDATE (plus what marostegui comments) indicates that is not a lag problem, but a contention problem (Error: 1205 Lock wait timeout exceeded)- many items wanting to lock the same rows at the same time. There is nothing for the DBAs to do here, code should avoid contention (many locks or long locks to avoid errors). If many items have to be edited at the same time, for some reason, locking or coordination between threads should happen somehow outside of the database- the database is doing exactly what it was asked "lock this row exclusively and wait if it is already locked".

@Billinghurst In non-technical terms, it seems to be an overload of some kind- something that normally doesn't happen, but that the code should avoid from happening in any case (e.g. it could be an unrelated bot creating items more quickly than expected). I will ask developers will have a look at it and see if it can be avoided so at least it doesn't happen so frequently.

If at some point in the future you find problems like this, try again, 1 in a million requests is normal for them to fail; however, if it happens consistently or repeatedly please do as you did and report it here. Thanks you for the report!

@Ladsgroup After looking at database logs, I can confirm there is something really bad with the wikidata new item logic- each request has to lock the table, which esentially serialized creations, puts a limit on the number of items that can be created per second (in a bad way- consuming resources-, not in a good, controlled way), and creates huge contention. See the top 2 queries of the past 10 hours (spike happened at the time for the report). Please escalate, this is a quite ugly bug and a real performance hog:

Hits 	Tmax 	Tavg 	Tsum 	Hosts 	Users 	Schemas
292	15	9	2,694	db1070	wikiuser	wikidatawiki
SELECT /* Wikibase\SqlIdGenerator::generateNewId */ id_value FROM `wb_id_counters` WHERE id_type = 'wikibase-item' LIMIT 1 FOR UPDATE /* ed91e42b47c2b64365221161239e8e4b db1070 wikidatawiki 3s */
55	16	10	569	db1070	wikiuser	wikidatawiki
SELECT /* WikiPage::lockAndGetLatest */ page_latest FROM `page` WHERE page_id = '923832' AND page_namespace = '0' AND page_title = 'Q974144' LIMIT 1 FOR UPDATE /* 3dafdbceffd65e9535ff1dcc6f43281b db1070 wikidatawiki 11s */

fwiw the phabricator ticket came on the seventh fail in however many minutes. 3 using tool, then 2 using create item; ask in IRC, get response try again through special item on 2 PCs, so call that at least 10-15 minute period of occurrence, and that would only be minimum if I started in coincidence with it starting.

all unscientific timing

@Billinghurst I was actually praising you for the report, for taking the time, and saying sorry it happened :-)

@Ladsgroup Without looking, I imagine a bot overloaded the creation, something that should be easy to check on recentchanges. Of course, that doesn't take away from avoiding or reducing the lock as much as possible.

@Billinghurst I was actually praising you for the report, for taking the time, and saying sorry it happened :-)

Yep, also thought that you were asking something else though, clearly not.

mark triaged this task as High priority.Dec 20 2017, 1:34 PM
mark subscribed.

Yeah, it seems a bot is making 70 items per minute: https://www.wikidata.org/w/index.php?title=Special:NewPages&dir=prev&offset=20171220133138&limit=500
70 edits per minute is okay-ish but I guess when it comes to making a page, it might cause trouble. I think two way I can think of to reduce the load: 1- Move updating tables to a deffered update and unlock the wb_id_counters table as soon as possible. 2- Move away from using wb_id_counters table and use concurrency-supporting solutions, I read about those a while ago, I need to read them again and see what can I do.

In the mean time, I should give "User:Research Bot" a notice to slow down.

Yes, the slowdown should avoid issues until a fix is conceived.

Perhaps we should enforce rate limits for edit and creation on wikidata. The entry in $wgRateLimits that affects edits on all sitesis currently:

		'edit' => [
			// 8 ed./min per each non-autoconfirmed, or group thereof from same IP
			'ip' => [ 8, 60 ],
			'newbie' => [ 8, 60 ],
		],

so, for regular users (and bots), there is no limit on editing (and no separate limit is enforced on creation - core doesn't even check such a limit, but wikibase does).

I propose to change this for wikidata.org to the following:

 	'+wikidatawiki' => [
		'edit' => [
			// 8 ed./min per each non-autoconfirmed, or group thereof from same IP
			'ip' => [ 8, 60 ],
			'newbie' => [ 8, 60 ],
			// 100 ed./min for regular users (including bots and sysops)
			'user' => [ 100, 60 ],
		],

		'create' => [ 
			// 4 creations/min per each non-autoconfirmed, or group thereof from same IP
			'ip' => [ 4, 60 ],
			'newbie' => [ 4, 60 ],
			// 10 creations/min for regular users (including bots and sysops)
			'user' => [ 10, 60 ],
		],
 	],

A limit of 10 may be too restrictive though. Perhaps it would make sense to introduce separate rate limits for bots. But creating more than one item per second seems excessive in any case.

For non-bot users I think a limit of 15 per minute would be okay. It's a way to encourage people who make more edits to seek bot approval. I would have to stop time but in some cases, 10 might be reached by smart humans.

Could there be an option that a limit only clicks in when there's no free server capacity? It seems to me like we need a limit for bots at times where no free server capacity is available but don't need it other times.

Lydia_Pintscher claimed this task.

I have created T184948 for the limit. I am closing this ticket as I think there is nothing else to do here.