Page MenuHomePhabricator

excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1
Closed, DuplicatePublicPRODUCTION ERROR

Description

Error

Request ID: AWanh-6v00on8STvSoRR

message
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.15)
trace
 	#0 /srv/mediawiki/php-1.33.0-wmf.1/includes/libs/rdbms/database/Database.php(1468): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.33.0-wmf.1/includes/libs/rdbms/database/Database.php(1228): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.33.0-wmf.1/includes/libs/rdbms/database/Database.php(1695): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.33.0-wmf.1/includes/libs/rdbms/database/Database.php(1786): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
#4 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/SqlIdGenerator.php(71): Wikimedia\Rdbms\Database->selectRow(string, string, array, string, array)
#5 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/SqlIdGenerator.php(46): Wikibase\SqlIdGenerator->generateNewId(Wikimedia\Rdbms\DatabaseMysqli, string)
#6 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/WikiPageEntityStore.php(120): Wikibase\SqlIdGenerator->getNewId(string)
#7 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/lib/includes/Store/TypeDispatchingEntityStore.php(59): Wikibase\Repo\Store\WikiPageEntityStore->assignFreshId(Wikibase\DataModel\Entity\Item)
#8 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(269): Wikibase\Lib\Store\TypeDispatchingEntityStore->assignFreshId(Wikibase\DataModel\Entity\Item)
#9 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(197): Wikibase\Repo\Api\EntitySavingHelper->createEntity(string, NULL)
#10 /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/repo/includes/Api/ModifyEntity.php(277): Wikibase\Repo\Api\EntitySavingHelper->loadEntity()
#11 /srv/mediawiki/php-1.33.0-wmf.1/includes/api/ApiMain.php(1570): Wikibase\Repo\Api\ModifyEntity->execute()
#12 /srv/mediawiki/php-1.33.0-wmf.1/includes/api/ApiMain.php(531): ApiMain->executeAction()
#13 /srv/mediawiki/php-1.33.0-wmf.1/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#14 /srv/mediawiki/php-1.33.0-wmf.1/api.php(87): ApiMain->execute()
#15 /srv/mediawiki/w/api.php(3): include(string)
#16 {main}

Impact

  • high error rate in logstash.
  • train rolled back to group0

Notes

Event Timeline

mmodell triaged this task as Unbreak Now! priority.Oct 24 2018, 7:27 PM
mmodell updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-10-24T19:31:50Z] <twentyafterfour> the errors were all coming from wmf.26 but the error rate skyrocketed after deploying 1.33.0-wmf.1 to group1 so there is some query in the new branch which is holding a lock. T207881

Change 469518 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Use a pre-commit hook for change_tag_def count updates

https://gerrit.wikimedia.org/r/469518

Change 469518 merged by jenkins-bot:
[mediawiki/core@master] Use a pre-commit hook for change_tag_def count updates

https://gerrit.wikimedia.org/r/469518

Banyek lowered the priority of this task from Unbreak Now! to Medium.Oct 25 2018, 8:30 AM
jcrespo raised the priority of this task from Medium to Unbreak Now!.Oct 25 2018, 8:32 AM

Change 469653 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@wmf/1.33.0-wmf.1] Use a pre-commit hook for change_tag_def count updates

https://gerrit.wikimedia.org/r/469653

Change 469653 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.1] Use a pre-commit hook for change_tag_def count updates

https://gerrit.wikimedia.org/r/469653

I'm seeing a lot of occurrences of this with Function: Wikibase\SqlIdGenerator::generateNewId

and also a lot with

Function: User::loadFromDatabase

This is happening even more now that I did a full scap with wmf.1 still on group0 only.

This is for the past 1 hour, during which time I ran a full scap sync:

Screenshot from 2018-10-25 15-00-48.png (595×1 px, 21 KB)

I've also seen periodic alerts for apache and hhvm:

PROBLEM - Apache HTTP on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
14:55:29	<+icinga-wm>	PROBLEM - HHVM rendering on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
14:56:29	<+icinga-wm>	RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.038 second response time
14:56:30	<+icinga-wm>	RECOVERY - HHVM rendering on mw1270 is OK: HTTP OK: HTTP/1.1 200 OK - 74206 bytes in 0.103 second response time

and just now:

+icinga-wm>	PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen

Promoted group1 to wmf.1 again and now even more lock wait timeouts for Function: Wikibase\SqlIdGenerator::generateNewId

mmodell lowered the priority of this task from Unbreak Now! to High.Oct 26 2018, 12:53 AM

This shouldn't be a blocker for the train, we gets spikes of these sometimes when people try to create too many entities in quick succession.
At some point in the future we will have to look at solving this

Change 470300 had a related patch set uploaded (by Paladox; owner: Aaron Schulz):
[mediawiki/core@REL1_32] Use a pre-commit hook for change_tag_def count updates

https://gerrit.wikimedia.org/r/470300

Change 470300 abandoned by Krinkle:
Use a pre-commit hook for change_tag_def count updates

Reason:
Unexplained backport proposal.

https://gerrit.wikimedia.org/r/470300

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM