Page MenuHomePhabricator

MessageGroupStats caused database query issues
Closed, DeclinedPublicPRODUCTION ERROR

Description

There seems to be database contention between 16:10 and midnight UTC on the 26th of june:

https://logstash.wikimedia.org/goto/3bcfe493d8d19739bca928d1b413d4e9

Several MessageGroupStats-related errors (such as Wikimedia\Rdbms\DatabaseMysqlBase::lock failed to acquire lock 'MessageGroupStats:updates' seems to be happening on meta and mediawiki.org at that time. While in commonly used database rows that would point to an overload, on less commonly used functions, that would normally point to a bug.

Potentially related to T53410

Event Timeline

55 in mediawiki.org from /rpc/RunJobs.php?wiki=mediawikiwiki&type=TranslationsUpdateJob&maxtime=30&maxmem=300M
~4k in metawiki from /rpc/RunJobs.php?wiki=metawiki&type=MessageGroupStatesUpdaterJob&maxtime=60&maxmem=300M
~14k in metawiki from /w/api.php
~100 in metawiki from /wiki/Learning_and_Evaluation/...

Sounds like these were caused by large translatable page moves, perhaps related to T168591: Complex page move leaves some translation-related pages behind.

Krinkle subscribed.

Zero hits in Logstash in at least 30 days for either of "failed to acquire lock" or "TranslationsUpdateJob".

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:10 PM