Page MenuHomePhabricator

Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file
Closed, DeclinedPublic

Description

While trying to undelete a file on Commons (File:Soldier Sikh LI INF of the Indian Army practices aim during MALABAR 2006.jpg), I got this error:

To avoid creating high replication lag, this transaction was aborted because the write duration (3.8845181560516) exceeded the 3 second limit. If you are changing many items at once, try doing multiple smaller operations instead.

[WrIzDQpAICoAABGltIQAAACW] 2018-03-21 10:25:27: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError"

A similar message is mentioned on T63122, and T150702 seems to be related.

Event Timeline

Yann created this task.Mar 21 2018, 10:31 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2018, 10:31 AM
jcrespo added a subscriber: jcrespo.

Adding performance for triaging purposes. It could be later sent to multimedia or someone else more familiar with that code.

Aklapper renamed this task from Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" to Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to delete a file.Mar 21 2018, 12:18 PM
Yann renamed this task from Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to delete a file to Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file.Mar 21 2018, 12:39 PM

Definitely not a storage problem (swift), if you want to guess the triage and not wait for performance to do it, this is a multimedia or editing team issue (probably undeleting needs batching or smaller transactions).

Krinkle triaged this task as Unbreak Now! priority.EditedMar 21 2018, 9:58 PM
Krinkle added a subscriber: Krinkle.

Details from Logstash (produced 3 entries in mediawiki-errors for the same web request)

Logstash
timestamp: 2018-03-21T10:25:27
method: POST
type:mediawiki channel:exception
url: /w/index.php?title=Special:Undelete&action=submit

Wikimedia\Rdbms\DBTransactionSizeError
Transaction spent 3.8845181560516 second(s) in writes, exceeding the limit of 3.
[...]
#9 /srv/mediawiki/php-1.31.0-wmf.25/includes/MediaWiki.php(602): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string, array)
#10 /srv/mediawiki/php-1.31.0-wmf.25/includes/MediaWiki.php(575): MediaWiki::preOutputCommit(RequestContext, Closure$MediaWiki::main;328)
#11 /srv/mediawiki/php-1.31.0-wmf.25/includes/MediaWiki.php(877): MediaWiki->doPreOutputCommit(Closure$MediaWiki::main;328)
#12 /srv/mediawiki/php-1.31.0-wmf.25/includes/MediaWiki.php(524): MediaWiki->main()
#13 /srv/mediawiki/php-1.31.0-wmf.25/index.php(42): MediaWiki->run()
#14 /srv/mediawiki/w/index.php(3): include(string)

It seems user Yann on Commons was ultimately able to succeed the action when trying again, 9 minutes later (per Specal:Log).

More from Logstash:

  • before March 20, DBTransactionSizeError exceptions happened 5-15x per 3 hours.
  • a spike started around 20 March 13:00, peaked around 18:00 at 500x per 3 hours.
  • came back down, stabalised at 20-30x per 3 hours since 21 March 03:00 (UTC).


So apart from the spike, it seems consistently 2-3x more common now than 2 days ago.

Tagging as UBN regression against wmf.26 until we know more.

Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptMar 21 2018, 9:58 PM
Krinkle lowered the priority of this task from Unbreak Now! to High.Mar 26 2018, 8:23 PM
Restricted Application added a project: Commons. · View Herald TranscriptMar 26 2018, 8:46 PM
Ramsey-WMF moved this task from Untriaged to Tracking on the Multimedia board.Mar 27 2018, 9:38 PM
Imarlier assigned this task to Krinkle.Jun 21 2018, 11:18 AM
Imarlier added a subscriber: Imarlier.

@Krinkle Check if this is still happening, followup if so.

Still seeing ~ 200 hits per day in Logstash for (DBTransactionSizeError) just within wiki:commonswiki AND method:POST.

Krinkle reassigned this task from Krinkle to aaron.Jun 23 2018, 12:21 AM
aaron added a subscriber: Tgr.Jul 2 2018, 11:28 AM
aaron added a comment.Jul 2 2018, 11:34 AM

Still seeing ~ 200 hits per day in Logstash for (DBTransactionSizeError) just within wiki:commonswiki AND method:POST.

I see about 3/hour or 24/day via << +wiki:commonswiki +http_method:POST +"DBTransactionSizeError" >>.

Change 443452 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: give TransactionProfiler write time and row count for ROLLBACK

https://gerrit.wikimedia.org/r/443452

Change 443472 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Change LOCK IN SHARE MODE to FOR UPDATE for local file restoration

https://gerrit.wikimedia.org/r/443472

aaron removed aaron as the assignee of this task.Jul 16 2018, 8:37 PM
aaron added a subscriber: aaron.

Change 443472 abandoned by Aaron Schulz:
Change LOCK IN SHARE MODE to FOR UPDATE for local file restoration

https://gerrit.wikimedia.org/r/443472

Restricted Application added a project: Multimedia. · View Herald TranscriptJul 16 2018, 8:54 PM

@tstarling I tagged CPT as it seems Aaron's patch wasn't right according to @Anomie. Could use a fresh perspective on this.

On second thought, while I see plenty of DBTransactionSizeError AND wiki:commonswiki in Logstash, they're with urls /w/api.php, Special:Wachlist or Special:MovePage. I don't see any from Special:Undelete in the last 30 days, which is what this task was about. I'm closing this specific task as such.

  • Separate task for watchlist issue: T171898.
  • Separate task for general undelete transaction size problem: T198176.
Krinkle closed this task as Declined.Jul 17 2018, 2:24 AM
Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptJul 17 2018, 2:24 AM

Change 443452 merged by jenkins-bot:
[mediawiki/core@master] rdbms: give TransactionProfiler write time and row count for ROLLBACK

https://gerrit.wikimedia.org/r/443452