Internal database error while saving translations
Closed, ResolvedPublic

Description

From the event logging I see ~150 instances of 'internal_api_error_DBQueryError'

To debug, some samples are given below

en->ta
tawiki
timestamp: 20160410071417
Source title: Laminar armour
{"servedby":"mw1115","error":{"code":"internal_api_error_DBQueryError","info":"[1d159c53da9eb0eeb620cb98] Database query error"},"errorCode":"internal_api_error_DBQueryError"}

en->ml
mlwiki
timestamp: 20160502174123
Source: Teacup
{"servedby":"mw1143","error":{"code":"internal_api_error_DBQueryError","info":"[VyeRQgpAEHsAAD5Z@8gAAAAV] Database query error"},"errorCode":"internal_api_error_DBQueryError"}

santhosh created this task.May 3 2016, 9:54 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 3 2016, 9:54 AM

The errors did not prevent translators to complete translation. Both of the above translations were successfully published

I can see two issues with investigating this bug further:

  1. As far as I can see, the trace doesn't say anything substantial except "internal_api_error_DBQueryError". Is the error itself or the line that triggered it logged elsewhere?
  2. What's the right way to measure the frequency of these errors? I currently do this by running https://gerrit.wikimedia.org/r/#/c/282228/7/bash/sorted_save_events.py , which groups by event_session. Should I group by something else?

As far as I can see, the trace doesn't say anything substantial except "internal_api_error_DBQueryError". Is the error itself or the line that triggered it logged elsewhere?

Logstash. Also see T129462: Internal API error for translation save API

What's the right way to measure the frequency of these errors? I currently do this by running https://gerrit.wikimedia.org/r/#/c/282228/7/bash/sorted_save_events.py , which groups by event_session. Should I group by something else?

group by unique translations - a unique translation is identified as combination of source language, target language and source title

Amire80 triaged this task as "Normal" priority.May 19 2016, 5:27 PM
Amire80 moved this task from Backlog to CX9 on the ContentTranslation board.
Arrbee raised the priority of this task from "Normal" to "High".

I located an error in logstash and details are as follows:

ContentTranslation\TranslationStorageManager::{closure} 10.64.16.20 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.16.20) SELECT * FROM cx_corpora WHERE cxc_translation_id = '178280' AND cxc_section_id = 'mwIg' AND cxc_origin = 'user' ORDER BY cxc_timestamp DESC LIMIT 1 FOR UPDATE

backtrace:

{"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/Database.php","line":901,"function":"reportQueryError","class":"DatabaseBase","type":"->","args":["string","integer","string","string","boolean"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/Database.php","line":1234,"function":"query","class":"DatabaseBase","type":"->","args":["string","string"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/Database.php","line":1293,"function":"select","class":"DatabaseBase","type":"->","args":["string","string","array","string","array","array"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/extensions/ContentTranslation/includes/TranslationStorageManager.php","line":127,"function":"selectRow","class":"DatabaseBase","type":"->","args":["string","string","array","string","array"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/extensions/ContentTranslation/includes/TranslationStorageManager.php","line":94,"function":"doFind","class":"ContentTranslation\\TranslationStorageManager","type":"::","args":["DatabaseMysqli","array","array","string"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/Database.php","line":2567,"function":"Closure$ContentTranslation\\TranslationStorageManager::save","args":["DatabaseMysqli","string"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/DBConnRef.php","line":39,"function":"doAtomicSection","class":"DatabaseBase","type":"->","args":["string","Closure$ContentTranslation\\TranslationStorageManager::save;363061839"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/db/DBConnRef.php","line":437,"function":"__call","class":"DBConnRef","type":"->","args":["string","array"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/extensions/ContentTranslation/includes/TranslationStorageManager.php","line":101,"function":"doAtomicSection","class":"DBConnRef","type":"->","args":["string","Closure$ContentTranslation\\TranslationStorageManager::save;363061839"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/extensions/ContentTranslation/api/ApiContentTranslationSave.php","line":216,"function":"save","class":"ContentTranslation\\TranslationStorageManager","type":"::","args":["ContentTranslation\\TranslationUnit"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/extensions/ContentTranslation/api/ApiContentTranslationSave.php","line":59,"function":"saveTranslationUnits","class":"ApiContentTranslationSave","type":"->","args":["array"]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/api/ApiMain.php","line":1373,"function":"execute","class":"ApiContentTranslationSave","type":"->","args":[]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/api/ApiMain.php","line":469,"function":"executeAction","class":"ApiMain","type":"->","args":[]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/includes/api/ApiMain.php","line":440,"function":"executeActionWithErrorHandling","class":"ApiMain","type":"->","args":[]}, {"file":"/srv/mediawiki/php-1.28.0-wmf.8/api.php","line":83,"function":"execute","class":"ApiMain","type":"->","args":[]}, {"file":"/srv/mediawiki/w/api.php","line":3,"function":"include","args":["string"]}

Change 305004 had a related patch set uploaded (by Aaron Schulz):
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305004

Change 305004 merged by jenkins-bot:
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305004

Change 305188 had a related patch set uploaded (by KartikMistry):
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305188

Change 305190 had a related patch set uploaded (by KartikMistry):
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305190

Change 305190 merged by jenkins-bot:
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305190

Change 305188 merged by jenkins-bot:
Avoid deadlock patterns in cx_corpora updates

https://gerrit.wikimedia.org/r/305188

Mentioned in SAL [2016-08-17T15:59:39Z] <thcipriani@tin> Synchronized php-1.28.0-wmf.15/extensions/ContentTranslation: SWAT: [[gerrit:305190|Avoid deadlock patterns in cx_corpora updates (T134245)]] (duration: 00m 52s)

Mentioned in SAL [2016-08-17T16:02:46Z] <thcipriani@tin> Synchronized php-1.28.0-wmf.14/extensions/ContentTranslation: SWAT: [[gerrit:305188|Avoid deadlock patterns in cx_corpora updates (T134245)]] (duration: 00m 50s)

santhosh closed this task as "Resolved".Aug 22 2016, 3:52 AM
santhosh claimed this task.

After this deployment, the issue disappeard.

Deadlock issue occurance graph from logstash