In https://github.com/wikimedia/mediawiki/commit/29d80335f9060427ed79ddca21bff5a5addd05ae it removed a qqq message api-error-was-deleted - as a result, core tests stated failing e.g. https://integration.wikimedia.org/ci/job/mediawiki-core-npm-node-4.3/623/console
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add stupidly simple check to alert if JobQueue is not running | translatewiki | master | +3 -2 | |
Unbreak tests | mediawiki/core | master | +1 -0 |
Related Objects
Event Timeline
@Nemo_bis There is T48833: Do not override additions in mediawiki extension export, is that what you mean with the general case?
In this case there was long time between the addition of and removal so it shouldn't have happened. It should only be a risk for commits merged between twn import and twn export which I think is usually less than hour.
@Raymond do you recall anything special about this case?
@Nikerabbit No idea what/why this happens. The new message was added on April, 5th with f3b2f2023a5978bc2585b7f9bcd9ccd41409be3a but neither en (https://translatewiki.net/wiki/MediaWiki:Api-error-was-deleted/en ) nor qqq ( https://translatewiki.net/wiki/MediaWiki:Api-error-was-deleted/qqq ) were imported into twn. Therefore the removal of qqq with the next export run was the logical - but wrong - follow up by our scripts.
Original change merged
Change has been successfully merged into the git repository. 04-06 05:06 UTC
My autosync which should not touch mediawiki at all runs:
[06/Apr/2016 04:50:56 +0000] /betawiki (ProcessMessageChanges) /srv/mediawiki/targets/production/extensions/Translate/scripts/processMessageChanges.php --safe-import --group=* --skipgroup=ext-*,core,mediawiki*
Raimond's autosync runs:
[06/Apr/2016 06:06:41 +0000] /betawiki (ProcessMessageChanges) /srv/mediawiki/targets/production/extensions/Translate/scripts/processMessageChanges.php --name mediawiki --group=core,ext-*,mediawiki* --quiet
Raimond does export:
[06/Apr/2016 20:32:52 +0000] raymond/raymond (CommandlineExport) /srv/mediawiki/targets/production/extensions/Translate/scripts/export.php --target . --group core --lang=* --skip test,aeb,be-x-old,crh,dk,en,fiu-vro,gan,gom,got,hif,kbd,kk,kk-cn,iu,kk-kz,kk-tr,ko-kp,ku,ku-arab,no,ruq,simple,sr,tg,tp,tt,ug,zh,zh-classical,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-mo,zh-my,zh-tw,zh-yue,bbc,ady --threshold 18 --hours 200
Patch uploaded and merged:
Uploaded 2016-04-06 20:35 Updated 2016-04-06 20:56
twn:/resources/caches/translatewiki.net$ TZ=utc ls *mediawiki*cdb* -l --full-time | tail -n 4 -rw-rw-r--+ 1 betawiki users 3567 2016-04-05 20:07:44.629254000 +0000 messagechanges.mediawiki.cdb-1459886901 -rw-rw-r--+ 1 betawiki users 4598 2016-04-06 06:12:48.537999000 +0000 messagechanges.mediawiki.cdb-1459924058 -rw-rw-r--+ 1 betawiki users 5273 2016-04-06 11:50:47.969999000 +0000 messagechanges.mediawiki.cdb-1459943470 -rw-rw-r--+ 1 betawiki users 2848 2016-04-07 05:56:37.033999000 +0000 messagechanges.mediawiki.cdb-1460008622 twn:/resources/caches/translatewiki.net$ TZ=utc ls *mediawiki*cdb* -l --full-time --time=ctime | tail -n 4 -rw-rw-r--+ 1 betawiki users 3567 2016-04-05 20:08:21.321254000 +0000 messagechanges.mediawiki.cdb-1459886901 -rw-rw-r--+ 1 betawiki users 4598 2016-04-06 06:27:38.121999000 +0000 messagechanges.mediawiki.cdb-1459924058 -rw-rw-r--+ 1 betawiki users 5273 2016-04-06 11:51:10.289999000 +0000 messagechanges.mediawiki.cdb-1459943470 -rw-rw-r--+ 1 betawiki users 2848 2016-04-07 05:57:02.617999000 +0000 messagechanges.mediawiki.cdb-1460008622
From here messagechanges.mediawiki.cdb-1459924058 looks to be the file where the changes where supposed to be. The unix timestamp is 2016-04-06 06:27:38 UTC which means the time when the file was processed and is same as ctime. The date in the listing should be when it was written out be the script.
grep confirms this is the case
grep api-error-was-deleted *mediawiki*cdb* Binary file messagechanges.mediawiki.cdb-1459924058 matches
I used the following script to dump the file:
$x = \Cdb\Reader::open( 'messagechanges.mediawiki.cdb-1459924058' ); $x->firstkey(); while( ($k = $x->nextkey()) !== false ) { var_dump( $k, unserialize( $x->get( $k ) ) ); }
The output is
string(4) "core" array(3) { ["en"]=> array(1) { ["addition"]=> array(1) { [0]=> array(2) { ["key"]=> string(21) "api-error-was-deleted" ["content"]=> string(74) "A file of this name has been previously uploaded and subsequently deleted." } } } ["tl"]=> array(1) { ["change"]=> array(1) { [0]=> array(2) { ["key"]=> string(6) "upload" ["content"]=> string(22) "Mag-upload ng talaksan" } } } ["qqq"]=> array(1) { ["addition"]=> array(1) { [0]=> array(2) { ["key"]=> string(21) "api-error-was-deleted" ["content"]=> string(78) "API error message that can be used for client side localisation of API errors." } } } } string(17) "ext-googlegeocode" [SNIP] string(16) "ext-kartographer" [SNIP]
So we can see that the change was picked up by the system. What happened after that is still mystery.
https://translatewiki.net/wiki/MediaWiki:Api-error-was-deleted/en (nor qqq) still doesn't exist.
Okay, JobQueue has been broken for a while:
timetamp=20160406045841 (id=4554446,timestamp=20160406045841) t=0 good <!DOCTYPE html> <html><head><title>Database error - translatewiki.net</title><style>body { font-family: sans-serif; margin: 0; padding: 0.5em 2em; }</style></head><body> <h1>Sorry! This site is experiencing technical difficulties.</h1><p>Try waiting a few minutes and reloading.</p><p><small>(Cannot access the database: <span dir=ltr>Can't connect to MySQL server on ' 127.0.0.1' (111) (127.0.0.1:3306)</span>)</small></p><hr /><div style="margin: 1.5em">You can try searching via Google in the meantime.<br /> <small>Note that their indexes of our content may be out of date.</small> </div> <form method="get" action="//www.google.com/search" id="googlesearch"> <input type="hidden" name="domains" value="https://translatewiki.net" /> <input type="hidden" name="num" value="50" /> <input type="hidden" name="ie" value="UTF-8" /> <input type="hidden" name="oe" value="UTF-8" /> <input type="text" name="q" size="31" maxlength="255" value="" /> <input type="submit" name="btnG" value="Search" /> <p> <label><input type="radio" name="sitesearch" value="https://translatewiki.net" checked="checked" />translatewiki.net</label> <label><input type="radio" name="sitesearch" value="" />WWW</label> </p> </form></body></html> <!DOCTYPE html> <html><head><title>Internal error - translatewiki.net</title><style>body { font-family: sans-serif; margin: 0; padding: 0.5em 2em; }</style></head><body> <p>[2bb61ce83432259b8f7dbed4] [no req] JobQueueConnectionError from line 742 of /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobQueueDB.php: DBConnectionError:DB connection error: Can' t connect to MySQL server on '127.0.0.1' (111) (127.0.0.1:3306)</p><p>Backtrace:</p><p>#0 /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobQueueDB.php(595): JobQueueDB->getSlaveDB()<br /> #1 /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobQueue.php(640): JobQueueDB->doGetSiblingQueuesWithJobs(array)<br /> #2 /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobQueueGroup.php(313): JobQueue->getSiblingQueuesWithJobs(array)<br /> #3 /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobQueueGroup.php(197): JobQueueGroup->getQueuesWithJobs()<br /> #4 /srv/mediawiki/tags/2016-03-31_19:10:56/includes/jobqueue/JobRunner.php(154): JobQueueGroup->pop(integer, integer, array)<br /> #5 /srv/mediawiki/tags/2016-03-31_19:10:56/maintenance/runJobs.php(93): JobRunner->run(array)<br /> #6 /srv/mediawiki/tags/2016-03-31_19:10:56/maintenance/doMaintenance.php(111): RunJobs->execute()<br /> #7 /srv/mediawiki/tags/2016-03-31_19:10:56/maintenance/runJobs.php(127): include(string)<br /> #8 {main}</p>
sudo service mw-jobrunner status
mw-jobrunner stop/waiting
It is not shown in service --status-all at all.
Nothing in the logs, is upstart really not logging service failures anywhere?
Anyway, started jobrunner manually. Looks like we need some kind of monitoring to prevent this happening in the future.
Change 282485 had a related patch set uploaded (by Nikerabbit):
Add stupidly simple check to alert if JobQueue is not running
Change 282485 merged by jenkins-bot:
Add stupidly simple check to alert if JobQueue is not running
This issue should not happen again. General issue still exists but has a separate task.
Tests are broken again today. This time because userjsispublic and usercssispublic have gone missing in rMWb3108317000f: Localisation updates from https://translatewiki.net..
Fixed now
https://gerrit.wikimedia.org/r/#/c/306499/1/languages/i18n/qqq.json with as
latest regular export.