Page MenuHomePhabricator

Rebuilding l10n cache fails for train
Closed, ResolvedPublicPRODUCTION ERROR

Description

When I attempted to deploy train to testwikis today, it failed. The command I ran was:

scap sync "testwikis to 1.36.0-wmf.9"

The end of the output is below.

This is preventing train from moving anywhere, which is why I'm marking it Unbreak-Now.

08:05:48 Bootstrapping l10n cache for 1.36.0-wmf.9
08:05:48 Last output:
MWException from line 512 of /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/localisation/LocalisationCache.php: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.
#0 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/localisation/LocalisationCache.php(370): LocalisationCache->initLanguage('en')
#1 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/localisation/LocalisationCache.php(311): LocalisationCache->loadItem('en', 'namespaceGender...')
#2 /srv/mediawiki-staging/php-1.36.0-wmf.9/languages/Language.php(721): LocalisationCache->getItem('en', 'namespaceGender...')
#3 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/title/MediaWikiTitleCodec.php(102): Language->needsGenderDistinction()
#4 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/title/MediaWikiTitleCodec.php(139): MediaWikiTitleCodec->getNamespaceName(8, 'Gadgets-definit...')
#5 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/title/MediaWikiTitleCodec.php(261): MediaWikiTitleCodec->formatTitle(8, 'Gadgets-definit...', '', '')
#6 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/LinkCache.php(257): MediaWikiTitleCodec->getPrefixedDBkey(Object(Title))
#7 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Title.php(3139): LinkCache->addLinkObj(Object(Title))
#8 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Revision/RevisionStore.php(1620): Title->getArticleID()
#9 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Revision/RevisionStore.php(1543): MediaWiki\Revision\RevisionStore->ensureRevisionRowMatchesTitle(Object(stdClass), Object(Title))
#10 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Revision/RevisionStore.php(1425): MediaWiki\Revision\RevisionStore->newRevisionFromRowAndSlots(Object(stdClass), NULL, 0, Object(Title), false)
#11 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Revision/RevisionStore.php(2371): MediaWiki\Revision\RevisionStore->newRevisionFromRow(Object(stdClass), 0, Object(Title))
#12 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/Revision/RevisionStore.php(1143): MediaWiki\Revision\RevisionStore->loadRevisionFromConds(Object(Wikimedia\Rdbms\DBConnRef), Array, 0, Object(Title))
#13 /srv/mediawiki-staging/php-1.36.0-wmf.9/extensions/Gadgets/includes/MediaWikiGadgetsDefinitionRepo.php(142): MediaWiki\Revision\RevisionStore->getRevisionByTitle(Object(Title))
#14 /srv/mediawiki-staging/php-1.36.0-wmf.9/extensions/Gadgets/includes/MediaWikiGadgetsDefinitionRepo.php(109): MediaWikiGadgetsDefinitionRepo->fetchStructuredList()
#15 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/libs/objectcache/wancache/WANObjectCache.php(1533): MediaWikiGadgetsDefinitionRepo->{closure}(false, 86400, Array, NULL, Array)
#16 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache->fetchOrRegenerate('testwiki:gadget...', 86400, Object(Closure), Array, Array)
#17 /srv/mediawiki-staging/php-1.36.0-wmf.9/extensions/Gadgets/includes/MediaWikiGadgetsDefinitionRepo.php(116): WANObjectCache->getWithSetCallback('testwiki:gadget...', 86400, Object(Closure), Array)
#18 /srv/mediawiki-staging/php-1.36.0-wmf.9/extensions/Gadgets/includes/MediaWikiGadgetsDefinitionRepo.php(32): MediaWikiGadgetsDefinitionRepo->loadGadgets()
#19 /srv/mediawiki-staging/php-1.36.0-wmf.9/extensions/Gadgets/includes/GadgetHooks.php(204): MediaWikiGadgetsDefinitionRepo->getGadgetIds()
#20 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/HookContainer/HookContainer.php(330): GadgetHooks::registerModules(Object(ResourceLoader))
#21 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook('ResourceLoaderR...', Array, Array, Array)
#22 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/HookContainer/HookRunner.php(3248): MediaWiki\HookContainer\HookContainer->run('ResourceLoaderR...', Array, Array)
#23 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/ServiceWiring.php(1008): MediaWiki\HookContainer\HookRunner->onResourceLoaderRegisterModules(Object(ResourceLoader))
#24 /srv/mediawiki-staging/php-1.36.0-wmf.9/vendor/wikimedia/services/src/ServiceContainer.php(447): Wikimedia\Services\ServiceContainer->{closure}(Object(MediaWiki\MediaWikiServices))
#25 /srv/mediawiki-staging/php-1.36.0-wmf.9/vendor/wikimedia/services/src/ServiceContainer.php(416): Wikimedia\Services\ServiceContainer->createService('ResourceLoader')
#26 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/MediaWikiServices.php(1109): Wikimedia\Services\ServiceContainer->getService('ResourceLoader')
#27 /srv/mediawiki-staging/php-1.36.0-wmf.9/maintenance/rebuildLocalisationCache.php(109): MediaWiki\MediaWikiServices->getResourceLoader()
#28 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/localisation/LocalisationCache.php(1060): RebuildLocalisationCache->{closure}()
#29 /srv/mediawiki-staging/php-1.36.0-wmf.9/includes/cache/localisation/LocalisationCacheBulkLoad.php(100): LocalisationCache->recache('en')
#30 /srv/mediawiki-staging/php-1.36.0-wmf.9/maintenance/rebuildLocalisationCache.php(192): LocalisationCacheBulkLoad->recache('en')
#31 /srv/mediawiki-staging/php-1.36.0-wmf.9/maintenance/rebuildLocalisationCache.php(148): RebuildLocalisationCache->doRebuild(Array, Object(LocalisationCacheBulkLoad), false)
#32 /srv/mediawiki-staging/php-1.36.0-wmf.9/maintenance/doMaintenance.php(107): RebuildLocalisationCache->execute()
#33 /srv/mediawiki-staging/php-1.36.0-wmf.9/maintenance/rebuildLocalisationCache.php(211): require_once('/srv/mediawiki-...')
#34 /srv/mediawiki-staging/multiversion/MWScript.php(101): require_once('/srv/mediawiki-...')
#35 {main}
08:05:48 Finished l10n-update (duration: 04m 07s)
08:05:48 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 341, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 650, in main
    return super(Scap, self).main(*extra_args)
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 86, in main
    self._after_sync_common()
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 673, in _after_sync_common
    version, wikidb, self.verbose, self.config)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 402, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 626, in update_localization_cache
    wikidb, cache_dir, use_cores, lang='en', quiet=True)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 543, in _call_rebuildLocalisationCache
    'quiet': '--quiet' if quiet else ''
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 402, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 497, in sudo_check_call
    raise subprocess.CalledProcessError(proc.returncode, cmd)
CalledProcessError: Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en  --quiet' returned non-zero exit status 1
08:05:48 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 11m 10s)
Impact
  1. Notes

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Sep 15, 8:48 AM
LarsWirzenius triaged this task as Unbreak Now! priority.Tue, Sep 15, 8:48 AM

Per @hashar, we should have a test that ensures that this circular dependency isn't possible.

There haven't been any changes to Gadgets (https://www.mediawiki.org/wiki/MediaWiki_1.36/wmf.9/Changelog#Gadgets) so that seems unlikely.

hashar added a comment.EditedTue, Sep 15, 9:35 AM

rebuildLocalisationCache triggers the onResourceLoader hook which is registered by Gadgets. It eventually loads the content from https://test.wikipedia.org/wiki/MediaWiki:Gadgets-definition

There was a change on September 12 which made Twinkle gadget to be loaded for unconfirmed users, maybe that is related. https://test.wikipedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=449206&oldid=447054

The cache should be set by the time it starts message blob store purge in includes/cache/localisation/LocalisationCache.php(1060). Then the ResourceLoader service is constructed to do the purge, and that runs a hook, that does bunch of stuff that ends up requiring data from LocalisationCache, but now LocalisationCache claims it does not have data.

Just guessing: maybe writing the to cache fails, or there is another instance of the service with stale data.

Note, Scap has been upgraded on deploy1001 today:

/var/log/apt/history.log
Start-Date: 2020-09-15  11:17:27
Upgrade: scap:amd64 (3.14.0-1, 3.15.0-1)

Thought the train deploy failed before that (around 8:00 UTC).

Scap was upgraded long after train failed. The train failure is unrelated.

brennen moved this task from Backlog to Logs/Train on the User-brennen board.
brennen added a subscriber: brennen.
daniel added a subscriber: daniel.Tue, Sep 15, 8:08 PM

Are there known steps to reproduce this locally?

eprodromou assigned this task to daniel.Tue, Sep 15, 8:52 PM

Tentatively a repro case could be:

  • ssh to deploy1001
  • change testwiki version to 1.36.0-wmf.9 in /srv/mediawiki-staging/wikiversion.json (or maybe that should be /srv/mediawiki.
  • run the script writing to "/tmp/scap_l10n_$USER":
sudo -u www-data /usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_$USER" --store-class=LCStoreCDB --threads=30 --lang en  --quiet

When I looked at deploy1001, /srv/mediawiki-staging/php-1.36.0-wmf.9/cache/l10n/ was empty (and still is).

Scap invokes that command to have at least the en l10n cache generated with a comment stating mergeMessageFileList.php needs a l10n file. Once l10n_cache-en.cdb has been generated it merges the files and then trigger a recache.

Tentatively a repro case could be:

  • ssh to deploy1001
  • change testwiki version to 1.36.0-wmf.9 in /srv/mediawiki-staging/wikiversion.json (or maybe that should be /srv/mediawiki.

This won't work, you'd need to change /srv/mediawiki/wikiversion.php if you want a hack.

  • run the script writing to "/tmp/scap_l10n_$USER":
sudo -u www-data /usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_$USER" --store-class=LCStoreCDB --threads=30 --lang en  --quiet

When I looked at deploy1001, /srv/mediawiki-staging/php-1.36.0-wmf.9/cache/l10n/ was empty (and still is).

Scap invokes that command to have at least the en l10n cache generated with a comment stating mergeMessageFileList.php needs a l10n file. Once l10n_cache-en.cdb has been generated it merges the files and then trigger a recache.

I tried to reproduce this on mwdeploy1001 but it didn't work. I changed /srv/mediawiki/wikiversions.php and then ran

sudo -u www-data strace -e trace=open php /srv/mediawiki-staging/multiversion/MWScript.php rebuildLocalisationCache.php --wiki="testwiki" --outdir=/tmp/T262900-test --store-class=LCStoreCDB --lang en --force

Running it under strace let me confirm that it was running on the right version of MediaWiki. It was apparently successful. But the problem is pretty obvious from the stack trace so maybe we don't really need reproduction.

I was able to reproduce it by first erasing the gadgets-definition cache:

$ sudo -u www-data php /srv/mediawiki-staging/multiversion/MWScript.php eval.php --wiki=testwiki
> print $IP
/srv/mediawiki-staging/php-1.36.0-wmf.9
> $woc = MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache()
> $woc->delete($woc->makeKey('gadgets-definition',9,2));

$ sudo -u www-data php /srv/mediawiki-staging/multiversion/MWScript.php rebuildLocalisationCache.php --wiki="testwiki" --outdir=/tmp/T262900-test --store-class=LCStoreCDB --lang en --force

This hits the expected exception.

This is very similar to T231866. The fix for that bug was committed in December 2019 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/553170 . But that fixed the code in ServiceWiring.php without fixing the duplicated code in rebuildLocalisationCache.php. So I will apply the same fix to rebuildLocalisationCache.php.

Change 627641 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627641

Reproduced locally and confirmed the fix by faking part of the call stack:

diff --git a/includes/ServiceWiring.php b/includes/ServiceWiring.php
index 1106d54052..97e2fd5361 100644
--- a/includes/ServiceWiring.php
+++ b/includes/ServiceWiring.php
@@ -1006,6 +1006,7 @@ return [
 		$rl->register( $modules );
 		$hookRunner = new HookRunner( $services->getHookContainer() );
 		$hookRunner->onResourceLoaderRegisterModules( $rl );
+		$services->getLocalisationCache()->getItem( 'en', 'namespaceGenderAliases' );
 
 		$msgPosterAttrib = $extRegistry->getAttribute( 'MessagePosterModule' );
 		$rl->register( 'mediawiki.messagePoster', [

Change 627641 merged by jenkins-bot:
[mediawiki/core@master] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627641

Change 627617 had a related patch set uploaded (by Jforrester; owner: Tim Starling):
[mediawiki/core@wmf/1.36.0-wmf.9] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627617

Is this something likely to hit others? Should we back-port to REL1_35?

It's noteworthy that this is also called by the refreshMessageBlobs.php maintenance script which is called by scap as a final step for any sync where we rebuild the localisation cache. I think the message blob clearing inside rebuildLocalisationCache is superfluous except where that script is called independently of scap.

Change 627617 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.9] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627617

I have fetched https://gerrit.wikimedia.org/r/c/mediawiki/core/+/627617 on the deployment server and rebased the security patches on top of it. So we should be able to deploy.

LarsWirzenius closed this task as Resolved.Wed, Sep 16, 10:15 AM

I managed to deploy this week's train to testwikis, which rebuilds the l10n cache and failed yesterday. I declare this fixed now.

daniel reassigned this task from daniel to tstarling.Wed, Sep 16, 3:50 PM

Change 627790 had a related patch set uploaded (by Reedy; owner: Tim Starling):
[mediawiki/core@REL1_35] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627790

Reedy added a subscriber: Reedy.Wed, Sep 16, 3:58 PM

Is this something likely to hit others? Should we back-port to REL1_35?

Reading Tim's reply...

This is very similar to T231866. The fix for that bug was committed in December 2019 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/553170 . But that fixed the code in ServiceWiring.php without fixing the duplicated code in rebuildLocalisationCache.php. So I will apply the same fix to rebuildLocalisationCache.php.

I'm going with Yes. 553170 will be in REL1_35 already... And I backported it to REL1_34 too it seems, so it should probably get backported there to...

Change 627791 had a related patch set uploaded (by Reedy; owner: Tim Starling):
[mediawiki/core@REL1_34] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627791

Change 627791 merged by jenkins-bot:
[mediawiki/core@REL1_34] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627791

Change 627790 merged by jenkins-bot:
[mediawiki/core@REL1_35] Fix failure of rebuildLocalisationCache.php due to RL hook

https://gerrit.wikimedia.org/r/627790