Scap deployments are not purging MessageBlobStore (was: Stale localized messages)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jeblad
	May 5 2019, 3:45 PM

Description

It seems like nowiki does not use w:no:MediaWiki:Rcfilters-show-new-changes, compare nowiki with enwiki.

To clarify; this is enwiki with English sysmsg (note that it has a dtg) and this is enwiki with Norwegian Bokmål sysmsg (note no dtg).

Details

Subject	Repo	Branch	Lines +/-
Revert "Disable MessageBlobStore::clear() via hook"	operations/mediawiki-config	master	+0 -6
clear_message_blobs: call once	mediawiki/tools/scap	master	+8 -2
MWScript.php: Mark refreshMessageBlobs.php as a global script	operations/mediawiki-config	master	+1 -0
resourceloader: Make MessageBlobStore global check key actually global	mediawiki/core	wmf/1.34.0-wmf.5	+9 -4
resourceloader: Make MessageBlobStore global check key actually global	mediawiki/core	wmf/1.34.0-wmf.6	+9 -4
resourceloader: Make MessageBlobStore global check key actually global	mediawiki/core	master	+9 -4
Clear MessageBlobStore after syncing i18n data	mediawiki/tools/scap	master	+16 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		thcipriani	T222539 Scap deployments are not purging MessageBlobStore (was: Stale localized messages)
		Resolved		jijiki	T224915 Deploy scap 3.10.0-1

Event Timeline

jeblad created this task.May 5 2019, 3:45 PM

Restricted Application added a project: Growth-Team. · View Herald TranscriptMay 5 2019, 3:45 PM

Restricted Application added subscribers: Danmichaelo, Aklapper. · View Herald Transcript

Cannot reproduce. Your link shows Vis de nyeste endringene, but it takes a long time until it is displayed.
Does the problem also happen with safemode?

Screenshot from 2019-05-05 18-51-32.png (466×960 px, 100 KB)

Actually, you did reproduce the bug.

The sysmsg is Vis nye endringer siden $1, but the shown message is Vis de nyeste endingene. If you open the same special page at enwiki you will get the correct message.

Note that using uselang=qqx will give (Rcfilters-show-new-changes:…

Tried to go to enwiki, which has the timestamp, and then change the English language to Norwegian Bokmål, and the timestamp goes away.

jeblad updated the task description. (Show Details)May 5 2019, 6:49 PM

Restricted Application added a subscriber: jhsoby. · View Herald TranscriptMay 5 2019, 6:49 PM

@Nikerabbit Could this be an error due to the translate update being turned off for a short time?

Aklapper renamed this task from Special page recent changes failes to use MediaWiki:Rcfilters-show-new-changes to Special:RecentChanges on no.wp ignores localized MediaWiki:Rcfilters-show-new-changes message.May 5 2019, 7:39 PM

Aklapper changed the task status from Stalled to Open.

Premeditated subscribed.May 5 2019, 9:32 PM

As I wrote as clarification, this also happen on enwiki, i follows the user language. Just click on the links and wait a few minutes. Most likely the message cache is somehow stuck in an outdated state.

It also exists at several other languages; da, nn, sv, fi…

It's not translatewiki.net: https://github.com/wikimedia/mediawiki/blob/wmf/1.34.0-wmf.3/languages/i18n/nb.json#L1451

It's unlikely to be stale MessageCache, because the message is not overridden in the wiki.

It could be stale LocalisationCache, but it is not:

nikerabbit@deploy1001:~$ mwscript eval.php nowiki
> echo wfMessage( 'rcfilters-show-new-changes' )->plain();
Vis nye endringer siden $1

So it is likely stale resource loader blob cache (or whatever is the correct name for it). We have had this before I think.

mw.loader.implement( "mediawiki.rcfilters.filters.ui@0hljuy5", {
...
{...,"rcfilters-show-new-changes":"Vis de nyeste endringene",...} );
}

Restricted Application added a project: Performance-Team. · View Herald TranscriptMay 6 2019, 6:53 AM

For some reason, this particular message has failed to update on every wiki that it's been deployed to. I'll force-update it in about half an hour, and will investigate why this happened after.

• kchapman moved this task from Inbox, needs triage to Radar on the Performance-Team board.May 6 2019, 7:53 PM

• kchapman edited projects, added Performance-Team (Radar); removed Performance-Team.

Krinkle moved this task from Inbox to Backlog on the MediaWiki-ResourceLoader board.May 6 2019, 7:53 PM

@Catrope We're watching this on the radar for now, but after your investigation feel free to ping or move back to Inbox for re-triage if we need to patch/CR something.

It turns out MessageBlobStore's normal update mechanism for when messages change was disabled in the WMF config by rOMWC7f1a3bc742ed: Disable MessageBlobStore::clear() via hook, because of T29320: MessageBlobStore::clear() causes scaling problems on multi-server setups with CDB l10ncache. However, that bug might be moot since MessageBlobStore no longer uses the database. I think this probably broke when we stopped running LocalisationUpdate, because the only thing that appears to run refreshMessageBlobs.php (referenced in the comment in that config patch) is l10nupdate-1, which I don't believe we run anymore.

Mentioned in SAL (#wikimedia-operations) [2019-05-06T22:43:05Z] <RoanKattouw> Running refreshMessageBlobs.php on all wikis for T222539

Since the reason that clearing the MessageBlobStore from LocalisationCache was originally disabled is now moot (MBS isn't DB-based anymore), and since MessageBlobStore::clear() is just a touchCheckKey() call nowadays, I think we can safely revert rOMWC7f1a3bc742ed: Disable MessageBlobStore::clear() via hook. I will upload a patch to do that now.

Wow, nice catch!

Indeed, refreshMessageBlobs.php normally ran as part of the localisation update cron, which is disabled at the moment.

However, while we don't update localisation cache from cron anymore, we do still update it as part of Scap deployments (naturally, to update localisation caches). But, we don't run refreshMessageBlobs.php in Scap. We should either run it there, too, or (my recommendation) re-enable $blobStore->clear() and remove support for disabling it.

One thing to keep in mind there is timing. We need to make sure the purge starts after (or lasts until) all servers have been updated as they might otherwise repopulate from stale files. As such, it might still be worth re-using the maintenance script instead for that purpose as last step in Scap. Or, we can check whether there is a tombstone involved and whether it lasts long-enough, but I'm not sure we can rely on that (~ 11 seconds, last I checked, which seems risky).

Krinkle renamed this task from Special:RecentChanges on no.wp ignores localized MediaWiki:Rcfilters-show-new-changes message to Scap deployments are not purging MessageBlobStore (was: Stale localized messages).May 6 2019, 11:23 PM

Krinkle edited projects, added Regression; removed Edit-Review-Improvements-RC-Page.

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.May 6 2019, 11:27 PM

As I was talking this through with @Krinkle on IRC, he pointed out that in WMF production, we only run rebuildLocalisationCache.php once, on the deployment server, and then distribute the CDB files it generates to the application servers(*). rebuildLocalisationCache.php doesn't run on each app server. Together with $wgLocalisationCacheConf['manualRecache'] = true, this means that LocalisationCache::recache() only runs once, on the deployment server, many seconds (even minutes) before any of the app servers have the new CDB files. Reverting 7f1a3bc742 will cause MessageBlobStore::clear() to be called at this time, but that's not very useful. The MBS cache will be invalidated, but then it'll just be repopulated with stale data.

So instead, we need scap to run refreshMessageBlobs.php near the deployment process, after the new CDB files are in place. Since the MBS cache is shared in WANCache, it only needs to be run once, and running it on the deployment server is OK; it just needs to be run after all the app servers have the new CDB files in place.

(*) Technically what happens is that the generated CDB files get converted to JSON, rsynced to all the app servers, and then converted back to CDB locally on each app server, but for our purposes this is equivalent to just syncing the CDB files directly.

Jdforrester-WMF added projects: Scap, Release-Engineering-Team.May 7 2019, 12:00 AM

Change 508488 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

gerritbot added a project: Patch-For-Review.May 7 2019, 1:25 AM

Catrope claimed this task.May 7 2019, 1:27 AM

Catrope removed projects: Growth-Team, MediaWiki-Recent-changes.

Change 508476 had a related patch set uploaded (by Catrope; owner: Catrope):
[operations/mediawiki-config@master] Revert "Disable MessageBlobStore::clear() via hook"

https://gerrit.wikimedia.org/r/508476

Catrope mentioned this in rMSCA7b59f5790092: Clear MessageBlobStore after syncing i18n data.May 7 2019, 3:43 AM

Catrope mentioned this in rMSCAe437dd68cec0: Clear MessageBlobStore after syncing i18n data.

Catrope mentioned this in rMSCA02b384d24ebe: Clear MessageBlobStore after syncing i18n data.

Change 508488 merged by jenkins-bot:
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

In T222539#5169146, @gerritbot wrote:

Change 508488 merged by jenkins-bot:
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

Merged in scap master, will have to create a new version of the scap deb and upload it to get it to production. I can do that next week if no one beats me to it.

Thanks to everyone! :D

jeblad awarded a token.May 9 2019, 8:17 AM

In T222539#5169148, @thcipriani wrote:

In T222539#5169146, @gerritbot wrote:

Change 508488 merged by jenkins-bot:
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

Merged in scap master, will have to create a new version of the scap deb and upload it to get it to production. I can do that next week if no one beats me to it.

This version of scap is currently deployed to beta. I noticed that the build time trend ticked up by a bit there -- 30 seconds on top of a 2 minute (most of the time) job:

beta-scap-eqiad-time.png (432×514 px, 15 KB)

I then ran this command in prod (https://tools.wmflabs.org/sal/log/AWq8jYzXEHTBTPG-CVWR) -- I started my stop watch a little after I started the command, I stopped the clock right when the command ended. It took 11m 21s +- 10-15 seconds where I was fumbling with my watch. scap sync for a new wikiversion (worse-case as far as time is concerned) takes about 35 minutes (https://tools.wmflabs.org/sal/production?p=0&q=%22finished+scap%22&d=) so an extra 11 minutes would be a significant increase in the amount of time it takes to sync a new version.

Also, this is meant to run when you run scap pull which currently takes about 30 seconds -- 11 minutes would make SWAT quite a hassle.

Is it possible to refreshMessageBlobs in parallel for each wiki rather than in serial? Is there any other way to avoid the 11 minute penalty?

Yes, it can go in parallel.
Also, I can improve core so that we don't need to run it foreach-wiki in the first place. Literally all this script does is issue one single memcached command. The overhead is all in initialising the PHP process and switching between wikis. The actual command, while it only takes a millisecond to make and send, is currently per-wiki. But doesn't even need to be that. I'll patch that up for next week.

Krinkle claimed this task.May 16 2019, 1:05 AM

Krinkle triaged this task as Medium priority.

Krinkle edited projects, added Performance-Team; removed Performance-Team (Radar).May 20 2019, 9:31 PM

Krinkle moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.May 20 2019, 9:34 PM

Change 511611 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/511611

Krinkle moved this task from Doing (old) to Blocked (old) on the Performance-Team board.May 20 2019, 11:02 PM

Change 512046 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.34.0-wmf.6] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/512046

Change 511611 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/511611

Change 512048 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.34.0-wmf.5] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/512048

Change 512046 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.6] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/512046

Change 512048 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.5] resourceloader: Make MessageBlobStore global check key actually global

https://gerrit.wikimedia.org/r/512048

Mentioned in SAL (#wikimedia-operations) [2019-05-22T21:47:49Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: T222539 / rMW3cb01cc73ce9 (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2019-05-22T21:51:06Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: T222539 / rMW734b3d84f752 (duration: 00m 56s)

Change 512219 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[mediawiki/tools/scap@master] clear_message_blobs: call once per wikiversions

https://gerrit.wikimedia.org/r/512219

Krinkle reassigned this task from Krinkle to thcipriani.May 24 2019, 2:35 PM

Restricted Application edited projects, added Release-Engineering-Team (Kanban); removed Release-Engineering-Team. · View Herald TranscriptMay 24 2019, 2:35 PM

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.May 24 2019, 2:35 PM

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.7; 2019-05-28).May 25 2019, 2:01 AM

Thanks for cleaning this up, @Krinkle and @thcipriani. I heard about this problem during the hackathon, but forgot to work on it and then went on vacation. The way it was solved is exactly what I had in mind,

Change 513174 had a related patch set uploaded (by Catrope; owner: Catrope):
[operations/mediawiki-config@master] MWScript.php: Mark refreshMessageBlobs.php as a global script

https://gerrit.wikimedia.org/r/513174

Change 513174 merged by jenkins-bot:
[operations/mediawiki-config@master] MWScript.php: Mark refreshMessageBlobs.php as a global script

https://gerrit.wikimedia.org/r/513174

Change 512219 merged by jenkins-bot:
[mediawiki/tools/scap@master] clear_message_blobs: call once

https://gerrit.wikimedia.org/r/512219

Deployed on beta, the old method took ~25 seconds, new method takes ~5 seconds. I'll prep a new prod scap release.

jijiki closed subtask T224915: Deploy scap 3.10.0-1 as Resolved.Jun 25 2019, 1:45 PM

greg edited projects, added Release-Engineering-Team-TODO (201907); removed Release-Engineering-Team (Kanban).Jul 1 2019, 9:25 PM

greg moved this task from INBOX to Ready on the Release-Engineering-Team-TODO (201907) board.Jul 1 2019, 9:27 PM

greg edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team-TODO (201907).Jul 6 2019, 4:33 AM

greg moved this task from Should be empty (use Release-Engineering-Team) to Next on the Release-Engineering-Team-TODO board.Jul 6 2019, 5:39 AM

This should now be resolved in production as of Tue, Jun 25, 7:45 AM when the new scap version (3.10.0-1) was released.

thcipriani moved this task from Ready to Completed on the Release-Engineering-Team-TODO (201907) board.Jul 8 2019, 3:22 PM

thcipriani mentioned this in T228328: 'scap pull' stopped working on appservers ?.Jul 17 2019, 7:04 PM

Change 508476 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Disable MessageBlobStore::clear() via hook"

https://gerrit.wikimedia.org/r/508476

Mentioned in SAL (#wikimedia-operations) [2019-09-30T21:40:08Z] <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: T222539 Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)

• dduvall mentioned this in T260827: l10n process for MW containerization.Oct 1 2020, 7:21 PM

	F28932543: Screenshot from 2019-05-05 18-51-32.png
	May 5 2019, 4:53 PM

Scap deployments are not purging MessageBlobStore (was: Stale localized messages)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Scap deployments are not purging MessageBlobStore (was: Stale localized messages)
Closed, ResolvedPublic
Actions

Related Objects
Search...