Page MenuHomePhabricator

Completely port l10nupdate to scap
Closed, DeclinedPublic

Description

Although scap builds localization cache it is not used for the nightly l10nupdate process.

  • l10nupdate and scap duplicate a lot of code.
  • l10nupdate is fragile and hard to debug.

Eventually we need to clean up this mess.

Event Timeline

https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/files/l10nupdate-1

Will still need to shell out to extensions/WikimediaMaintenance/refreshMessageBlobs.php and extensions/LocalisationUpdate/update.php and rebuildLocalisationCache.php

But it'd still be a massive improvement

I'm not sure that there is any massive code/functionality duplication. /usr/local/bin/l10nupdate-1 calls scap operations wikiversions-inuse and sync-l10n. Most of the rest of the work it does is git clone updating that scap doesn't do and running several different maintenance scripts.

I don't have any objection to converting the l10nupdate scripts to python and making them proper scap commands, but I don't see that there is much duplicate code or fragility as the task suggests. The larger problem with l10nudpate is that the job it performs is not well understood and most of the real logic lives in the extensions/LocalisationUpdate/update.php maintenance script which few have read.

With the train running at a weekly cadence the benefit of l10nupdate runs is also arguable. There was a time when it took many weeks for new translations from master to reach the production wikis, but today the typical delta is 7 days which doesn't seem a horrible burden for the translators or the communities they serve.

With the train running at a weekly cadence the benefit of l10nupdate runs is also arguable. There was a time when it took many weeks for new translations from master to reach the production wikis, but today the typical delta is 7 days which doesn't seem a horrible burden for the translators or the communities they serve.

I tend to agree and have said so ever since we got an a regular (bi-)weekly schedule. And for situations in which a translation must go out (which is rare, probably), we can always do it via SWAT.

I don't share these opinions about the lack of benefits to update daily our UI localisation from master. Translations could be a typo, a clumsy formulation, something ambigus. A lot of translations occur on TranslateWiki, where translators can't even not know nor have any likeness to know there is a SWAT process for urgent changes.

So we currently have a working solution to allow the interface to be fixed the day after, without any need to add patches to SWAT.

Furthermore, a full scap is required for l10nupdate, so these patches wouldn't be a good fit.

And finally, some extensions aren't weekly: CentralNotice for example uses a wmf_deploy branch and doesn't follow the train.

I don't share these opinions. Translations could be a typo, a clumsy formulation. We currently have a working solution to allow the interface to be fixed the day after, without any need to add patches to SWAT.

How is that any more urgent than other user-facing bugs that can wait for the train?

Furthermore, a full scap is required for l10nupdate, so these patches wouldn't be a good fit.

Full scaps during SWAT can happen, and they do.

Full scap of a new branch is ~30 minutes. Updating localization without syncing a new branch is probably less than 15 minutes and could easily be done at the end of a swat.

Full scap of a new branch is ~30 minutes. Updating localization without syncing a new branch is probably less than 15 minutes and could easily be done at the end of a swat.

scap l10n-update && scap sync-l10n --version 1.28.0-wmf.$N

  • scap l10n-update rebuilds the cdb cache files for all active branches but it would be pretty easy to give it a --version switch
  • scap sync-l10n --version $VERSION is what the l10nupdate cron job does after it has built new cdb files in a different way than scap's l10n-update does. The former merges in new message changes from master; the latter just uses whatever is in the cloned branch (but doesn't undo changes made by the former script). The behavior of not clobbering the cdb and recreating form scratch actually makes the cdb generation much slower as it introduces a lot of stat system calls to figure out if the messages in the cdb or the messages in the source files are newer.