Page MenuHomePhabricator

RFC: Reevaluate LocalisationUpdate extension for WMF
Closed, DeclinedPublic

Assigned To
None
Authored By
Reedy
Feb 16 2017, 11:17 PM
Referenced Files
None
Tokens
"Heartbreak" token, awarded by MarcoAurelio."The World Burns" token, awarded by Liuxinyu970226."Heartbreak" token, awarded by Kghbln."Heartbreak" token, awarded by Nemo_bis.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle renamed this task from RFC: Disabling LocalisationUpdate on WMF wikis to RFC: Reevaluate LocalisationUpdate extension for WMF.Apr 26 2017, 9:04 PM
Krinkle updated the task description. (Show Details)

Per @demon comment we've re-added this to the TechCom-RFC board. While one of the proposed solutions isn't very cross-cutting in its implementation (the solution involving scap creating git commits that will be deployed normally), we acknowledge that it still impact users and developers. We can therefore still help facilitate this proposal and work towards an approved solution that involves the different stakeholders.

  • Users and translators: No updates in the weekend.
  • Developer processes: Possible conflicts in wmf branches?
  • Operations: Where is this process going to run? How will the commits be submitted to Gerrit?
  • Release-Engineering (or Language-Engineering): How, when, and by whom will they be deployed?

See updated RFC at: https://www.mediawiki.org/wiki/Requests_for_comment/Reevaluate_LocalisationUpdate_extension

Another data point is this bug report: T163671: LocalisationUpdate not working since 2017-04-11

As you can see from the title, l10nupdate wasn't working since 2017-04-11. That task was reported on 2017-04-24, almost 2 full weeks later. Which, I believe, re-opens the "what's wrong with weekly (at least) updates (with updates during a SWAT on an as-needed basis)?" question. I won't say more on that topic now, though.

One change that should be uncontroversial is only have l10nupdate run during the work week. It currently runs at 2am UTC. That means we can probably have it run Mon-Fri at that time without much issue. In the Pacific timezone that's Sunday night through Thursday night. Given the situation with l10nupdate today, I recommend we make this change ASAP. This small of a change, to bring it in-line with our standard operating procedures for all deploys, should not require the rest of this RFC process to complete.

As you can see from the title, l10nupdate wasn't working since 2017-04-11. That task was reported on 2017-04-24, almost 2 full weeks later. Which, I believe, re-opens the "what's wrong with weekly (at least) updates (with updates during a SWAT on an as-needed basis)?" question. I won't say more on that topic now, though.

It may be that it wasn't reported sooner because translators are used to complications with the updating process and it's hard to tell whether it's broken, or there's yet another temporary outage, or delay to daily update is intentional due to message definition update or some other special condition related to certain message or repo. This time I looked into it a little and reported it, but usually I'm just patient. I expect interface texts to be updated in a timely manner though. In any case, thanks for fixing it.

I did notice that messages aren't updated, and assumed that it's related to the reduction of deployments related to the dc switch, and there weren't any particularly important updates that I wanted to get fixed urgently.

I did report particular LU failures numerous times in the past.

I did notice that messages aren't updated, and assumed that it's related to the reduction of deployments related to the dc switch, and there weren't any particularly important updates that I wanted to get fixed urgently.

But LU runs even when we're not doing deployments (unless we explicitly cut it off). If the default assumption of "no updates" is "we aren't deploying code right now" -- how does this differ from letting things go with the train?

I did report particular LU failures numerous times in the past.

Yes, they do get reported. There tends to be a bit of a lag between failure & them getting noticed and filed. Not blaming you though, just my general observation.

Since it is hard to detect when LU is or isn't working, we could add one dummy message key to MediaWiki core which contains the timestamp of last export in a given language. Then one could just look at the timestamp (and compare to what is in git if necessary).

It's not hard at all: it logs success/failure in the SAL after each run.

It's not hard at all: it logs success/failure in the SAL after each run.

It takes two weeks to check the logs for failures??? ;)

Exactly my point: the information is there for those who care to see it. Because we've had multiple occasions where it was multiple weeks for people to notice...

Exactly my point: the information is there for those who care to see it. Because we've had multiple occasions where it was multiple weeks for people to notice...

Frankly I do not believe that it is the task of the wikipedia community to notice and report. On high frequency wikis they just add the translations manually which has happened multiple times in the past and on low frequency wikis people are more concerned with other issues. How can they be supported to get their translations? I guess this is what task is all about.

Exactly my point: the information is there for those who care to see it. Because we've had multiple occasions where it was multiple weeks for people to notice...

Frankly I do not believe that it is the task of the wikipedia community to notice and report.

I never said any such thing :)

As stated above, there is no clear owner of the l10nupdate code and process. For us (RelEng) to consider it something we would take ownership of we would require some basic changes, like update frequency.

How can they be supported to get their translations? I guess this is what task is all about.

I respectfully disagree. This task is about the fact that no one owns l10nupdate at the Foundation and thus it is not taken care of. No where in any of the proposals would any wiki community lose support for getting their translations.

Surely it has some purview of the language team to make sure things are running correctly? I'm not saying they should necessarily be the ones to fix it, but at least check on it, file tasks as appropriate.

Anyone is of course welcome to add improved logging, notifications, ganglia or similar checks etc as deemed appropriate

How can they be supported to get their translations? I guess this is what task is all about.

And no one has said we're going to stop exporting from translatewiki on a daily basis, nor that the changes wouldn't ride the train every week like they currently do

The task is about the automated deployment of these translations, how and when they happen

Exactly my point: the information is there for those who care to see it. Because we've had multiple occasions where it was multiple weeks for people to notice...

Well, failure in the SAL isn't the only reason why translations sometimes won't go through in a timely manner. As noted above sometimes the problem is elsewhere, or regarding certain interface text delay could be intentional (I usually suspect the latter). Most translators probably don't know where to check any of that. And of course, if the interface text isn't highly visible, then its translation not coming through may go unnoticed too by translators since they are not expected to come back each time they have translated something to see if updating went smoothly.

Nonetheless, regardless of the reason why it sometimes takes time to report failures, I believe it's fair to expect that interface looks neat at any given time, and users see as little untranslated texts as possible and as briefly as possible.

It's not hard at all: it logs success/failure in the SAL after each run.

Language team has received reports multiple times about LU being broken. What I proposed could help to investigate what is the root cause for the user reports by eliminating a lot of uncertainty. The presence or lack of presence of the log entry in SAL alone doesn't tell much. The full logs of the runs are not visible as far as I know, so my suggestion would also make more information available to non-sysadmins.

But I'll bring up the larger question of how to do localisation updates. We might choose to do things in one way for now, with the assumption that we will improve it later. Or if we are certain there are no plans to work on this area in near future, then we should aim for a good enough solution that avoids as many of the drawbacks listed in the discussion as possible.

I'm linking some ideas on work on this area:

What I am currently missing from the wiki page is a list of requirements/wishes how things should work to be painless for deployments. My guess would be something like: reliable, fast, human-initiated. But should it go through git? Should it be like regular updates to i18n files, or can we just drop in an optimized blob that would be fast to deploy?

Language team has received reports multiple times about LU being broken. What I proposed could help to investigate what is the root cause for the user reports by eliminating a lot of uncertainty. The presence or lack of presence of the log entry in SAL alone doesn't tell much. The full logs of the runs are not visible as far as I know, so my suggestion would also make more information available to non-sysadmins.

AFAIK they only exist on disk at tin:/var/log/l10nupdatelog (or swap tin for whatever is the current deployment host in whichever datacenter). I don't think that they make it into logstash at all

greg triaged this task as Medium priority.Nov 27 2018, 7:12 PM
greg moved this task from INBOX to Epics (ARCHIVED) on the Release-Engineering-Team board.

Change 677325 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Disable LocalisationUpdate, part I

https://gerrit.wikimedia.org/r/677325

Change 677326 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Disable LocalisationUpdate, part II

https://gerrit.wikimedia.org/r/677326

Change 677327 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Disable LocalisationUpdate, part III

https://gerrit.wikimedia.org/r/677327

Change 677385 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [BETA CLUSTER] Disable LocalisationUpdate

https://gerrit.wikimedia.org/r/677385

Change 677385 merged by jenkins-bot:

[operations/mediawiki-config@master] [BETA CLUSTER] Disable LocalisationUpdate

https://gerrit.wikimedia.org/r/677385

Change 677325 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable LocalisationUpdate, part I

https://gerrit.wikimedia.org/r/677325

Mentioned in SAL (#wikimedia-operations) [2021-05-10T18:25:27Z] <jforrester@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:677325|Disable LocalisationUpdate, part I (T158360)]] (duration: 00m 58s)

Change 790355 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/tools/release@master] Stop branching LocalisationUpdate for Wikimedia production

https://gerrit.wikimedia.org/r/790355

Change 677326 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable LocalisationUpdate, part II

https://gerrit.wikimedia.org/r/677326

Mentioned in SAL (#wikimedia-operations) [2022-05-18T13:06:18Z] <jforrester@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:677326|Disable LocalisationUpdate, part II (T158360)]] (duration: 00m 52s)

Change 677327 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable LocalisationUpdate, part III

https://gerrit.wikimedia.org/r/677327

Mentioned in SAL (#wikimedia-operations) [2022-05-18T13:08:56Z] <jforrester@deploy1002> Synchronized wmf-config/extension-list: Config: [[gerrit:677327|Disable LocalisationUpdate, part III (T158360)]] (duration: 00m 53s)

Change 790355 merged by jenkins-bot:

[mediawiki/tools/release@master] Stop branching LocalisationUpdate for Wikimedia production

https://gerrit.wikimedia.org/r/790355

I'm going to say at this point that this is Declined.