Page MenuHomePhabricator

Localization failure on edit tabs, article history, and contribution history for English Wikipedia
Open, MediumPublic

Description

Author: kwwilliams

Description:
At approximately 18:50 March 25, 2013 UTC, localization on English Wikipedia began to break down on several fields. All my examples are in Dutch, but I tested with the interface language set to Japanese and observed parallel behaviour.

  1. On the contribution history for an editor, "rollback" now displays as "rollback" rather than "terugdraaien". It still displays as "terugdraaien" on a watchlist.
  1. On the history for an article, "rollback" now displays as "rollback" rather than "terugdraaien". It still displays as "terugdraaien" on a watchlist.
  1. At roughly the same time, the edit tabs (for Monobook and Vector) shifted as well: "pagina" became "article", "bewerken" became "edit this page", and "+" became "add new section".

I've checked, and at this time, all of this still works on Commons.

Random fields remain localized: the dates on the contribution history remain in Dutch. The tool tips from hovering over the English fields remain in Dutch. All "history" tabs remain as "history".


Version: unspecified
Severity: normal

Details

Reference
bz46579

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:20 AM
bzimport set Reference to bz46579.

kwwilliams wrote:

Last sentence should read

All "history" tabs remain as "geschiedenis".

kwwilliams wrote:

More testing, and the issue seems to be that the order of interface selection is wrong. By creating http://en.wikipedia.org/wiki/MediaWiki:Rollbacklinkcount/nl I was able to restore the Dutch text for rollback to be terugdraaien. It seems the problem is the existence of the customized English text: some change yesterday made the customized English text take precedence over the default Dutch text, which is inappropriate when the language preference is set to "nl". Fortunately, it seems that a custom Dutch message is still a higher priority than a custom English message, so I can selectively override these things if I want.

(In reply to comment #2)

Fortunately, it
seems that a custom Dutch message is still a higher priority than a custom
English message, so I can selectively override these things if I want.

I would very much recommend against doing that, because that would override basically forever, until the local override is removed.

This is a major issue for users of wikis with many local customisations that use a UI in a different language than the content language.

This is related to gerrit 44224, gerrit 52434 and gerrit 55816 that aim to resolve bug 1495. It may be best to revert all of these as soon as possible and deploy the reverts on Wikimedia.

I've added those involved in the aforementioned patch sets to this bug's CC list.

kwwilliams wrote:

Let me know then this is fixed, and I will remove my local overrides. That hodgepodge of Dutch and English controls was extremely irritating.

(In reply to comment #4)

Let me know then this is fixed, and I will remove my local overrides. That
hodgepodge of Dutch and English controls was extremely irritating.

You'll basically have to override all Dutch messages that are overridden in English. See Special:AllMessages?uselang=en for a list that will make you cry.

I did a quick check on Special:Preferences, and saw plenty in English that I'd expected to see in Dutch, so you can expect this to happen everywhere.

We're expecting to make a decision and a fix if deemed necessary before the weekend.

(In reply to comment #5)

We're expecting to make a decision and a fix if deemed necessary before the
weekend.

So you are breaking the interface for many people, and it will be "before the weekend".

Its clearly broken and should be reverted asap.

mwalker wrote:

When I originally wrote the base patch I had a couple of different ideas for how this would work. This particular option was chosen, in part, due to the belief that if the local community had overridden the default message then they had done so for a reason -- and that reason should be communicated to all users regardless of locale. This does mean that more messages will have to be updated -- and that right now that's extremely annoying -- but it doesn't mean it's not the correct choice.

The one case I find the current behavior especially compelling is if someone has modified a message to place in warning text/guidelines about usage/something similar.

The other option I thought would be correct is, if the thought is that locale specific messaging is more important than community inspired messaging, to oscillate back and forth between on-wiki and localization cache messages. IE: look in /pt-br on wiki then in CDB; then /pt on wiki then in CDB; /en on wiki then in CDB. This would require a bit of meta-data in the CDB saying what locale the message came from.

mwalker wrote:

(In reply to comment #6)

So you are breaking the interface for many people, and it will be "before the
weekend".

Siebrand and the most of the rest of the i18n team are not based in SF -- this decision was punted so that the correct response could be taken when the key team members are not asleep.

Its clearly broken and should be reverted asap.

That depends on your definition of broken. Many people thought the old behaviour was broken as well. The fact that this is creating discussion is a good thing -- I followed WP:BOLD and now we require some consensus on what is the best path forward.

kwwilliams wrote:

I can understand your perspective, Matt, but the problem I'm finding is that the majority of the overrides on English Wikipedia are trivial. Why would changes like the ones seen at http://en.wikipedia.org/w/index.php?title=Special%3AAllMessages&prefix=rollback&filter=modified&lang=en&limit=500 warrant overriding a different language's interface? Some admin getting annoyed with the placement of a colon in English shouldn't change the interface in other languages.

(In reply to comment #8)

That depends on your definition of broken. Many people thought the old
behaviour was broken as well.

Breaking a long term norm (even if that itself is "broken") and the user interface for users is still breaking it and should be reverted till properly consensus can be archived on the correct path for moving forward.

mwalker wrote:

(In reply to comment #9)

The problem I'm finding is that the majority of the overrides on English
Wikipedia are trivial.

That's somewhat surprising to me; and then I'm surprised that I'm surprised :p However; the example you bring to the table raises my point quite well -- there's a template embedded in the 'rollback-success' message. The current method exposes that; the old method would not. Is this important? If not, is it trivial in all cases?

(In reply to comment #10)
It's a decent argument. How many users are affected by this though; clearly there are at least two. How many users are getting an improved experience from this; I don't know. How many readers are getting an improved experience from this -- potentially quite a few; especially if we go ahead and start respecting a readers browser requested locale by default and/or start using ULS more heavily. I personally think the benefits outweigh the drawbacks.

(In general)
I don't actually have deploy authority - so I could not, even if I wanted to, revert the change; the language team will make up it's mind when it wakes up in a couple of hours and then effect what they think to be the most appropriate change.

In the meantime; I will leave you with the three options I gave the language team when I wrote the patch (there could be more) -- Niklas, possibly without fully understanding the ramifications, recommended option C. The other two are just as technically feasible. I await your thoughts on what would be the most correct way forward.

  • Option A - Use the message cache fallback chain at the requested language

lookup on wiki for message
if no message:

lookup via cdb cache

if no message:

for each language:
  lookup on wiki for the message
  • Option B - Use message cache fallback chain only at the wiki source language

for each language:

lookup on wiki for the message
if no message and language is wgLanguageCode:
  lookup via cdb cache w/ original requested language
if message:
  break
  • option C - always prefer on wiki messages

for language:

get on wiki message

if no message

get message from cdb

My comments at the time:
"It seems to me that option B might be more correct because we will get every possible on wiki customization (up to the native language of the wiki), then every possible cached version (up to english), then any on wiki customization (from native language to english).

Option A seems like it would give us english far too soon -- ie: it would only be able to give us on wiki replacements for fallbacks only if the language cache for the message didn't exist at all."

I feel like if I set "Russian" as my interface language (for example), the interface should be in Russian. I don't think there's an issue with a customized English message and a stock Russian message. Surely having the interface in Russian is going to be closer to what's expected and desired than having the interface in English.

Since both behaviors are considered broken, it looks like the only proper fix is to separate the two use cases: customisations which should and which shouldn't override standard translations should be separated.

I can think of two ways to do this with the currently unused /en page:
Assuming content language is 'en' and interface 'fit'.

UI: A/fit A/fi A/en CDB A
Content: A/en A CDB.

In this case customisations that should override standard translations should be moved to /en. Tweaks would stay.

This would mostly restore original behavior.

UI: A/fit A/fi A CDB
Content: A/en A CDB

In this case customisations that should override standard translations should kept. Tweaks should be moved to /en.

The breakage would stay until messages are migrated.

Imho both seem a bit illogical and hard to understand depending on how you look at it. If thinking of fallbacks, having tweaks at A makes sense. But that also makes the order of A and CDB inconsistent depending on the language. Thoughts?

(In reply to comment #3 by Siebrand)

This is a major issue for users of wikis with many local customisations that
use a UI in a different language than the content language.

This is related to Gerrit change #44224, Gerrit change #52434 and Gerrit
change #55816 that aim to resolve bug 1495. It may be best to revert all
of these as soon as possible and deploy the reverts on Wikimedia.

Tentatively setting as blocking bug 38865 as per Siebrand's comment.

kwwilliams wrote:

As a pragmatic compromise, would it be reasonable to test if the difference between the stock message and the local customization is solely punctuation, capitalization, and white space? If so, overriding the selected language is completely unreasonable, because the change is not impacting the meaning. If it's a larger change, it's at least possible that the argument in favor of overriding the selected interface language has merit.

I still don't think the selected interface language should ever be overridden, but this would at least get rid of most of the problems this change has caused.

A few days ago, I intended to report this exact bug when I spotted the behavior change on Commons (where the [Read] tab has changed from Czech to English just because somebody decided it would suit Commons better to have [View] instead of [Read]). Then, I noticed this was _exactly_ the change requested by bug 1495, which was now “fixed”. In my commit, I just fixed the change so that it at least works as designed. (There are further lesser technical problems with the change, but I won’t go into it here.)

I have started to write an opinion on the issue itself, but it is getting too long and complicated, so I believe the best course of action would be to revert the change(s) for now, because the current state of affairs is worse than it was. Then, initiate a discussion/RFC to find the best way of implementing all the conflicting requirements and use cases.

(In reply to comment #13)

UI: A/fit A/fi A/en CDB A
Content: A/en A CDB.

In this case customisations that should override standard translations should
be moved to /en. Tweaks would stay.

This would mostly restore original behavior.

Thanks for the options, Niklas. I discussed with Matt. We both agree your option (1) is the preferred option, as it has the smallest behavior change, and still provides a lot of options.

Matt will work on a patch, and submit it in the next 8 hours or so, meaning that it's going to be available for us in the European morning. I've communicated with Greg to get us a deployment window in which we can deploy the fix, around Thursday 10:00 UTC. See email for details.

So if the fix Matt writes works out, we'll be deploying it soon.

mwalker wrote:

OK: My first stab at a solution to the problem https://gerrit.wikimedia.org/r/#/c/56345/1

As said, we tried to get gerrit 56345 ready for deployment. We didn't make it. Niklas has reviewed Matt's patch and made some fixes. It's not ready for deployment yet.

At this point, I propose we revert the original change, and also remove it from the Wikimedia deployment. I've prepared a patch for that as gerrit 56380. This reverts the original patch set by Matt and a follow-up by Mormegil, as well a very small part of a patch by Tyler updating some PHPDoc.

This should allow us time to prepare an improved change set.

Potentially related: bug 46612.

Gerrit 56380 was merged. I don't know if and how this can be deployed to the active Wikimedia code.

mwalker wrote:

Katie will be deploying 56380 shortly.

mwalker wrote:

Reverted for the moment.

mwalker wrote:

(In reply to comment #4)

Let me know then this is fixed, and I will remove my local overrides. That
hodgepodge of Dutch and English controls was extremely irritating.

This patch has been reverted and the fix that we're debating over will not require you to have these overrides. You can go ahead and remove them. Thanks for your patience.

  • Bug 46657 has been marked as a duplicate of this bug. ***

(In reply to comment #23)

Reverted for the moment.

Is this bug resolved? I see bug 1495 has been re-opened.

(In reply to comment #26)

Is this bug resolved? I see bug 1495 has been re-opened.

Siebrand: Do you know by any chance?

This bug should be resolved, as previous behavior has been reinstated.

It looks like this problem is back, at least at nl.wikipedia.org: since yesterday some items in the navigation box and interaction box are not translated ("Find an article", "Featured content", "Tutorial").

Siebrand/Matt/Niklas: Any thoughts on this recurrence?

Change 72867 had a related patch set uploaded by Parent5446:
Complete usage of message fallback chain

https://gerrit.wikimedia.org/r/72867

(In reply to comment #32)

Complete usage of message fallback chain
https://gerrit.wikimedia.org/r/72867

Patch still awaiting review...

There is a special case which is broken by this bug, would be fixed by the solution proposed by comment 17, but can be fixed with a smaller change as well, and it would be nice if it got fixed soon (which doesn't seem likely for the general case).

There are certain messages whose default English text is exactly '-' and they are never translated in the i18n files (they are only present in the English version). MessageCache::get() used to have special handling for these: if there was no translation for the current language, it preferred the English DB translation to the CDB. Current behavior is that CDB takes precedence if there is no DB translation in the current language.

These messages are often used as configuration, e. g. MediaWiki:Licenses lists the licenses which are permitted for uploading. As such, they are mostly language-independent; when '-' overrides the English version, things break.

Until there is a proper fix for the lookup order, what do you think about restoring the special behavior? That is, if the CDB lookup was successful and the text is exactly '-', treat it as unsuccessful and continue to walk the fallback chain (the original behavior was to use directly the English DB message, but that seems inferior).

So unfortunately I'm having a bit of a hard time following exactly what kind of fallback sequence is wanted here. "A/fit A/fi A/en CDB A" does not mean too much because CDB can be the CDB for any language.

Assuming the user language is fit and the content language is en, the current sequence in the master branch code right now is:

DB/fit -> CDB/fit -> CDB/fi -> DB/fi -> DB/ -> CDB/en

And correct me if I'm wrong, but the "option 1" you're discussing is to change the sequence to:

DB/fit -> DB/fi -> DB/en -> CDB/fit -> CDB/fi -> DB/ -> CDB/en

This sequence makes no sense, as it seems to cause the exact problem this bug was reported for: when /en message overrides take precedence over the CDB for the native language.

Can somebody please clarify exactly what sequence is desired and why? The patch I have in gerrit right now makes the sequence this:

DB/fit -> CDB/fit -> DB/fi -> CDB/fi -> DB/ -> CDB/en

This sequence makes actual sense. If there is some functionality this sequence does not fulfill, somebody please explain it to me and I will suggest a saner solution that fixes the problem.

(In reply to comment #35)

Assuming the user language is fit and the content language is en, the current
sequence in the master branch code right now is:

DB/fit -> CDB/fit -> CDB/fi -> DB/fi -> DB/ -> CDB/en

I thought the current sequence is

DB/fit -> CDB/fit -> CDB/fi -> CDB/en -> DB/fi -> DB/

since CDB caches the full fallback chain lookup? That also seems consistent with the behavior mentioned in bug 55473, where the English-only language file value overrides the root DB value.

(In reply to comment #36)

I thought the current sequence is

DB/fit -> CDB/fit -> CDB/fi -> CDB/en -> DB/fi -> DB/

since CDB caches the full fallback chain lookup? That also seems consistent
with the behavior mentioned in bug 55473, where the English-only language
file
value overrides the root DB value.

It does cache the full fallback chain, but en is not part of the fallback chain. fit falls back on fi, and fi has no fallback. What happens is that if the message cannot be found anywhere in the requested language, it proceeds to check the site language.

Here is the exact process:

  1. Check the database for the requested language
  2. Check the CDB cache for the requested language (which includes fallbacks)
  3. Check the database for each of the requested language fallbacks
  4. Check the database for the site language
  5. Check the CDB cache for the site language (which includes fallbacks)
  6. Check the database for the site language fallbacks

And here is the process as proposed by the patch:

For each language in a list of the requested language and its fallbacks followed by the site language and its fallbacks:

  1. Check the database for that language
  2. Check the CDB cache for that language

Note: the actual implementation is a bit different since the CDB cache is only checked once, but it has that effect.

English is included in the CDB files.

You are proposing following kind of logic:
for x in language_fallback_chain:

use DB/x if it exists
use CDB/x if it exists

The problem with this is that it removes the ability to override site messages.

I can't remember now whether we chose "DB/x" (x is the content language) or "DB" (the MediaWiki page without a language code) to be the place for site overrides. Which ever it is, lets call it OVERRIDE. We must ensure that if OVERRIDE is present, CDB translations are ignored. We have tried to achieve this by the order we check messages. Other option would be to do simple if clause: if OVERRIDE exists, do not use CDB.

I would be okay with this kind of logic:
for x in language-fallback-chain:

use DB/x if it exists
use CDB/x if it exists unless OVERRIDE exists

This would be complementary solution usable by wikiadmins to my solution for server admins: https://gerrit.wikimedia.org/r/#/c/98078/ . Both allow achieving the same thing: do not use incompatible translations if the message content semantics have changed.

If you still bear with me: one of the reasons for the current state of affairs that CDB flattens the language fallback chain so that we do not know what language we actually get from it. It looks like this is going to change with: https://gerrit.wikimedia.org/r/#/c/72866/ . After that it is possible to implement the logic above.

(In reply to comment #38)

English is included in the CDB files.

Ah yes you are right. I forgot about that piece of code in LocalisationCache.

You are proposing following kind of logic:
for x in language_fallback_chain:

use DB/x if it exists
use CDB/x if it exists

The problem with this is that it removes the ability to override site
messages.

What do you mean "override site messages"? As in you want a message that, if set, overrides all languages and takes ultimate precedence?

I can't remember now whether we chose "DB/x" (x is the content language) or
"DB" (the MediaWiki page without a language code) to be the place for site
overrides. Which ever it is, lets call it OVERRIDE. We must ensure that if
OVERRIDE is present, CDB translations are ignored. We have tried to achieve
this by the order we check messages. Other option would be to do simple if
clause: if OVERRIDE exists, do not use CDB.

Right now, the page DB/x, where x is the content language, does absolutely nothing. MessageCache will never check the language-specific subpage for the content language.

I would be okay with this kind of logic:
for x in language-fallback-chain:

use DB/x if it exists
use CDB/x if it exists unless OVERRIDE exists

I'm still a little bit unclear about exactly what you want the OVERRIDE message to do. You're saying if it exists the CDB is ignored, but does this message ever get chosen as the actual message?

Correct me if I'm wrong, but here's what I think you're proposing (for full demonstration purposes, let's assume that the user language it fit with fallback fi and the site language is de_at with fallback de):

If OVERRIDE exists:
DB/fit -> DB/fi -> DB/en -> OVERRIDE
If OVERRIDE does not exist:
DB/fit -> CDB/fit -> DB/fi -> CDB/fi -> DB/en -> CDB/en -> DB/de_at -> CDB/de_at -> DB/de -> CDB/de -> DB/en -> CDB/en

(In reply to comment #39)

What do you mean "override site messages"? As in you want a message that, if
set, overrides all languages and takes ultimate precedence?

Yes. In comment 13 it was called "customisations that should override standard translations" as opposed to "tweaks". Simple example of override: [[m:MediaWiki:Histlegend]] stores links to external tools and used to do so in all languages (overriding default translations). Example of tweak: someone in en.wiki wants to replace "rollback" with "roll back" in English, no reason for this to hide all translations.

(In reply to comment #40)

Yes. In comment 13 it was called "customisations that should override
standard
translations" as opposed to "tweaks". Simple example of override:
[[m:MediaWiki:Histlegend]] stores links to external tools and used to do so
in
all languages (overriding default translations). Example of tweak: someone in
en.wiki wants to replace "rollback" with "roll back" in English, no reason
for
this to hide all translations.

OK, so the functionality you are looking for is you want to change the way the base page is treated.

The language subpages are just tweaks and have priority right before the CDB of that language, and not higher. And then the base page is treated as a complete override, taking precedence over all CDB messages regardless of language code.

My last question is this: which do you want to have a higher priority, overrides or language tweaks? In other words, if requesting a message in fit, and DB/ and DB/fit both exist, which takes precedence? Another way to word this question is should the base page take precedence over *all* translations, or only over CDB translations?

(In reply to comment #41)

OK, so the functionality you are looking for is you want to change the way
the
base page is treated.

No. It doesn't need to be the base page. The solution agreed upon is that the override would be in the content language subpage, MediaWiki:<key>/en on a wiki with en as wiki language.

The language subpages are just tweaks and have priority right before the CDB
of
that language, and not higher. And then the base page is treated as a
complete
override, taking precedence over all CDB messages regardless of language
code.

Yes, this would be option 2 in comment 13 if I understand correctly what you're saying.

My last question is this: which do you want to have a higher priority,
overrides or language tweaks? In other words, if requesting a message in fit,
and DB/ and DB/fit both exist, which takes precedence? Another way to word
this
question is should the base page take precedence over *all* translations, or
only over CDB translations?

Assuming our "DB/" is "A/en" of comment 13 and "DB/fit" is "A/fit", in the example of comment 13 and assuming they both exist, DB/fit takes precedence: in option 1 DB/ comes later and in option 2 it's not even checked.

This makes sense because otherwise customisations would be untranslatable, while (in my example) one may well want to translate the MediaWiki:Histlegend customisation too, on a multilingual wiki. They may get out of sync on the wiki but it's their responsibility to fix. Both options also satisfy Liangent's requirement in bug 1495 comment 34 (after the /zh-* subpages are deleted).

OK, I think I finally understand. Basically you want to assign a special value to the subpage that corresponds with the site language, so that if something is on that page, it overrides all translations unless there is a more specific language override. In other words, the exact sequence becomes:

wgLang == fit (with fallbacks fi and en)
wgContLang == de_at (with fallbacks de and en)

  1. DB/fit
  2. CDB/fit (only if DB/de_at does not exist)
  3. DB/fi
  4. CDB/fi (only if DB/de_at does not exist)
  5. DB/en
  6. CDB/en (only if DB/de_at does not exist)
  7. DB/de_at
  8. DB/
  9. CDB/de_at
  10. DB/de
  11. CDB/de
  12. DB/en
  13. CDB/en

I constructed the above sequence of fallbacks based on the following assumptions:

  1. Each language takes precedence over its fallbacks
  2. The requested language takes precedence over the site language
  3. A non-content-language subpage override takes precedence over its CDB version
  4. A base page override takes precedence over the content language CDB
  5. A content-language subpage override takes precedence over all CDB versions

If any of these assumptions are incorrect, let me know.

Just to be clear, if the above description is correct, I disagree with it. It doesn't make any sense conceptually that DB/de_at (the content language subpage) would be the global override page. That should be what the base page is for.

I agree with the proposal and the amendment. It will just mean that lots of minor wording fixes on our wikis will need to be moved to DB/en (or whatever is the content language) to avoid breaking the fallback chain.

The patch has been updated to reflect the amended workflow.

I reviewed the patch against what I understood to be the agreed-upon sequence and found a few issues, which I noted.

As for the sequence itself, consider for example:

wgLang = fit (fallbacks fi, en)
wgContLang = fi (fallback en)

10: DB/fit
20: CDB/fit (if DB missing)
30: DB/fi
40: CDB/fi (if DB missing)
50: DB/en
60: CDB/en (if DB missing)
70: DB
80: false (site language and all fallbacks already checked)

It doesn't seem right that if DB were created, then DB/en as a translation of DB, that DB/en would override DB. It shouldn't be necessary to duplicate DB at DB/fi.

So perhaps it would make sense to add a "25: DB": if DB/fi is in the sequence, then DB would be checked immediately beforehand.

That would be a bit difficult, because en is considered part of the fallback chain proper. I'll see what I can do.

[obviously not blocking bug 38865; removing]

Aklapper lowered the priority of this task from High to Medium.Feb 16 2021, 11:11 AM
Aklapper edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.

Obviously not high priority given that this has been open for many years without any news.
For the records, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/72866 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/72867/ are still open and both need rebasing.