Page MenuHomePhabricator

Preferences and lang codes should distinguish "English" from "American English"/"U.S. English"
Open, LowPublic

Description

Right now our preferences list lists "en - English", "en-CA - Canadian English", and "en-GB - British English". However in reality the "en - English" is en-US ("American English" or "U.S. English").

We should update the preferences system and lang output to accurately reflect state:

  • Special:Preferences should list en-US instead of 'en' and call it by a proper name.
  • When en is used in user language lang="" should output en-US as oourut 'en' i18n is en-US.
  • When the content lang is 'en' we should respect this as we don't know what locale the wiki's content actually uses, and lang="" for content should output 'en'.
  • When a users' preference is set to flat 'en' the preferences list should have the 'en-US' entry as the selected entry.

See also:
T154589: evaluate creation of en-us for Wikidata monolingual strings

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:49 PM
bzimport set Reference to bz31874.
bzimport added a subscriber: Unknown Object (MLST).
brion added a comment.Dec 9 2011, 1:09 PM
  • Bug 32889 has been marked as a duplicate of this bug. ***

We shouldn't be so quick to throw away "en". There is such a thing as International English, after all, so "en" doesn't necessarily have to refer to American English. Also, if we only have en-US, en-GB and en-CA, it doesn't leave any other category for other Englishes, of which there are quite a few. I imagine quite a few Australians may prefer "en" over "en-GB", for example, even though the spelling may be closer in the latter. Also, we shouldn't forget dialects like Indian English and Singlish. Perhaps English speakers of those dialects could get by with en-GB, but perhaps not; more investigation is needed, I think.

Krenair added a subscriber: Krenair.Jan 2 2015, 2:39 PM

We shouldn't be so quick to throw away "en". There is such a thing as International English, after all, so "en" doesn't necessarily have to refer to American English. Also, if we only have en-US, en-GB and en-CA, it doesn't leave any other category for other Englishes, of which there are quite a few. I imagine quite a few Australians may prefer "en" over "en-GB", for example, even though the spelling may be closer in the latter. Also, we shouldn't forget dialects like Indian English and Singlish. Perhaps English speakers of those dialects could get by with en-GB, but perhaps not; more investigation is needed, I think.

Our i18n files' en is not international English, it is written specifically in American English.

en - English isn't really being thrown out at all. We'd probably have a quiet alias so $wgLanguageCode = 'en'; will still work.

And for other English variations, no-one said we had to have "only" en-US, en-GB, and en-CA. In fact, originally we didn't even have en-CA, I had it created.

If anyone wants Australian English, Indian English, Singlish, or any other English dialect all they need to do is find someone willing to write the message changes and have the new dialect created on TWN.

How about creating "en-US" in addition to "en", instead of replacing it?

en is already en-US, there's no point confusing people by having them both in preferences.

He7d3r added a comment.Jan 3 2015, 1:54 PM

FYI: there seems to be an analogous situation for Portuguese, as discussed on
https://pt.wikipedia.org/wiki/Project:Esplanada/propostas/Uso_do_portugu%C3%AAs_de_Portugal,_pt-PT_%284mar2012%29
In that context, my understanding is that we have:

  • pt-BR for Portuguese from Brazil
  • pt (in theory) for Portuguese from Portugal

However, the content language of Portuguese Wikipedia is set to pt and it seems to be common to have Brazilian expressions in the local pt translations in that wiki (i.e., replacing the ones from Translatewiki). So, for ptwiki readers, while pt-BR contains only Brazilian Portuguese translations, pt is a mix of pt-PT and pt-BR translations (which is probably unwanted by readers from Portugal).

Our i18n files' en is not international English, it is written specifically in American English.

en is already en-US, there's no point confusing people by having them both in preferences.

That's not true. en i18n should be written in international English and avoid locale-specific variation as much as possible.

Fomafix added a subscriber: Fomafix.Jan 9 2018, 8:15 AM

On https://en.wikipedia.org/w/index.php?title=Metre&oldid=817655372#cite_note-3 stands:

Thus, the spelling metre is referred to as the "international spelling"; the spelling meter, as the "American spelling".

Currently the system message exif-subjectdistance-value uses:

en-ca.json:	"exif-subjectdistance-value": "$1 metres",
en-gb.json:	"exif-subjectdistance-value": "$1 metres",
en.json:	"exif-subjectdistance-value": "$1 meters",

An international spelling would be:

en-ca.json:	"exif-subjectdistance-value": "$1 metres",
en-gb.json:	"exif-subjectdistance-value": "$1 metres",
en-us.json:	"exif-subjectdistance-value": "$1 meters",
en.json:	"exif-subjectdistance-value": "$1 metres",

Change 412337 had a related patch set uploaded (by Fomafix; owner: Fomafix):
[mediawiki/core@master] Distinguish between International English (en) and American English (en-us)

https://gerrit.wikimedia.org/r/412337

Do the language codes in MediaWiki's list match ICU locale codes? They certainly appear to, but then we are overriding things like date formatting, so perhaps they shouldn't thought of as strictly the same thing.

If these are actual locale names, then en without any country or variant code looks very much the same as en_US (e.g. short dates are M/d/yy), and I'm not sure but is en_001 "English (World)" the same as International English?

Change 412337 abandoned by Fomafix:
Distinguish between International English (en) and American English (en-us)

Reason:
The messages exif-* does not exist anymore. The words meter/metre are currently not anywhere else.

https://gerrit.wikimedia.org/r/412337