Page MenuHomePhabricator

Rename Serbo-Croatian Wikipedia and Wiktionary from sh.wiki* to hbs.wiki*
Closed, InvalidPublic

Description

Rename sh.wikipedia to hbs.wikipedia, and rename sh.wiktionary to hbs.wiktionary.

sh is not valid ISO 639-1 code since 2000 (sic!) and has been replaced by [[ http://www-01.sil.org/iso639-3/documentation.asp?id=hbs | hbs ]]

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedadrianheine
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedTTO
InvalidNone
OpenNone
OpenNone
ResolvedNikerabbit
OpenNone
OpenFeatureNone
OpenBUG REPORTNone
OpenNone
DuplicateNone
OpenFeatureNone
InvalidNone
OpenNone
ResolvedWinston_Sung
ResolvedWinston_Sung
OpenNone
DeclinedNone
OpenNone
DeclinedNone
DeclinedNone
ResolvedNone
ResolvedLadsgroup
OpenWinston_Sung
ResolvedBUG REPORTMbch331
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
DeclinedNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

"hs" is still valid. It has not been reassigned or deleted, only been "deprecated" (for bad reasons, only because it is a macrolanguage, just like "zh" in fact, but "zh" has been reassigned as an alias of "cmn" for BCP47). Renaming is not necessary. "hs" is still valid for BCP47.

This code was deprecated in 2000 because there were separate language codes for each individual language represented (Serbian, Croatian, and then Bosnian was added). It was published in a revision of ISO 639-1, but never was included in ISO 639-2. It is considered a macrolanguage (general name for a cluster of closely related individual languages) in ISO 639-3. Its deprecated status was reaffirmed by the ISO 639 JAC in 2005.

It does not matter, for the BCP47 all languages and macro-languages are supported. And "hs" and "hbs" are aliased in the IANA database, just like "zh" with "cmn", or "en" with "eng" and BCP47 favors the shortest codes.

It was "deprecated" only in ISO 639-1 because this very old standard (a bad mix) did not correctly handle the difference between isolated languages and macro languages or even language families (this was also true for ISO639-2), and finally fixed in ISO 639-3, that added the necessary data, including for old codes from ISO 639-1 and ISO 639-2, but also but make it more compliant with BCP 47, the (stable) standard effectively used.

ISO 639 has never been stable, it has its own contradictions (in fact it had absolutely no policy at all, it is just a random collection of codes that have been used in libraries, but it proved to be largely insufficient, and unclear about the scope of some codes: inclusive or exclusive: this was fixed several decnials later only with an addon to ISO 639-3). We should not rely on it, but only on BCP 47 (and on the classification of languages/macrolanguages/families added with ISO639-3) which has much clearer rules.

The status in ISO 639-1 *only* does not matter, we do not depend on it, not more than ISO 639-2 or ISO 639-3. And ISO 639 still lacks information for dialects, and families, when BCP 47 already has the framework for it.

Our source is then the IANA database for BCP47 (which also fixes many errors or caveats, also found in ISO 639-1, ISO 639-2, ISO 639-3, ISO 15924, and UN M.49 codes, which were *partly* imported).

Please consider that "deprecated" even in ISO 639-1 does NOT mean "invalid". It is only deprecated for use as an isolated language, yes, but this is ALSO true for "hbs" !

So renaming "hs" to "hbs" would actually not change anything, and would give absolutely no benefit.

We are already compliant with BCP 47 and with ISO 639-1 when we use "hs" for the macrolanguage

Aklapper triaged this task as Lowest priority.Feb 22 2016, 2:55 PM

I just want to say that I completely agree with Aklapper putting this on the lowest priority. There is no just Serbo-Croatian Wiktionary, but Wikipedia, too. They have community and community should be consulted and, if they agree, both of the projects should be moved.

However, I see no point in spending energy and time in this issue, as it's not in collision with anything and it's not likely to be in foreseeable future.

I just want to say that I completely agree with Aklapper putting this on the lowest priority. There is no just Serbo-Croatian Wiktionary, but Wikipedia, too. They have community and community should be consulted and, if they agree, both of the projects should be moved.

However, I see no point in spending energy and time in this issue, as it's not in collision with anything and it's not likely to be in foreseeable future.

T127679 for shwiki, unfortunately non of the shwiki sysops can be found at Phabricator.

Liuxinyu970226 changed the task status from Open to Stalled.Apr 30 2017, 5:13 AM

per krinkle

This request should not be "stalled" but just "closed" since long as completely unnecessary (bad request).
"sh" is perfectly valid in BCP 47 and is still recommanded over "hbs" which is exactly the same thing (with exactly the same status with regard to ISO 639).

The entry about "sh" in the normative IANA database for BCP47 is extremely clear, it is a standard macrolanguage (just like "zh" for Chinese, which was mapped in the IANA database at the same date!).

%%
Type: language
Subtag: sh
Description: Serbo-Croatian
Added: 2005-10-16
Scope: macrolanguage
Comments: sr, hr, bs are preferred for most modern uses

[...]

%%
Type: language
Subtag: zh
Description: Chinese
Added: 2005-10-16
Scope: macrolanguage

In summary this request in invalid, it would mean we would have to rename zh to zho, en to eng, fr to fra, de to deu to match what is incorrectly expected, and which is not the best practice (BCP 47), but only based on an equivalence in ISO 639 (which is not stable and not usable for localisation purpose, only used now for some bibliographic uses by librarians, many of them having abandoned its use in favor of BCP47 which is more precise and stable)...
if one says that "sh" should not be used because it is a macrolanguage refering to multiple individiaul languages, this is true as well for Chinese and in fact as well for English, French, German that have multiple variants. So this is in fact a request to delete "sh" and ignore the fact that it has an active community and that many linguists still think that the separation of "individual" languages making "sh" is completely artificial: sr-Latn, hr and bs are actually the same language and evidences are showing that the political attempt to divide it in specific countries is failing in many domains and has not served their own communities and not helped developing their use internationally but created only a political division we should probably have avoided in Wikimedia.
But now we have distinct communities not working together as they should and the "sh" community exists that have created useful contents which is much less politically oriented than the separate sr, hs, and bs communities. I doubt any one of these 4 communities will accept the deletion of any one of the 4.
But there's absolutely no need to rename one of them whose encoding is is completely standard (and the target "hbs" is not standard). This request seems to be only a political attack against one of the 4 communities, and will help nobody.

Liuxinyu970226 renamed this task from Rename sh.wiktionary to hbs.wiktionary to Rename Serbo-Croatian Wikipedia and Wiktionary from sh.wiki* to hbs.wiki*.Jun 12 2018, 2:16 AM
Liuxinyu970226 updated the task description. (Show Details)

Once again the deletion from ISO 639-1 is not relevant, the code is still conforming to BCP47; there's no need at all to rename this one (and "sh" cannot be reallocated to any other language). We just want to conform to BCP47 which is stable (ISO 639 is not stable and in fact ambiguous in many other cases, ISO 639-1 is no longer a normative source of BCP47, this informative reference exists for historic purpose only and we do not conform to ISO 639 and will never be able to conform to it; all the web standards are based on BCP47 which explains in iuts RFC the difference and why not all ISO639 codes are accepted, as it contains numerous classification errors and ISO 639 is inconsistant; ISO 639 remains used only for old bibliographic purpose, for libraries of printed books and old legal archives, not for technical tagging; many libraries have stopped using ISO 639 and have converted to BCP47 which is consistant, stable, and much more precise)

Background info:

The ISO 639-1 language code sh was deprecated, leaving the ISO 639-3 code hbs as the only undeprecated code.
However, language tags are defined by BCP 47, which is based on ISO 639 but is not identical to it.
In particular: BCP 47 does not register multiple tags for the same language, which means that the ISO 639-3 code hbs is not a valid BCP 47 language tag, since sh already exists. BCP 47 also does not remove tags, which means sh remains a valid BCP 47 language tag, even though it was deprecated in ISO 639-1.

Validator link: https://validator.w3.org/nu/?doc=https%3A%2F%2Fincubator.wikimedia.org%2Fwiki%2FWy%2Fhbs%2FGlavna_stranica
which reports
Bad value hbs for attribute lang on element div: The language subtag hbs is not a valid ISO language part of a language tag.

A language tag validator: https://r12a.github.io/app-subtags/?check=hbs vs https://r12a.github.io/app-subtags/?check=sh

Winston_Sung moved this task from Untriaged to Language names and codes on the I18n board.

Based on the discussion (in Discord):

I believe BCP 47 / IANA Language Subtag Registry would be the standard (that took precedence) (instead of the ISO 639) (as we're using these language codes for websites).
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

I'll close this task as invalid (until hbs become one of the valid language codes in BCP 47 / IANA Language Subtag Registry).