Page MenuHomePhabricator

Split sh language code to sh-Cyrl-ekavsk, sh-Cyrl-ijekavsk, sh-Latn-ekavsk and sh-Latn-ijekavsk
Closed, InvalidPublic

Description

On sh.wiki, we reached consensus to split current sh language code to four variants. All currently translated messages should be moved to sh-Latn-ijekavsk code, which would be primary.

List of new codes
  • sh
  • sh-Cyrl
  • sh-Latn
  • sh-ekavsk
  • sh-ijekavsk
  • sh-Cyrl-ekavsk
  • sh-Cyrl-ijekavsk
  • sh-Latn-ekavsk
  • sh-Latn-ijekavsk
List of codes with fallbacks
CodeFallback
shsh-Latn
sh-Cyrlsh-Cyrl-ekavsk
sh-Latnsh-Latn-ijekavsk
sh-ekavsksh-Cyrl-ekavsk
sh-ijekavsksh-Latn-ijekavsk
sh-Cyrl-ekavsk
sh-Cyrl-ijekavsk
sh-Latn-ekavsk
sh-Latn-ijekavsk

Simular task: T117845

Event Timeline

@Fomafix, I guess you can help out with this considering that you've already initiated the same process for sr-ec and sr-el language codes.

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry defines the subtags ekavsk and ijekavsk only for sr, not for sh. For sh is the comment:

Type: language
Subtag: sh
Description: Serbo-Croatian
Added: 2005-10-16
Scope: macrolanguage
Comments: sr, hr, bs are preferred for most modern uses

Is it intended to have the four languages sh-Cyrl-ekavsk, sh-Cyrl-ijekavsk, sh-Latn-ekavsk and sh-Latn-ijekavsk as user interface language or as content language or both?

Is it possible to implement an automatic language converter between the four languages?

Sorry for being imprecise. I was thinking of the user interface language.

As for the automatic language converter, I don't believe it would be possible to make a good converter between Ijekavian and Ekavian. Currently, it's only possible to make a converter Latin<->Cyrillic. But I would create a separate task for that later.

sh is a macrolanguage with the comment sr, hr, bs are preferred for most modern uses. sr is part of the macrolanguage sh. Is there a difference between sr-Cyrl-ekavsk and sh-Cyrl-ekavsk?

There are no big differences, but I know some users who would rather use sh-Cyrl-ekavsk instead of sr-Cyrl-ekavsk. Besides, the community supported the introduction of four interface variants.

It's obvious that IANA is not complete. For example, cnr-Latn and cnr-Cyrl are not specified although the Montenegrin language uses both Cyrillic and Latin alphabets.

https://www.loc.gov/standards/iso639-2/php/code_changes.php notes for sh:

This code was deprecated in 2000 because there were separate language codes for each individual language represented (Serbian, Croatian, and then Bosnian was added). It was published in a revision of ISO 639-1, but never was included in ISO 639-2. It is considered a macrolanguage (general name for a cluster of closely related individual languages) in ISO 639-3. Its deprecated status was reaffirmed by the ISO 639 JAC in 2005.

Shouldn't the individual language codes used?

cnr has like sr no Suppress-Script field. https://www.rfc-editor.org/rfc/rfc5646.html#section-3.1.9 writes:

The lack of a 'Suppress-Script' might indicate that the language is customarily written in more than one script [...].

cnr-Cyrl and cnr-Latn are valid like sr-Cyrl and sr-Latn. sr-Cyrl and sr-Latn are just specified additional separately but this is redundant.

Well, if sh can't do, there's hbs to go with. This, hbs code , should be still alive and kicking as seen from one of the ISO 639 Registration Authorities at their page here:

IdentifierLanguage Name(s)StatusCode SetsEquivalent(s)ScopeLanguage TypeDenotations
hbsSerbo-CroatianActive639-3639-1: sh (deprecated)MacrolanguageLivingEthnologue, Glottolog, Multitree, Wikipedia

But even if we go with the infamous sh, still it might be totally valid. I can understand how it appears that its use poses a problem, yet maybe there is none. I mean its status is deprecated not obsolete, that is it is discouraged but it's not abandoned. Still.

didn't know the sh code is deprecated... I guess we can then go with the bhs code and the corresponding subcodes (bhs-Cyrl-ekavsk, bhs-Cyrl-ijekavsk, bhs-Latn-ekavsk, and bhs-Latn-ijekavsk? Is that okay?

bhs is a typo of hbs. hbs is the ISO 639-3 (three letter) equivalent of sh. sh is not really deprecated, it is considered as a macrolanguage for bs/bos (Bosnian), hr/hrv (Croatian), sr/srp (Serbian) and cnr (Montenegrin). The variants ekavsk and ijekavsk are only defined for sr, sr-Cyrl and sr-Latn. sh-Cyrl-ekavsk leads to a validation error: https://validator.w3.org/nu/?useragent=Validator.nu%2FLV+http%3A%2F%2Fvalidator.w3.org%2Fservices&acceptlanguage=&doc=https%3A%2F%2Fsh.wikipedia.org%2Fwiki%2FPosebno%3AVerzija%3Fuselang%3Dsh-Cyrl-ekavsk

Bad value sh-Cyrl-ekavsk for attribute lang on element html: Variant ekavsk lacks recommended prefix. Use one of sr, sr, sr, or latn instead.

The recommendation in the error message is a bit funny and should be sr, sr-Cyrl or sr-Latn.

Yes, and if you put sh-latn with either ijekavsk or ekavsk, it substitutes latn with cyrl in the error message.

Anyway, I render the message:

Error: Bad value hbs for attribute lang on element html: The language subtag hbs is not a valid ISO language part of a language tag.

from that same W3 validator is that hbs code isn't particularly liked.
On the other hand it does behave similarly with eng which should be the valid ISO code for English (see the output of https://validator.w3.org/nu/?useragent=Validator.nu%2FLV+http%3A%2F%2Fvalidator.w3.org%2Fservices&acceptlanguage=&doc=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3AVersion%3Fuselang%3Deng).

Actually, I'd say that three letter codes seem to have issues with them: ara gives an error, but arb and arz appear ok.

https://www.rfc-editor.org/rfc/rfc5646.html#page-10 (part of BCP 47) writes:

When languages have both an ISO 639-1 two-character code and a three-character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is defined in the IANA registry.

This explains why here only the two-letter codes are allowed in BCP 47.

Marking this as invalid since the variants ekavsk and ijekavsk are only defined for sr, per Fomafix.