Steps to replicate the issue (include links if applicable):
- Open a page on the Incubator for the HBS Wikivoyage, e.g. https://incubator.wikimedia.org/wiki/Wy/hbs/Glavna_stranica
- Inspect the HTML
What happens?:
The page HTML uses hbs for the language tag, which is not valid.
What should have happened instead?:
The page HTML should use sh for the language tag.
Software version (skip for WMF-hosted wikis like Wikipedia):
Other information (browser name/version, screenshots, etc.):
There are several things that could be done here:
- Revisit the decision to use hbs and move the Incubator pages to sh, so that they don't need the language tag in the HTML to be different from the project's language code.
- Add a mapping from hbs to sh in MediaWiki (in LanguageCode.php I assume).
- Make the Incubator extension only automatically set the page language from the language code if the language code is also a valid language tag.
Background info:
The ISO 639-1 language code sh was deprecated, leaving the ISO 639-3 code hbs as the only undeprecated code.
However, language tags are defined by BCP 47, which is based on ISO 639 but is not identical to it.
In particular: BCP 47 does not register multiple tags for the same language, which means that the ISO 639-3 code hbs is not a valid BCP 47 language tag, since sh already exists. BCP 47 also does not remove tags, which means sh remains a valid BCP 47 language tag, even though it was deprecated in ISO 639-1.
Validator link: https://validator.w3.org/nu/?doc=https%3A%2F%2Fincubator.wikimedia.org%2Fwiki%2FWy%2Fhbs%2FGlavna_stranica
which reports
Bad value hbs for attribute lang on element div: The language subtag hbs is not a valid ISO language part of a language tag.
A language tag validator: https://r12a.github.io/app-subtags/?check=hbs vs https://r12a.github.io/app-subtags/?check=sh
There was a discussion on the Langcom list earlier this year about which code to use, where @Amire80 said ""hbs" is cleaner according to the standards" but that is only true for ISO 639, and (with the exception of a few historical invented codes) Wikimedia uses BCP 47 codes even for wiki subdomains (e.g. all two-letter codes, nds-nl, be-tarask), so sh would actually be the more consistent choice.
All other Serbo-Croatian projects use sh: