Page MenuHomePhabricator

Language tag on Incubator pages for HBS Wikivoyage
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

The page HTML uses hbs for the language tag, which is not valid.

What should have happened instead?:

The page HTML should use sh for the language tag.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

There are several things that could be done here:

  • Revisit the decision to use hbs and move the Incubator pages to sh, so that they don't need the language tag in the HTML to be different from the project's language code.
  • Add a mapping from hbs to sh in MediaWiki (in LanguageCode.php I assume).
  • Make the Incubator extension only automatically set the page language from the language code if the language code is also a valid language tag.

Background info:

The ISO 639-1 language code sh was deprecated, leaving the ISO 639-3 code hbs as the only undeprecated code.
However, language tags are defined by BCP 47, which is based on ISO 639 but is not identical to it.
In particular: BCP 47 does not register multiple tags for the same language, which means that the ISO 639-3 code hbs is not a valid BCP 47 language tag, since sh already exists. BCP 47 also does not remove tags, which means sh remains a valid BCP 47 language tag, even though it was deprecated in ISO 639-1.

Validator link: https://validator.w3.org/nu/?doc=https%3A%2F%2Fincubator.wikimedia.org%2Fwiki%2FWy%2Fhbs%2FGlavna_stranica
which reports

Bad value hbs for attribute lang on element div: The language subtag hbs is not a valid ISO language part of a language tag.

A language tag validator: https://r12a.github.io/app-subtags/?check=hbs vs https://r12a.github.io/app-subtags/?check=sh

There was a discussion on the Langcom list earlier this year about which code to use, where @Amire80 said ""hbs" is cleaner according to the standards" but that is only true for ISO 639, and (with the exception of a few historical invented codes) Wikimedia uses BCP 47 codes even for wiki subdomains (e.g. all two-letter codes, nds-nl, be-tarask), so sh would actually be the more consistent choice.

Serbo-Croatian projects use sh:

Event Timeline

If the decision here is to follow BCP 47 and use sh for the subdomain, T127680 should be rejected.

Dear @Nikki it is a pity to see this only now (accidentally) and a bit disappointed that you would not consult with me as initiator and/or the growing community of contributors to this project. The choice of HBS is not related only to language code standards *(or which one is more adequate for Wikimedia), but is also a social and political choice in the post-war/still-conflict-packed region of former SFR Yugoslavia. HBS is a spectrum and it could also explicitly include Montenegrin as part of Bosnian-Croatian-Montenegrin-Serbian (BCMS)...

If language committee or Wikimedia tech admins want to reduce this to tech-only and make decision separate of the community of contributors please state this now as we started the work under the premise of HBS approval and change could influence our ambitions over the creation of this new project effort. Our aspirations are not in following SH (former official standards), but accommodating full spectrum of those old + new standards and even keeping it flexible with deviations and hybrids *(as we think it is beneficial for Wikivoyage to be more flexible and less formal in following overly politicized new language standards in the region).

As a matter effect ideally I would not use as first option any of the technical language codes *(as for sure there is no perfect one, that leaves no one estranged in this region), but rather DobarPut.Wikivoyage.org as I elaborated on January 14th proposal. https://w.wiki/5$Fj

Best Z. Blace

I'm not really sure what I could consult you about - using hbs in HTML language tags isn't valid, regardless of how anyone feels about it, and MediaWiki's function for outputting valid BCP 47 language tags has a bug if it's outputting something which isn't a valid BCP 47 language tag.

Langcom revisiting their decision is one potential option, since not having a mismatch between the two codes would avoid the issue, but it's up to Langcom whether they want to do that. If they do, I would expect them to discuss it with the community before coming to a new decision.

OK - I do not understand all the technical options, but switching the sub-domain from HBS to SH would be undesirable. If this can be decoupled from HTML use of language code this could be a compromise on both ends.

It has been more than 3 months since this bug report was made and I tried to get feedback on my proposal and send reminders 5 times by email to get any feedback in past months. Initially I thought that it is a matter of the winter season slowing down for holidays, but now I see other issues are addressed and this is not. Is this issue a hot potato or something else is an issue? (I know language politics is hard to navigate with South Slavic languages, but ignoring it deteriorates our trust and commitment)

@Zblace Sorry for the lack of reply! I think the solution of using a different language tag in the HTML and for the subdomain should work, but there are a few technical hurdles in the way that I am not sure how to solve. I don't think there is any rush in fixing this for the time being though, it would be more of an issue once the Wikivoyage is approved.

I'm not really sure what I could consult you about - using hbs in HTML language tags isn't valid, regardless of how anyone feels about it, and MediaWiki's function for outputting valid BCP 47 language tags has a bug if it's outputting something which isn't a valid BCP 47 language tag.

This specific issue of the hbs code being used in the lang attribute on Incubator is a feature of the WikimediaIncubator extension. It will output any two- or three-letter code used in a prefix without any further attempt at validation. The relevant code is https://github.com/wikimedia/mediawiki-extensions-WikimediaIncubator/blob/master/includes/WikimediaIncubator.php#L904-L910 . That should probably be improved.

@jhsoby thank you for reply. I hope that (internal) technical requirement and consistency will not force Wy HBS (wider spectrum) to fallback to SH (historic spectrum with Serbian and Croatian standards of pre-breakdown of Yugoslavia) as we do write it and want to accommodate widest spectrum possible (including new standards and local non-standard language use). It is essential for this project to try to have bottom up approach to language (politics) and not just perpetuate top-down language antagonisms in the region. I hope this is clear and acceptable, if not lets have a conversation on this now and not wait for the stress and shock as we try to leave the incubator. OK?

p.s.
I fear that Wikipedians have non-amazing history of having global aspirations and following formalism of 'authorities' even at the expanse of innovation and its own communities of contributors. It would be terrible to have that situation in this context as it would be against this effort.

Zblace renamed this task from Invalid language tag on Incubator pages for Serbo-Croatian Wikivoyage to Language tag on Incubator pages for HBS Wikivoyage.Feb 27 2023, 7:03 AM
Zblace updated the task description. (Show Details)
Zblace added subscribers: Millodarka, millosh, DVrandecic.