MediaWiki has a mapping for language codes in includes/language/LanguageCode.php. Wikibase has its own mapping in repo/config/Wikibase.default.php.
Some are the same:
Code | MediaWiki and Wikibase |
de-formal | de-x-formal |
es-formal | es-x-formal |
hu-formal | hu-x-formal |
map-bms | jv-x-bms |
nl-informal | nl-x-informal |
simple | en-simple |
Some are different:
Code | MediaWiki | Wikibase |
---|---|---|
cbk-zam | cbk | cbk-x-zam |
crh | crh (not changed) | crh-Latn |
nrm | nrf | fr-x-nrm |
roa-tara | nap-x-tara | it-x-tara |
The Wikibase mapping is only used for sitelinks in RDF (as far as I can tell). Elsewhere in RDF, they are not converted (the ticket for that is T243428). When displaying entities, the HTML lang attributes use the MediaWiki mapping. This results in the same language code being standardised in different ways.
For example: On https://www.wikidata.org/wiki/Q5296, the roa-tara.wikipedia.org sitelink has lang="nap-x-tara" and hreflang="nap-x-tara" in the HTML and on https://roa-tara.wikipedia.org/ the <html> element has lang="nap-x-tara", whereas the RDF has schema:inLanguage "it-x-tara" and schema:name "Pagene Prengepále"@it-x-tara.
These are describing the same text/page and HTML and RDF both use the same standard for language codes (BCP 47) so the language code should be the same in both places.
The function which uses Wikibase's mapping (in repo/includes/Rdf/RdfVocabulary.php) already uses LanguageCode::bcp47 (which uses MediaWiki's mapping), so perhaps Wikibase doesn't need its own mapping at all. If it needs to be possible to customise the mapping, it would probably make more sense for the MediaWiki list to be customisable.