Page MenuHomePhabricator

Add language codes sr-cyrl and sr-latn on Wikidata
Open, MediumPublic

Description

sr-cyrl and sr-latn should be added as language codes in Wikidata for labels, monolingual text and lexemes.

The existing codes sr-ec and sr-el are Wikimedia inventions and there is work being done to eventually switch everything to using the correct codes sr-cyrl and sr-latn (T125073, T117845).

Making sr-cyrl and sr-latn available in Wikidata now would be a good idea because:

  • It would allow us to start migrating the data already. There is a lot, so it will take some time.
  • It would resolve the issue described in T262269 where it's not possible for people to add Serbian (Latin script) or Serbian (Cyrillic script) to their termbox, because the data uses sr-el and sr-ec but the language codes in the Babel box are normalised to sr-latn and sr-cyrl.
  • It would be a proper way to resolve the inconsistency in which language codes are being used for lexemes, with some people using sr-el and sr-ec and others using sr-x-Q2839566 and sr-x-Q829464.
  • The RDF output would use valid language codes even before T243428 is fixed.

Data that needs migrating:

Event Timeline

This seems like a well-reasoned proposal to me, and given the amount of data that needs migrating, I think it would be good to do this as soon as possible.
Are there any special considerations that need to be made? Would it be possible to change the language codes automatically somehow (e.g. via a maintenance script), or should it be done by the communtiy (e.g. by bot)?

LangCom has no objections to using standard language codes, of course. :-)

Good to go from my side as well. Let's do it.
I fear the data migration needs to be done by the community via a bot.
As for how the migration happens: Should we add the new ones, migrate all the data, and then remove the old ones?

Good to go from my side as well. Let's do it.
I fear the data migration needs to be done by the community via a bot.

That's fine. We have plenty of bots (and non-bots) mass editing labels already.

As for how the migration happens: Should we add the new ones, migrate all the data, and then remove the old ones?

We still don't have a way to disable language codes (see T51024, T284808, T320887, etc) but otherwise yes.