sr-cyrl and sr-latn should be added as language codes in Wikidata for labels, monolingual text and lexemes.
The existing codes sr-ec and sr-el are Wikimedia inventions and there is work being done to eventually switch everything to using the correct codes sr-cyrl and sr-latn (T125073, T117845).
Making sr-cyrl and sr-latn available in Wikidata now would be a good idea because:
- It would allow us to start migrating the data already. There is a lot, so it will take some time.
- It would resolve the issue described in T262269 where it's not possible for people to add Serbian (Latin script) or Serbian (Cyrillic script) to their termbox, because the data uses sr-el and sr-ec but the language codes in the Babel box are normalised to sr-latn and sr-cyrl.
- It would be a proper way to resolve the inconsistency in which language codes are being used for lexemes, with some people using sr-el and sr-ec and others using sr-x-Q2839566 and sr-x-Q829464.
- The RDF output would use valid language codes even before T243428 is fixed.
Data that needs migrating:
- sr-ec labels/aliases: 750,000
- sr-el labels/aliases: 1.5 million
- sr-ec descriptions: 24.6 million
- sr-el descriptions: 23.3 million
- Wikidata monolingual text statements, qualifiers and references: unknown, it's not possible to search for them
- Lexemes: A handful of lemmas, glosses and forms
- sr-el captions on Commons: 6500
- sr-ec captions on Commons: 3000
- SDC monolingual text statements, qualifiers and references: unknown, it's not possible to search for them