Page MenuHomePhabricator

Decide how and where to configure Wikidata/Wikibase monolingual text languages
Open, Needs TriagePublic

Description

New languages for monolingual text values on Wikidata can be defined in two places: in the wmgExtraLanguageNames setting in the Wikimedia production config repository, which makes them available for terms (labels/descriptions/aliases) as well, or in WikibaseContentLanguages::getDefaultMonolingualTextLanguages(), which doesn’t affect terms but is also effective on third party Wikibase installations.

Occasionally, language codes will migrate from only being available in monolingual text to being available in terms as well (e. g. T220118), which means adding them to wmgExtraLanguageNames in wikimedia-config. In the past, we’ve removed such codes from the default monolingual text languages afterwards (though often with some delay), since they’re now redundant in Wikidata. However, this means that third-party installations will lose support for such language codes even if they might (in theory) already have data using them, since the language codes are no longer in the default monolingual text languages and these installations don’t use the Wikimedia configuration where they’re now listed instead.

Should we do anything about this? I see several possible ways forward:

  • Don’t change anything, continue to remove the redundant language codes. If third-party installations are unlikely to actually use those language codes, there’s no problem. If they do use them, they can still configure $wgExtraLanguageNames in their own settings.
  • Fully duplicate wmgExtraLanguageNames in the default monolingual text languages, so that they’re supported on all installations. Very slightly bloats Wikibase, but it’s not really a big deal.
  • Remove the hard-coded extra default monolingual text languages (i. e. this list) from the Wikibase source code repository and instead move them to the Wikimedia configuration repository as well, so they’re only effective on Wikidata and other Wikimedia installations, not on third-party installations. (Though I think the required configuration variable would have to be introduced first.) Once both lists are configured in the same place, we can remove a language code from the monolingual text languages every time we add it to wmgExtraLanguageNames, even in the same commit.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2019, 9:50 AM
jhsoby added a subscriber: jhsoby.Apr 12 2019, 9:54 AM

It sounds like we shouldn't really be removing things from WikibaseContentLanguages::getDefaultMonolingualTextLanguages().
For wikidata.org we should have some configuration which can add language codes to that list.
If we want them to be available for all wikibases then of course they should be added to the main list.

I month on, is there a decision then? or? :)

For wikidata.org we should have some configuration which can add language codes to that list.

That sounds like the configuration option mentioned in •3, but without the assumption that we’ll move all the hard-coded extra languages there?

If we want them to be available for all wikibases then of course they should be added to the main list.

I’m not sure if these extra languages are likely to be useful to other Wikibases… my impression is “no, just Wikidata”, but I don’t really know enough third-party Wikibases.

I guess a way forward for now would be to introduce the extra configuration option, and then next time we want to add a new language code for monolingual text, we can think whether it belongs in the hard-coded list or in the new config variable…?

If we want them to be available for all wikibases then of course they should be added to the main list.

I’m not sure if these extra languages are likely to be useful to other Wikibases… my impression is “no, just Wikidata”, but I don’t really know enough third-party Wikibases.

I think that assessment is correct. And if it is easy enough to add a new one in a config it doesn't really matter anyway if we ship a few more or not.

I guess a way forward for now would be to introduce the extra configuration option, and then next time we want to add a new language code for monolingual text, we can think whether it belongs in the hard-coded list or in the new config variable…?

Sounds good to me!

Yupik added a subscriber: Yupik.Jun 27 2019, 10:30 AM