Page MenuHomePhabricator

Import all languages from wikidata
Open, MediumPublic

Description

Important: doing so should come with a bot / query able to update all those languages pages.

We currently have less than 600 languages available on Lingua Libre, and only administrators can add new languages, using LinguaImporter, which just asks for the Wikidata Qid of the language and adds it automatically. There are more than 10.000 languages described on Wikidata.
I wonder why we don't just import all languages from wikidata, since it can be done automatically. This would allow users to record words on Lingua Libre in their language without having to ask to an admin to add it. It is possible that some users wanted to record in their language and abandoned when they saw that is was not available.
We could synchronize the databases periodically, to have on Lingua Libre labels that were added on Wikidata since the last synchronisation and vice-versa (if labels were added on Lingua Libre and not on Wikidata)

Event Timeline

As a side question to this : why do we even need to import languages from Wikidata ?

.I don't feel comfortable with a massive import mainly because among all the elements that are languoids, there is a mix between language families, dialects, varieties, etc. And so I wonder about our means to manage all of this. So agree to import list of "official" langages but skeptical about a massive import

Clearly importing *all* languages (lato sensu) from Wikidata doesn't seems to be a good idea. But on the other hand, it would make sense to import some "important" missing languages. Between 600 and 10 000, there is probably a sensible middle-ground.

The important distinction here is: what "important" languages do LinguaLibre miss? What are the ones who are most likely to get records?

PS: with a SPARQL query, I found that there is 595 languages right now in LinguaLibre with an item, but only 164 lang have at least one record (including 68 languages with less than 10 records), so 431 languages have an unused and currently-useless items. Not sure how much not having an item is really a blocking code, do we have actual feedbacks.

As a side question to this : why do we even need to import languages from Wikidata ?

What solution do you propose instead?

[...] So agree to import list of "official" langages but skeptical about a massive import [...]

I agree, there are unwanted elements among the 10k elements, we have to find a middle-ground as said by Vigneron.
Here is a list of living languages with an iso 639-3 code on Wikidata : https://w.wiki/3Ugk (Edit: thanks @VIGNERON for the correction 😉). This is probably not precise enough, but it can still be a beginning, what do you think of it?

[...] PS: with a SPARQL query, I found that there is 595 languages right now in LinguaLibre with an item, but only 164 lang have at least one record (including 68 languages with less than 10 records), so 431 languages have an unused and currently-useless items. Not sure how much not having an item is really a blocking code, do we have actual feedbacks.

Every language deserves to be recorded on Lingua Libre, and IMO users should not have to ask admins to add a language, the language should already be there, or at least they should be able to add it themselves. I find very inconvenient in terms of user experience to have to ask to humans (which can take up to a few days) to add one's language.

Isn’t it possible to only use the Qid from wikidata, like for locations?

Yug triaged this task as High priority.Jul 6 2022, 11:24 AM
Yug updated the task description. (Show Details)
Yug lowered the priority of this task from High to Medium.Jul 20 2022, 9:57 AM