Page MenuHomePhabricator

Lingua Libre Bot cannot add recordings on some languages
Open, LowPublicBUG REPORT

Description

Hi, Lingua Libre bot came by recently but did not add my pronunciations in Salentin, Calabrian Central Southern and Southern Cilentine that I recorded two days ago. I guess this is due to the fact that these three languages do not have an ISO codeg. Is there a way to fix that?

Event Timeline

This is rather important topic because currently minor languages without ISO 639-3 code are not supported by Lingua Libre Bot and thus recordings are not added into Wikimedia projects.

One example of a word of such language is Q734852 in centro-meridional Calabrian.

For such languages, the only identifier that can be used is the Wikidata Q-ID.

So, one idea how to handle such languages with Lingua Libre would be to map Wikidata Q-ID with the identifier used on the WIktionary for this language. For example, the French Wiktionary uses "calabrais centro-méridional" to identify this language/dialect.

The question is where to create this map? It could be project by project (for example frwiktionary.py) but I think it would be better to have a central place for this map to avoid replicating the same informations everywhere (on each individual project file).

@Lepticed7 @Poslovitch any thought?

Hi. Is it not possible on wikidata, with a specific property?

Dans T329568#8613533, @Lepticed7 a écrit :

Hi. Is it not possible on wikidata, with a specific property?

I am not sure what you mean. Do you have an example?

On the wikidata element of the calabrais, maybe we can add a property like "identifier in Wiktionary", with the value "calabrais centro-méridional" and a qualifier "wiki" with the value "French Wiktionary".

Hmmm, not sure if the Wikidata community will accept such a property that is very specific. I mean I do not know how to write a property proposal because the only need for now is for Lingua Libre Bot.

This query gives currently 46 Lingua Libre languages that do not have ISO 639-3 code.

For now, I will create a JSON file to map all these languages to the Wiktionary codes.

Dans T329568#8613730, @Lepticed7 a écrit :

On the wikidata element of the calabrais, maybe we can add a property like "identifier in Wiktionary", with the value "calabrais centro-méridional" and a qualifier "wiki" with the value "French Wiktionary".

Thinking a bit more, as I said, I do not think the Wikidata community will accept such property. Yet, what about adding these informations in the Lingua Libre Wikibase (instead of the Wikidata Wikibase).

I think that we should get rid of the LiLi Wikibase, so I don’t think that this is a good idea.

Not using the Lingua Libre wikibase anymore? Why not, but I don't think it's going to happen tomorrow. By the way, are there any public discussions (a Phabricator ticket or something) where you can read about the state of thinking on the subject?
So in the meantime, I think adding a property to the Lingua Libre wikibase is not such a bad idea.

By the way, are there any public discussions (a Phabricator ticket or something) where you can read about the state of thinking on the subject ?

All tickets are there https://phabricator.wikimedia.org/tag/lingua_libre/ as you know.
I searched the "databse" group and found nothing.
Migration of the whole wikibase to Commons wikibase is such a large umbrella issue that i'm not sure we created any ticket for that. We just brainstormed it, mostly Lepticed7 & Poslovitch. We may have some very early document somewhere tho. (Pad ? Wiki ?)

So I go ahead and I have created a new property. See an example here. I think we can use "Wikimedia language code" property to avoid creating a new one.

@Pamputt Thanks! I added two alias cf. diff because they are listed here.

I reverted because one should list only the "main" language code. Aliases are for Wiktionary internal purpose and must not be used by LLbot. If we really want to add them in the Lingua Libre Wikibase, then we should mark these values as deprecated.

@XANA000 does Lingua Libre Bot now add pronunciations in the languages you have listed above?

@Pamputt No, for exemple I recorded the word "barcuni" (item on LinguaLibre) in "Calabrais centro-méridional", but as you can see here, there's no audio file in this section.

OK, I will check Lingua Libre Bot and once the "new language code system" is supported I will run the bot on the French Wiktionary only for thos languages.