LinguaLibreBot : Improve handling of Wikidata Lexeme
Open, MediumPublicFeature
Actions

Assigned To

None

Authored By

	Pamputt
	May 24 2019, 7:44 PM

Description

Audio pronunciations have to be added on every forms
Most pronunciation-related values are stored as qualifiers of the pronunciation property (P7243) (see details)
~~P407 has to be used as qualifier of P443~~ (407 should NOT be used for Lexemes)

Create lexeme when trio language + form + pos does not exist on wikidata. See https://ordia.toolforge.org/language/ : French only has 10,000 lexeme on wikidata, about 50k forms on LL.

See https://lingualibre.org/wiki/LinguaLibre:Chat_room/Archives/2019#Feature_request:_ask_to_reuse_existing_identical_audio_if_available_.28part_2.29 and https://lingualibre.org/wiki/LinguaLibre:Chat_room/Archives/2019#Feature_request:_add_language_qualifier_to_lexeme_form_pronunciation_audio_statement for more details.

EDIT: https://lingualibre.org/wiki/LinguaLibre:Chat_room/Archives/2019#Adding_sounds_to_the_pronunciation_claim_in_Wikidata -> request to update the bot code to take into account the new pronunciation property.

EDIT2: https://lingualibre.fr/wiki/LinguaLibre:Chat_room#Wikidata -> request to add several pronunciations on one Wikidata item so that there are pronunciations with different accents

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open	Feature	None	T224312 LinguaLibreBot : Improve handling of Wikidata Lexeme
		Resolved	Feature	Poslovitch	T274667 Add an option to get list of Lexemes and/or Forms from Wikidata lexicographical data

Event Timeline

Pamputt created this task.May 24 2019, 7:44 PM

P407 isn't needed on lexemes. Even, I asked them not to add it.

Esc3300 added a project: Wikidata Lexicographical data.May 25 2019, 8:19 PM

Restricted Application added a project: Wikidata. · View Herald TranscriptMay 25 2019, 8:19 PM

Indeed for the second point, it is indicated on P443 (Constrain section) that the lexeme has to be excluded (it makes sense). So Lingua Libre Bot should not add P407. That's said, it is not clear why the exclamation mark appears in Strom.

Currently, I don't think the constraint system allows to limit a constraint scope to items only.

The exception there doesn't go beyond the entity https://www.wikidata.org/wiki/Q51885771

Pamputt updated the task description. (Show Details)Oct 13 2019, 3:34 PM

Yurik updated the task description. (Show Details)Oct 13 2019, 6:44 PM

Yurik updated the task description. (Show Details)Oct 13 2019, 6:49 PM

Theklan subscribed.Oct 25 2019, 4:47 PM

Pamputt updated the task description. (Show Details)May 23 2020, 6:31 AM

Pamputt changed the subtype of this task from "Task" to "Feature Request".Oct 6 2020, 7:12 PM

Ainali subscribed.Dec 12 2020, 11:17 AM

Wouldn't it be better to edit this through the API than a bot? The tool already asks for Oauth. This has a few advantages, I can get credit/blame more directly and I can also immediately inspect the edit (If I have to wait for the bot, I might forget it).

Pamputt updated the task description. (Show Details)Dec 12 2020, 12:29 PM

Pamputt updated the task description. (Show Details)Dec 12 2020, 12:38 PM

I took over the bot's code. I see there have been discussions here and the main task "body" might have not been updated to reflect that. Could someone please summarize what I need to do to get this done?

Poslovitch claimed this task.Feb 11 2021, 11:18 AM

Difficult to answer precisely. So I would say the best would be to ask on Wikidata talk:Lexicographical data to see what the "Wikidata lexicographers" expect from LinguaLibreBot. I think what is listed in the description is still valid but it's worth to ask to the current community. I think @VIGNERON may have some opinion about that since he is quite involved in lexicographical data.

The simplest solution that would still be hugely valuable would be to feed LinguaLibre with forms of lexemes (preferably through a query), get to record the forms in the regular interface, and then after they are uploaded the files get added with pronunciation audio (P443) on the respective form.

There used to be a way to add Wikidata queries, get a list of lexemes or forms, record them and automatically upload to Commons and Wikidata Lexemes, but this functionality seems to be gone. If we could get that working, the change would be huge.

This would be nice indeed. Actually, I do not remember how it did work before because the possibility to get a list of lexemes or forms should be implemented in the Record Wizard (Details step). I've opened a new ticket (T274667) to ask for this feature.

It can certainly be done as a subtask as it would be valuable even without the following addition. But I guess this task is dependent on such a query to be able to know the exact forms to add the files to, so it needs to be connected as a blocker.

That's it. Getting a list of items and recording them is not difficult. Having the bot upload them to the correct form is the tricky part.

Ainali added a subtask: T274667: Add an option to get list of Lexemes and/or Forms from Wikidata lexicographical data.Feb 15 2021, 2:12 PM

Yug moved this task from Query services to Bots and data management on the Lingua-Libre-Legacy board.Feb 15 2021, 3:06 PM

Yug updated the task description. (Show Details)Feb 21 2021, 4:02 PM

@Yug create lexemes would be very nice but quite difficult. "language + form" is not enough, at least the lexical category is mandatory to create a Lexeme.
Other data are needed to to determine if the lexeme exists and is the same (for instance for cases like "fils" - threads L10371- and "fils" - son L15917 - or "tour" L2330 and "tour" L2331).
How could we solve these problems?

See also this diff where the bot confused fils and fils...

@VIGNERON : Thank you ! exactly the kind of thing we ignore but need to know.
I do have a 3000 Chinese lexem datasets with hans, hant, pinyin (toned), pos, french translations.

Also, shouldn't this present task be split ? It really vague and becoming confusing.

Sadly I know the problem, not the solution...

And yes, we should have subtasks for each specific and independant issues.

Yug updated the task description. (Show Details)Feb 21 2021, 4:45 PM

Lea_Lacroix_WMDE subscribed.Apr 12 2021, 9:50 AM

After the recovery from the OVH fire... do we have any news on this? Thanks!

I have recovered the bot and it's now running on Toolforge.
The description of this task still remains a bit unclear to me, I'd appreciate if someone could split it into smaller-scoped feature requests.

Thanks @Poslovitch! I have one request, and it is in the subtask. There used to be an option to get a list of lexemes and forms and, after recording it, uploading automatically to the corresponding form/lexeme. Now this option is gone, so we can add a list of words, but they won't add to Wikidata lexemes.

Yes, thanks a lot @Poslovitch !

For the Create lexeme when trio language + form + pos does not exist on wikidata. I would abandonned that as 1. this would generate an almost empty lexemes 2. we don't have the pos (part of speech = lexical category) on LinguaLibre, no? @Pamputt if there is no objection, I will strike that.
A better solution would be to generate a todo list somewhere of recording not having a corresponding form in a lexeme (a list that then humans could use to carefuly create lexeme).

Poslovitch closed subtask T274667: Add an option to get list of Lexemes and/or Forms from Wikidata lexicographical data as Resolved.May 27 2021, 7:16 AM

Theklan reopened subtask T274667: Add an option to get list of Lexemes and/or Forms from Wikidata lexicographical data as Open.May 27 2021, 7:50 AM

Theklan closed subtask T274667: Add an option to get list of Lexemes and/or Forms from Wikidata lexicographical data as Resolved.May 27 2021, 10:53 AM

Lepticed7 removed a subtask: T283802: Words generator: words from Wikidata Lexemes.Sep 15 2021, 10:15 AM

Yug renamed this task from Improve LinguaLibreBot on Wikidata Lexeme to LinguaLibreBot : Improve handling of Wikidata Lexeme.Jul 6 2022, 1:12 PM

Yug triaged this task as Medium priority.Jul 6 2022, 10:19 PM

@Poslovitch: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

LinguaLibreBot : Improve handling of Wikidata LexemeOpen, MediumPublicFeatureActions

Description

Related ObjectsSearch...

Event Timeline

LinguaLibreBot : Improve handling of Wikidata Lexeme
Open, MediumPublicFeature
Actions

Related Objects
Search...