QuickStatements is partially supporting Lexemes since October 2018 (see announcement). Now we're moving forward with organizations who would like to import data, it would be very helpful to have the full feature ready for big imports.
Description
Related Objects
Event Timeline
@Magnus is there anything that is missing from the Wikidata side (API, etc.) to complete the work? Any support you may need from us or volunteers in order to get this done? :)
The patch that resulted in the current support is surprisingly small: https://phabricator.wikimedia.org/R2010:d4bbd816e688d910d28617acf22a8ecc2a725dc5
This makes me think that it is a relatively small task to support Lexeme creation?
I have some JS and Guile experience but my python knowledge is very limited so I guess I'm not the best suited for the task, but it would be really nice if somebody would get it done.
Also the WD API seems to support glosses fine because it is what powers MachtSinn that saves glosses using this code https://github.com/Nudin/makesense/blob/master/app.py#L160
It uses LexData written in python by the same author, found here: https://github.com/Nudin/LexData
LexData supports lexeme creation as well as glosses, forms and grammatical forms. That means it support all important features of lexemes! Hooray! :)
I recommend adding LexData to QS as it is well written and works in MachtSinn without problems. Anyone up for the task?
The desire to create lexemes via QuickStatements was raised by a FactGrid user in the Wikibase Community User Group mailing list.
It seems there are still relatively few tools for this, most lexicographical data scripts being oriented around improving existing Wikidata content.
With QuickStatements installed on Wikibase.cloud and JS scripts currently forbidden, it'd also be nice to have it as an option that doesn't require, say, WikibaseIntegrator. (Cradle support is another idea, but not within this task.)
Can we clarify what is still missing? Is it only the creation of Lexemes? The dev team had asked for that to be held off a bit initially to take things a bit slowly when Lexemes were introduced. I think by now this is no longer an issue and shouldn't block anything.
To makes thing clearer - and if I'm not mistaken - what is still missing:
- create lexeme
- create form
- create sense
- on lexeme level: edit lemma, lexical category, and language
- on form level: edit representation and grammatical feature
- on sense level: edit glose and its language
IIRC full Lexeme support was postponed so we can start Lexemes on Wikidata "clean", and not as a mass import from Wiktionary or some copyright-dubious source. If we agree that Lexemes have reached critical mass, I can add Lexeme support to QuickStatements, unless you want to wait for that rewrite (who was doing this? Brasil?)
Not sure if it was the main or only reason but probably...
If we agree that Lexemes have reached critical mass, I can add Lexeme support to QuickStatements, unless you want to wait for that rewrite (who was doing this? Brasil?)
That would be great! There is now more Lexemes than there is entries on any Wiktionary (depending on how you count, lexeme forms are more-or-less the same as Wikt entries...) so I guess we can say we "reached critical mass".
Yes @ACorrea-WMB (and others) worked on QS3. But IIRC, QS3 works with the Rest API that does not work fully on Lexemes, so maybe QS2 could be the solution for more efficiently create and improve Lexemes (and indeed a lot of them needs it).
FWIW, the specific issue here is covered in T329096 and as I understand it relates to the desire of the Wikibase team to first rewrite WikibaseLexeme to adhere to the precepts of Hexagonal Architecture, such work needing to be prioritised against other tasks.
Let's do it! :)
(And yeah Wikimedia Brasil. We'll work on adding Lexeme support to the REST API so it can later be integrated in Quickstatements 3. But until then it doesn't hurt to enable it in Quickstatements 2.)
Yay!! As for us at Wikimedia Brasil, as soon as Lexemes are available in the Wikibase REST API we'll quickly work to support it in QuickStatements 3. We already have most of the syntax prepared.
I have added the required Lexeme fixes to both the PHP and the (back-end) Rust version. Specifics are here (not yet in the official help page). Can someone more familiar with Lexemes give it a whirl please, before I announce it to the general population?
First, thanks a lot @magnusmanske ! (you may just have destroyed my future free time but I'm very glad about it ;) ).
I just did a quick try with the following code:
CREATE_LEXEME Q12107 Q147276 br:"Montroulez" LAST P12846 "m/montroulez/" LAST ADD_FORM br:"Montroulez" Q110786 LAST ADD_SENSE fr:"commune française" LAST P5137 Q202368
See what I've got:
It work mostly well, see L1560547. Apparently the last 2 lines didn't work (from what I understand it tried to add the sense to the form? or maybe I misunderstood the syntax?).
When redoing the two last lines with the actual Lexeme id, it worked fine:
L1560547 ADD_SENSE fr:"commune française" LAST P5137 Q202368
Also, less importantly, as visible in the screenshot, it seems that the interface don't know yet these new commands, I see "UNKNOWN COMMAND".
@VIGNERON The LAST after ADD_FORM referred to the FORM not to the Lexeme. This is indeed confusing, so I changed it to use LAST to refer to the last Lexeme created. Please try again (with a new Lexeme), it should work now. I also changed the docs accordingly.
I don't think it's true yet that an entire lexeme can be created in one go if form and sense IDs have to be calculated when preparing a QuickStatements batch. "Item for this sense" statements, as the property name suggests, go on senses; they do not go on lexemes as the command examples given show. There are lots of other statements that can go on senses (images, semantic genders, external IDs) and forms (pronunciation, morphological context, external IDs) as well, and what I fear will happen with the current setup is that someone decides to write up a batch thinking, naturally, that LAST refers to the last entity (whether lexeme, form, or sense) created, or that they will simply omit all statements on forms/senses due to an unwillingness to calculate form/sense IDs.
Perhaps some commands LAST_SENSE and LAST_FORM might be introduced to allow statements on those (in addition to grammatical features, form representations, and sense glosses) to be added? e.g.
CREATE_LEXEME Q12107 Q147276 br:"Montroulez" LAST P12846 "m/montroulez/" LAST ADD_FORM br:"Montroulez" Q110786 LAST_FORM P898 "[mɔ̃tˈʁuːles]" LAST_FORM P443 "Br-Montroulez.ogg" LAST ADD_SENSE fr:"commune française" LAST_SENSE P5137 Q202368 LAST_SENSE P18 "Vue_de_Morlaix.JPG"
I have implemented the idea of @Mahir256 in both PHP and Rust, and put everything live. Please test. Note that I might not be able to reply until tomorrow.
I did not make use of the new syntax (LAST_SENSE and LAST_FORM), but a simple "add" a value for a property on ~800 lexeme forms.
The temporary batch was working fine (see https://www.wikidata.org/w/index.php?title=Lexeme:L1523693&oldid=2475322672 as an example). But then it started errorring with the following API response in the Firefox dev tools:
{
"status": "OK",
"command": {
"action": "add",
"item": "L1522609-F1",
"property": "P7481",
"what": "statement",
"new_statement": 0,
"datavalue": {
"type": "wikibase-entityid",
"value": {
"entity-type": "item",
"id": "Q138786802"
}
},
"meta": {
"message": "",
"status": "RUN",
"id": 0
},
"summary": "#temporary_batch_1774469814385",
"status": "error",
"message": "Item L1522609-F1 is not available"
},
"last_item": "",
"last_form": "",
"last_sense": ""
}I could somehow get it working again by starting another temporary batch, but it would start failing again after ~100 forms.
Now, even if I try a simple batch with a single line, it fails with the similar "Item Lxxx-Fxx is not available error".
Attached is the initial, full set of QSv1 commands.
Hi,
I had the same "Item LXXX is not available" problem when trying to create L1560879
The original code started with
CREATE_LEXEME Q12107 Q147276 br:"Douarnenez" LAST P11068 "douarnenez"
The second line failed and then I tried
L1560879 P11068 "douarnenez"
which also failed...
