Page MenuHomePhabricator

[EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites
Open, Needs TriagePublic

Description

Now we have lexicographical data on Wikidata, it should be possible to reuse it on Wiktionaries with Lua functions.

Open questions for the communities:

  • what kind of data do you want to access?
  • what are the use cases you could imagine?
  • what Lua functions would be helpful for you?

Event Timeline

Addshore rescinded a token.
Addshore awarded a token.
deryckchan renamed this task from [EPIC] Access to Wikidata's lexicographical data from Wiktionaries to [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites.Jan 26 2019, 11:09 PM

Hello @deryckchan, and thanks for creating this task!
We're actually considering this for the year to come, but before starting developping anything, we need to understand better what people would like to do with the data, how they would like to display it on their wiki, what kind of Lua functions they would need.
If you already have some ideas, or use cases, feel free to share :)

As I wrote on T213941

I would envisage this being done using parser functions, similar to {{#property:}} for Q-items.

As the first step, we should make {{#statements:}} and {{#property:}} work for Lexemes too. For example, we should make these work:

  • {{#statements:P5974|from=Q4115189}} (i.e. domain:item; datatype:Lexeme/Sense/Form) currently outputs the Lexeme-Sense ID "L123-S2". It should output the lemma or gloss
  • {{#statements:P5974|from=L123-S2}} (i.e. domain:Lexeme/Sense/Form; datatype:anything) throws parser error at the moment

This will address migration blocks like https://www.wikidata.org/wiki/Wikidata:Properties_for_deletion#Property:P2521, where the lack of a feature to call Lexemes in Wikipedias is blocking the migration of a property.

My suggestion would be to simply mirror the functions that are currently available for Q-items - either by duplicating the code that does that and changing "Q" to "L", or better, generalizing it so that it works for all of Wikibase's namespaces (P/Q/L/M/...). Then it can be built upon on-wiki as needed (e.g., through Module:WikidataIB). That would also help structured data on commons, and future projects using wikibase.

Pamputt added a subscriber: Pamputt.Jun 4 2019, 8:25 PM

For the French Wiktionary, I do not know what will decide the community but if we decide one day to use the Lexeme data from Wikidata, it will be the most probably for the Forms (conjugation, inflection, declension, etc). I think we will never use the Senses. So what Mike Peel proposed just before makes sense for a full flexibility.

This will address migration blocks like https://www.wikidata.org/wiki/Wikidata:Properties_for_deletion#Property:P2521, where the lack of a feature to call Lexemes in Wikipedias is blocking the migration of a property.

Note that the discussion has been archived. It is now available here: https://www.wikidata.org/wiki/Wikidata:Requests_for_deletions/Archive/2019/Properties/1#female_form_of_label_(P2521)

RexxS added a subscriber: RexxS.Jun 4 2019, 9:53 PM

I'd like to have a complete collection of api calls exposed to Scribunto. I should be able to get the following:

getEntity - the whole object (probably expensive, but would mostly be used to look at structures)
getLanguage - entity ID like Q1860 for 'English'
getLexicalCategory - entity ID like Q24905 for 'verb'
getStatements - table
getSenses - table
getForms - table (each value is an entity ID along with qualifiers 'Grammatical features', a table of entity IDs like Q110786 for 'singular, etc.)

That would be enough, in my opinion, for me to write almost any Scribunto code that the folks at the Wiktionaries and other sites could ask for (until you start changing the structure of the lexemes, of course). If all of these returned values are normal q-numbers (entity IDs), I already have plenty of code to handle getting labels, sitelinks, etc. to display in the local or preferred language, so we probably wouldn't need to worry about further internationalisation.

We have a bunch of words and forms uploaded in Basque, they should be at least 5.000, and as euwikt is quite dead, this could be a good boost to the project.

If someone wants to use basque wiktionary for testing purposes, let's talk about it.

Iniquity added a subscriber: Iniquity.