Page MenuHomePhabricator

[Wikidata] Provide a feature to link Item labels to Lexemes
Open, Needs TriagePublicFeature

Description

Feature summary:
I would like to be able to link Wikidata Item labels to corresponding lexemes.

For instance, Item Q467 has the English label "woman" and the Hebrew label אישה. These should be linked to corresponding lexemes L3338 and L63925.

For muti-word labels, like "Eiffel Tower" of Q243 : ideally all words of the label should be linked to their corresponding lexemes, but if that is not possible, at least the head word should be linked to a lexeme, e.g. L12700.

Use case(s):

It is currently not possible to navigate from an item to a corresponding lexeme, without using "reverse link lookup", e.g. clicking on the "What links here" link in the graphical UI.

In Scribunto, the WikibaseLexeme API does not provide such a reverse lookup, and thus it is not possible to go from an item to a corresponding lexeme (the reverse is possible, since Lexemes may have an "Item for this sense" property).

Benefits:

  • Implementing this would allow for easy navigation from items to corresponding lexemes, increasing the informational value associated with the current "label" annotations.
  • In particular, it would allow linking from items to lexemes in the Scribunto development environment.
  • This would facilitate the development of the Abstract Wikipedia NLG system.
  • It would made properties such as "female form of label" and "male form of label" possibly redundant, as such information could be given in the linked lexeme.

Event Timeline

AGutman renamed this task from [Wikidata] Linking items labels to lexemes to [Wikidata] Linking Item labels to Lexemes.Oct 7 2022, 2:35 PM
AGutman created this task.

The items will have many links then. For water (Q29053744) there are current 849 language links (see https://ordia.toolforge.org/Q29053744). I think that will clutter too much.

Ordia provides the reverse lookup:

https://ordia.toolforge.org/Q467 (woman)

https://ordia.toolforge.org/Q243 (Eiffel Tower)

Do you mean that will clutter the UI or the database itself? If the former, this can be solved by selectively showing these link in the UI. If you refer to cluttering the database itself - I agree this would require extra capacity, but I don't think it is unmanageable.

The problem with reverse links is two-fold: first, it is not available for all APIs (specifically, the Scribunto API), and second, a reverse lookup is latency-sensitive, since you need to traverse the entire lexeme database, unless you have a reverse index. But having a reverse index is equivalent to having these back-links in the main database anyhow.

I would mean both UI and the database. It is a question of how series this is: there is already the labels and description for a large number of languages.

As said, both issues can be solved. The issue is that, as currently construed, the labels/descriptions are not really machine-readable: currently they are usable mostly for human consumption.

Having only multi-lingual labels in ontologies, without backing linguistic information, is known to have limitations: see "A Review of Multilingualism in and for Ontologies" by Gillis-Webber and Keet: https://arxiv.org/pdf/2210.02807.pdf

Jdforrester-WMF renamed this task from [Wikidata] Linking Item labels to Lexemes to [Wikidata] Provide a feature to link Item labels to Lexemes.Oct 11 2022, 6:40 PM

Two issues:

  • Would one be interested in linking senses rather than lexemes? A lexeme does not not necessarily correspond to one Q-item. For instance, Danish lexeme https://ordia.toolforge.org/L33929 would be a lighthouse (Q39715) or a heating unit (Q1409761). Is the sense the more appropriate level?
  • Could the feature be implemented as just a Wikidata property that would link the lexeme/sense?
  • Yes, you're right it makes more sense to link to a sense.

However, it may prove problematic for items with multi-word labels (which should thus link to multiple lexemes).
From a knowledge-representation point of view, I think it makes more sense to represent this as annotations on the labels, as it enriches the labels themselves with extra value, and we don't need to validate coherence between the labels and the extra property.

As a stop gap solution, I'm suggesting we use the literal translation property to link items to senses. As an example of its usage, I've linked Q467 to Hebrew L63925. This seems to work well for cases where the item corresponds to a single lexeme (sense).

For multiword labels, I tried to use this property qualified by the combines lexemes property but it doesn't seem to work so well, as the UI doesn't allow to add multiple lexemes in this context.

There is a discussion of my stop-gap solution on the item where I added the literal translation property: https://www.wikidata.org/wiki/Talk:Q467#Lexemes

To remedy this, I have create a property proposal Verbalization by lexeme which can be used for this purpose.