Page MenuHomePhabricator

Investigation: How to make it possible to format a Lexeme used in a statement
Closed, ResolvedPublic

Description

The goal of the investigation is to find out what would be possible ways to allow formatting of the lexeme used in the statement, and evaluating those possibilities.

Outcome of the investigation would be:

  • description of the changes needed, that could be then turned into the implementation (depending on the case, could be either prose, or experimental code, etc)
  • Decision made by developers on possibilities, in the form of the comment on this ticket, and following task, or tasks, created based on the solution/approached chosen.

Related Objects

Event Timeline

WMDE-leszek triaged this task as Normal priority.Jan 19 2018, 4:57 PM
WMDE-leszek raised the priority of this task from Normal to High.
WMDE-leszek created this task.
WMDE-leszek moved this task from Backlog to In Progress on the Wikidata-Sprint-2018-01-17 board.

Change 406306 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/WikibaseLexeme@master] [DNM] Display lexemes in statements

https://gerrit.wikimedia.org/r/406306

Change 406608 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/WikibaseLexeme@master] [DNM] Move the lexeme "displaying" logic to the LexemePresenter

https://gerrit.wikimedia.org/r/406608

WMDE-leszek added a comment.EditedJan 29 2018, 5:47 PM

Two proof-of-concept patches uploaded. Intentionally overly simplified etc. Not meant to be merged in any way.

https://gerrit.wikimedia.org/r/406306 adds a formatter class to display lexemes as HTML.
https://gerrit.wikimedia.org/r/406608 moves out the "logic" on how the lexeme is to be displayed (i.e. which data of it to use etc) to a service. It could then be used in other places, e.g. in non-HTML formatters, but possibly in other places too. Note: the "custom" HTML formatter is going to be needed any way, given the specific HTML display (ie. title attribute of the link).

WMDE-leszek removed WMDE-leszek as the assignee of this task.Jan 29 2018, 5:47 PM
WMDE-leszek moved this task from In Progress to Review on the Wikidata-Sprint-2018-01-17 board.

Change 406612 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] [DNM] Add EntityPresenter interface and simple implementation

https://gerrit.wikimedia.org/r/406612

WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement to Investigation: How to make it possible to format a Lexeme used in a statement (days: 1).Jan 30 2018, 2:00 PM
WMDE-leszek moved this task from Backlog to Review on the Wikidata-Sprint-2018-01-31 board.
WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement (days: 1) to Investigation: How to make it possible to format a Lexeme used in a statement (days: 2).Jan 31 2018, 11:23 AM
WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement (days: 2) to Investigation: How to make it possible to format a Lexeme used in a statement (days: 3).Feb 1 2018, 11:13 AM
WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement (days: 3) to Investigation: How to make it possible to format a Lexeme used in a statement (days: 4).Feb 2 2018, 11:21 AM

In addition to a few more specific comments I left on the Gerrit patches, here is what I think about the different approaches:

LexemeIdHtmlFormatter

https://gerrit.wikimedia.org/r/406306 registers a formatter via the existing registry infrastructure Wikibase provides (namely "formatter-factory-callback"). This formatter can exclusively turn LexemeId objects into HTML strings. This formatter is automatically used in almost all cases we care about (LexemeId references in statements, special pages, even summary lines, I believe).

Note there was already a LexemeIdHtmlFormatter. The patch just replaces it with an other one that does not hard-code so much knowledge, but relies on a message instead.

I like this approach very much because it is so self-contained. I suggest to implement this no matter how we are going to implement derived labels (see T175030) later. The only detail we need to think about is a proper caching layer for this formatter (including prefetching), to avoid fetching possibly hundreds of Lexemes via an EntityLookup, which is what the code in the patch currently does.

EntityPresenter

The approach presented in https://gerrit.wikimedia.org/r/406608 is not different, but builds on top of the first. It still is just a "formatter-factory-callback" utilizing the existing ….datatypes.php infrastructure.

It introduces the concept of "primary" and "secondary labels" that come in pairs. The patch starts using this concept in a single place in the LexemeIdHtmlFormatter. At this point this is not much more but some formatter logic factored out into a separate class. The only difference users will see is that the messages translators are going to see are constructed in an other, more modular way.

As the patch is now, it does not even need the interfaces in Wikibase. This can be seen in https://gerrit.wikimedia.org/r/406612: the code introduced there is not used in Wikibase. I believe the "primary/secondary label" concept is meant to be used in more places. But at this point I can not tell where.


What the patches don't cover (possibly because of this tasks wording) are:

  • How to put derived Lexeme labels, descriptions, and "aliases" a.k.a. "secondary labels" (whatever that is on a Lexeme, maybe the Forms?) into wb_terms and search indexes?
  • Is an EntityIdFormatter (possibly a plain text one) enough to have proper Lexeme labels in RDF exports?

What the patches don't cover (possibly because of this tasks wording) are:

  • How to put derived Lexeme labels, descriptions, and "aliases" a.k.a. "secondary labels" (whatever that is on a Lexeme, maybe the Forms?) into wb_terms and search indexes?
  • Is an EntityIdFormatter (possibly a plain text one) enough to have proper Lexeme labels in RDF exports?

This indeed not done. Are these necessary for get lexemes in the statement on on lexeme (or item) page?
I suspecting wb_terms would be ralated to searching for the lexeme, when adding it to the statement. But searching is different thing, that we are actually going to jump on after solving the display part.
But I might have easily missed something, so if there are more requirements, this should be added indeed.

I'm bringing this up because this ticket is linked to the "derived labels" ticket T175030, but I do not think this ticket here will help in any way with that. Having a better LexemeIdFormatter will probably not help much when it comes to the search integration. And RDF is even unrelated to HTML, so having a better HTML formatter can't help there either.

This is not an issue we need to care about now. What the first patch proposes is totally fine. I think at this point we should only care about T184997 and get it done (ideally with proper prefetching in place, which is currently missing).

WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement (days: 4) to Investigation: How to make it possible to format a Lexeme used in a statement (days: 5).Feb 5 2018, 11:51 AM
WMDE-leszek removed a project: Patch-For-Review.

This investigation is done. Outcome:

  • we will go with the approach of moving the "displaying the entity" logic to the service, i.e. something along the lines of the approach outlined in https://gerrit.wikimedia.org/r/#/c/406608/. To keep in mind: the service would probably generate the "display labels" in some structured way, not a primitive strings like in the proof of concept.
  • For scalability and performance reasons, there should be a way to get the data needed for displaying the lexeme without needed to load the "whole lexeme" (open question: does it need a separate investigation?)

@WMDE-leszek is going to close this ticket once the further tasks are created.

WMDE-leszek renamed this task from Investigation: How to make it possible to format a Lexeme used in a statement (days: 5) to Investigation: How to make it possible to format a Lexeme used in a statement.Feb 6 2018, 10:30 AM
He7d3r added a subscriber: He7d3r.Feb 6 2018, 3:27 PM
WMDE-leszek closed this task as Resolved.Feb 7 2018, 8:31 AM
WMDE-leszek claimed this task.

The next step is scalability-question-related investigation: T186606.

Change 406608 abandoned by WMDE-leszek:
[DNM] Move the lexeme "displaying" logic to the LexemePresenter

https://gerrit.wikimedia.org/r/406608

Change 406306 abandoned by WMDE-leszek:
[DNM] Display lexemes in statements

https://gerrit.wikimedia.org/r/406306

Change 406612 abandoned by WMDE-leszek:
[DNM] Add EntityPresenter interface and simple implementation

https://gerrit.wikimedia.org/r/406612