Page MenuHomePhabricator

Decide whether a Lexeme's lemma is a single Term, or a TermList (multi-variant).
Closed, ResolvedPublic

Description

For many words in many languages, different representations (spellings, scripts) exist. We need to decide how to model this fact in Lexemes. Options include:

  • a separate Lexeme for each spelling
  • a lemma for each spelling, all on the same Lexeme
  • single Lemma, spellings are represented by Forms
  • single Lemma, spellings are represented by Lexeme-level Statements.

Decision matrix:
https://docs.google.com/spreadsheets/d/1PtGkt6E8EadCoNvZLClwUNhCxC-cjTy5TY8seFVGZMY/edit?ts=5834219d#gid=0

Maling list discussion:
https://lists.wikimedia.org/pipermail/wikidata-tech/2016-November/001057.html

See also:
T151626: Investigate and decide the representation of languages and variants in Lexeme entities

Event Timeline

daniel added subscribers: aude, hoo.

At the engineering/product meeting with @Lydia_Pintscher @thiemowmde @WMDE-leszek @Ladsgroup @aude @hoo and me, it was decided to go with the multi-variant approach. @Denny had previously said on the mailing list that he now also favors this approach. Arguments:

  • the multi-variant approach is more in line with the Lemon model
  • the multi-variant approach better forward-compatibility (no breaking change for consumers when going to a single-lemma model).
  • the multi-variant model is more expressive, and thereby leads to more concise and easier to manage instance data.
  • the logic needed to select a variant based on user preferences already exists for Item labels.