Page MenuHomePhabricator

Decide whether a Lexeme's lemma is a single Term, or a TermList (multi-variant).
Closed, ResolvedPublic


For many words in many languages, different representations (spellings, scripts) exist. We need to decide how to model this fact in Lexemes. Options include:

  • a separate Lexeme for each spelling
  • a lemma for each spelling, all on the same Lexeme
  • single Lemma, spellings are represented by Forms
  • single Lemma, spellings are represented by Lexeme-level Statements.

Decision matrix:

Maling list discussion:

See also:
T151626: Investigate and decide the representation of languages and variants in Lexeme entities

Event Timeline

daniel added subscribers: aude, hoo.

At the engineering/product meeting with @Lydia_Pintscher @thiemowmde @WMDE-leszek @Ladsgroup @aude @hoo and me, it was decided to go with the multi-variant approach. @Denny had previously said on the mailing list that he now also favors this approach. Arguments:

  • the multi-variant approach is more in line with the Lemon model
  • the multi-variant approach better forward-compatibility (no breaking change for consumers when going to a single-lemma model).
  • the multi-variant model is more expressive, and thereby leads to more concise and easier to manage instance data.
  • the logic needed to select a variant based on user preferences already exists for Item labels.