Page MenuHomePhabricator

Add a key to Z60 for the Wikidata item for a language
Open, LowPublicFeature

Description

Feature summary (what you would like to be able to do and where):

An instance of Z60 should contain not only a language code (Z60K1, most likely compliant with bcp47) and aliases thereof (Z60K2), but an identifier for the Wikidata item for that language (as Z60K3 most likely).

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Keys into multilingual Z12/texts and Z32/stringsets on Wikifunctions are represented by ZObject identifiers, but the indirect mapping of such identifiers to Wikidata items via language codes is prone to error in many circumstances, due not just to differences in the availability of language codes in different parts of the Wikimedia universe but also to how those language codes are used between those parts where those codes are available. It therefore would alleviate such errors to make a mapping from language objects to Wikidata items explicit within those objects.

There are some specific scenarios where a set of language objects sharing the same set of Wikidata lexemes would be assisted by having a Wikidata item key, such that a reference to the same item on multiple language objects would help establish a link between them:

  • different script variants of a language, which may have divergent language codes (e.g. Z1657/pa and Z1083/pnb for Punjabi);
  • different romanization standards for a language (e.g. those represented by the Wikidata items Q559173 and Q56929 for Z1221/nan, in addition to Z1647/nan-hani); and
  • different regional variants for a language (e.g. Z1003/es, Z1127/es-150, Z1547/es-419, and Z1133/es-mx).

Benefits (why should this be implemented?):

Functions that operate with respect to particular languages can, for example, choose lexemes based on the Wikidata item field and choose representations on those lexemes' forms based on the code field.

(tfsl currently deals with languages as code-item pairs, and code in Ninai and Udiron uses the existence of these pairs extensively.)

Event Timeline

Consider making this a Z2 key so that any persistent object can reference a corresponding Wikidata item.

This would effectively be a secondary identity key, which we'd previously hand-waved-for-now and said we weren't yet supporting. Not sure of the implications in our orchestrator code, if any.

Consider making this a Z2 key so that any persistent object can reference a corresponding Wikidata item.

I think we'd rather implement this via identity keys with specific semantic intent and a one-to-one mapping, rather than a catch-all "roughly the same as". For instance, Q32043 ("addition") would be the target of many Functions – one that takes two Integers, one that takes five, one that takes two floats, one that takes a vector of hypercomplex numbers, etc..

@Jdforrester-WMF Hmmm… “specific semantic intent” is good, but a link at the Z2 level is a semantic copula. It specifically asserts identity, not “roughly the same as”. Accordingly, it is much more likely to be appropriate for types and identities than for functions, implementations and tests.

I’m not sure I recall the specific hand-waving around secondary identities. In my view, Wikidata is the natural repository for these, so I would expect support for them to grow out of Wikidata integration. However, I consider Wikidata identifiers to be a special case, not least because their status as secondary identifiers is debatable at the WMF level (although they are clearly secondary outside of Wikidata, in the physical sense).

The Z60 case is a special special (special?) case, of course… but one way or another, I continue to agree with @Mahir256.