Page MenuHomePhabricator

Use standard language codes in RDF output
Open, MediumPublic

Description

As an RDF consumer (linked data interface, query service, or dumps), I want to work with standard language codes, not custom MediaWiki ones.

Problem:
We currently directly export the internal language codes to RDF (in terms and monolingual text): simple instead of en-simple, de-formal instead of de-x-formal, etc.

Note that we do map the language codes already for sitelinks (schema:inLanguage, schema:name).

Example:
The sandbox item currently has a label in “de-formal”; to this query, that should appear as “de-x-formal”, but currently it doesn’t.

Screenshots/mockups:

BDD
GIVEN
AND
WHEN
AND
THEN
AND

Acceptance criteria:

  • the language tag of labels, descriptions and aliases uses standard language codes
  • the language tag of monolingual text values uses standard language codes
  • the language tag of the schema:name of a sitelink uses standard language codes (already the case, should not regress)
  • the schema:inLanguage of a sitelink uses standard language codes (already the case, should not regress)

Open questions:

NOTE: This constitutes a breaking change and should be announced in accordance with our Stable Interface Policy.

Event Timeline

As far as I can tell, this was already implemented for monolingual text values and sitelinks in I92533ca968, part of T105430: [Task] Ensure that language tags generated in RDF output are standard language names , but then the monolingual text part got lost during a refactoring (I6d9b99d657, T118500), apparently accidentally (I see no mention of it in the review comments). For terms this was never implemented as far as I can tell.

Gehel triaged this task as Medium priority.Sep 15 2020, 8:00 AM

None of the language codes currently given in the description of this ticket ( simple or de-formal ) should be in termboxes at Wikidata. Eventually, the should be disabled (see T284808 )