Page MenuHomePhabricator

New serialization code needs to support language fallback
Closed, ResolvedPublic

Description

The serializer needs to be able to represent language fallback - that is, they key used for a label or description term can be different from the actual language of the term. For example, consider https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&languages=ii&format=json&languagefallback: The requested language is ii, but since there is no label known in ii, but a language fallback from zh-cn to ii is defined, we get back the zh-cn label for the ii key:

"labels":{
  "ii":{
    "language":"zh-cn",
    "value":"\u9053\u683c\u62c9\u65af\u00b7\u4e9a\u5f53\u65af"
  }
}

We will need to support at least this behavior, either in the serializer or in the model.

Note: According to bug 72038, "ii" should also be present in a separate field called "for-language". And that in cases where translitteration is involved, there may be a third language (the original language) involved.


Version: unspecified
Severity: normal
Whiteboard: u=dev c=backend p=0 s=2014-11-11
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=73308
https://bugzilla.wikimedia.org/show_bug.cgi?id=72038

Details

Reference
bz72183

Related Objects

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:45 AM
bzimport set Reference to bz72183.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Oct 17 2014, 1:48 PM

After a discussion today we agreed to the following:

  • The data model has to have knowledge of the fact that an entity can have a term for another language than the term is in
  • The data model serializers and deserializers have to have knowledge of the fact that there are language fallbacks

Premises:

  • We want to provide a view on our data which includes for example language fallbacks (for the API, for wbEntity, …)
  • We want to enable users (for example the JavaScript frontend code) to work with these views
  • Data model deserializers should return data model objects
  • Data model deserializers should not lose information

Necessary steps:

  • Make TermList a TermMap
  • Make TermMap::_construct respect the keys of its parameter
  • Make TermMap::setTerm expect a language parameter (adapt callers in DM)
  • Make EntityDeserializer::deserializeValuePerLanguageSerialization respect and pass the keys
  • Make EntityDeserializer::setAliasesFromSerialization respect and pass the keys
  • Make EntityDeserializer::assertIsValidValueSerialization assert on the key
  • Make FingerprintSerializer::serializeValuePerLanguageArray (and everybody above it) aware of the fact that there are different ways to serialize a map of terms (with and without keys, with and without fallback terms included)
  • Write high-level, implementation-independent documentation on this decision

Additional requirement: We want to have a facility that can test if an object has inferred information like language fall back and thus should not be written into the database. So that we can easily ensure that at runtime instead of only by code review.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
adrianheine closed this task as Resolved.Jan 29 2015, 8:25 AM
adrianheine added a subscriber: adrianheine.