Page MenuHomePhabricator

[RFC] Decide on a mechanism for supporting derived values during serialization
Closed, ResolvedPublic

Description

Proposed solutions (distilled from discussions like the one at WikibaseDataModelSerialization/pull/171):

  1. Inject the derived data into the data model before serialization. Extend the (PHP) data model to include the derived value. Extend the serializers to output it. (Extend deserializers to process it, optionally).
  2. Inject additional information during serialization, based on lookup services injected into the serializers
  3. Inject the derived data after serialization, into the nested array structure output by the serializer.

Assessment from the discussion on 2015-09-15 (Bene*, Thiemo, Jan Z, Jonas, Jeroen, Daniel):

Ad 1:

  • Several options: a) subclass, b) wrap/delegate, c) fork/specialize, d) interfaces/views
  • Much effort. Possibly big breaking change (depending on which option we pick)
  • Serializers need to be aware of "extended" model. But they need to dispatch based on the type of sub-structure anyway.
  • Difficulty: Derived values should never go into the DB, should not be accepted as input for editing
  • Advantage: Can be used in client code and formatter code (our own, and 3rd party's). Works symmetrically with deserialization.
  • Currently used to represent Terms with language fallback information

Ad 2:

  • Serializer needs to know about types of derived values, and which lookups to use to get the required info.
  • Advantage: keep the data model clean (but 1b and 1d also do this).
  • Used in the older serialization code to generate URLs for SiteLinks
  • No support for deserialization. No support for client side usage.

Ad 3:

  • Advantage: keep the data model clean (but 1b and 1d also do this).
  • Currently used to inject URLs for SiteLinks
  • Post-processing may involve de-serializing parts of the structure again (in particular EntityIds, for use with the lookups)
  • Needs knowledge of "pathes" in JSON
  • No support for deserialization. No support for client side usage.

Since options 2 and 3 do not offer supprt for deserialization (and thus client side usage), one important question was about the use cases we have for derived values. The following use cases for derived data were identified (among others):

  • ingestion by third parties (dumps)
  • use by gadgets (API)
  • use via Lua (php data model / JSON)
  • use by the formatter (php data model)

This lead to the conclusion that support for representation in the data model would be useful not only for 3rd parties, but also for own own code (formatters, Lua). From that followed the conclusion that we would end up doing option (1) to some extent anyway. Going for option (2) and (3) for the serialization, while implementing (1) for deserialization, seems likely to cause inconsistencies and code duplication.

NOTE: Conclusion: we want option (1), full representation of derived values in the data model.

Further thought and discussion is needed to decide which sub-option to choose. (1a) was agreed to be the least desirable, since it leads to a "pollution" of the basic data model. (1b) and (1d) were favored during the discussion, (1c) was seen as hard to maintain and prone to duplication. See T112550: [RFC] How to represent derived values in the data model, and allow for deferred deserialization

Event Timeline

daniel created this task.Sep 14 2015, 4:23 PM
daniel raised the priority of this task from to Normal.
daniel updated the task description. (Show Details)
daniel set Security to None.
daniel closed this task as Resolved.Sep 14 2015, 5:07 PM
daniel claimed this task.
daniel updated the task description. (Show Details)

See conclusion in the description

daniel added subscribers: thiemowmde, Jonas, aude, Bene.
daniel added a subscriber: JeroenDeDauw.
daniel updated the task description. (Show Details)Sep 14 2015, 5:19 PM
daniel updated the task description. (Show Details)Sep 14 2015, 5:28 PM
aude added a comment.Sep 16 2015, 3:39 PM

agree with the recommended approach of including derived values in the data model , that this is the most sane and reasonable option.

This is blocked by T112550, which is on proposed, while this is on review. Huh?

Tobi_WMDE_SW closed this task as Resolved.Sep 29 2015, 2:19 PM
Tobi_WMDE_SW moved this task from Review to Done on the Wikidata-Sprint-2015-09-15 board.
Tobi_WMDE_SW added a subscriber: Tobi_WMDE_SW.

This was resolved how?

This was resolved how?

By the discussion you attended. It's documented in the description. At the bottom it sais:

Conclusion: we want option (1), full representation of derived values in the data model.