- Objective: include derived information in JSON output
- Problem: the (abstract) wikibase datamodel does not support derived values; the datamodel serialization component does not support the inclusion of additional information in the output
Proposed solutions (distilled from discussions like the one at WikibaseDataModelSerialization/pull/171):
- Inject the derived data into the data model before serialization. Extend the (PHP) data model to include the derived value. Extend the serializers to output it. (Extend deserializers to process it, optionally).
- Inject additional information during serialization, based on lookup services injected into the serializers
- Inject the derived data after serialization, into the nested array structure output by the serializer.
Assessment from the discussion on 2015-09-15 (Bene*, Thiemo, Jan Z, Jonas, Jeroen, Daniel):
Ad 1:
- Several options: a) subclass, b) wrap/delegate, c) fork/specialize, d) interfaces/views
- Much effort. Possibly big breaking change (depending on which option we pick)
- Serializers need to be aware of "extended" model. But they need to dispatch based on the type of sub-structure anyway.
- Difficulty: Derived values should never go into the DB, should not be accepted as input for editing
- Advantage: Can be used in client code and formatter code (our own, and 3rd party's). Works symmetrically with deserialization.
- Currently used to represent Terms with language fallback information
Ad 2:
- Serializer needs to know about types of derived values, and which lookups to use to get the required info.
- Advantage: keep the data model clean (but 1b and 1d also do this).
- Used in the older serialization code to generate URLs for SiteLinks
- No support for deserialization. No support for client side usage.
Ad 3:
- Advantage: keep the data model clean (but 1b and 1d also do this).
- Currently used to inject URLs for SiteLinks
- Post-processing may involve de-serializing parts of the structure again (in particular EntityIds, for use with the lookups)
- Needs knowledge of "pathes" in JSON
- No support for deserialization. No support for client side usage.
Since options 2 and 3 do not offer supprt for deserialization (and thus client side usage), one important question was about the use cases we have for derived values. The following use cases for derived data were identified (among others):
- ingestion by third parties (dumps)
- use by gadgets (API)
- use via Lua (php data model / JSON)
- use by the formatter (php data model)
This lead to the conclusion that support for representation in the data model would be useful not only for 3rd parties, but also for own own code (formatters, Lua). From that followed the conclusion that we would end up doing option (1) to some extent anyway. Going for option (2) and (3) for the serialization, while implementing (1) for deserialization, seems likely to cause inconsistencies and code duplication.
Further thought and discussion is needed to decide which sub-option to choose. (1a) was agreed to be the least desirable, since it leads to a "pollution" of the basic data model. (1b) and (1d) were favored during the discussion, (1c) was seen as hard to maintain and prone to duplication. See T112550: [RFC] How to represent derived values in the data model, and allow for deferred deserialization