Maniphest T112547

[RFC] Decide on a mechanism for supporting derived values during serialization
Closed, ResolvedPublic
Actions

Description

Objective: include derived information in JSON output
- see T112548: [Epic] Provide derived values for re-use by third parties (as well as for internal use)
- see T93747: [Story] Create infrastructure for optionally putting normalized values into JSON
Problem: the (abstract) wikibase datamodel does not support derived values; the datamodel serialization component does not support the inclusion of additional information in the output

Proposed solutions (distilled from discussions like the one at WikibaseDataModelSerialization/pull/171):

Inject the derived data into the data model before serialization. Extend the (PHP) data model to include the derived value. Extend the serializers to output it. (Extend deserializers to process it, optionally).
Inject additional information during serialization, based on lookup services injected into the serializers
Inject the derived data after serialization, into the nested array structure output by the serializer.

Assessment from the discussion on 2015-09-15 (Bene*, Thiemo, Jan Z, Jonas, Jeroen, Daniel):

Ad 1:

Several options: a) subclass, b) wrap/delegate, c) fork/specialize, d) interfaces/views
Much effort. Possibly big breaking change (depending on which option we pick)
Serializers need to be aware of "extended" model. But they need to dispatch based on the type of sub-structure anyway.
Difficulty: Derived values should never go into the DB, should not be accepted as input for editing
Advantage: Can be used in client code and formatter code (our own, and 3rd party's). Works symmetrically with deserialization.
Currently used to represent Terms with language fallback information

Ad 2:

Serializer needs to know about types of derived values, and which lookups to use to get the required info.
Advantage: keep the data model clean (but 1b and 1d also do this).
Used in the older serialization code to generate URLs for SiteLinks
No support for deserialization. No support for client side usage.

Ad 3:

Advantage: keep the data model clean (but 1b and 1d also do this).
Currently used to inject URLs for SiteLinks
Post-processing may involve de-serializing parts of the structure again (in particular EntityIds, for use with the lookups)
Needs knowledge of "pathes" in JSON
No support for deserialization. No support for client side usage.

Since options 2 and 3 do not offer supprt for deserialization (and thus client side usage), one important question was about the use cases we have for derived values. The following use cases for derived data were identified (among others):

ingestion by third parties (dumps)
use by gadgets (API)
use via Lua (php data model / JSON)
use by the formatter (php data model)

This lead to the conclusion that support for representation in the data model would be useful not only for 3rd parties, but also for own own code (formatters, Lua). From that followed the conclusion that we would end up doing option (1) to some extent anyway. Going for option (2) and (3) for the serialization, while implementing (1) for deserialization, seems likely to cause inconsistencies and code duplication.

NOTE: Conclusion: we want option (1), full representation of derived values in the data model.

Further thought and discussion is needed to decide which sub-option to choose. (1a) was agreed to be the least desirable, since it leads to a "pollution" of the basic data model. (1b) and (1d) were favored during the discussion, (1c) was seen as hard to maintain and prone to duplication. See T112550: [RFC] How to represent derived values in the data model, and allow for deferred deserialization

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Open	None	T73992 [Story] JSON should (optionally) contain full URIs for referenced external entities
Invalid	None	T93747 [Story] Create infrastructure for optionally putting normalized values into JSON
Resolved	daniel	T112547 [RFC] Decide on a mechanism for supporting derived values during serialization
Resolved	daniel	T112550 [RFC] How to represent derived values in the data model, and allow for deferred deserialization
		· · ·

Event Timeline

daniel created this task.Sep 14 2015, 4:23 PM

daniel raised the priority of this task from to Medium.

daniel updated the task description. (Show Details)

daniel added projects: Wikidata, MediaWiki-extensions-WikibaseRepository.

daniel added a project: Wikibase-DataModel-Serialization.

daniel set Security to None.

daniel added subscribers: JanZerebecki, Aklapper, Lydia_Pintscher, daniel.

daniel added a subtask: T93747: [Story] Create infrastructure for optionally putting normalized values into JSON.Sep 14 2015, 4:37 PM

daniel removed a subtask: T93747: [Story] Create infrastructure for optionally putting normalized values into JSON.Sep 14 2015, 4:40 PM

daniel added a parent task: T93747: [Story] Create infrastructure for optionally putting normalized values into JSON.

See conclusion in the description

re-opening until the sub-task is resolved.

daniel moved this task from incoming to needs discussion or investigation on the Wikidata board.Sep 14 2015, 5:17 PM

daniel added subscribers: thiemowmde, • Jonas, aude, Bene.

daniel added a subscriber: JeroenDeDauw.

daniel updated the task description. (Show Details)Sep 14 2015, 5:19 PM

daniel updated the task description. (Show Details)Sep 14 2015, 5:28 PM

Lydia_Pintscher added a project: Wikidata-Sprint-2015-09-01.Sep 14 2015, 6:09 PM

Lydia_Pintscher moved this task from Backlog to Review on the Wikidata-Sprint-2015-09-01 board.

Lydia_Pintscher moved this task from needs discussion or investigation to in progress on the Wikidata board.Sep 15 2015, 10:09 AM

Tobi_WMDE_SW added a project: Wikidata-Sprint-2015-09-15.Sep 15 2015, 1:02 PM

Tobi_WMDE_SW moved this task from Backlog to Review on the Wikidata-Sprint-2015-09-15 board.

agree with the recommended approach of including derived values in the data model , that this is the most sane and reasonable option.

Lydia_Pintscher added a project: Proposal.Sep 25 2015, 2:33 PM

Tobi_WMDE_SW added a project: Wikidata-Sprint-2015-09-29.Sep 28 2015, 1:19 PM

Tobi_WMDE_SW moved this task from Proposed to Review on the Wikidata-Sprint-2015-09-29 board.

This is blocked by T112550, which is on proposed, while this is on review. Huh?

Tobi_WMDE_SW moved this task from Review to Done on the Wikidata-Sprint-2015-09-29 board.Sep 29 2015, 12:56 PM

Tobi_WMDE_SW closed this task as Resolved.Sep 29 2015, 2:19 PM

Tobi_WMDE_SW moved this task from Review to Done on the Wikidata-Sprint-2015-09-15 board.

Tobi_WMDE_SW removed a project: Wikidata-Sprint-2015-09-29.

Tobi_WMDE_SW subscribed.

This was resolved how?

In T112547#1690481, @JeroenDeDauw wrote:

This was resolved how?

By the discussion you attended. It's documented in the description. At the bottom it sais:

Conclusion: we want option (1), full representation of derived values in the data model.

daniel mentioned this in T118860: [RFC] Use Role Object Pattern to represent derived data in the data model.Nov 17 2015, 4:24 PM

daniel closed subtask T112550: [RFC] How to represent derived values in the data model, and allow for deferred deserialization as Resolved.Nov 17 2015, 4:43 PM

[RFC] Decide on a mechanism for supporting derived values during serializationClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[RFC] Decide on a mechanism for supporting derived values during serialization
Closed, ResolvedPublic
Actions

Related Objects
Search...