(Description updated after a discussion between Jan, Katie, Thiemo, and Daniel 2015-05-11)
Wikibase API modules currently use the old external serialization code in WikibaseLib, together with some code in ResultBuilder, to represent Entities and associated information in API results. If we want to ditch the old serialization code, we need to make sure we can cover the old functionality based on the new WikibaseDataModelSerialization module.
Requirements for feature parity are:
- include the property data type in Property(Value)Snaks. The main issue here is avoiding circular dependencies between the serializer and the service interfaces. See T98850
- grouped vs ungrouped output of Statements, Qualifiers, and Reference Snaks. Or decide we want to ditch the ungrouped representation, see T78653.
- forcing (some) maps to lists (T98857)
- we may no longer need this for the XML output, if "indexTagName" now overrides explicit array keys in ApiResult
- currently, there is an option to force maps to objects - that should probably always be done, to avoid issues with the representation of empty maps in JSON (T98860)
- injecting "indexTagName" and other meta-info for the MediaWiki API serializer. Hard coded knowledge about these markers could be kept out of the serializer by using some sort of callback logic. See T78652
- maintain ordered maps: since JSON does not specify the order of entries in a map, we need to make that order explicit, since it should at least be stable. (T98861)
- inject revision info (revision id, timestamp, page title). This is mediawiki specific, the serializer shouldn't know the meaning of such meta-data fields. (T98862)
- deletion markers for deleted elements in API responses. We may want to implement this by representing deletions in the DataModel somehow. T98863
Additional features we want to support in the future, but should already consider when thinking abotu a solution for the issues above:
- Derived values in Snaks (normalized quantities, expanded external IDs, etc; see T89005). Could be represented in the model in a way similar to TermFallback: ExtendedPropertyValueSnak would extend PropertyValueSnak (and perhaps implement a marker interface, to safeguard against serialization into the database). Note: derived values should not be supported as input to EditEntity, etc.
- Filtered representations (terms filtered by language, sitelinks filtered by family or language, etc). Filtering can be done before serialization, but should be marked in the data model and in the serialization, to avoid data loss during round trips. (T73512)
- Represent incoming redirects as "secondary ids" in the serialized entity data structure (T98039)
For implementing the features above, there are several options:
- Add knowledge to the serializer
- Wrap another set of serializers around the basic serializers
- Post-process the resulting array structure to apply the necessary changes
- Represent them in the data model explicitly
- Layer another representation model on top of the basic data model
For practical reasons, things that only change the way the data is represented, should probably be implemented inside the serializer (Option 1 or 2, possibly 3 if there is a very good reason). Features that add or remove information (such a filtering or language fallback) should be represented in the model (options 4 or 5). Care must be taken to avoid serializing such derivative data into the database, or accepting it as input of edits.