Change Details

(Description updated after a discussion between Jan, Katie, Thiemo, and Daniel 2015-05-11) WikibaseDataModelSerialization does not currently support all features the old serializers in WikibaseLib supports. In particular: * include the property data type in Property(Value)Snaks. The main issue here is avoiding circular dependencies between the serializer and the service interfaces. * grouped vs ungrouped output of Statements, Qualifiers, and Reference Snaks. Or decide we want to ditch the ungrouped representation. * forcing (some) maps to lists, T78653 ** we may no longer need this for the XML output, if "indexTagName" now overrides explicit array keys in ApiResult ** currently, there is an option to force maps to objects - that should probably always be done, to avoid issues with the representation of empty maps in JSON. * injecting "indexTagName" and other meta-info for the MediaWiki API serializer. Hard coded knowledge about these markers could be kept out of the serializer by using some sort of callback logic. See T78652 * maintain ordered maps: since JSON does not specify the order of entries in a map, we need to make that order explicit, since it should at least be stable. * inject revision info (revision id, timestamp, page title). This is mediawiki specific, the serializer shouldn't know the meaning of such meta-data fields. * deletion markers for deleted elements in API responses. We may want to implement this by representing deletions in the DataModel somehow. Additional features we want to support in the future, but should already consider when thinking abotu a solution for the issues above: * Derived values in Snaks (normalized quantities, expanded external IDs, etc). Could be represented in the model in a way similar to TermFallback: ExtendedPropertyValueSnak would extend PropertyValueSnak (and perhaps implement a marker interface, to safeguard against serialization into the database). Note: derived values should not be supported as input to EditEntity, etc. * Filtered representations (terms filtered by language, sitelinks filtered by family or language, etc). Filtering can be done before serialization, but should be marked in the data model and in the serialization, to avoid data loss during round trips. (T73512) * Represent incoming redirects as "secondary ids" in the serialized entity data structure (T98039) For implementing the features above, there are several options: # Add knowledge to the serializer # Wrap another set of serializers around the basic serializers # Post-process the resulting array structure to apply the necessary changes # Represent them in the data model explicitly # Layer another representation model on top of the basic data model For practical reasons, things that only change the way the data is represented, should probably be implemented inside the serializer (Option 1 or 2, possibly 3 if there is a very good reason). Features that add or remove information (such a filtering or language fallback) should be represented in the model (options 4 or 5). Care must be taken to avoid serializing such derivative data into the database, or accepting it as input of edits.

(Description updated after a discussion between Jan, Katie, Thiemo, and Daniel 2015-05-11) WikibaseDataModelSerialization does not currently support all features the old serializers in WikibaseLib supports. In particular: * include the property data type in Property(Value)Snaks. The main issue here is avoiding circular dependencies between the serializer and the service interfaces. * grouped vs ungrouped output of Statements, Qualifiers, and Reference Snaks. Or decide we want to ditch the ungrouped representation, see T78653. * forcing (some) maps to lists ** we may no longer need this for the XML output, if "indexTagName" now overrides explicit array keys in ApiResult ** currently, there is an option to force maps to objects - that should probably always be done, to avoid issues with the representation of empty maps in JSON. * injecting "indexTagName" and other meta-info for the MediaWiki API serializer. Hard coded knowledge about these markers could be kept out of the serializer by using some sort of callback logic. See T78652 * maintain ordered maps: since JSON does not specify the order of entries in a map, we need to make that order explicit, since it should at least be stable. * inject revision info (revision id, timestamp, page title). This is mediawiki specific, the serializer shouldn't know the meaning of such meta-data fields. * deletion markers for deleted elements in API responses. We may want to implement this by representing deletions in the DataModel somehow. Additional features we want to support in the future, but should already consider when thinking abotu a solution for the issues above: * Derived values in Snaks (normalized quantities, expanded external IDs, etc). Could be represented in the model in a way similar to TermFallback: ExtendedPropertyValueSnak would extend PropertyValueSnak (and perhaps implement a marker interface, to safeguard against serialization into the database). Note: derived values should not be supported as input to EditEntity, etc. * Filtered representations (terms filtered by language, sitelinks filtered by family or language, etc). Filtering can be done before serialization, but should be marked in the data model and in the serialization, to avoid data loss during round trips. (T73512) * Represent incoming redirects as "secondary ids" in the serialized entity data structure (T98039) For implementing the features above, there are several options: # Add knowledge to the serializer # Wrap another set of serializers around the basic serializers # Post-process the resulting array structure to apply the necessary changes # Represent them in the data model explicitly # Layer another representation model on top of the basic data model For practical reasons, things that only change the way the data is represented, should probably be implemented inside the serializer (Option 1 or 2, possibly 3 if there is a very good reason). Features that add or remove information (such a filtering or language fallback) should be represented in the model (options 4 or 5). Care must be taken to avoid serializing such derivative data into the database, or accepting it as input of edits.

(Description updated after a discussion between Jan, Katie, Thiemo, and Daniel 2015-05-11) WikibaseDataModelSerialization does not currently support all features the old serializers in WikibaseLib supports. In particular: * include the property data type in Property(Value)Snaks. The main issue here is avoiding circular dependencies between the serializer and the service interfaces. * grouped vs ungrouped output of Statements, Qualifiers, and Reference Snaks. Or decide we want to ditch the ungrouped representation, see T78653. * forcing (some) maps to lists, T78653 ** we may no longer need this for the XML output, if "indexTagName" now overrides explicit array keys in ApiResult ** currently, there is an option to force maps to objects - that should probably always be done, to avoid issues with the representation of empty maps in JSON. * injecting "indexTagName" and other meta-info for the MediaWiki API serializer. Hard coded knowledge about these markers could be kept out of the serializer by using some sort of callback logic. See T78652 * maintain ordered maps: since JSON does not specify the order of entries in a map, we need to make that order explicit, since it should at least be stable. * inject revision info (revision id, timestamp, page title). This is mediawiki specific, the serializer shouldn't know the meaning of such meta-data fields. * deletion markers for deleted elements in API responses. We may want to implement this by representing deletions in the DataModel somehow. Additional features we want to support in the future, but should already consider when thinking abotu a solution for the issues above: * Derived values in Snaks (normalized quantities, expanded external IDs, etc). Could be represented in the model in a way similar to TermFallback: ExtendedPropertyValueSnak would extend PropertyValueSnak (and perhaps implement a marker interface, to safeguard against serialization into the database). Note: derived values should not be supported as input to EditEntity, etc. * Filtered representations (terms filtered by language, sitelinks filtered by family or language, etc). Filtering can be done before serialization, but should be marked in the data model and in the serialization, to avoid data loss during round trips. (T73512) * Represent incoming redirects as "secondary ids" in the serialized entity data structure (T98039) For implementing the features above, there are several options: # Add knowledge to the serializer # Wrap another set of serializers around the basic serializers # Post-process the resulting array structure to apply the necessary changes # Represent them in the data model explicitly # Layer another representation model on top of the basic data model For practical reasons, things that only change the way the data is represented, should probably be implemented inside the serializer (Option 1 or 2, possibly 3 if there is a very good reason). Features that add or remove information (such a filtering or language fallback) should be represented in the model (options 4 or 5). Care must be taken to avoid serializing such derivative data into the database, or accepting it as input of edits.