This tracking ticket governs the implementation of derived (secondary) information in the wikibase data model. Examples for derived data:
- a Snak's datatype
- a SiteLink's URL
- a Badge's thumbnail URL
- an external ID's URI
- a quantity's normalized value
- a Statement's constraint violations
In addition to the above, the data model should be able to support deferred deserialization. The mechanism for secondary information should at least not obstruct the implementation of deferred deserialization.
Design
We are looking for a mechanism that allows arbitrary information to be "glued onto" Wikibase data model objects, for use by specializaed code when rendering, serializing, or exporting (parts of) the data model. After some investigation (see T112550 and T112547) we already ruled out some options. In particular, we decided that we want to represent the extra information inside the data model, for symmetry with client side code, usability of the code base, and also the complexity the alternative solutions would require for some of the edge cases. For representation inside the data mode, we found that specialized read models for every purpose cannot really be combined, and mean a lot of duplicate code to maintain. Putting knowledge about all possible extra info into the core model makes the model bloated and inflexible.
Some research turned up the Role Object Pattern as a good solution to our problem: alternative "views" of a component can be attached dynamically simply by adding an object that implements the desired "role". This pattern has been described several times in literature and papers:
- The Portland Pattern Wiki (hosted by Cunningham & Cunningham) describes the RoleObjectPattern.
- Martin Fowler apparently coined the term "role object", see his paper On Roles.
- Another well known paper describing this pattern is The Role Object Pattern by Dirk Bäumer, Dirk Riehle, Wolf Siberski, and Martina Wulf. (Interestingly, Dirk Riehle is a well known wiki researcher, see for instance his paper How and Why Wikipedia Works)
- In section 6.4 of the CORBA component model, such roles are known as "facets", which are also described at FacetPattern on the Portland Pattern Wiki.
- There is also a paper called Facet: A pattern for dynamic interfaces by Eric Crahen (the author of the ZThreads libary) which explores this concept for a more concrete use case.
- Another rather similar concept is the Extension Interface as described by Douglas C. Schmidt. Another good description of this is The Extension Objects Pattern by Erich Gamma.
The definitions and implementations differ slightly, but the general idea is always to allow a component (object) to offer different interfaces for different tasks by somehow attaching other objects that implement these interfaces at runtime. For Wikibase, the role object pattern could be used as follows:
interface RoleEnabled { function getRoleManager(); } interface RoleManager { function hasRole( $name ); function getRoleObject( $name, $type = null ); function addRoleObject( $name, $object ); } interface LinkTargetProvider { function getLinkTargetUrl(); } interface DataTypeProvider { function getDataType(); } interface DerivedValueProvider { function hasDerivedValue( $name ); function getDerivedValue( $name ); }
The relevant classes in the data model would then each need to implement RoleEnabled.
For performance reasons, the RoleManager instance should be created on demand. then
code for maintaining the RoleManager instance would be duplicated in each relevant data
model class.
class PropertyValueSnak implements RoleEnabled { private $roleManager = null; public function getRoleManager() { if ( !$this->roleManager ) { $this->roleManager = new SimpleRoleManager(); } return $this->roleManager; } } class Term implements RoleEnabled { ... } class Statement implements RoleEnabled { ... }
Note that roles can freely be added over the life time of an object. Not all roles will be known
when the object is instantiated.
The design outlined above provides quite a bit of flexibility, without adding any need for code to know about the new "role" concept. All support for roles is completely optional:
- When deserializing from the database, no role objects (and not even RoleManager objects) get instantiated.
- When deserializing API input, extra information can esily be ignored or rejected.
- When deserializing API output on the client, extra information can be handled by a "role deserializer" facility, and attached to the appropriate model objects.
- Roles may be defined outside Wikibase: a ConstraintViolationProvider role may be added by the WikibaseQuality extension, and ConstraintViolationProvider role objects may be added to the model when and where desired.
- Formatters and other export mappings can make use of optional information in a type-safe way.
- The model can easily be extended to accept callbacks for instantiating role objects on demand.
- Role objects may implement RoleEnabled themselves, and may maintain a "backling" to the original object (making the question which of the objects is the "main" object rather academic)
Once major concern here is performance: the infrastructure for managing role objects means allocating extra objects (the RoleManager and probably an array inside the RoleManager) in addition to the actual role objects. This can be avoided at the cost of some of the abstraction:
class DataModelObject implements RoleManager { function hasRole( $name ) { return isset( $this->_role_$name ); } function getRoleObject( $name, $type = null ) { return $this->_role_$name; } function addRoleObject( $name, $object ) { $this->_role_$name = $object; } }
Data model objects would then use DataModelObject as their base class:
class PropertyValueSnak extends DataModelObject { ... } class Term extends DataModelObject { ... } class Statement extends DataModelObject { ... }
This is admittedly a bit hacky, but avoids any extra objects being instantiated,
and access to roles still happens via a well defined interface.
Breakdown
TBD