[RFC] Design of provider/holder interfaces in Wikibase's data model
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Bene
	Feb 7 2016, 10:35 AM

Description

The current interfaces to access terms (LabelsProvider, DescriptionsProvider and AliasesProvider) all state in their contract: "It is not guaranteed that this method returns the original object." This is a problem because with that sentence we cannot rely on the fact that changes made to the TermList returned by the method are visible to the provider.

We have three options so solve this issue:

We can either create a second list of interfaces called LabelsHolder, DescriptionsHolder and AliasesHolder. This is how we solved the issue for Fingerprint and StatementList. The disadvantages are that we get the overhead of three additional types and that we have to call the setter everytime we want to change someting. Code will always be like $var = $obj->getFoo(); $var->setBar( 'baz' ); $obj->setFoo( $var );.
Another option is to change the contract of the interfaces so that they always return a reference to the original object in the provider. We can then be sure that all changes are visible to the provider. However, there is no option to replace the entire object in case we want to clear the list (for example).
Third option would be to merge all the "provider" and "holder" interfaces. We than don't have the choice to provide immutable providers but we don't have the overhead of three (+two) additional interfaces.

It would be great to get some opinions on this question so that we can make a decision on the architecture of our data model.

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		dchen	T118706 Conduct heuristic evaluation of image upload and insert flow in VisualEditor
Open		None	T115858 Design improvements for mw.ForeignStructuredUpload.BookletLayout
Open		None	T115865 Insert image in content immediately after it's uploaded, skipping the "General settings" step
Duplicate		None	T115864 Figure out if the description of the image can be used as the caption on-wiki
Open	Feature	None	T53032 When inserting an image, set its caption by default to be the Commons image description
Duplicate		None	T39535 Wikimedia Commons should support filtering by color
Open	Feature	None	T39534 Wikimedia Commons should support searching by color
Resolved		None	T19503 Provide metadata support on Wikimedia Commons
Resolved		None	T51662 VisualEditor: Use Multimedia/Wikidata's proposed rich structured meta-data in the image insertion dialog
Resolved		None	T68108 [Epic] Store media information for files on Wikimedia Commons as structured data
Duplicate		None	T66288 basic support for structured data on mediawiki files
Invalid		Lydia_Pintscher	T76012 make use of new entity type for multimedia / structured data of media files
Resolved		daniel	T134473 Create MediaInfo / Linking multimedia files to their mediainfo entity (second prototype)
Resolved		Tobi_WMDE_SW	T125822 [Epic] Basic first prototype for structured data support for Commons
Resolved		None	T128467 [Task] Request project on gerrit for WikibaseMediaInfo
Resolved		Luke081515	T128469 [Task] Request phabricator project for WikibaseMediaInfo extension
Resolved		Bene	T128470 [Task] Create serializers for MediaInfo entity type
Resolved		Bene	T125824 [Task] Create a new media info entity type
Resolved		Bene	T128471 [Task] Create an alpha setup on labs for the MediaInfo extension
Resolved		adrianheine	T129275 [Task] Create js widget for media info entity type
Resolved		adrianheine	T128472 [Task] Create entity view for media info entity type
Resolved		Bene	T127191 [Task] Allow extensions to register additonal EntityViews
Resolved		Bene	T129380 [Task] Create MediaWiki content handler for mediainfo entity/content type
Resolved		thiemowmde	T132442 [Task] Introduce differ for MediaInfo
Duplicate		None	T133043 [Task] EntityFactory::newEmpty does not work for MediaInfo
Declined		adrianheine	T133044 [Task] Handle cases where assiging a fresh ID to a new entity expects numeric entity IDs
Resolved		adrianheine	T133767 Add EntityFactory callback to MediaInfo entity type definition
Resolved		daniel	T132964 Allow EntityFactory to work in Wikibase for more entity types
Resolved		adrianheine	T134264 [Task] Add EntityId builder pair to MediaInfo entity type definition
Resolved		adrianheine	T134265 [Task] Add EntityDifferStrategy builder to MediaInfo entity type definition
Resolved		adrianheine	T134873 Make EntityRevisionLookup not rely on entity_per_page
Declined		None	T134874 Don't insert media info into entity_per_page
Resolved		adrianheine	T135593 [Task] Add default permissions for MediaInfo
Resolved		daniel	T134261 [Task] Transclude mediainfo entity in file-page
Resolved		daniel	T134260 [Task] Link from MediaInfo entity to file description page
Resolved		daniel	T134259 [Task] Create mediainfo entity on demand when saving the first time after editing labels, descriptions and aliases
Resolved		daniel	T134258 [Task] Show empty mediainfo for mediainfo ids that correspond to existing file-pages
Resolved		daniel	T140760 [Task] Create mediainfo entity on demand when adding a statement the first time
Resolved		Lydia_Pintscher	T134756 [Task] Update demo-system to show mediainfo entity
Open		None	T109579 [Epic] Give more sister projects access to Wikidata
Open		None	T187900 There is no way to reference a specific quote on Wikiquote
Stalled		None	T71753 [Story] Wikibase / Wikidata support on Wikiquote
Open		None	T67626 [Epic] Support for queries on-wiki (automated list generation)
Resolved		Addshore	T76019 [Story] Support new types of Entities in Wikibase Client
Resolved		thiemowmde	T135650 [Task] Migrate PropertySuggester away from assuming all entities are numeric
Resolved		Addshore	T75496 [Epic] Support new types of Entities in Wikibase Repository
Resolved		thiemowmde	T78298 [Task] Split Diff/Patching code to allow constructing other Differs/Patchers
Resolved		JeroenDeDauw	T45326 Cleanup EntityDiff
Resolved		thiemowmde	T128468 [Task] Release DataModel 6.0
Declined		adrianheine	T133861 Stop using numeric ids in wb_entity_per_page
Resolved		Bene	T128378 Release DataModel 5.1
Resolved		Bene	T128362 [Task] Deprecate and remove Holder interfaces
Resolved		Bene	T128363 [Task] Add clear methods to list classes
Resolved		None	T126152 [RFC] Design of provider/holder interfaces in Wikibase's data model

Event Timeline

Bene created this task.Feb 7 2016, 10:35 AM

Bene raised the priority of this task from to Needs Triage.

Bene updated the task description. (Show Details)

Bene added projects: Wikidata, Wikibase-DataModel, Wikidata-Sprint-2016-02-02.

Bene added subscribers: Bene, thiemowmde, JeroenDeDauw, daniel.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 7 2016, 10:35 AM

Bene moved this task from Proposed to Review on the Wikidata-Sprint-2016-02-02 board.Feb 7 2016, 10:36 AM

It's generally good to design interfaces from the perspective of their users. This is done by looking at each user and determining what would be best for them, as opposed to trying to come up with an interface that suits all users (including those that do not exist yet). The existing provider interfaces serve their users well, and have optimal interface segregation (their users do not end up depending on stuff they do not need). If there is a new class of users, namely those where immutability is not desired, then creating a new interface is typically better than making the interface less ideal for both users and implementers.

I think that looking at the number of interfaces one introduces is a very bad way of gauging complexity and maintenance cost. These come from the new functionality (the mutability) and the coupling caused by it.

I'm a bit worried about the contract of the existing method though. It says no guarantee of mutability is provided, which means no guarantee of immutability is provided either. Users might use mutability they should not rely upon without this being noticed. After all, tests will pass, and the violation is not obvious, esp if the programmers are thinking about a specific concrete implementation where mutability is guaranteed. They have have to notice the docs in the interface.

One way around this later concern is to have a dedicated interface for the case where users do not need mutability and implementers might not be able to provide it. In this interface the contract would be guaranteed lack of mutability. Users that would need mutability would use a different interface that does not, and this is key, derive from the other one. This interface would guarantee mutability.

I'm not aware of which users you are trying to cater to, or what kind of implementations you have in mind. It depends on these what the best solutions are.

It is not guaranteed that this method returns the original object.

Weird contract, as there presumably is no guarantee there is such a thing as an original object. (the return value can be created on the fly)

I'm a bit worried about the contract of the existing method though.

I agree that this is not optimal for the reasons you pointed out. It would indeed be nicer if immutability could be provided. However, to achieve that we either have to make TermList, StatementList etc. immutable or we have to do a full object clone everytime we access the object which sounds quite expensive.

I'm not aware of which users you are trying to cater to, or what kind of implementations you have in mind. It depends on these what the best solutions are.

The concrete users of these interfaces are the ones that currently use FingerprintHolder. We cannot longer bind terms to a fingerprint because not all entities have the combination a fingerprint provides (eg. the media type will only have labels and descriptions). Therefore, we need a way to tell a LabelsProvider that some label should change.

Regarding mutability: most of our data model objects are mutable. The idea is that we want to be able to build or modify domain objects simply and efficiently. For the same reason, the semantics for mutable objects returned from getters should always be that any changes to that object will be reflected in the "parent object".

The "mutable copy" semantics is something I would generally like to avoid. It's inefficient and confusing. Either our model objects are mutable, or they aren't. I see no use case for modifying a term list or sitelinks list or whatever in which we would not want that change to be reflected in the entity. Well, ok, maybe for filtering during output, though that can also be done on the fly. But if we want a copy, we should explicitly create a copy, not rely on the getter to always implicitly provide a copy, even if none is needed because no modifications are done.

• Johsthao closed this task as a duplicate of T126250: <spam>.Feb 8 2016, 6:24 PM

JEumerus reopened this task as Open.Feb 8 2016, 6:29 PM

JEumerus set Security to None.

Where do we currently need mutability of our data model objects?

Tobi_WMDE_SW triaged this task as Medium priority.Feb 9 2016, 11:45 AM

Tobi_WMDE_SW added a project: Proposal.

@adrianheine we use the mutable interface to build the entity object during deserialization, but more importantly, we use it when applying ChangeOps to perform edits. I'm not sure we use it for filtering - I think we construct new instances in that case. But I could be wrong.

I suppose ChangeOps could easily work with immutable objects by having setters return new objects. For deserialization, this would probably produce a lot of objects, though. Can't we deserialize bottom-up to avoid this?

@adrianheine If you want to set a sitelink, the setter should return a new Item? That sounds rather awkward and inefficient.

We could re-design the data model to be immutable. It would mean rewriting a lot of code. I don't see that happening.

Generally: Manipulating complex data structures efficiently in a purely functional style means using "zippers" of some kind, which as far as I understand need good reference handling, and good memoization. Both would be a pain to do in PHP.

Why would that be awkward? It's another item. Also, you don't need the old one afterwards. I think in ChangeOps it would not look worse. Also, what would be inefficient about that? We are neither doing a lot of changes in ChangeOps nor are we working a lot with the entities after having changed them, so both immediate and lazy manipulations should be efficient enough.

If I get this ticket right we'll probably have to rewrite a lot of code anyway, and we can think upfront about what it should look like afterwards.

In T126152#2021847, @adrianheine wrote:

Why would that be awkward?

For one thing, because the SiteLinkList (and ReferenceList, etc) would have to know how to construct an entity, and which concrete type of entity to construct, and how. I see no good way to do that.

Or, if you spread that over multiple levels, you end up with something like

$statements = $entity->getStatementList();
$statement = $statements->getStatement( $hash );
$references = $statement->setReferenceList();
$entity = $entity->setStatementList(
  $statements->updateStatement(
    $hash,
    $statement->setReferenceList(
      $references->addReference( $reference ) ) ) );

Reads like lisp. I'd call it awkward compared to

$entity->getStatementList()
  ->getStatement( $hash )
    ->getReferenceList()
      ->addReference( $reference )

It's another item. Also, you don't need the old one afterwards.

If you don't need the old one afterwards, why create a new one instead of recycling the old one?

I think in ChangeOps it would not look worse. Also, what would be inefficient about that? We are neither doing a lot of changes in ChangeOps nor are we working a lot with the entities after having changed them, so both immediate and lazy manipulations should be efficient enough.

Duplicating the entire data structure for every change of any part of it seems rather wasteful. But you are right that it would be acceptable for edits. It wouldn't be acceptable for filtering or augmenting a data structure for output.

If I get this ticket right we'll probably have to rewrite a lot of code anyway, and we can think upfront about what it should look like afterwards.

I don't see that, actually. We are adding interfaces that will mostly just declare methods that already exist in the implementations. If we change contracts, we'll have to check where that may cause problems, but this is a far cry from changing the paradigm from "manipulate a data structure using methods" to "construct derived data structure using pure functions".

In T126152#2021971, @daniel wrote:

In T126152#2021847, @adrianheine wrote:

Why would that be awkward?

For one thing, because the SiteLinkList (and ReferenceList, etc) would have to know how to construct an entity, and which concrete type of entity to construct, and how. I see no good way to do that.

Yeah, well, that would be awkward.

Or, if you spread that over multiple levels, you end up with something like

$statements = $entity->getStatementList();
$statement = $statements->getStatement( $hash );
$references = $statement->setReferenceList();
$entity = $entity->setStatementList(
  $statements->updateStatement(
    $hash,
    $statement->setReferenceList(
      $references->addReference( $reference ) ) ) );

That's basically what I would suggest. We could add some delegation methods (Item::getStatement( $hash )), but it boils down to this.

It's another item. Also, you don't need the old one afterwards.

If you don't need the old one afterwards, why create a new one instead of recycling the old one?

Because it's much easier to reason about immutable data structures, and because immutable data structures can usually be implemented more efficiently.

I think in ChangeOps it would not look worse. Also, what would be inefficient about that? We are neither doing a lot of changes in ChangeOps nor are we working a lot with the entities after having changed them, so both immediate and lazy manipulations should be efficient enough.

Duplicating the entire data structure for every change of any part of it seems rather wasteful. But you are right that it would be acceptable for edits. It wouldn't be acceptable for filtering or augmenting a data structure for output.

That's where lazy manipulations could be used (FilteredEntity(Entity, EntityFilter) implements Entity). Quite difficult without generics.

If I get this ticket right we'll probably have to rewrite a lot of code anyway, and we can think upfront about what it should look like afterwards.

I don't see that, actually. We are adding interfaces that will mostly just declare methods that already exist in the implementations. If we change contracts, we'll have to check where that may cause problems, but this is a far cry from changing the paradigm from "manipulate a data structure using methods" to "construct derived data structure using pure functions".

We will have to switch stuff from fingerprint to individual term manipulation. It's probably not that much, though.

@daniel I think you're not getting what Adrian means.

class Item {

    public function withSiteLink( SiteLink $link ) {
        return new Item( /* stuff with new link added */ );
    }

}

You'd not get dependencies from the smaller objects onto the bigger ones.

Making all model objects immutable is not what this issue is about though. This is however something that we've been "oh that would be nice" at since the start of the project (long before we had a DataModel component), though something that while nice in theory, does not seem practical in PHP.

@JeroenDeDauw I did get that, but that only works for things that exist on the Entity level. Or you need to have setters for *every* part of the model on the Entity level. Item::addBadgeToSiteLink( $siteId, $badge ), Item::addReferenceToStatement( $hash, $reference ), etc. Which in turn means that everything that modifies badges would have to operate on Items, even though it shouldn't have to know about entities at all.

Yes, possible, but awkward.

@adrianheine I basically agree with you that formally, that would be really nice. But practically, with PHP and the existing code base, it's pretty much out of the question.

thiemowmde added a project: Wikidata-Sprint-2016-02-16.Feb 16 2016, 2:05 PM

thiemowmde moved this task from Proposed to Review on the Wikidata-Sprint-2016-02-16 board.

Bene added a parent task: T75496: [Epic] Support new types of Entities in Wikibase Repository.Feb 16 2016, 7:33 PM

I vote for mutability. All provider interfaces (including the existing FingerprintProvider and possibly more) should guarantee they do nothing but returning what they have. If an implementation can not simply return an Object as it is but has to construct it whenever the getter is called, it should not use this interface. It should not be a requirement that a getter must do expensive cloning before returning anything. This should be a separate thing, which we just recently discussed, see https://github.com/wmde/WikibaseDataModel/pull/626. This also caused trouble in Wikibase-DataModel-JavaScript, see https://github.com/wmde/WikibaseDataModelJavaScript/pull/47 and much more patches there.

From my point of view this was one of the top reasons why we introduced "trivial" array wrappers like StatementList and so on. You can do …

$fingerprint = $item->getFingerprint();
$fingerprint->setLabel( … );

… and be fine. This is not possible with arrays. A user calling a setter like in the example above should not be enforced to call an other setter. This feels like calling setForReal(). I find this confusing and can tell from experience that this is error-prone.

I consider calling setFingerprint a mistake in all cases. We do have some calls to this method in our code base because we had somewhat "broken" versions of Wikibase-DataModel that returned a new Fingerprint in getFingerprint. All these setters should be removed.
I consider type hints against a …Holder interface a mistake and code smell when no setter is called.
We currently "misuse" setStatements( new StatementsList() ) to remove all statements from an entity. This can easily be solved by introducing StatementList::clear.

I basically agree with what @thiemowmde wrote.

Does that include we should get rid of all ...Holder interfaces and instead introduce clear methods on StatementList and TermList? We can still have setStatements in Item and Property for tests.

In my opinion deprecating the interfaces and not using them (thus finally removing them) is the cleanest solution and avoids that they are misused as described above.

If there aren't any objections against this, I think we can close this RFC, deprecate the ...Holder interfaces in the next release of Wikibase-DataModel and not use them in Wikibase.

Lydia_Pintscher moved this task from incoming to in progress on the Wikidata board.Feb 18 2016, 12:10 PM

Bene closed this task as Resolved.Feb 29 2016, 10:34 AM

Bene added a parent task: T128362: [Task] Deprecate and remove Holder interfaces.Feb 29 2016, 10:47 AM

Bene added a parent task: T128363: [Task] Add clear methods to list classes.

Bene moved this task from Review to Done on the Wikidata-Sprint-2016-02-16 board.Feb 29 2016, 10:50 AM

[RFC] Design of provider/holder interfaces in Wikibase's data modelClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[RFC] Design of provider/holder interfaces in Wikibase's data model
Closed, ResolvedPublic
Actions

Related Objects
Search...