Page MenuHomePhabricator

Allow the EditEntity API module to work with all types of entities
Closed, ResolvedPublic

Description

Currently, Wikibase\Repo\Api\EditEntity::getChangeOps hard-codes what kind of information can exist in an entity, along with knowledge about how that information is represented in the input data.

To allow entity types that have additional fields, we need some sort of plugin mechanism that allows EditEntity::getChangeOps to construct change-ops for the non-standard parts of the entity.

Outline of how the plugin mechanism should look like as of 06.01.2017:

  • there should be a ChangeOpDeserializer interface introduced for services that process serialized change ops and return ChangeOp instances. Initial version of the interface has been introduced in https://gerrit.wikimedia.org/r/#/c/329376/ (already merged), with a bit different approach proposed in https://gerrit.wikimedia.org/r/#/c/330698/,
  • API\EditEntity should not know any details of how to generate change ops from the json-array-like input. It should be done by ChangeOpDeserializer instances.
  • Entity type definitions should provide a new property (work-in-progress name "changeop-deserializer-callback") that would allow EditEntity (and potentially other classes) to instantiate ChangeOp objects relevant for the particular entity type.
  • All item- and property-specific code currently in Api\EditEntity::getChangeOps would be moved to relevant ChangeOpDeserializers and to item and property type definitions.
  • entity type definitions of new entity types (MediaInfo, Lexeme) would declare instantiation of their own specific ChangeOp objects, no longer being bound to labels, descriptions etc.
  • Note: initially we have been considering that entity type definition would expose some callback to Api\EditEntity. That would a function getting in serialized entity change data s provided in API request, and return a ChangeOp instance. Given how ChangeOpDeserializer is modelled now maybe it would make more sense to have a property in entity type definition called "changeop-deserializer" (without a "callback" prefix) that would instantiate the actual ChangeOpDeserializer. That might be cleaner and easier to analyze?

Patches for review:

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
OpenFeatureNone
OpenFeatureNone
OpenFeatureNone
OpenNone
OpenNone
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedWMDE-leszek
Resolvedthiemowmde
ResolvedJakob_WMDE
ResolvedLadsgroup
Declinedthiemowmde
Declinedthiemowmde
Resolvedthiemowmde
Resolvedhoo

Event Timeline

Okay, I checked a little, It seems we have two options:

  • Register a hook: We can add a hook to end of Wikibase\Repo\Api\EditEntity::getChangeOps so WikibaseLexeme can patch and send it up.
  • We can add a key to entitytypes array called ApiPatchCallbacks and make Wikibase\Repo\Api\EditEntity::getChangeOps use these patching callbacks.

The former is easy to implement but we might run into trouble if two wikibase extensions use the hook and try to patch same things (e.g. we have two wikibase extensions that add lemmas). The latter is probably a royal pain.

I haven't looked at the problem in detail yet but I am sorry to say it seems to me that having some kind of definition in entitytypes is a better option here. Using hooks seems a bit hacky here, while entity type definitions exist for dealing with such issues as I got it. I can imagine this is not going to be an easy task to change it all to work, that's a shame.

I talked briefly with @daniel about this and he also doesn't like the hook option much. I guess he will chime in here with some more comments soon.

Yes, as Leszek said: hooks should be avoided whenever possible. They are easy to introduce, but they rely on global state. Because of that, they hide information flow, and make testing hard.

So, EditEntity::getChangeOps should use a callback defined in the entity type definition to construct ChangeOps from JSON. The current hard-coded logic in EditEntity should be moved accordingly. I don't think all of that should go into a callback, we'll probably want something like a ChangeOpDeserializer interface (think of the JSON as serialized ChangeOps). The callback would return a list ChangeOpDeserializer appropriate for a given entity type. Ideally, we wouldn't inject an untyped callback, but we'd have a ChangeOpDeserializerFactory interface... but that may be overkill.

The above model is just off the top of my head. There may be better ways. There is already a ChangeOpFactoryProvider and several XyzChangeOpFactory classes that will have to fit into this somehow. The important factors are:

  • EditEntity should not know how to construct ChangeOps
  • the entity type definition array should provide a callback that somehow provides the knowledge of how to construct a list of ChangeOps from the JSON input to EditEntity.
  • that callback (or something that wraps it, or something that was constructed by it) needs to be injected into EditEntity.

Note that EditEntity currently uses pseudo-injection by accessing global state in the constructor, and it does not provide an overrideServices method. Changing this to using proper injection would also be nice. To achieve this, the API module can be registered using a callback instead of a class name.

When working on this, please file the appropriate tickets first.

Change 328357 had a related patch set uploaded (by Ladsgroup):
Make EditEntity injectable for getChangeOps

https://gerrit.wikimedia.org/r/328357

I made https://gerrit.wikimedia.org/r/#/c/328357 before your comment @daniel (I forgot to put Bug: T### properly in the commit message), but it's practically the same. Please give me your feedback and we continue based on your feedback

@Ladsgroup I had a quick look at your patch, and it seems a good first step. It would be nice to do some refactoring to make this a bit cleaner, but it should work as it is.

Too bad PHP's type system doesn't support function signatures, that would remove the need for classes/interfaces to make this kind of thing type safe.

Anyway, I'll have a closer look later. Perhaps make a patch in which MediaInfo uses the new mechanism, so we can see this in operation.

daniel triaged this task as High priority.Jan 5 2017, 4:13 PM

This blocks Lexemes and MediaInfo, so setting to high

@daniel @Ladsgroup @Jakob_WMDE: I've edited a task description adding an outline of how we (or at least I) imagine the mechanism would work. It has not been written down before but I believe we need this. Please go ahead and change the description if I consider something wrong, or move it some other place if there is a better one.

Also pinging @Aleksey_WMDE and @thiemowmde as they have been involved in the discussion on ChangeOpDeserializer interfance.

Vote for dropping "callback" suffix.

@Aleksey it seems a bit redundant here, but we use it for other callbacks in the entity type definition. I vote to keep in in the name of consistency.

We could make it optional for all fields in the entity type definition. But let's have a separate discussion about that.

The more important question I think I failed to make more promiment is whether entity type definition should be providing the ChangeOpDeserializer instance or just some generic callback returning ChangeOp instance. Former seems better to me, regardless how the actual property/field in the type definition will be called. Having callback for consistency seems reasonable IMO.

I just made a new patchset with this following: 1- The definition in array is "changeop-deserializer-callback" for consistency 2- The method in EntityTypeDefinition is "getChagneOpDeserializerCallbacks" also for consistency 3- The attribute in EditEntity API class is "changeOpsDeserializers"

As soon as we don't need to wrap it in a closure I don't see a reason to put callback in the name - it will only bring confusion.
If we need to wrap it in a closure I don't understand why do we need an interface in the first place?

AFAIK it's the plan to wrap it up in a closure (it'll be added here)

If we need to wrap it in a closure I don't understand why do we need an interface in the first place?

The callback is the factory for the actual deserializer object. The entity type definition file functions as a DI wiring file.

It is of course possible to use the callbacks directly. And if PHP has function types, that's what I would suggest doing. But since PHP doesn't have that, it's nicer to have an interface and objects, to provide at least some type safety.

Change 331609 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
More generic "@return ChangeOp" type for EditEntity::getChangeOps

https://gerrit.wikimedia.org/r/331609

Change 328357 merged by jenkins-bot:
Use ChangeOpDeserializer callbacks in Api\EditEntity

https://gerrit.wikimedia.org/r/328357

Change 331609 merged by jenkins-bot:
More generic "@return ChangeOp" type for EditEntity::getChangeOps

https://gerrit.wikimedia.org/r/331609

I think this should be resolved because It's pluggable now (probably we should add some docs too) and we should continue working on lexeme in T155699: Integrate WikibaseLexeme with EditEntity API module

What documentation do you have in mind? Is T154286 enough?

Ladsgroup removed a project: User-Ladsgroup.

I did some parts (subtasks) but I shouldn't get credit for all of it

WMDE-leszek claimed this task.

There might be a need to work further on this but it seems EditEntity API could now work with extension-registered entity types. Therefore I am closing this task.