Page MenuHomePhabricator

Rethink and streamline Lexeme class constructors
Closed, DeclinedPublic

Description

With https://gerrit.wikimedia.org/r/345292 I attempted to:

  1. simply remove code that already started hurting us because of easy to confuse parameters, and
  2. work towards a higher goal of having a set of a few very simple, clean, and easy to use constructors.

I believe the frustration in the patch comments are partly caused by me failing to make 2. transparent. I did not expected the need to explain 2. in advance. I imagined 1. to be enough, and 2. only being relevant in future patches.

What I did now is counting all current usages of the Lexeme constructor. I found only one non-empty constructor in production code (in the deserializer). All other constructor calls are exclusively used in tests, with these combinations of parameters:

  • ID only: 9 usages
  • Lemmas only: 3
  • Lemmas + ID: 4
  • Lexical category only: 1
  • Lexical category + ID: 2
  • Language only: 1
  • Language + ID: 2
  • Lexical category + language only: 3
  • Lexical category + language + ID: 7
  • All: 3 usages

I suggest to reduce the default constructor to a single, still optional ItemId $id = null parameter. This covers the two most relevant use-cases. Currently all entity types have a nullable ID parameter as the first one. I find it nice to keep this symmetry, and would even expect it.

We may add one or two more optional parameters to the default constructor, but this is already highly disputable. Which set of parameters would that be?

  • ID + forms + senses might make sense because this is the rough, basic structure of a Lexeme without going into any detail.
  • ID + lemmas appear to make sense because a Lexeme without at least one lemma is useless in production, similar to an Item with no label. But this is barely needed in tests.
  • ID + statements feels like it doesn't make much sense, because forms and senses have statements too.

So I suggest to stick to ID only for now, and possibly add parameters for forms and senses when both are implemented.

I suggest two static constructors with different guarantees:

  • newFromLanguageAndLexicalCategory( ItemId $language, ItemId $lexicalCategory )
  • newFromIdLanguageAndLexicalCategory( ItemId $id, ItemId $language, ItemId $lexicalCategory )

The second one guarantees that the Lexeme is completely initialized and LexemeContent::isValid will consider it valid. The first is the same, but with no ID for convenience in tests. I suggest to not merge these two use-cases into one with an optional ID parameter. Note that no parameter in this suggestion is nullable/optional. Also note that this suggestion includes changing the order of language and lexical category.

I suggest to use setters for everything else, including the one use-case in production.

Event Timeline

Change 345292 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Remove $lexicalCategory and $language from Lexeme constructor

https://gerrit.wikimedia.org/r/345292

I would extend this to constructors of all Entities. And I think removing all parameters from all the constructors would be best. We can perhaps keep the optional ID parameter, but even that isn't very useful.

If we do this, the EntityDocument interface should get an isFullyInitialized method, which we should check as a postcondition in EnityDeserializers, and as a precondition in EntitySerializers.

"Fully" initialized might be a confusing term we want to avoid.

For now we have Lexeme::isSufficientlyInitialized, which could act as the counter-part to one of the static constructors from the proposal above. We can think of naming one newSufficientlyInitialized or similar. This constructor will give the guarantee that isSufficientlyInitialized is true (and always will be, because none of the critical fields can be set back to null, including the ID).

However, I strongly suggest to not make this discussion more complicated by considering all entity types. There is no technical need to make them identical, and no need to add more guarantees to the EntityDocument interface. The Item and Property constructors are like they are for a long time now, and we never felt the urge to change them. This is different for Lexeme because it is new, and going to have a lot more parameters.

I haven't digested this proposal, and going even more further as @daniel outlined in his comment at https://gerrit.wikimedia.org/r/#/c/345292/, thus I don't feel ready to comment on the whole proposal.

My two cents so far:

  • I like the idea of aiming to have no optional parameters in constructors/static constructor-like methods.
  • it also seems more consistent to me if we say there shall be no constructor arguments at all. I could see why ID might be somewhat preferred to be allowed to be specified. On the other hand, one could argue that from what the entity stands for (ie. what it models), ID is probably least important. So it does more like "none or all" opposition to me.
  • regarding names: "isSufficientlyInitialized" is not very nice name, same "isFullyInitialized", Maybe we could just say "isInitialized"? It should be clear that if e.g. language is missing from the Lexeme it is not "sufficiently" initialized, thus it is not really "initialized" yet.
  • It is a valid point to say that limiting the change to the Lexeme as new entity type is going to make it more simple. OTOH it seems to me that it is "core" Wikibase (ie. the one with items and properties) that actually makes it pretty much required to instantiate incomplete entity objects. Therefore if we say this change is going to "improve" the situation, once could argue that items and properties should also be "fixed" in a similar way (and possibly they should be fixed in the first place). The point on longer parameter list of Lexeme's constructor is a valid one, though. On the other hand, currently Lexeme's constructor has one parameter more than Item's (with possible ways to wrap some of arguments to a thing like a Fingerprint for items/properties, which could reduce the parameter count - I am not saying this would be a good idea). I know there would be more parameters possibly coming (form and sense list, thus 2 params more). So there is some significant difference, I don't argue that.
daniel closed this task as Declined.EditedMay 31 2017, 4:35 PM

After some discussion, we agreed to resolve the problem of constructor parameters for Entities in a different way, see T166701 and T166694.

Note that this doesn't make everything proposed in this ticket invalid. We might still add static constructors for convenience.

Note that this doesn't make everything proposed in this ticket invalid. We might still add static constructors for convenience.

Yes, of course!

Change 345292 abandoned by Thiemo Mättig (WMDE):
Remove $lexicalCategory and $language from Lexeme constructor

Reason:
Should use an actual builder class. I'm not happy with the one currently merged in this code base, and want to continue working on it. Whatever the outcome will be, it's much easier to recreate this patch here than to rebase it.

https://gerrit.wikimedia.org/r/345292