Page MenuHomePhabricator

[RFC] Possible redesign of the EntityTermsProvider interface
Closed, DeclinedPublic

Description

This is an alternative proposal to T78286.

https://github.com/wmde/WikibaseDataModel/pull/343 (rename Fingerprint to EntityTerms) is a first step towards this.

  • The idea is to refactor the EntityTerms class in a way that it represents 1 label, 1 description and 1 set of aliases that all have the same (base a.k.a. requested) language. Note that everything in that class should be represented as Term objects to support language fallbacks.
  • AliasGroup will either be dropped in favor of EntityTerms or refactored to contain a list of Term objects, not a list of strings (this is a bit unrelated to this proposal, see https://github.com/wmde/WikibaseDataModel/pull/272#discussion_r23175733 for the reasoning).
  • The EntityTermsProvider interface will have a single method, getEntityTermsForLanguage( $languageCode ).
  • TermList and AliasGroupList will be dropped.

This proposal will drop the possibility to get a list of all labels in all languages. This is on purpose since I think this use case is not relevant. If it turns out to be relevant we can add EntityTermsProvider::getEntityTerms, which returns an associative array of language code => EntityTerms objects.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

thiemowmde raised the priority of this task from to Medium.
thiemowmde updated the task description. (Show Details)
thiemowmde added a project: Wikibase-DataModel.
thiemowmde subscribed.

Should this be reflected in the serialization? If so, how? If not, how does this work for (De)Serializers?

This proposal changes the code path from "Entity > Fingerprint > TermType > Language" to "Entity > Language > TermType". Which is, as far as I can see, the code path that's way, way more common in our code base.

The serialization should, in my opinion, stay as it is. Which is "Entity > TermType > Language", right? Why change it, if it isn't broken? We didn't changed it when "Fingerprint" was introduced, right? So why change it now?

The question how this will affect the (de)serializers is, indeed, a very good one. I think it won't be a problem. We will replace the current FingerprintSerializer with something like a EntityTermSetSerializer which will do more or less the same. Most important difference: If we ever want to (for example) deserialize labels without deserializing descriptions and aliases, this will still be possible, but different. Instead of returning a set of Term objects for all labels, this deserializer will return a set of EntityTerm objects where the description and aliases part is either null or somehow marked as "not deserialized" (could use false as a special value for that, for example). This will require more memory (for the two false properties).

First off: Renaming in the UI has already happened. The reason behind that change was that, in the layout, the "sub-component" of a Fingerprint (label, description, aliases in one language) needs to be encapsulated in a dedicated container/widget. Using a construct like FingerprintSegment would have worsen the situation even more to what it is already. The result are entityterms*view and entitytermsforlanguage*view widgets that are introduced along the implementation of the new layout header.

While renaming Fingerprint does ease the situation, I would support that being just the first step. Personally, apart from having a flexible component for UI rendering, I do not see a proper reason for encapsulating labels, descriptions and aliases in a dedicated object as that encapsulation generates a quite unintuitive structure. Fingerprint appears to be an optimization that misses usability because users of the component have to understand a rather more than less sole internal concept. I have read the arguments on http://www.bn2vs.com/blog/page/2/#post-1306 multiple times during the last months and assume to understand the reasons but, to me, those do not outweigh the increase in logical (mental) structure: An Item has a label vs. an Item has a Fingerprint that has a label. Why would you want a user to understand what a Fingerprint is? Fingerprint seems like a container acting as a dump to get the broken concept of label, description and aliases not being Statements out of sight.

Back to the specifics of the RFC: I do not see huge improvement of EntityTerms being reduced to a collection of label, description and aliases in one language (which is what is represented in entitytermsforlanguageview in the front-end) as one, still, would need to understand the concept of that container class and, to me, that appears like shifting the problem. What would be the name for all EntityTerms (in all languages) then? Removing AliasesGroup* (MultiTerm* in JS) might be a good step towards simplifying the data model's logical structure. However, if the container Fingerprint was just removed, there probably should be a dedicated MultiTerm (AliasesGroup) class. (Apart from that aliases should actually be represented as a Statement as well.)

Thanks all for outlining your thoughts on the subject.

I have to take issue with the basic approach taken here. It's the classical

  • I need some code that solves this use case nice
  • There already is code that deals with the relevant subject, so I can use it
  • Actually, it does not work well for me, and would be better in a somewhat different form
  • We should change the code into a more suitable form

The fallacy here is thinking that because code is not suited to a specific use case, it should be changed. Instead one can write new code that handles this use case well. This way existing users of the code are not affected. Changing the responsibility of code after it's established is a workflow smell, often done without a real need for it. In big systems it is typically also not possible to have a single good representation for a concept, since the interpretation of this concept is differs subtly between contexts, making it harmful to strive for one.

I can certainly see how you might not want to organize the data into the structure of a Fingerprint in a UI, and how there might be a more suitable organization there. Likewise I can see the use of having an object that collects terms for a specific language, and an interface returning such an object. I don't object at all against introducing such new code, providing it is well designed etc. And in fact I much prefer this over existing code, be it Fingerprint or something else, being used for something it is not designed for.

What I object against is changing or removing code that is serving existing users well, because it is not useful for a particular type of tasks. We can address issues for individual use cases or types of tasks without having to discuss if they are more important than another set of use cases. This can involve removing concepts that where inappropriately introduced somewhere, which seems to be what Henning is mostly concerned about. And it can involve replacing objects and interfaces with new ones that are more suitable for particular use cases, which is part of what Thiemo is suggesting.

Given this, I'm -2 to this proposal as a whole. This does not mean I disagree with the motivations outlined, or have a problem with things such as "EntityTerms" being created. I hope I've made it clear that I see argumentation about why some specific use case is important or why some other one is not is besides the point, and is not going to make my change my position.

While I have no issue with something along the lines of the proposed EntityTerms and associated interface being introduced, I'm concerned about the name.

Consider this:

  • There is no guarantee all types of entities have terms
  • Different types of entities can have different types of terms
  • Entities defined outside of DataModel can have new types of terms, also defined outside of DataModel

This means that a collection of labels, descriptions and aliases is something that while it applies to Items and Properties, by no means applies to entities in general. The name also does not suggest it's a specific combination of different types of terms. It looks like either a collection (ie a list of Term) or perhaps a container that holds terms of all types that apply to entities (which is a non-sensible concept, either being none or all (all not being possible dependency wise)).

What would be the name for all EntityTerms (in all languages) then?

No name. In my proposal, this is an associative array of language code => EntityTerms objects. Note that the array key is optional, it repeats the base language from the EntityTerms object. That base language is redundant in cases where the Term objects inside the EntityTerms object are all of the same language, but relevant when dealing with language fallbacks and such.

if [...] Fingerprint was just removed, there probably should be a dedicated MultiTerm (AliasesGroup) class.

This is more or less what I propose here, with two small additions:

  • The class I'm proposing does contain all the aliases in one language (what is called AliasesGroup now and what you call MultiTerm) plus one label and one description in the same language.
  • All elements (the label, the description and all aliases) are Term objects. Note that this also solves this issue which needs further refactoring of the fallback classes anyway.

fallacy [...] workflow smell [...] existing users [...]

There is one question I can't get out of my head: What made the deprecation of all the existing term related methods in Entity so different from the proposed deprecation of Fingerprint? Or the deprecation of the newEmpty methods, where we had the exact same discussion? Since when is backwards compatibility across versions a dealbreaker and why? It wasn't that important half a year ago. What changed?

not useful for a particular type of tasks.

Fingerprint is barely useful for any task. I re-checked now and we type hint against Fingerprint in methods in 10 classes outside of tests and outside of DataModel:

  • FingerprintView: Misuse of Fingerprint, what it really needs is an EntityTermsForLanguage object for a single language.
  • 3 ChangeOp classes: Same, needs one language only.
  • 4 FingerprintValidator classes: Can easily be refactored to use a set of EntityTerms objects instead of a Fingerprint. From what I see this would even allow for further refactoring, like restricting the set of EntityTerms objects instead of providing the full Fingerprint and a list of languages.
  • EntitySearchTextGenerator and TermSqlIndex: Both are for indexing, both need all terms in all languages anyway, so neither nor is better, but very easy to refactor.

That's it.

This can involve removing concepts that where inappropriately introduced [...] And it can involve replacing objects and interfaces with new ones [...] Given this, I'm -2 to this proposal as a whole.

This confuses me. So you are ok with every detail in this proposal but not with the proposal as a whole?

I'm concerned about the name.

All your arguments about the name are valid (especially the fact that other entities will need other combinations of terms), even if they equally apply to Fingerprint (plus all the other arguments already given, like not even being a fingerprint as you can see when looking at the FingerprintValidator classes). Please suggest a better naming scheme or architecture (e.g. the T78286 proposal or similar) that suits these use cases.

Following Thiemo's analysis, I think we don't gain much by artificially grouping the terms properties and items have. The fact that we are not able to come up with a sane name for this adds up to the feeling that this is arbitrary and does not really correspond to anything from the domain we try to model. Thus, I agree that we should phase out Fingerprint and EntityTerms in the data models. The view code can, imho, continue to use ›entity terms‹ as a term, as long as we all know that neither all entities have terms nor that all entities have these kinds of terms.

Like I have written before, I totally support completely removing the concept of Fingerprint. The idea of renaming "Fingerprint" to "EntityTerms" was meant to be some kind of compromise that, eventually, no one really seems to be happy with.

All arguments have been expressed and since the data model is a public interface, it is time for PM to decide on how to proceed in that subject. I do not see a reason for having this stall for weeks. Being a dispute about the fundamental core component, it is more than about time to get that off the table.

Current situation:

  • Data model implementations use the class name Fingerprint.
  • PHP and JavaScript front-end code uses the term EntityTermsView / entitytermsview.
  • Special page is named SetLabelDescriptionAliases.

Calling @Lydia_Pintscher: Should the concept of Fingerprint be kept or should it get ditched? If it should be kept, should the name be kept or altered?

No update since two months now. What is the current state of this discussion. Did somebody finally make a decision?

This turns out to be a major blocker for doing more refactoring away from Entity usages. The way to go is currently to use a FingerprintProvider instead but since the concept of Fingerprint is put into question here, introducing new usages is strongly discouraged by some team members (see 1 and 2).

If we want to do any progress in favour of supporting new entity types etc. we should find a solution for this issue in reasonable time.

I talked to @Snaterlicious the day after his last comment yet forgot to write anything down here. We talked about the unsuitability of using Fingerprint in certain cases, and how that simply means that one should not use it in these cases rather than incentive to remove it. As I understood it, Henning no longer was intent on having it removed after this discussion, though I'll leave clarification up to him.

You get the same reaction for "Snak". Especially if you via tone of voice, formulation of sentence, body language, or some other mechanism, already indicate you think the name is stupid. Also, this issue says "redesign", not "rename". There are separate things.

thiemowmde renamed this task from RFC: Possible redesign of the EntityTermsProvider interface to [RFC] Possible redesign of the EntityTermsProvider interface.Sep 10 2015, 4:20 PM
thiemowmde removed a project: Patch-For-Review.
thiemowmde removed a subscriber: Abraham.

Not doing it in favor of label/description/aliases providers as discussed in sprint start today