Page MenuHomePhabricator

Dedicated section on Wikidata Item and Property pages for classifying Properties
Open, Needs TriagePublic

Description

Main components:

  • Wikidata configuration

User story:
As a new Wikidata editor, I would like to quickly see which parts of an Item define the Wikidata ontology.

Problem:
As is, our users are often not aware that some of their edits on Items are modifying the Wikidata ontology. This unintentionally leads to an overly complex and inconsistent ontology. We need to improve this situation.

Solution:
A first step would be to introduce a dedicated section on Item and Property pages for classifying properties. This section is named "Classification" and it will live above the statements section. This dedicated section will make it more obvious that some properties modify the ontology.

List of properties:
The proposed section will contain the following properties:

Origin: https://www.wikidata.org/wiki/Wikidata:Item_classification

Origin: https://www.wikidata.org/wiki/MediaWiki:Wikibase-SortedProperties#Classification_(2)

Notes:

  • This first step will only involve the display of statements after saving. We might consider making deeper changes to the UI and recommendation system at a later time, based on user experience with this change.

Acceptance criteria:

  • Wikidata is reconfigured in a way so that the described properties will show up as a dedicated section above the statements section on Item and Property pages.

Open questions:

  • Should some properties be removed from this dedicated section?
  • Should some properties be added to this dedicated section?

Community communication:
We would like to get community feedback via the Project Chat before implementing this change. If possible, we should also ping the community members that originally discussed this idea.

Original:
This idea came up at Wikidata Data Quality Days, WikidataCon pre-conference, and the WikidataCon session on ontology issues.

Event Timeline

For the list of properties, we could look at MediaWiki:Wikibase-SortedProperties § Classification and § Classification (2) for inspiration, though I don’t think we need to include all of those properties.

The only UI element we need to define is the heading (a message in WikimediaMessages).

Manuel updated the task description. (Show Details)
Manuel renamed this task from Dedicated section on Wikidata Item pages for classifying properties to Dedicated section on Wikidata Item and Property pages for classifying Properties.Mon, Nov 15, 11:26 AM
Manuel updated the task description. (Show Details)

It's a great idea! I just believe P2860 (cites work) should be removed from the list. It's definitely not a classifying property.

We already discussed this as Data Engineering and Semantics Research Unit, University of Sfax. We should include there the basic and taxonomic properties.

By Basic, we mean elementary and generic relation types.

By Taxonomic, we mean non-symmetric relation types.

If the relation type passes these two criteria, we can include it in Classification Part.

For example, "different from" is symmetric. If X is "different from" Y, then Y is also "different from" X. So, it cannot be included in Classification Part.

For example, "cites work" is not a generic relation. In fact, it cannot be assigned to an item about a football game or a portrait. So, it cannot be included in Classification Part.

To clarify, I give an example of a Wikidata property that can be included there according to our discussion: "Instance of".

  • If X is an instance of Y, Y cannot be an instance of X. So, the property is taxonomic.
  • Any item can be the subject of "instance of". So, this property is basic.

Hmm - I agree with the above that P2860 should not be on this list. If we are including the "partitive" properties like P361 and P527 (taxonomic in the sense that they group parts of something with the whole), what about P355 (subsidiary) and P749 (parent organization), which are used that way for organizations, or other properties of that sort?
On the other hand your list does not include the truly taxonomic property P171 (parent taxon) - which is explicitly a subproperty of P279. P10019 (term in higher taxon) seems to also be a subproperty of P279.

It's always a huge red flag, when assessing a proposed change, to read a user story which is self-evidently detached from reality. This and other klaxons of doom are found in this proposal.

  1. It's at best a polite fiction to suggest there are any users who "would like to quickly see which parts of an Item define the Wikidata ontology". A truer user story is: "as as experienced WD user I'd like other people to stop breaking the ontology". All that flows from a fictitious user story is water from a poisoned well.
  1. The key identified problem is that people break parts of ontologies from time to time. In this proposal, there's some sort of [miracle occurs here] process inferred, wherein changing the order & providing a heading, "Classification", for a set of property statements in an item, will make users who are sufficiently incautious as to break ontologies, and who may indeed not know what an ontology is, not break things.
  1. "This dedicated section will make it more obvious that some properties modify the ontology." Only in your dreams.
  1. PerArthurPSmith, not even The Most Basic analysis of what is an ontological property (e.g. ?item wdt:P1647* wd:P279. ) has been made by those proposing this change. Real HeadDesk stuff.
  1. Given the problem issue, conspicuously absent from the proposal is any suggestion for evaluating the effect of the change. afaics, we're going to evaluate the outcome based on feels - ("user experience with this change") - rather than, for instance, metrics on changes to the incidence of reverted ontology changes.
  1. Come to that, absent from the proposal is any measurement of the incidence of ontology-breaking, or an analysis of the edit count of the breakers. The whole thing is predicated on some sort of faith and belief.
  1. Absent from this proposal are any communty member proposers. ("If possible, we should also ping the community members that originally discussed this idea.") For whatever reason - Parkinson's Law, presumably - WMDE has decided this change is a good thing (proposed by WMDE, promoted on Chat by WMDE).

So to sum up, the proposel is an ill-analysed, wishful thinking, cargo cult. The main expected outcome is that a bunch of property statements will be promoted in prominence on the item UI, delighting ontologists and, for instance, depressing geographers and biographers who will see their preferred statements sink beneath the fold. New incautious users will continue to bork things (see also: scorpion/frog koan).

It does not matter much if you do rearrange the deckchairs as you propose, so to that extent, knock youself out. It's not a good substitute for getting on with other more pressing problems, and it is emblematic of the capricious approach to prioritisation and resource utilisation we are so used to with WMDE.

I have commented elsewhere, on the fact that it appears to me to be a very difficult task to collect "classifying Properties", but I'd like to comment here on the technical side:

Currently, in the entity view, we group statements based on the datatype of their property values: Most everything goes into Statements, but identifiers go into Identifiers. This proposal appears to be about adding a third group, "Classification", to the top of the list.

If we decide that is a good idea, the next question would be how to select which statements go into Classification. The two options I see are to provide a fixed list of such properties in the Wikibase configuration file, or to derive the list of properties from the Wikidata information itself, by classifying properties as "appears in Classification", "appears in Statements", and possibly others.

I've looked into the second option and it's straightforward to implement. We need a new StatementGrouper that looks up the Property, finds an "appears in" statement attached to it, and uses it to determine where the statement appears.

I think that's a good idea as a generic Wikibase feature. It's a few extra lines of code.

As for the first option, that would require a strong, immutable consensus of which parts of the Wikidata ontology are "Classifying" and which aren't, strong enough to justify putting it into the WikiBASE configuration and not deriving it from the WikiDATA database. I don't think such a consensus exists.

I think the first option is a bad idea.

(My vague intuition is that what people want to see as Classification is statements that are true "by definition". That's easy enough to do for some statements: elephants are mammals by taxonomical definition. It's impossible to do for mathematical objects which often have different, nontrivially equivalent definitions (worse, whether or not they're equivalent may depend on the precise flavor of mathematics in use))

Good to see this problem being addressed. Some remarks:

  • As much as I am aware, we do not fail the classification job completely. It's the P279/subclass-of hierarchy which some refer to as the "Wikidata ontology" that is problematic, because it is generic in topic, global in reach, and does not closely resemble any other ontology from elsewhere so that we cannot stricly build this on sources. I suggest to limit modifications to P279 claims.
  • Main reasons for the poor P279 ontology, from daily Wikidata editing experience over several years:
    • Requires high level of knowledge and experience. We leave editors pretty much alone to learn the necessary skills.
    • Poor tooling; simple edits in the P279 hierarchy can have severe adverse effects that are difficult to project even for experienced users.
    • Lack of awareness; editors often modify P279 claims to fix something else, such as e.g. a constraint violation in another item (it would be better to fix the item, leave the constraint violation there for others to fix it, or sometimes to fix the constraint definition).
    • Also: often there is not a clear "correct" or "incorrect" approach when classifying data items, and some situations are arguably not easy to resolve. This needs more community discussion and probably also an explicit definition of the term "Wikidata ontology", its purposes, and its design principles.
  • In general, I think we should rather restrict the ability to add, modify, or remove P279 main values by introducing a new user group "ontologist" (or so). This would be similar to "property creator", which is another user group based on technical skills and experience in a certain field. The community could then elect or assign the right to interested, qualified users. My only concern is that this might not scale well.

Good points from @MisterSynergy and others above. One other case I often run into is problems caused by item merges; if both original items had P279 statements this can cause significant trouble (for example it is a common source of subclass loops).

In general, I think we should rather restrict the ability to add,

Another pattern used in iNaturalist is that you let more people confirms an observation and then set research Grade

maybe it could be used in Wikidata for P279 ?!?!?