Page MenuHomePhabricator

Sort claims of a property in meaningful way
Open, LowPublic

Description

Marking the most recent claim of a property as preferred can be a nightmare when there are multiple claims for an statement. One example of this would be checking that the most recent ELO of a chess player is the preferred one in this example. In this example we have more than 100 different ELO each of them from a different point if time. Checking missing months or even finding the most recent one to set it as preferred is a PITA.

This also happens with the population of entities.

Sorting this claims by the value of the property P585 (point in time) would make things easier.

Event Timeline

thiemowmde added a project: patch-welcome.
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.

We might need to sort on more qualifiers than just P585. This is becoming an issue for P39 claims, where it's not unknown for someone to have 10-20 claims or even more if they had a long career (especially in politics where they might hold a lot of different official posts and elected offices).

These tend to use P580/P582 (start/end) rather than point in time, so it would be good if this sorting could also take advantage of those dates. Other properties that require sorting might possibly use P571/P576 (inception/dissolution), though I can't offhand think of any examples that use those on a large scale.

Probably the best approach here is to sort all claims by the P585, P580 and P571 qualifiers in some way - start dates are more likely to be available than end dates, especially if something is "current", and it makes conceptually a bit more sense to sort by when something began rather than when it ended.

One other sorting method that might be useful - P1545 qualifiers on P2093 claims (used for huge numbers of scholarly articles) or P735 claims (starting to become common for people with first and middle names). It looks like we will need a general solution for sorting properties by qualifier...

Related to this task, I’ve created a script to sort the values of P⁠348 (versions of a software), see https://www.wikidata.org/wiki/User:Seb35/sortValues.js (and talk page). It could be adapted for other properties.

But it will be better if it is integrated into Wikibase, perhaps by defining in the property page a specially-recognised property 'Ordering' saying to sort either with the value of the property (with some specially-recognised item) either according to some qualifier either according to the rank. E.g. on Property:P⁠580 we have such property to sort the values of this property directly by the values:

Ordering (PXX) = Value (QYY)

And on Property:P⁠1087 (Elo rating) there could be a property saying to sort by P⁠585 (date):

Ordering (PXX) = Date (P585)

Defining the order in the property page allows the community to discuss what order should be prioritised: e.g. for P⁠1087, should we sort by value or by qualifier "publication date" or by rank? The order defined by the community would be the default order, it could also be created scripts to define alternative orders for specific needs.

On the longer term:

  • it could be added a specially-recognised qualifier to this property to define sub-orders: when two values are identical, the second then the third (etc) order will be used. E.g. sort by P⁠580 (start date) then by P⁠582 (end date),
  • special collations could be added, e.g. for software versions the semantic versioning needs some special attention
  • where should be sorted the special values "no value" and "unknown value"?
  • should we (and how) deal with references? should we allow sorting by (e.g.) P⁠813 publication date from the reference?

PS: in this post, to avoid P⁠348 is intepreted by Phabricator as a link to a paste, and given I didn’t find how to avoid such behaviour in Remarkup syntax help, I used the Unicode character U+2060 (zero-width word joiner) between the P and the number. A bit hacky but it works.

Lydia_Pintscher renamed this task from Sort claims of a property by point in time to Sort claims of a property in meaningful way.Dec 21 2018, 7:05 PM
Lydia_Pintscher added a subscriber: SJu.

So from my side sorting of claims within one statement group is indeed something we should look into more. From my side there are the following things to keep in mind:

  • How do ranks play into this? We also have a task for making ranks more useful and visible. One of the options we have there is ordering by rank and then collapsing anything but the best ranked statements.
  • We have a lot of other properties that can be used in qualifiers. How do they interact with time-based ones when it comes to sorting?

I'd love to see some more experimenting with @Seb35's script to figure this out.

I can't comment on actual implementation/feasibility, but conceptually I think it would make sense to:

  • Have an editable list of qualifiers by which claims within the same statement group should be sorted (ordered by priority - aka, if P585 is present sort by it, then sort remaining claims by P580, etc.)
  • Sort claims with multiple of the same sortable qualifier, or no sortable qualifier at the top (as these need attention)

and in the interests of keeping life simple:

  • Don't sort by rank, indicate this in some other clear way, that way it's easy to spot if a claim somewhere in the order has falsely been given a rank
  • Don't sort by snaktype, novalue and somevalue claims can take qualifiers just like any other and be ordered that way, otherwise they will go to the top thanks to point 2
  • Don't sort by claims in the sources, doesn't really make sense since the sources don't necessarily indicate anything about the order of the claims