Page MenuHomePhabricator

[Curious Facts] Provide a mode with equiprobability among constraints
Closed, ResolvedPublic

Description

Problem:
We don't have a mode for getting results from constraints in order to avoid repeating the same type of violation.

Example:
Constraint A has 10000, whereas B has 10. Then we should return results from A and B in a 1:1 ratio rather than 1000:1. However, if there are 1000, it may not appear as a "curious fact" at all.

Event Timeline

Lydia_Pintscher subscribed.

We should think a bit more about if 1:1 is the right ratio.

@amy_rc @Lydia_Pintscher

The current sampling of anomalies that would be presented to a user of Curious Facts is random and proportional to the size of the respective anomaly set.

I have also read that comment on the project's talk page but I am not sure if I agree that it should be implemented. Namely, any user would than think that anomaly type A and anomaly type B are equally present in Wikidata - which is not the case, empirically.

However, your call. Let me know if you want to switch to uniform random sampling across the anomaly types and I will implement it.

An alternative option is to

  • have the dashboard re-organized so
  • that controls would be included for uses
  • to select among different types of anomalies, suitably described in non-technical terms.

Also, I was think about implementing a search option (for properties and items - list all anomalies that have to do with a particular item, class, or property). But that's another story.

As of this sampling idea... I am not sure. Let me know what you think.

On the grounds presented in T286277#7200018:

The current sampling of anomalies that would be presented to a user of Curious Facts is random and proportional to the size of the respective anomaly set.

and the fact that no one showed interest in this discussion since the ticket was opened, I will decline this ticket.

Please re-open if you have any arguments why the currently implemented random and proportional sampling of anomalies in the Qurator Curious Facts system should be replaced by equiprobable sampling.

So I still believe this is the right way to go and it is important.
The current system gets boring quickly because it shows you the same type of issue on the same type of property in an overwhelming percentage of runs. This is not a good user experience. This is also the feedback we've been getting repeatedly.

@Lydia_Pintscher

Well, then we have a go for a uniform sampling of anomalies.
To be implemented very soon. Thank you.

@Lydia_Pintscher

Ooops - amid tons of tickets I must have somehow forgotten that I have already implemented an equiprobable sampling of the anomaly type in Qurator Curious Facts... please go check.

Change 724994 had a related patch set uploaded (by GoranSMilovanovic; author: GoranSMilovanovic):

[analytics/wmde/WD/WikidataAnalytics@master] T286277

https://gerrit.wikimedia.org/r/724994

Change 724994 merged by GoranSMilovanovic:

[analytics/wmde/WD/WikidataAnalytics@master] T286277

https://gerrit.wikimedia.org/r/724994

Yes! This feels much better :)
\o/