Page MenuHomePhabricator

Estimate how many Wikidata items have low/no ORES score
Closed, ResolvedPublic

Description

As a user, if there is catastrophic Wikidata loss, I want to keep items with (good) ORES scores, as they may be more useful to me in querying.

  • How many items do not have ORES scores?
  • What is the distribution of ORES scores for Wikidata items?
  • Is there an optimal separation between high/low ORES scores?
    • If so, what is it?

This ticket is a part of WDQS disaster planning, and reflects research into mitigation strategies for catastrophic failure of Blazegraph: specifically in the case that the Wikidata graph becomes too big for Blazegraph to continue supporting. This is not a commitment to a long term state of WDQS or Wikidata, but part of the disaster mitigation playbook in a worst case scenario.

Event Timeline

ACraze added a subscriber: ACraze.

Hi @AKhatun_WMF, just following up to your message in IRC (#wikimedia-ml):

On reading the docs, I think ORES scores item quality on every revision (https://www.wikidata.org/wiki/Wikidata:Item_quality). I am wondering what the ORES scores displayed for each item then means (https://www.wikidata.org/wiki/Wikidata:Item_quality#ORES). How to get the item quality of the Wikidata item in its present state?

I believe this is correct, the ORES model scores at the 'revision' level, so getting the item quality of of a Wikidata item in it's present state should just be scoring the most recent revision to that item.

@ACraze Indeed! I was confusing the models for revision (item quality) with edits (damaging/good faith). The latest revision is all I will need. Thank you!

@MPhamWMF Hi, could you please clarify the question Is there an optimal separation between high/low ORES scores?. Separation in what respect? To my mind comes the separation of items with respect to the subgraph it is part of.

cc: @JAllemandou @dcausse

@AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant when I wrote the question about "optimal separation" is given some distribution of ORES scores (e.g. a normal distribution), is it clear what the threshold is for what qualifies as a "high" vs "low" score: e.g. anything over .75 is a high score. But that's assuming the scores are continuous. I guess it's moot if they're binary (I don't actually know).

If this isn't a sensible way of thinking about the issue, let me know if there's a better way.

@AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant when I wrote the question about "optimal separation" is given some distribution of ORES scores (e.g. a normal distribution), is it clear what the threshold is for what qualifies as a "high" vs "low" score: e.g. anything over .75 is a high score. But that's assuming the scores are continuous. I guess it's moot if they're binary (I don't actually know).

If this isn't a sensible way of thinking about the issue, let me know if there's a better way.

Ah, that I believe is already solved by the output of the model. Basically, we get probabilities for 5 classes (A to E) determining how good an item is, where A is the best and E is the worst. And then the score is calculated as 5*ProbabilityOfClassA + 4*ProbabilityOfClassB + 3*ProbabilityOfClassC + 2*ProbabilityOfClassD + 1*ProbabilityOfClassE. But we can definitely define our own thresholds as well.

The analysis is done here: Wikidata_Item_ORES_Score_Analysis

I will be doing a bit more to get the scores per subgraph and will add it here as well.

Oh cool, no need to reinvent the wheel then! we can just use the current solution then

Yeah I think the underlying question we came to with this was if it would make sense to consider kicking out the low-quality Items from the Query Service for the disaster planning. The more I think about the less I think we should, because the query service is such an important piece of infrastructure for the workflows to get exactly these low-quality Items improved. The approach now chosen seems better to me.

@AKhatun_WMF: You mention on the wiki that some Items don't have an ORES score. All Items should have one 😬 Do you have an example of one that does not?
Also: overview for subgraphs would be fantastic.

/cc @Manuel because this is probably of interest for him as well.

@AKhatun_WMF: You mention on the wiki that some Items don't have an ORES score. All Items should have one 😬 Do you have an example of one that does not?

Oh, it's not that they don't have a score per se. They're just not in the event data table, so I could not get a score for them to analyze. I will clarify that!
If we could run an event for all existing items, we could get scores for all items. The way the table is populated at present, it only produces scores for the latest revisions I believe.

Ahh makes sense. Probably not worth bothering then.

Yeah I think the underlying question we came to with this was if it would make sense to consider kicking out the low-quality Items from the Query Service for the disaster planning. The more I think about the less I think we should, because the query service is such an important piece of infrastructure for the workflows to get exactly these low-quality Items improved.

+1

The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis

I really appreciate your work. Thanks!