Page MenuHomePhabricator

Produce dump files for ORES scores
Open, LowestPublic

Description

Ideas for what to dump:

  • enwiki-20181107-ores-scores.jsonlines.bz2 – Full dump of all ORES scores available for the wiki.
  • enwiki-20181107-ores-scores-editquality.jsonlines.bz2 - Dump of all scores for a specific model.
  • wikidata-20181107-ores-scores-itemquality-history-monthly.jsonlines.bz2 - Dump of all historical scores for a content quality model, but sampled down to one revision per page per month.
  • wikidata-20181107-ores-scores-itemquality-current.jsonlines.bz2 - Much smaller dump that only has scores for the latest revision of each page.

These will require some processing to create, a MapReduce query should be fine.

The last time we made a dump like this, it was created manually. About 300 people have downloaded and it's cited by 2 papers, so it seems to be useful.
https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800

Event Timeline

awight updated the task description. (Show Details)
Ladsgroup raised the priority of this task from High to Needs Triage.
Ladsgroup moved this task from Unsorted to New development on the Machine-Learning-Team board.

What sort of help do you need from my end, or is it too early?

Harej triaged this task as Lowest priority.Mar 26 2019, 9:19 PM