Page MenuHomePhabricator
Paste P7129

Productionize monthly article quality prediction datasets
ActivePublic

Authored by Halfak on May 15 2018, 10:44 AM.
The monthly article quality predictions have proven very useful. However, re-generating the data for new dumps is a time-consuming and highly manual process. There should be a job that runs periodically on the Analytics Cluster to keep this dataset up to date.
Here's the one-off dataset:
https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800
Here's an example of some fun research that is based on this data:
https://commons.wikimedia.org/wiki/File:Demonstrating_the_Keilana_Effect_(OpenSym%2717).pdf