|Resolved||Halfak||T145332 Formal publication of article quality score dataset|
|Resolved||Ladsgroup||T135684 Generate recent article quality scores for English Wikipedia|
|Resolved||• DarTar||T146708 Ask Figshare to remove file upload limit for Article Quality Score dataset|
|Resolved||Halfak||T146709 Mini blogpost for Article Quality Score dataset|
I was thinking the AQ score dataset would be good material for a short blog post, to drive more attention to it (roping in @Nettrom with a couple of quotes maybe?) @Halfak, @Ladsgroup: what do you guys think? I don't want to add a lot more work (and I can help with this task) but I feel the data release deserves a more visible announcement than just lists+wikiresearch.
For all our major data releases we create a registry entry in figshare, which adds a couple of benefits to just a static dump on datasets.wikimedia.org:
- it assigns the dataset a DOI (making it citable)
- it stores metadata (which gets propagated) making it more easily discoverable
- it includes additional mirroring of the dataset for long term preservation
See for example:
I think there's two AQ datasets going around. One is the one @Ladsgroup pointed to, which I believe @Halfak gathered, and is used for ORES training and evaluation. The second is the one I used to do some additional training to improve the wikiclass library, and that's already on figshare: https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406 This second dataset is gathered by following the process described in our 2015 CSCW paper, and referenced in the figshare description.