Page MenuHomePhabricator

Formal publication of article quality score dataset
Closed, ResolvedPublic

Description

Create a figshare entry with metadata and basic documentation on article quality score data (T135684) and announce it.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2016, 4:11 PM

I was thinking the AQ score dataset would be good material for a short blog post, to drive more attention to it (roping in @Nettrom with a couple of quotes maybe?) @Halfak, @Ladsgroup: what do you guys think? I don't want to add a lot more work (and I can help with this task) but I feel the data release deserves a more visible announcement than just lists+wikiresearch.

Forgive me if my question is a little bit stupid but It's already in datasets.wikimedia.org (see T135684#2622793) So what exactly should we do to consider it published? I was thinking that having a table in labs (T106278) would be really nice. Do you mean this?

For all our major data releases we create a registry entry in figshare, which adds a couple of benefits to just a static dump on datasets.wikimedia.org:

  • it assigns the dataset a DOI (making it citable)
  • it stores metadata (which gets propagated) making it more easily discoverable
  • it includes additional mirroring of the dataset for long term preservation

See for example:

@Ladsgroup I realize I should document the process (and benefits) somewhere on wikitech.

I was thinking that having a table in labs (T106278) would be really nice.

That would be fantastic to have too, although it's a separate task.

I think there's two AQ datasets going around. One is the one @Ladsgroup pointed to, which I believe @Halfak gathered, and is used for ORES training and evaluation. The second is the one I used to do some additional training to improve the wikiclass library, and that's already on figshare: https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406 This second dataset is gathered by following the process described in our 2015 CSCW paper, and referenced in the figshare description.

@Nettrom agreed, we should definitely reference the 2015 one (maybe cross-link the two entries).

Oh, Thanks :)

Halfak renamed this task from Publish article quality score dataset to Formal publication of article quality score dataset.Sep 15 2016, 3:06 PM
Halfak closed this task as Resolved.Nov 3 2016, 8:45 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJul 17 2017, 6:34 PM