Page MenuHomePhabricator

Generate monthly article quality dataset
Closed, ResolvedPublic

Description

  • enwiki
  • frwiki
  • ruwiki

Event Timeline

Halfak created this task.Sep 14 2016, 3:33 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 14 2016, 3:33 PM

Checking in. I wish that I hadn't turned on verbose mode (which is way too verbose) for our monthly article quality extraction process. I'd be able to look at INFO log lines to see how we're progressing on processing dump files.

Right now, i can only say that we've got 291M article quality assessments. We might end up with 360M if my conservative estimate is about right.

Stat1003 got a reboot, so I'm trying to pick up where I left off.

I just started up the frwiki extractor

I just uploaded the cleaned and compressed enwiki dataset to figshare

frwiki is up to 66,833,544 article/month assessments

Halfak updated the task description. (Show Details)Oct 3 2016, 8:49 AM

OK. Done with French. Starting up Russian

Halfak updated the task description. (Show Details)

All datasets are here: https://datasets.wikimedia.org/public-datasets/all/wp10/20160801/

I'm traveling so it's hard to upload to figshare. I'll do that upload when I'm on a better connection.

Halfak closed this task as Resolved.Oct 11 2016, 11:51 PM