Page MenuHomePhabricator

Publish scraper results on figshare
Open, Needs TriagePublic

Description

In order to make our HTML dump scraper data more usable, we had planned to publish the metadata on figshare, which is a data repository chosen by many wiki researchers. The process has already been worked out for an earlier edition of our data, which we unfortunately had to redact. See T341751: Publish dump scraper reports for the final text and more information.

Raw data itself is already published and can remain hosted on WMF infrastructure: https://analytics.wikimedia.org/published/datasets/one-off/html-dump-scraper-refs/

Related Objects

Event Timeline

awight renamed this task from Publish scraper results on figshare, publicize dataset at conference to Publish scraper results on figshare.Mon, May 6, 9:22 AM