Publish scraper results on figshare
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	awight
	Apr 29 2024, 9:43 AM

Description

In order to make our HTML dump scraper data more usable, we had planned to publish the metadata on figshare, which is a data repository chosen by many wiki researchers. The process has already been worked out for an earlier edition of our data, which we unfortunately had to redact. See T341751: Publish dump scraper reports for the final text and more information.

Raw data itself is already published and can remain hosted on WMF infrastructure: https://analytics.wikimedia.org/published/datasets/one-off/html-dump-scraper-refs/

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		awight	T366144 Run HTML dump scraper (June 2024)
		Resolved		awight	T363675 Publish scraper results on figshare

Event Timeline

awight created this task.Apr 29 2024, 9:43 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 29 2024, 9:43 AM

awight claimed this task.May 3 2024, 7:06 AM

awight moved this task from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-24 board.

awight renamed this task from Publish scraper results on figshare, publicize dataset at conference to Publish scraper results on figshare.May 6 2024, 9:22 AM

awight moved this task from Doing to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-24 board.

awight moved this task from Incoming to In progress on the WMDE-References-FocusArea board.May 8 2024, 12:48 PM

awight removed awight as the assignee of this task.May 29 2024, 9:23 AM

awight added a parent task: T366144: Run HTML dump scraper (June 2024).

awight closed this task as Resolved.Jun 10 2024, 12:23 PM

awight claimed this task.

awight moved this task from In progress to Done on the WMDE-References-FocusArea board.Oct 23 2024, 7:05 AM