[Abstract Wikipedia data science] Create parser for list of all existing wikis
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	LostEnchanter
	Dec 18 2020, 2:31 PM

Description

To fetch all the Scribunto modules in _all_ Wikimedia pages, first thing to do is to get list of them. That can be done by parsing an existing page on Meta-wiki. For further usage, parsed pages should be saved in text file.

The update (15.12.2020)

For fetching additional information we need to know names of different wikis in database too. So it is reasonable to switch to fetching this info from 'meta' wiki database copy, as written in here

But these tables don't have update time property, as they are just copies, which are not updated per se, the new copies are just loaded instead of old ones. So it makes sense to look at time of creation to check for updates.

Tasks

Parse existing wiki links from the page
Save them in text file, one line - one link
Add checks for page parsing (page unavailable, page changed...)
Add check if page have been updated recently (api request?)
Move to fetching info from database copies
Save request results as csv
Make "last update" checker
Put updater to cron

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		None	T263678 Analyze community authored functions that build Wikipedia infoboxes and more
		Resolved		LostEnchanter	T270493 [Abstract Wikipedia data science] Create parser for list of all existing wikis

Event Timeline

LostEnchanter closed this task as Resolved.Dec 18 2020, 2:31 PM

LostEnchanter created this task.

Jdforrester-WMF moved this task from To triage to Data Science work on the Abstract Wikipedia team board.Dec 23 2020, 7:18 PM

[Abstract Wikipedia data science] Create parser for list of all existing wikisClosed, ResolvedPublicActions

Description

Description

The update (15.12.2020)

Tasks

Related ObjectsSearch...

Event Timeline

[Abstract Wikipedia data science] Create parser for list of all existing wikis
Closed, ResolvedPublic
Actions

Related Objects
Search...