Import endangered alphabets in Wikidata
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Pintoch
	Nov 15 2019, 7:57 PM

Description

The Atlas of Endangered Alphabets could be linked to Wikidata (the linkage could be done like this).

This would be a good OpenRefine exercise. The challenge here is to do the reconciliation: many items will not be classified as writing systems / languages / scripts, so we need to surface them. But the list is quite short so manual reconciliation should be doable too.

The dataset can be found here: http://pintoch.ulminfo.fr/e140c8b9bf/alphabets.tsv
(it is also a good scraping exercise to create this table directly from the website).

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Pintoch	T236038 WORKSHOP: OpenRefine (10.30 - 12.15)
		Open		None	T238441 Import endangered alphabets in Wikidata

Event Timeline

Pintoch triaged this task as Medium priority.Nov 15 2019, 7:57 PM

Pintoch created this task.

Pintoch moved this task from Backlog to SPARQLstation on the Wiki-Techstorm-2019 board.

Pintoch moved this task from Backlog to Community data imports with OpenRefine on the OpenRefine board.

Ecritures moved this task from SPARQLstation to Backlog on the Wiki-Techstorm-2019 board.Nov 15 2019, 9:53 PM

I would love it if somebody would take this on. I think it's a really worthwhile resource to link to, and a really significant topic to improve our data for.

But please be very careful to link to items that actually are for writing systems, alphabets, etc (although the items might be quite under-developed, and could be missing the P31 to say so), _not_ to any more general items that might exist for languages, peoples, cultures, etc.. Thanks!!

Created a catalog on MixNMatch: https://tools.wmflabs.org/mix-n-match/#/catalog/3042

All the languages in this list are now matched (manually as well as by open refine).

I also see an issue with the fact that some item just use the name the writing system and other labels include the qualifiers themselves, either script, or alphabet, or writing system... Does not that mean that the whole of them should be reviewed?

Actually, it is confusing, because some language have only a "[NAME}" item, that is often described as a language but could be also qualified as a writing system, and others have a "[NAME} language" that may or not refer to some "[NAME] script/alphabet/writing system" and there is no generic "[NAME]" to use.... Difficult to determine a consistent approach :(

Pintoch moved this task from Backlog to Done on the Wiki-Techstorm-2019 board.Nov 23 2019, 11:16 AM

Import endangered alphabets in WikidataOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Import endangered alphabets in Wikidata
Open, MediumPublic
Actions

Related Objects
Search...