Page MenuHomePhabricator

Import endangered alphabets in Wikidata
Open, MediumPublic

Description

The Atlas of Endangered Alphabets could be linked to Wikidata (the linkage could be done like this).

This would be a good OpenRefine exercise. The challenge here is to do the reconciliation: many items will not be classified as writing systems / languages / scripts, so we need to surface them. But the list is quite short so manual reconciliation should be doable too.

The dataset can be found here: http://pintoch.ulminfo.fr/e140c8b9bf/alphabets.tsv
(it is also a good scraping exercise to create this table directly from the website).

Related Objects

Event Timeline

Pintoch triaged this task as Medium priority.Nov 15 2019, 7:57 PM
Pintoch created this task.
Pintoch moved this task from Backlog to SPARQLstation on the Wiki-Techstorm-2019 board.

I would love it if somebody would take this on. I think it's a really worthwhile resource to link to, and a really significant topic to improve our data for.

But please be very careful to link to items that actually are for writing systems, alphabets, etc (although the items might be quite under-developed, and could be missing the P31 to say so), _not_ to any more general items that might exist for languages, peoples, cultures, etc.. Thanks!!

All the languages in this list are now matched (manually as well as by open refine).

I also see an issue with the fact that some item just use the name the writing system and other labels include the qualifiers themselves, either script, or alphabet, or writing system... Does not that mean that the whole of them should be reviewed?

Actually, it is confusing, because some language have only a "[NAME}" item, that is often described as a language but could be also qualified as a writing system, and others have a "[NAME} language" that may or not refer to some "[NAME] script/alphabet/writing system" and there is no generic "[NAME]" to use.... Difficult to determine a consistent approach :(