Page MenuHomePhabricator

Harvest Wikidata item into the monuments database from the linked article
Open, Needs TriagePublic

Description

T140795 is about harvesting the Wikidata item when it is indicated in the monuments list.

Many lists link to Wikipedia articles, from which we could look-up the Q item and store it.

This could could of course be done dynamically as well. Not sure what’s best.

Thoughts @Lokal_Profil @Multichill ?

Event Timeline

JeanFred created this task.Oct 24 2016, 5:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2016, 5:15 PM

First we need to track what property to use. Let's take the good old Rijksmonumenten. We use the template at https://nl.wikipedia.org/wiki/Sjabloon:Tabelrij_rijksmonument (with Wikidata support) and the property is https://www.wikidata.org/wiki/Property:P359 .

On https://nl.wikipedia.org/wiki/Lijst_van_rijksmonumenten_in_Haarlem_Centrum the first entry is "Welgelegen" id 15966. So we want the Wikidata item that has P359==15966. Sparql can give us that: https://query.wikidata.org/#SELECT%20%3Fitem%20WHERE%7B%20%3Fitem%20wdt%3AP359%20%2215966%22%20%7D%20%20 -> http://www.wikidata.org/entity/Q17186772

Doing this one by one would be a pain. I've been cross referencing quite a few sources on Wikidata. Generating a lookup table is a very inexpensive operation. See for example https://github.com/multichill/toollabs/blob/master/bot/wikidata/biografisch_finder.py#L58

Shouldn't be too hard to add this to the harvester.

@Multichill What you describe sounds rather like T138668?

Here I really meant

Oh right, I thought about that and realized that quite often multiple entries link to the same article.....

It might be OK for the lists which have a separate article parameter but for the ones where the links are extracted from the name field we risk pulling in all kinds of things.

That said I know that for e.g. the Swedish lists på here is not a perfect overlap between article value and wikidata since the list deals with complexes of buildings while the articles are normally about an individual buildings in that complex (example: list entry is for the church and the separate bellflower but the article is just about the church).