Page MenuHomePhabricator

Batch add WO II war memorials to Wikidata
Open, Needs TriagePublic

Description

It would be great if one of you could help me out with the following:

The Dutch network war resources (netwerkoorlogsbronnen) wants to make their collected information on Dutch war memorials and lieu de memoire available through Wikidata.
In my beginners Wikidata point of view that would involve adding new WD items for the places that don't have an ID yet, and adding properties for war memorial to existing places.

Please see the attached file (in Dutch) to get an idea of the data involved.

Any help on getting this info on Wikidata or first steps on how to go about this are much appreciated!

Cheers and thanks so much in advance,

Michelle (SIryn)

Event Timeline

SIryn created this task.Oct 24 2018, 10:03 AM
Pintoch claimed this task.Oct 24 2018, 2:41 PM
Pintoch added a subscriber: Pintoch.

I would be interested in helping with this - I can guide you through the uploading process with OpenRefine.
If you want to prepare for this, I feel free to download OpenRefine have a look at tutorials, like these:

The videos at http://openrefine.org/ are also useful to get an idea of what OpenRefine does (with no reference to Wikidata).

Fedfant removed a subscriber: Fedfant.

We are willing to help out with this. Is anyone working on this yet?
Fricke en Lois

Footech moved this task from Backlog to Doing on the Wikistorm board.Oct 26 2018, 5:57 PM

864 items in total.

Proposal: We want to start with a pilot project: uploading all Aalten monuments (15 items).

  1. Check how many are already in Wikidata.
  2. Manually upload 3 items to get a feel of the required properties.
  3. Reorder data in Excel to easier allign columns and Wikidata Properties.
  4. Make selection of collumns for Wikidata.
  5. Upload using OpenRefine

Is the original prompter here? We would like to discuss our proposal and your data: for example, what the heck are those numbers?!?! 0_o

Find us in the hacking room!
Lois and Fricke

We made an export of all Dutch war memorials in Wikidata and compared them with our list (Monumenten - Gelderland). 37 memorials already have a page.

Awesome! \o/ Actually OpenRefine could potentially help you already at that stage to do the matching - let me know if you want a quick demo :)

Hi Pintoch,

A demo would be great. It's just me (Fricke), so far, but I'm expecting three team mates.

We will also have to convert some GPS coordinates in bulk. Do you have a tip what tool/site we could use for this?

Footech added a comment.EditedOct 27 2018, 12:46 PM

We decided to ditch the Excel. Instead Maarten scraped Wikipedia for all war memorials in the Netherlands. We cleaned up the data and the used OpenRefine to reconcile and upload the data. We excluded the following items:
-Items without a name
-Items without an ID

All items were specified as 'an instance of ' war memorial. This could be further specified in the further, with the help of the attached Excel.

Some of the following properties were used:
Item name
P276 Location
P3638 Oologsmonument ID
P131 located in the
P969 located at street adress
P281 Postal code
P625 Coordinates location
P571 Inception

SPARQL for Michelle:
SELECT ?oorlogsmonument ?Oorlogsmonument_identificatiecode ?oorlogsmonumentLabel WHERE {

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?oorlogsmonument wdt:P31 wd:Q575759.
OPTIONAL { ?oorlogsmonument wdt:P3638 ?Oorlogsmonument_identificatiecode. }
?oorlogsmonument wdt:P17 wd:Q55.

}
LIMIT 1000000

Latest version of file

The 6 test items uploaded well but these were the first 6 ids (1,2,3,4,5,6) and after upload these were added in decimal format with ".0" after the number. Open refine is great when it works as expected... Meanwhile multichill has also added some images to existing items. Listeria list is here for now, a war memorial WikiProject should probably be made to park this stuff more permanently https://www.wikidata.org/wiki/User:Jane023/oorlogsmonumenten

If we get so much detailed information into Wikidata, we should try to use a Wikidata-driven template to show this information on Wikipedia. Will we use a special template for this, or update the template [https://nl.wikipedia.org/wiki/Sjabloon:Infobox_beeld]? Not all monuments are sculptures.
Should we make a subtask or a separate task for this?

RonnieV added a comment.EditedOct 28 2018, 5:39 PM

The spreadsheet which Slryn made available has also a description of each monument. If this information is available under the right license, we could use this spreadsheet to create pages for monuments and use the infobox and these descriptions to start with articles on a reasonable level. Some script to use a csv to create these pages and fill them was already made yesterday. I'll upload it to github.

We were planning to upload the whole list in one go, but it required to much manual checking. I ended up adding a first batch of 100 items (see attachment).

Footech added a comment.EditedOct 28 2018, 9:12 PM

Er is nog een issue: OpenRefine heeft de identifier als integer gelezen en er een ,0 aan toegevoegd. Ik heb geprobeerd dit te corrigeren door de kolom op tekst te zetten. Nu is de identifier echter gedoubleerd. Weet iemand hoe dit opgelost kan worden? Zie bijlage

Edit: There is still one issue: OpenRefine interpreted the identifier as an integer (adding ,0). I tried to correct this by setting the collumn to tekst, but now it is showing the identifier twice. Does anyone know how to solve this?

https://www.wikidata.org/wiki/Q57907750

Good to see that you found a solution for the strange addition which changes integers to floats. New imports will be fine, so it's just a one time clean-up for 100 records. Manual handling is a quick solution, writing a script might be nicer and help in other cases

I figured there might already be a script available. Basically there are a lot of items that have no known suggestion. These could be automatically set to 'new item' (Note: this isn't foolproof. For example the Anne Frank house doesn't get a suggestion, if you try to add it as a monument, but of course it's already in Wikidata under a different, and probably more suitable label).

A lot of other items do get suggestions for possible Wikidata matches. Since we already did a check in advance these are usually wrong. The monuments we're working with all have an unique identifier, so it might be possible to exclude these false suggestions, but I'm not sure how to go about it.

Added another batch{F26968548}

Just to let you know that the problem with the ".0" will be solved in the next version of OpenRefine.
In the meantime, you can solve the issue by transforming your column with the following expression: value.toString().replace(".0",""). Hope it helps!

SIryn added a comment.Nov 2 2018, 7:13 AM

The spreadsheet which Slryn made available has also a description of each monument. If this information is available under the right license, we could use this spreadsheet to create pages for monuments and use the infobox and these descriptions to start with articles on a reasonable level. Some script to use a csv to create these pages and fill them was already made yesterday. I'll upload it to github.

I'll get on making this happen. My collegeau told me that they are actually freely licensed, but we do not have the written statement to back it up. It would be great if we could use the descriptions to make articles. I'll let you know how it works out.

SIryn added a comment.Nov 2 2018, 7:15 AM

Good to see that you found a solution for the strange addition which changes integers to floats. New imports will be fine, so it's just a one time clean-up for 100 records. Manual handling is a quick solution, writing a script might be nicer and help in other cases

Would writing a script be the way to go since the problem is not existing in the future? If it is to much work for just 100 uploads, I did be happy to adjust them manually.