Page MenuHomePhabricator

Import the Museum Data Files in Wikidata
Open, MediumPublic

Description

The Museum Data Files (MDF) updated in 2018 are a set of three files which provide information about museums and related organizations in the United States.

This data could be imported in Wikidata. @Alicia_Fagerving_WMSE has already worked out how columns from this dataset could be mapped to Wikidata properties and a SPARQL query to show the existing museums with the corresponding ids on a map. Only 266 museums show up there, while the dataset contains thousands of them: there is probably room for importing more of them!

I think this would be a good task for anyone who attends the OpenRefine workshop at WikiTechStorm.

@Alicia_Fagerving_WMSE is it fine with you if others get involved in this import? Is there anything they should know when working on this?

Event Timeline

Pintoch moved this task from Backlog to Data imports on the OpenRefine board.Nov 14 2019, 6:19 PM
Ecritures assigned this task to Pintoch.Nov 14 2019, 6:20 PM
Ecritures triaged this task as Medium priority.
Ecritures moved this task from Backlog to SPARQLstation on the Wiki-Techstorm-2019 board.
Ecritures removed a subscriber: Pintoch.

@Pintoch – I don't have any immediate plans to do anything concrete with this dataset, and I'm glad if my notes can be of any use.

I think the biggest risk is duplication due to the bad quality of existing items. For example, here's a bunch of items linked to enwp articles about American museums that don't have P31 at all, so reconciliation might be tricky.

Pintoch removed Pintoch as the assignee of this task.Nov 15 2019, 4:30 PM
Pintoch added a subscriber: Pintoch.
Pintoch moved this task from Backlog to Done on the Wiki-Techstorm-2019 board.Nov 23 2019, 11:26 AM

I have reconciliated the database only on the Children Museum (CMU discipline code) based on their common names.
For the matches, I have added to WIKI data the Official name and for the URLS for unambigous ones.

MH0042 added a subscriber: MH0042.EditedNov 23 2019, 2:42 PM

I took file1 (of the three) and loaded it in OpenRefine.

I first wanted to find out which museums were already in wikidata and which ones not. So I reconciled on Commonname. Then I went manually through the non-matched items. When OpenRefine did suggest several options, I decided which one was correct, or that there should be a new item in Wikidata.

I have not finished this yet, but when I have finished this, I will leave three files (matched, new, non-matched) and hopefully anybody else can use these to put the data in wikidata. So I did not upload anything to wikidata yet.

A complication is that it turned out that Vincy.Lacroix is also working on this task and is uploading childrens museums. So maybe the reconcilition should be done again, because there will be new matches that were not there yet when I did the reconcilition.

I attached three files. These are the result of my work on the first museum file "MuseumFile2018_File1_Nulls.csv".

As mentioned in my previous comment, I reconciled this file in OpenRefine with wikidata on the column CommonName. This resulted in three subsets:

  1. List Item MuseumFile2018_File1_Nulls-csv matched.xlsx. This file contains the museums that are already in wikidata. They were either recognised by the reconciliation of OpenRefine, or I matched them by going through the suggestions of OpenRefine. I added a column Qid, with the Q-number of the museum items in wikidata. In many cases the wikidata items do not have much information, so it would be useful to add information from the MuseumFile to wikidata. The Q-number will help to match the information to the right wikidata item.
  1. List Item MuseumFile2018_File1_Nulls-csv new.xlsx. This file contains museums that are not yet in wikidata. The reconciliation process did suggest a match for these museums, but by manually going through them I determined that they did not match. So these museums should be added as new items in wikidata.
  1. List Item MuseumFile2018_File1_Nulls-csv not matched.xlsx. This file contains museums where the reconciliation did not find any match. So these museums should be added as new items in wikidata as well.

Unfortunately I do not have time to do further work on this task. I hope somebody else can use these files to add the museum information to wikidata.

I will take up the task to upload the data of 'MuseumFile2018_File1_Nulls-csv new.xlsx.' to Wikidata.

Ecritures moved this task from Done to Backlog on the Wiki-Techstorm-2019 board.Dec 2 2019, 10:32 AM