Page MenuHomePhabricator

enriching wikidata entities with MARC data from libraries
Open, Needs TriagePublic

Description

We suggest to have a pywikibot script to import marcxml data used by libraries to Wikidata.

MARC is a standard used by many libraries for description of items catalogued by libraries. This includes people, places, institutes etc.

Wikimedia Israel got permission from National Library of Israel to get and use its authority database (xml dump size is about ~2.3GB of text), and we have a starting point for coding with a small subset of the data. We expect that other libraries may want to share their data in similar way in the future.

  1. A scratch for the project: https://github.com/idoivri/marc_to_wikidata
  2. Mapping of Wikidata properties to MARC: https://docs.google.com/spreadsheets/d/1lXxIe1vYFbUaTGUWTFi7Gh9zZzUcKZNSjPCk65qgm2A/edit#gid=0
  3. A subset of the large XML file: https://github.com/eranroz/marc_to_wikidata/blob/master/marcxml_example.xml
  4. A short primer to reading the MarcXML file can be found here: https://docs.google.com/document/d/1BVZZd9cJrAwd_dzfX7abB79dbGrQoQGpJJc_seabaPI/edit

Event Timeline

eranroz created this task.Mar 31 2016, 11:23 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptMar 31 2016, 11:23 PM

Thanks, @eranroz! :)
Awesome stuff...

We are in the Caesarea room, in the north-east part of Hansen building.

johl added a subscriber: johl.Apr 1 2016, 8:54 AM
putnik added a subscriber: putnik.Apr 1 2016, 8:55 AM

Wikimedia RU currently has a preliminary agreement with the Russian State Library (one of the largest Russian libraries) on the export of MARC database (this is several million records). I hope that in the coming days we will settle the copyright question.

Therefore, such a script would be interesting for us.

Wikimedia RU currently has a preliminary agreement with the Russian State Library (one of the largest Russian libraries) on the export of MARC database (this is several million records). I hope that in the coming days we will settle the copyright question.

Therefore, such a script would be interesting for us.

Awesome :)

Alleycat80 updated the task description. (Show Details)Apr 1 2016, 9:19 AM
Alleycat80 updated the task description. (Show Details)Apr 1 2016, 9:21 AM

Wikimedia RU currently has a preliminary agreement with the Russian State Library (one of the largest Russian libraries) on the export of MARC database (this is several million records). I hope that in the coming days we will settle the copyright question.

Therefore, such a script would be interesting for us.

Great, join us at Caesarea room!

johl removed a subscriber: johl.Mar 22 2017, 12:13 PM