We suggest to have a pywikibot script to import marcxml data used by libraries to Wikidata.
MARC is a standard used by many libraries for description of items catalogued by libraries. This includes people, places, institutes etc.
Wikimedia Israel got permission from National Library of Israel to get and use its authority database (xml dump size is about ~2.3GB of text), and we have a starting point for coding with a small subset of the data. We expect that other libraries may want to share their data in similar way in the future.
- A scratch for the project: https://github.com/idoivri/marc_to_wikidata
- Mapping of Wikidata properties to MARC: https://docs.google.com/spreadsheets/d/1lXxIe1vYFbUaTGUWTFi7Gh9zZzUcKZNSjPCk65qgm2A/edit#gid=0
- A subset of the large XML file: https://github.com/eranroz/marc_to_wikidata/blob/master/marcxml_example.xml
- A short primer to reading the MarcXML file can be found here: https://docs.google.com/document/d/1BVZZd9cJrAwd_dzfX7abB79dbGrQoQGpJJc_seabaPI/edit