Page MenuHomePhabricator

Consult the Wikidata, Commons and GLAM communities about the feasibility and practical organization of GLAM metadata and ontology mapping for Structured Commons
Closed, ResolvedPublic

Description

A first request for feedback from the Commons, Wikidata and GLAM communities around work on GLAM metadata schemes and ontologies, and how we can properly map them to Structured Commons.

By 4 May 2018 we'd like to know if the communities are interested in working on this, and how they'd generally like to organize the process.

Event Timeline

The consultation is closed now, but comments are still very welcome. I'm providing a summary and suggestions for next steps around the end of May.

Repeating the summary here as well. Skipping the hyperlinks ;-) - for those, see the on-wiki summary.

Hi everyone! My warmest thanks to everyone who contributed to this discussion. With some delay, here's an attempt at (very briefly) summarizing what I (Sandra) read in your comments above, adding some of my own thoughts and reflections to the mix. Feel free to comment!

Mapping GLAM metadata schemes and ontologies to structured data on Commons - is this a worthy undertaking?

  1. Several people seem interested in working on this and think that a common and centralized effort makes sense, although the scope for this work needs to be better defined (see below).
  2. It is probably good to start by taking a step back first: what impact do we want to achieve? More GLAM contributions to Commons? Better contributions? Less frustrating upload and contribution processes? ...
  3. Interestingly: during a recent Wikidata workshop day at the Europeana offices, I (Sandra) got very clear feedback from some GLAM participants that they don't think we should put enormous amount of efforts in mapping GLAM metadata schemes in great detail; it would (according to them) be much better instead if we would work towards crystal-clear, well-documented and findable instructions, and towards standardized ways in which GLAMs should model their own data towards Commons. Although coming from a small group, I find this interesting input which I'd like to verify more broadly.

Better focus and prioritization needed

  1. The original proposal was way too broadly and vaguely defined and seems to be very unclear to people not familiar with GLAM metadata and ontologies.
    1. It is important to distinguish between ontologies and vocabularies, as these are very different things.
    2. Looking at 'vocabularies', it's probably also good to distinguish between
      1. Thematic / topical data (subjects, concepts - example 'oak tree')
      2. Person names (photographers, artists, depicted people)
      3. Organization names (both organizations that contribute files to Commons, such as GLAMs, and other organizations that may be involved in the production of our media files, such as publishers, photo studios, etc.)
    3. We also must clearly distinguish between metadata used for the description of artworks (as GLAMs do in their collection management systems, and which in our case will probably mostly be used to describe artworks on Wikidata) and of media files (as GLAMs do in digital asset management systems, and which in our case will be used to describe Commons files).
  2. We need to prioritize our efforts: it is probably most worthwhile to work first on those metadata schemes and vocabularies that are very widely adopted in the GLAM sector.
  3. I read some consensus that working on this will not produce a magic bullet, and converting GLAM metadata to Commons will always be painful. (While this is true, I think it's a worthy undertaking to work towards a process that makes it the least painful as possible.)

On Wikidata, we have already started working on some GLAM ontologies and vocabularies.

Some insights from that perspective:

  1. We are mapping many vocabularies on Wikidata, including thesauri. We might want to include more information on Wikidata about the hierarchical relations in those, and we might want to work on mapping the SKOS format to Wikidata.
  2. The Commons category system is also a hierarchical structure with a wealth of data that we don't want to get lost.
    1. I (Sandra) recommend everyone to read the findings about categories in the context of GLAM uploads as part of the GLAM research earlier this year, where participants report having difficulties finding the right categories; from my own experience since 2012 with GLAM uploads - both performing and re-using files from them - I also notice these tend to be under-categorized, often with sub-optimal selection of categories.
  3. Inside GLAM vocabularies and inside our own projects, there are still major knowledge gaps!

The longer term

We need to think about the longer term: maintainability and constant updates to mappings.

Technical integration

We need to think carefully how such mappings (if we work on them) are integrated in technical infrastructure. It's probably not a good idea to statically 'bake' them into APIs - perhaps code libraries make more sense, and we might want to encourage specific tool development in this direction? This also needs further investigation.

Follow up in June 2018 and beyond:

  1. I (Sandra) have the feeling we need more input from GLAMs themselves, and I'm now thinking how to do this: whether this can be done in an informal survey or another type of consultation, and which questions need to be asked. Please let me know if you have ideas or suggestions here.
  2. Categories. The core team working on Structured Data on Commons needs all currently allocated time and funds to give its full focus to the basic functionalities of structured data itself; extra work in technical support for transitioning categories is out of scope within the current timeline and budgets. I myself also can't give extra attention to category conversion from a practical perspective. Conversion of category data to structured data is, like data modelling and conversion itself, a task for the communities.
  3. It would be helpful to make it easy for more people to contribute to the process of mapping GLAM metadata to Wikidata and Commons. More help is certainly welcome, and needed. I (Sandra) can try help support this by creating a set of GLAM info pages, part of the Structured Commons information site, including a better structured set of 'landing pages' on GLAM vocabularies and ontologies. These will point towards existing work on Wikidata's WikiProjects, be extensible by anyone, offer a first attempt at prioritization, and point to (if it exists) documentation. Help is welcome here!
  4. Several members of the Structured Commons team will be present at the Wikimania hackathon, which is a good opportunity to talk to volunteer developers about ideas for technical integration of GLAM metadata mapping. It is probably quite relevant to tools that (will) support GLAM uploads to Commons and Wikidata, for instance Pattypan and OpenRefine 3.0.