Page MenuHomePhabricator

Create a dataset of past GLAM-Wiki collaborations
Open, Needs TriagePublic


Create a structured dataset of past documented GLAM-Wiki collaborations in the Wikimedia movement.

This data will help to understand the diversity as well as the needs of GLAM-Wiki collaborations. This data is currently spread across various sources such as the GLAM Newsletter, Grant reports, Affiliate reports, Meta-Wiki etc.

Event Timeline

Very interested in this task to assist tracking instances of Open GLAM data releases in Wiki Commons.

In the Open GLAM survey, we currently record such instances through a manual, word-of-mouth process. It would be fantastic if one could query Open GLAM data releases of media in Wiki Commons in a more structured and automated way.

Will follow with interest and am happy to discuss and assist!

@Douglaskmccarthy Indeed, it would be really awesome if we are able to query this data.

Here is the page on Meta-Wiki about this research project:

Hopefully, I will be sharing more happy news in the coming months.

Here's a Wikidata query of institutions that are described in the OpenGLAM Survey, with their Commons categories:

Please note that a Commons category does not necessarily mean that there's content from that institution uploaded to Wikimedia Commons; it may also just be a small category with photographs of their building taken by Wikimedians. And the data may not be complete, i.e. institutions without Commons category may actually have one, but it hasn't been added to Wikidata yet.

In order to be able to query for actual uploads from the institution, we'd need to dig deeper. Having that data as structured data + being able to retrieve it through a SPARQL endpoint for Wikimedia Commons would be super helpful. See T221921: Provision search endpoint for SDC. Requirements from Product Team.

@David_Haskiya_WMSE pinging you as this is related to what we discussed earlier today.

Linked data maturity is of interest In the datatset it would be nice to see the quality of the metadata like Europeana try to measure in the Metadata quality framework by Péter Király

My understanding is that the Europeana people has big problems with the metadata quality of delivered material so they have created a Metadata Quality Assurance Framework for Europeana

As we see in Sweden that all of the museum material uploaded to Europeana miss "same as" and "coordinates" it would be also of interest to see the metadata quality we get in GLAM-Wiki collaborations to understand what help those institution needs and also what added value the Wiki community adds regarding depicts in pictures/metadata/linked data...

If you check what is delivered from Sweden by SOCH in the Metadata Quality Assurance Framework for Europeana you can see that they deliver 1 130 565 objects but no coordinates or same as has been delivered

image.png (1×2 px, 495 KB)

The consequences is that the Europeana enrichment adds error as they guess that every one called "Carl Larsson" is same as Wikidata Q187310 BUT a Swedish museum also have 1 million photos from a person named Carl Larsson same as Q5937128.

The sad thing is that the museum has good metadata but when exported to Europeana the people doing that converts Linked data to text ---> and we get a mess...

Example of the cost of bad metadata is that en:Wikipedia refused to have links to Europeana and deleted the template you can also follow the link for Carl Larsson to Europeana and see its a mix of Q187310 and Q5937128.

image.png (1×2 px, 3 MB)

  • blogpost about this "Carl Larsson who is that - sadly Europeana doesnt know --> #Metadatadebt"

My guess is that when Europeana has so much problem asking one museum to say that this artist is the same as an artist at another museum this will never scale to also have entity management for what a picture depicts using linked data if not we start to communicate the quality of delivered metadata / entity management. The challenge is that sending text strings is easy and have entity management needs a total different skill sets and management that we dont see in most GLAM institutions today.

I have been working with international bank transactions and no one would think about sending something as text strings with no unique identifier and then start match bank accounts on the name "We give all money to one person called "Carl Larsson" for me this is an indication of a network lack of maturity when no one reacts, people dont know how to error report and no one gives us an action plan how and when it will be fixed.... most of the components are missing like: getting a helpdesk ticket and an easy way seeing the status....

#Linkeddata needs #linkedpeople if that parameter can be measured then we at least understand better what actions are needed...

@SGill Is there still activity going on in this task or is it wrapped up/abandoned?

Aklapper added a subscriber: SGill.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).

Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.

If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".

Also see for tips how to best manage your individual work in Phabricator.