Page MenuHomePhabricator

Goal: Establish a framework to engage with data engineers and open data organizations
Closed, ResolvedPublic

Description

Part of T101100: Engineering Community quarterly goals for July-September 2015

Result

Partially completed

Wiki Loves Open Data offers a framework and is the result of a collaboration with the Wikidata team and community, including some chapters and projects.

BUT even if WMF Strategic Partnerships and some chapters are in talks with organizations, we cannot count that as “ongoing projects” yet.

  • @SVentura (Strategic Partnerships) started promising talks with World Bank, OECD, and others, but the requirement for CC0 licensing is the main obstacle for quick collaborations.
  • The involvement of @Lydia_Pintscher (Wikidata), @Wittylama (Europeana), @Susannaanas (WMFI), @Yair_rand (volunteer developer) and @johl (WMDE) among others has been very valuable and puts this first step in a promising direction inspired by the GLAM precedent.

The problem (previous description)

We are missing a community framework allowing Wikidata content and tech contributors, data engineers, and open data organizations to collaborate effectively on this use case:

  • Open data organization has a subset of interesting data that could be used to improve Wikimedia wikis after being added to Wikidata.

What are the specific problems that Wikimedia volunteers and/or interested open data organizations are facing?

  • What makes an open data organization? We need a technical definition of "open data" compatible with Wikimedia, useful for organizations to check whether it applies to them or not.
  • What types of contributions are welcomed? Open data orgs might want to dump all of their data somewhere. Wikidata might just accepts a few data types. How to know what is possible, what is useful?
  • How does the process work? Everybody knows Wikipedia, less know Wikidata, even less know how to contribute to Wikidata, and even less how to show that data in Wikipedia, which frequently is the ultimate goal of the open data owners.
  • How does licensing and attribution work? Licenses allowed in Wikidata, how are attributions shown, what happens when others reuse that data.
  • How is the data updated? Are there any expectations on open data organizations in updating the data contributed? Also, what happens with changes made by other contributors to that data? How to upstream changes? How to avoid that they are just overridden in a next update?
  • How to contribute resources? How to make effective use of resources available, i.e. a team in an open data organization is willing to work on a Wikidata project, there is a possibility to fund a data engineer in residence, to organize workshops and sprints (datathons?)
  • What precedents and ongoing projects are there? Especially at the beginning, any previous / ongoing experience is going to be very useful for new open data organizations and Wikimedia contributors willing to get involved.
  • How to track all of the relevant conversations about a particular objective over time? Contrary to above, not everyone is familiar with Wikipedia social structures and tools. Use of watchlists, etc. can be taught, tools could be improved.
  • How to visualize a data model that spans multiple entity types within the context of wikidata?
  • What else?

Out of scope

Use cases that we are NOT pursuing in this quarterly goal:

  • Open data organization has a humongous amount of data to be injected entirely to Wikidata.
  • Wikidata contributors start compiling a directory with the Sum of all Data.
  • Wikidata/Commons contributors start building a Wikimedia version of http://datahub.io/

The solution

Wiki Loves Open Data, A basic framework agreed with the Wikidata community and documented, offering a process that addresses the questions asked above. Imagine the GLAM framework applied to data.

GLAM has created documentation, campaigns, tools, success stories, a network of volunteers and cultural organizations, and even some new jobs. Today, a Wikimedian living in a place with an interesting gallery/library/archive/museum (or someone working in any of these institutions) has a framework and a support network to learn how to establish a first contact and organize a first activity. Let's try to build a framework allowing an easy start for open data projects.

This framework needs to be tested and improved with real collaborations with some alpha-testing open data organizations that will need to be very patient and understanding with us. For that we will need experienced Wikidata contributors and consolidated Wikimedia teams able to handle the relationship with these organizations and work (or find the resources to work) on the technical solutions to their problems.

This goal aims to start walking in the right direction until reaching a first milestone that we can be happy about.

Precedents

  • ProteinBoxBot: bots for populating wikidata with trusted biomedical information and for using that information to drive applications such as Wikipedia.

Organizations interested

Open data organizationWikimedia team mediatingContact person(s)
multipleWikimedia Belgium@Romaine
DBpedia@Hjfocs
World University and School@Scott_WUaS
Add yourselves

Communication

Measurement of success

  • Publication of basic documentation and community processes for open data engineers and organizations willing to contribute to Wikidata.
  • Ongoing projects with 1 open data org.

Dependencies

  • Wikidata team
  • Wikidata community
  • Strategic Partnerships team

ETA

DevRel-September-2015

Relation with WMF Call to Action

EXPERIMENT: support innovation & new knowledge

Related Objects

Event Timeline

Qgil claimed this task.
Qgil raised the priority of this task from to Medium.
Qgil updated the task description. (Show Details)
Qgil added subscribers: Qgil, SVentura, Lydia_Pintscher and 2 others.
Qgil set Security to None.

Let me be the first to say that Europeana wants to be part of this. We are already publishing everything in open data (which has provided much of the seed-information for the 'sum of all paintings' wikidata project) and have a strong organisational interest in being able to query Wikidata to integrate it in Europeana's database. (which requires, among other things, converting the current Europeana's hooks to dbpedia, and a production-grade SPARQL system in Wikidata).

For now, I will keep the discussion in the Wikidata mailing list thread, and I will document in the description of this task the progress.

@Wittylama, I'm very excited to read your comment! Very encouraging, and it just came moments after announcing this initiative. Thank you very much.

Romaine updated the task description. (Show Details)
Romaine updated the task description. (Show Details)

We are already doing more than imagining GLAM applied to Wikidata, we are applying GLAM datasets to Wikidata at Wikidata:WikiProject sum of all paintings
https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings
We not only have Maarten who is busy adding painting items from the world's greatest collections, we also have collections who are eager to share data with us and "see what happens". The Rijksmuseum has given us their old catalog codes (from before 1976) in order to make matchups to art catalogs possible. They have also shared with us their iconclass codes on all of the artworks for which we have Wikidata items. These have been added in the property "P1256 Depicts iconclass notation".

Qgil updated the task description. (Show Details)

@Wittylama - This is very exciting to read, Europeana is a perfect fit. It's not always obvious to organizations I've spoken with that these partnerships are a two-way benefit, they feed in and can pull from the projects. I'd love to learn more about the Europeana project, that would inform my conversations with other orgs.

Jane023 - Great to read about Wikidata:WikiProject sum of all paintings - wonderful project!

I'd like to announce our project and ask for assistance with it. Wikimedia Finland is starting a collaborative project to learn Wikidata with several Finnish organizations and the WP community. The starting point is GLAM data, but we have extended the scope. The partners/datasets are the Finnish Broadcasting Company migrating from Freebase to Wikidata, Laji.fi : Updating Finnish species names, Linked Data Finland: Historical place names, the Association of Finnish Local and Regional Authorities: Basic facts about Finnish municipalities, Open science and research, National Gallery artist and artwork database, GIS database of the National Board of Antiquities and the National Library of Finland, Finto service.
We look for lecturers / instructors for our workshops taking place after summer!

The SoaP project augments existing Wikidata items about paintings that already exist, and creates items in a structured way for paintings that exist in the real world, but are not yet on Wikidata. Many paintings have items because some Wikipedia somewhere has an article about the painting, or because it is in one of the metadata runs that Maarten has been doing. As he creates a body of items based on top museums starting with the GLAMs who have already donated to Commons, I have experimented with two artists to include their body of work as documented by art historians. This is effective as a way to measure the way we model painting items on Wikidata, but also as a way of testing the "findability" of items (I have merged many doubles) and discovering collections for Maarten to "datamine". I started with Frans Hals, and applied the same concept to Pieter de Hooch a few months ago. The result for Hooch is a list on Wikipedia (en/fr/nl) built with the assistance of "Listeria" that links directly to Wikipedia painting articles or Wikidata painting items (in that order of preference) here: https://en.wikipedia.org/wiki/List_of_paintings_by_Pieter_de_Hooch

Thanks for the clarification that "Wiki Loves Open Data [:)] should guide organizations through the process of contributing content to Wikidata" (https://phabricator.wikimedia.org/T104701) ... and World University looks forward to learning how this will work. I also recently posted my email of yesterday to you, Quim, and Jane here - http://scott-macleod.blogspot.com/2015/07/impatiens-species-structuring-world.html - in terms of WUaS's ~ 10 main foci/areas, as possible contributions to Wikidata, and establishing a framework to engage with data engineers and open data organizations.

There is a list of open data and data science institutions at Meta:Innovation

Quim and All, Where does this project stand now? How might we best develop it further? Thank you, Scott

@Rogol_Domedonfors, do you or someone you know have any contact with any of these organizations? Since all the organizations listed there so far are based in the UK, do you know whether someone at WMUK would be willing to act as common contact with open data orgs, in the lines of what WMFI is doing?

As explained in https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data, first we need to look for organizations that are willing to test and learn wth us. We are not ready yet for reaching out to multiple organizations without established contacts.

I may be able to help with some of the UK organisations: email me. I expect to be at the Wikipedia Science Conference in London, 2-3 September, which would be a suitable venue for a discussion.

@I9606, you added to the description

How to visualize a data model that spans multiple entity types within the context of wikidata?

Can you or someone else explain this, please?

@Richard_Pinch, hi! I'm not joining the Wikipedia Science Conference, but if other Wikimedians loving open data go, please do meet.

Now that the very basic documentation is in place, I'm a bit uncertain about the best next steps. I made a call for feedback and testers at the wikidata mailing list.

Apart from the fine tuning of these pages, what we need are more use cases, some organizations willing to go through their first steps to test this framework and identify better their needs. Wikimedia Finland, Wikimedia Belgium, and others have hinted at ongoing initiatives to collaborate with open data organizations. Can we help? Who else?

Qgil updated the task description. (Show Details)

This task is being resolved as a quarterly goal. The framework is in place for others to use it and improve it, based on real-life experiences with open data organizations. I have updated the description with some details.

Thank you to everybody involved!