Maniphest T101950

Goal: Establish a framework to engage with data engineers and open data organizations
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Qgil
	Jun 10 2015, 9:31 AM

Description

Part of T101100: Engineering Community quarterly goals for July-September 2015

Result

Partially completed

Wiki Loves Open Data offers a framework and is the result of a collaboration with the Wikidata team and community, including some chapters and projects.

BUT even if WMF Strategic Partnerships and some chapters are in talks with organizations, we cannot count that as “ongoing projects” yet.

@SVentura (Strategic Partnerships) started promising talks with World Bank, OECD, and others, but the requirement for CC0 licensing is the main obstacle for quick collaborations.
The involvement of @Lydia_Pintscher (Wikidata), @Wittylama (Europeana), @Susannaanas (WMFI), @Yair_rand (volunteer developer) and @johl (WMDE) among others has been very valuable and puts this first step in a promising direction inspired by the GLAM precedent.

The problem (previous description)

We are missing a community framework allowing Wikidata content and tech contributors, data engineers, and open data organizations to collaborate effectively on this use case:

Open data organization has a subset of interesting data that could be used to improve Wikimedia wikis after being added to Wikidata.

What are the specific problems that Wikimedia volunteers and/or interested open data organizations are facing?

What makes an open data organization? We need a technical definition of "open data" compatible with Wikimedia, useful for organizations to check whether it applies to them or not.
What types of contributions are welcomed? Open data orgs might want to dump all of their data somewhere. Wikidata might just accepts a few data types. How to know what is possible, what is useful?
How does the process work? Everybody knows Wikipedia, less know Wikidata, even less know how to contribute to Wikidata, and even less how to show that data in Wikipedia, which frequently is the ultimate goal of the open data owners.
How does licensing and attribution work? Licenses allowed in Wikidata, how are attributions shown, what happens when others reuse that data.
How is the data updated? Are there any expectations on open data organizations in updating the data contributed? Also, what happens with changes made by other contributors to that data? How to upstream changes? How to avoid that they are just overridden in a next update?
How to contribute resources? How to make effective use of resources available, i.e. a team in an open data organization is willing to work on a Wikidata project, there is a possibility to fund a data engineer in residence, to organize workshops and sprints (datathons?)
What precedents and ongoing projects are there? Especially at the beginning, any previous / ongoing experience is going to be very useful for new open data organizations and Wikimedia contributors willing to get involved.
How to track all of the relevant conversations about a particular objective over time? Contrary to above, not everyone is familiar with Wikipedia social structures and tools. Use of watchlists, etc. can be taught, tools could be improved.
How to visualize a data model that spans multiple entity types within the context of wikidata?
What else?

Out of scope

Use cases that we are NOT pursuing in this quarterly goal:

Open data organization has a humongous amount of data to be injected entirely to Wikidata.
Wikidata contributors start compiling a directory with the Sum of all Data.
Wikidata/Commons contributors start building a Wikimedia version of http://datahub.io/

The solution

Wiki Loves Open Data, A basic framework agreed with the Wikidata community and documented, offering a process that addresses the questions asked above. Imagine the GLAM framework applied to data.

GLAM has created documentation, campaigns, tools, success stories, a network of volunteers and cultural organizations, and even some new jobs. Today, a Wikimedian living in a place with an interesting gallery/library/archive/museum (or someone working in any of these institutions) has a framework and a support network to learn how to establish a first contact and organize a first activity. Let's try to build a framework allowing an easy start for open data projects.

This framework needs to be tested and improved with real collaborations with some alpha-testing open data organizations that will need to be very patient and understanding with us. For that we will need experienced Wikidata contributors and consolidated Wikimedia teams able to handle the relationship with these organizations and work (or find the resources to work) on the technical solutions to their problems.

This goal aims to start walking in the right direction until reaching a first milestone that we can be happy about.

Precedents

ProteinBoxBot: bots for populating wikidata with trusted biomedical information and for using that information to drive applications such as Wikipedia.

Organizations interested

Open data organization	Wikimedia team mediating	Contact person(s)
multiple	Wikimedia Belgium	@Romaine
DBpedia		@Hjfocs
World University and School		@Scott_WUaS
Add yourselves

Communication

Initial announcement in Wikidata mailing list
Echo of the announcement in Wikidata project chat
Planned by the end of July: https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data (see T104701)

Measurement of success

Publication of basic documentation and community processes for open data engineers and organizations willing to contribute to Wikidata.
Ongoing projects with 1 open data org.

Dependencies

Wikidata team
Wikidata community
Strategic Partnerships team

ETA

DevRel-September-2015

Relation with WMF Call to Action

EXPERIMENT: support innovation & new knowledge

Related Objects
Search...

Status	Assigned	Task
Resolved	Qgil	T101099 Developer Relations Roadmap
Resolved	Qgil	T113030 Developer Relations quarterly review Jul-Sep 2015
Resolved	Qgil	T101100 Engineering Community quarterly goals for July-September 2015
Resolved	Qgil	T101950 Goal: Establish a framework to engage with data engineers and open data organizations
Resolved	Qgil	T104701 Create https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data project page (or equivalent)
Resolved	• DannyH	T107584 Enable Flow for Wiki Loves Open Data discussion page
Resolved	Qgil	T107625 Wiki Loves Open Data subpages for organizations with continuous relationship
Resolved	• ezachte	T107613 Request for data: sites traffic by topics/ subject areas and geographies

Event Timeline

Qgil created this task.Jun 10 2015, 9:31 AM

Qgil claimed this task.

Qgil raised the priority of this task from to Medium.

Qgil updated the task description. (Show Details)

Qgil added projects: Developer-Advocacy, Wikidata, ECT-July-2015, ECT-August-2015, DevRel-September-2015.

Qgil added subscribers: Qgil, • SVentura, Lydia_Pintscher and 2 others.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 10 2015, 9:31 AM

Qgil mentioned this in T101100: Engineering Community quarterly goals for July-September 2015.Jun 10 2015, 9:31 AM

Qgil added a parent task: T101100: Engineering Community quarterly goals for July-September 2015.Jun 10 2015, 10:02 AM

Qgil mentioned this in T101099: Developer Relations Roadmap.Jun 10 2015, 12:15 PM

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Jun 11 2015, 12:54 PM

Qgil mentioned this in T99945: Create a Phabricator project for 'Partnerships'.Jun 22 2015, 10:12 PM

Qgil moved this task from To triage to July-September 2015 on the Developer-Advocacy board.Jun 28 2015, 9:58 PM

Qgil updated the task description. (Show Details)Jul 1 2015, 3:16 PM

Qgil set Security to None.

Goal shared with the Wikidata community at https://lists.wikimedia.org/pipermail/wikidata/2015-July/006581.html

Wittylama subscribed.Jul 1 2015, 3:57 PM

Romaine subscribed.Jul 1 2015, 4:18 PM

Smalyshev subscribed.Jul 1 2015, 4:19 PM

Multichill subscribed.Jul 1 2015, 4:24 PM

Susannaanas subscribed.Jul 1 2015, 4:31 PM

Let me be the first to say that Europeana wants to be part of this. We are already publishing everything in open data (which has provided much of the seed-information for the 'sum of all paintings' wikidata project) and have a strong organisational interest in being able to query Wikidata to integrate it in Europeana's database. (which requires, among other things, converting the current Europeana's hooks to dbpedia, and a production-grade SPARQL system in Wikidata).

• Abraham subscribed.Jul 1 2015, 5:59 PM

Qgil updated the task description. (Show Details)Jul 1 2015, 9:11 PM

For now, I will keep the discussion in the Wikidata mailing list thread, and I will document in the description of this task the progress.

@Wittylama, I'm very excited to read your comment! Very encouraging, and it just came moments after announcing this initiative. Thank you very much.

Qgil updated the task description. (Show Details)Jul 1 2015, 9:15 PM

Qgil updated the task description. (Show Details)Jul 1 2015, 9:22 PM

Romaine updated the task description. (Show Details)Jul 1 2015, 10:00 PM

Romaine updated the task description. (Show Details)

Qgil updated the task description. (Show Details)Jul 1 2015, 10:05 PM

Scott_WUaS subscribed.Jul 1 2015, 11:49 PM

I9606 updated the task description. (Show Details)Jul 2 2015, 4:26 AM

I9606 subscribed.

Jane023 subscribed.Jul 2 2015, 8:49 AM

We are already doing more than imagining GLAM applied to Wikidata, we are applying GLAM datasets to Wikidata at Wikidata:WikiProject sum of all paintings
https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings
We not only have Maarten who is busy adding painting items from the world's greatest collections, we also have collections who are eager to share data with us and "see what happens". The Rijksmuseum has given us their old catalog codes (from before 1976) in order to make matchups to art catalogs possible. They have also shared with us their iconclass codes on all of the artworks for which we have Wikidata items. These have been added in the property "P1256 Depicts iconclass notation".

Hjfocs updated the task description. (Show Details)Jul 2 2015, 1:36 PM

Hjfocs subscribed.

Qgil updated the task description. (Show Details)Jul 2 2015, 1:40 PM

Qgil updated the task description. (Show Details)

Hjfocs updated the task description. (Show Details)Jul 2 2015, 1:49 PM

Qgil updated the task description. (Show Details)Jul 2 2015, 1:55 PM

@Wittylama - This is very exciting to read, Europeana is a perfect fit. It's not always obvious to organizations I've spoken with that these partnerships are a two-way benefit, they feed in and can pull from the projects. I'd love to learn more about the Europeana project, that would inform my conversations with other orgs.

Jane023 - Great to read about Wikidata:WikiProject sum of all paintings - wonderful project!

I'd like to announce our project and ask for assistance with it. Wikimedia Finland is starting a collaborative project to learn Wikidata with several Finnish organizations and the WP community. The starting point is GLAM data, but we have extended the scope. The partners/datasets are the Finnish Broadcasting Company migrating from Freebase to Wikidata, Laji.fi : Updating Finnish species names, Linked Data Finland: Historical place names, the Association of Finnish Local and Regional Authorities: Basic facts about Finnish municipalities, Open science and research, National Gallery artist and artwork database, GIS database of the National Board of Antiquities and the National Library of Finland, Finto service.
We look for lecturers / instructors for our workshops taking place after summer!

Scott_WUaS updated the task description. (Show Details)Jul 2 2015, 10:12 PM

Qgil updated the task description. (Show Details)Jul 3 2015, 8:15 AM

Qgil updated the task description. (Show Details)Jul 3 2015, 1:12 PM

Qgil mentioned this in T104701: Create https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data project page (or equivalent).Jul 3 2015, 1:34 PM

Qgil updated the task description. (Show Details)Jul 3 2015, 1:37 PM

Ricordisamoa subscribed.Jul 3 2015, 6:48 PM

The SoaP project augments existing Wikidata items about paintings that already exist, and creates items in a structured way for paintings that exist in the real world, but are not yet on Wikidata. Many paintings have items because some Wikipedia somewhere has an article about the painting, or because it is in one of the metadata runs that Maarten has been doing. As he creates a body of items based on top museums starting with the GLAMs who have already donated to Commons, I have experimented with two artists to include their body of work as documented by art historians. This is effective as a way to measure the way we model painting items on Wikidata, but also as a way of testing the "findability" of items (I have merged many doubles) and discovering collections for Maarten to "datamine". I started with Frans Hals, and applied the same concept to Pieter de Hooch a few months ago. The result for Hooch is a list on Wikipedia (en/fr/nl) built with the assistance of "Listeria" that links directly to Wikipedia painting articles or Wikidata painting items (in that order of preference) here: https://en.wikipedia.org/wiki/List_of_paintings_by_Pieter_de_Hooch

Thanks for the clarification that "Wiki Loves Open Data [:)] should guide organizations through the process of contributing content to Wikidata" (https://phabricator.wikimedia.org/T104701) ... and World University looks forward to learning how this will work. I also recently posted my email of yesterday to you, Quim, and Jane here - http://scott-macleod.blogspot.com/2015/07/impatiens-species-structuring-world.html - in terms of WUaS's ~ 10 main foci/areas, as possible contributions to Wikidata, and establishing a framework to engage with data engineers and open data organizations.

Qgil moved this task from Backlog to Doing on the ECT-July-2015 board.Jul 8 2015, 9:57 AM

Ilario subscribed.Jul 13 2015, 9:48 AM

There is a list of open data and data science institutions at Meta:Innovation

Quim and All, Where does this project stand now? How might we best develop it further? Thank you, Scott

Qgil added a subtask: T107584: Enable Flow for Wiki Loves Open Data discussion page.Jul 31 2015, 3:13 PM

Qgil added a subtask: T107613: Request for data: sites traffic by topics/ subject areas and geographies.Jul 31 2015, 9:18 PM

• ezachte subscribed.Aug 5 2015, 10:41 AM

• DarTar added a subscriber: Daniel_Mietchen.Aug 6 2015, 2:29 PM

There are suggestions at https://meta.wikimedia.org/wiki/Innovation#Partnerships

@Rogol_Domedonfors, do you or someone you know have any contact with any of these organizations? Since all the organizations listed there so far are based in the UK, do you know whether someone at WMUK would be willing to act as common contact with open data orgs, in the lines of what WMFI is doing?

As explained in https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data, first we need to look for organizations that are willing to test and learn wth us. We are not ready yet for reaching out to multiple organizations without established contacts.

• SVentura closed subtask T107613: Request for data: sites traffic by topics/ subject areas and geographies as Resolved.Aug 14 2015, 5:47 PM

I may be able to help with some of the UK organisations: email me. I expect to be at the Wikipedia Science Conference in London, 2-3 September, which would be a suitable venue for a discussion.

@I9606, you added to the description

How to visualize a data model that spans multiple entity types within the context of wikidata?

Can you or someone else explain this, please?

@Richard_Pinch, hi! I'm not joining the Wikipedia Science Conference, but if other Wikimedians loving open data go, please do meet.

Qgil closed subtask T104701: Create https://www.wikidata.org/wiki/Wikidata:Wiki_Loves_Open_Data project page (or equivalent) as Resolved.Aug 17 2015, 1:53 PM

Qgil moved this task from Backlog to Doing on the ECT-August-2015 board.Aug 18 2015, 3:56 PM

Qgil closed subtask T107625: Wiki Loves Open Data subpages for organizations with continuous relationship as Resolved.Aug 21 2015, 12:22 PM

Now that the very basic documentation is in place, I'm a bit uncertain about the best next steps. I made a call for feedback and testers at the wikidata mailing list.

Apart from the fine tuning of these pages, what we need are more use cases, some organizations willing to go through their first steps to test this framework and identify better their needs. Wikimedia Finland, Wikimedia Belgium, and others have hinted at ongoing initiatives to collaborate with open data organizations. Can we help? Who else?

• johl subscribed.Aug 21 2015, 12:43 PM

• DannyH closed subtask T107584: Enable Flow for Wiki Loves Open Data discussion page as Resolved.Aug 22 2015, 12:07 AM

• RobLa-WMF subscribed.Aug 22 2015, 12:17 AM

Qgil mentioned this in T108935: Update Master Project List for Engineering Community.Aug 26 2015, 10:28 AM

Lokal_Profil subscribed.Sep 2 2015, 8:49 PM

Qgil moved this task from Backlog to Doing on the DevRel-September-2015 board.Sep 29 2015, 6:19 PM

This task is being resolved as a quarterly goal. The framework is in place for others to use it and improve it, based on real-life experiences with open data organizations. I have updated the description with some details.

Thank you to everybody involved!

• Phabricator_maintenance added a project: Goal.Aug 13 2016, 8:39 PM

Aklapper mentioned this in T926: Engage with established technical communities.Oct 5 2016, 1:25 PM

Goal: Establish a framework to engage with data engineers and open data organizationsClosed, ResolvedPublicActions