Page MenuHomePhabricator

GSoC 2026: Gamifying constraint violation fixes on Wikidata
Open, Needs TriagePublic

Description

Project title:

Gamifying constraint violation fixes on Wikidata

Description of project:

Develop a game to make edits to Wikidata to fix constraint violations. The requirements for being a game here is not only that the tool is easy to use but also that there are methods that keep users engaged, such as scores, leaderboards, collaborations, and challenges. There may be aspects of community collaboration in some games.

There is a long-open Phabricator ticket to make more Wikidata games
https://phabricator.wikimedia.org/T165167

Constraints in Wikidata:

Wikidata has several kinds of constraints. One kind is the property constraint, a simple soft or hard rule on how a property should be used, such as the values of subclass of (property 279) have to be Wikibase items, i.e., not data values like integers, or father (property 22) should have only one "best" value. For more information on Wikidata property constraints see https://www.wikidata.org/wiki/Help:Property_constraints_portal. Other kinds of constraints in Wikidata come from the intended meaning of some of the items in Wikidata. For example, Wikidata has disjoint unions, so an individual that is an instance of two disjoint classes in Wikidata is a violation of the intended meaning of disjoint union.

Violations of constraints are thus places that show evidence of a potential breakdown in the way data should be modeled in Wikidata, and often are the result of incorrect information being put into Wikidata. Reducing the number of constraint violations thus improves the quality of Wikidata.

Because there are so many constraint violations in Wikidata, it is hard for small groups of editors to tackle the problem. Enlisting the efforts of more people via a game has the possibility of making significant reductions in the number of constraint violations.

Expected outcomes:

By the end of this project, the contributor will have developed a game, either based on currently existing software like the Distributed Game (https://wikidata-game.toolforge.org/distributed/) or inspired by it. The game will be designed to fix particular kinds of constraint violations in Wikidata. It will be possible to slightly modify the game to fix other issues that exist in Wikidata. The game may be used later on in outreach campaigns.

Bonus outcomes:

A great tool for introducing people to Wikidata.
A tool to help communities not currently engaged with Wikidata involved in editing Wikidata.

Skills required:

Understanding of Wikidata
Game Design
SPARQL proficiency
Proficiency in Python or another scripting language

Skills preferred:

Database management
Conducting reliability tests on user inputs
Ranking users according to skill

Bonus skills:

Community management
Graphic design

Possible mentor(s):
Peter F. Patel Schneider
David Martin

Size of project:

350 hours to complete

Rating of difficulty for the project:

Medium - The project requires the contributor to understand existing gamifying attempts at Wikidata, and either emulate them or build upon them. It also requires many different skills, such as database management, game design, reliability analysis for proposed changes,

Microtasks:

  1. Understanding what the Distributed Game does, as evidenced by a short report on how it works, is a great introduction to what games can do for Wikidata.
  2. Updating an existing game for the Distributed Game is an excellent task to get up to speed in game design, for example the New Wikiquote article and category matches (https://wikidata-game.toolforge.org/distributed/#game=90) could be updated to remove distracting bracketing in text.
  3. Making a minor update to the Distributed Game is a good task to understand how games can work internally.

Why are you proposing this project?

There are so many violations of Wikidata constraints (including both from Wikidata property constraints and constraints inferrable from the intended meaning of core Wikidata constructs like disjointness) that it is difficult for small groups of editors to keep up, or indeed even survey the extent of the problem for some kinds of violations. Games are one mechanism for encouraging large numbers of participants to fix problems like constraint violations. A game to fix constraint violations would help to reduce the number of constraint violations in Wikidata.

What is the expected impact?

The immediate impact would be the creation of a new game addressed as fixing problems in Wikidata. The desired eventual impact is a reduction in the number of constraint violations in Wikidata of the kind addressed by the game. Another desired eventual impact is the continued creation of similar games, which would expand the constraint violations being addressed.

Any other additional information that the interns should know about:

Closed Phabricator ticket that could be used as an example of an update to the Distributed Game https://phabricator.wikimedia.org/T252935

Open Phabricator tickets related to the Distributed Game
https://phabricator.wikimedia.org/T258067
https://phabricator.wikimedia.org/T253956
https://phabricator.wikimedia.org/T253839
https://phabricator.wikimedia.org/T210635

Event Timeline

Hi @Pfps thank you for submitting this proposal! A couple things:

  • Could you please add the Phabricator usernames for the mentors?
  • For the microtasks, are there any Phab tasks or specific examples on the Distributed Game that you can link?
Pfps updated the task description. (Show Details)
Pfps updated the task description. (Show Details)

I added links to the Phabricator pages of the mentors.
I added pointers to several Phabricator tasks related to the Distributed Game. These links can be used to find games implemented in the Distributed Game and other information that would be useful in the microtasks and throughout the project.

Hi @Pfps in case you missed the update on the parent task:
Unfortunately, the Wikimedia Foundation will not be participating in this year's Google Summer of Code program. We look forward to coming back in 2026. In the meanwhile, we will be participating in Outreachy round 30 this summer. You can propose a project here. Thank you for your patience and support.

Hi @Pfps, @LGoto and @DMartin-WMF,

I'm a student developer and I've made contributions to Scribe-Data. During my work there, I identified some issues in their codebase which resulted in duplication of data and unreliability. After discussion with the maintainers, I migrated it from static Wikidata dump to the live SPARQL query service to solve it.
I'm really interested in this project and the challenge of reliability analysis. Since it was originally drafted for GSoC 2025, I wanted to check if you’re planning to bring it back as a 350h project for GSoC 2026?
In the meantime, I'm setting up the Distributed Game environment to start working on the microtasks.

Edit: Fixed the bugs in T210635 (was the only open ticket) Code is at Github

This project is being revived for the 2026 GSoC.

Pfps renamed this task from GSoC 2025: Gamifying constraint violation fixes on Wikidata to GSoC 2026: Gamifying constraint violation fixes on Wikidata.Jan 12 2026, 2:46 PM
Pfps updated the task description. (Show Details)

Hi @Pfps thank you for the proposal. A couple questions:

  • Could you briefly explain in layman terms 1.) what is a constraint violation and 2.) why is it important to fix?
  • I'm not sure that microtask #2 is actually a small enough task, could you consider breaking this down a bit? I'd also encourage you to create the microtasks as subtasks of this one, so that candidates can contribute directly to them.

Hi @LGoto, @Pfps, @DMartin-WMF,

I am a full stack developer at Infosys limited and technical community organizer in WIkimedia ecosystem. I am very interested in this project for GSoC 2026. I possess the required and preferred skills for this project and I am particularly drawn to this game design challenge of making it engaging for users.
I am currently setting up the environment to work on the microtasks. Since there are no open Phabricator tickets, after completing the environment setup I will make a minor update to the Distributed Game to better understand how the game system works.

@Hridyesh_Gupta Thank you for your interest. It might be a bit early to start on the microtasks, as the period where potential contributors interact with mentors isn't for a while. As well, the topic will be getting some updates over the next little while.

Hi @Pfps @DMartin-WMF Reminder to please see my previous comment and update your task description. Thank you!

@Pfps Thanks for letting me know this. I'll wait for the interaction period to start and for your further updates on this project.

Thanks again!

Hi @Pfps thank you for the proposal. A couple questions:

  • Could you briefly explain in layman terms 1.) what is a constraint violation and 2.) why is it important to fix?
  • I'm not sure that microtask #2 is actually a small enough task, could you consider breaking this down a bit? I'd also encourage you to create the microtasks as subtasks of this one, so that candidates can contribute directly to them.

@LGoto I added a section on constraints and modified microtask to remove the "create new game" component, which I agree was too big. I also added an example task for this task. I expect to communicate with potential contributors to select this or another similar update.

@Pfps Thank you for the updates, much appreciated!

Hi, I run the "New Wikiquote article and category matches", "New Wikipedia article and category matches", and "Commons category matches" Wikidata games (connected to Pi bot activities). It's great to see this project taking place, the games definitely need some love and expansion!

I just got a pull request from @Harikrshnaa (I think?) at https://github.com/mpeel/wikicode/pull/17 to remove the brackets from the Wikiquote game, should I accept this, or leave this example open since it's listed in the project description?

If there's anything I can do to help with this project, do let me know. I'd love to see the three games I run expanded to cover more languages, for example - or a 'Commons category matches' version that uses extra checks (precise name matching, LLM pre-evaluation, etc.) to increase the likelihood of the suggested matches being good.

Yes, @Mike_Peel, I submitted that PR to address the bracketing issue.

After submitting, I’ve been considering if using a dedicated parser like mwparserfromhell would be a better fix than the current regex approach. I wanted to get your advice on whether you'd prefer keeping it lightweight with regex or if I should follow up with a version using the parser library?

Hi, I run the "New Wikiquote article and category matches", "New Wikipedia article and category matches", and "Commons category matches" Wikidata games (connected to Pi bot activities). It's great to see this project taking place, the games definitely need some love and expansion!

I just got a pull request from @Harikrshnaa (I think?) at https://github.com/mpeel/wikicode/pull/17 to remove the brackets from the Wikiquote game, should I accept this, or leave this example open since it's listed in the project description?

If there's anything I can do to help with this project, do let me know. I'd love to see the three games I run expanded to cover more languages, for example - or a 'Commons category matches' version that uses extra checks (precise name matching, LLM pre-evaluation, etc.) to increase the likelihood of the suggested matches being good.

I would say that you should go ahead if you think that this is a good change to make. I can either find a different simple change to a different game or point at the PR as an example of a change that has been made.

Hey @Pfps , I had been working on a game called "Pick the Discovery site" as an addition to the distributed game and I've finally tested and added it permanently. I've also included the report as part of the microtasks in the github repo.

Link to game : Pick the Discovery site
Link to Github : Github