Page MenuHomePhabricator

GSoC 2026: Gamifying constraint violation fixes on Wikidata
Open, Needs TriagePublic

Description

Project title:

Gamifying constraint violation fixes on Wikidata

Description of project:

Develop a game to make edits to Wikidata to fix constraint violations. The requirements for being a game here is not only that the tool is easy to use but also that there are methods that keep users engaged, such as scores, leaderboards, collaborations, and challenges. There may be aspects of community collaboration in some games.

There is a long-open Phabricator ticket to make more Wikidata games
https://phabricator.wikimedia.org/T165167

Constraints in Wikidata:

Wikidata has several kinds of constraints. One kind is the property constraint, a simple soft or hard rule on how a property should be used, such as the values of subclass of (property 279) have to be Wikibase items, i.e., not data values like integers, or father (property 22) should have only one "best" value. For more information on Wikidata property constraints see https://www.wikidata.org/wiki/Help:Property_constraints_portal. Other kinds of constraints in Wikidata come from the intended meaning of some of the items in Wikidata. For example, Wikidata has disjoint unions, so an individual that is an instance of two disjoint classes in Wikidata is a violation of the intended meaning of disjoint union.

Violations of constraints are thus places that show evidence of a potential breakdown in the way data should be modeled in Wikidata, and often are the result of incorrect information being put into Wikidata. Reducing the number of constraint violations thus improves the quality of Wikidata.

Because there are so many constraint violations in Wikidata, it is hard for small groups of editors to tackle the problem. Enlisting the efforts of more people via a game has the possibility of making significant reductions in the number of constraint violations.

Expected outcomes:

By the end of this project, the contributor will have developed a game, either based on currently existing software like the Distributed Game (https://wikidata-game.toolforge.org/distributed/) or inspired by it. The game will be designed to fix particular kinds of constraint violations in Wikidata. It will be possible to slightly modify the game to fix other issues that exist in Wikidata. The game may be used later on in outreach campaigns.

Bonus outcomes:

A great tool for introducing people to Wikidata.
A tool to help communities not currently engaged with Wikidata involved in editing Wikidata.

Skills required:

Understanding of Wikidata
Game Design
SPARQL proficiency
Proficiency in Python or another scripting language

Skills preferred:

Database management
Conducting reliability tests on user inputs
Ranking users according to skill

Bonus skills:

Community management
Graphic design

Possible mentor(s):
Peter F. Patel Schneider
David Martin

Size of project:

350 hours to complete

Rating of difficulty for the project:

Medium - The project requires the contributor to understand existing gamifying attempts at Wikidata, and either emulate them or build upon them. It also requires many different skills, such as database management, game design, reliability analysis for proposed changes,

Microtasks:

  1. Understanding what the Distributed Game does, as evidenced by a short report on how it works, is a great introduction to what games can do for Wikidata.
  2. Updating an existing game for the Distributed Game is an excellent task to get up to speed in game design, for example the New Wikiquote article and category matches (https://wikidata-game.toolforge.org/distributed/#game=90) could be updated to remove distracting bracketing in text.
  3. Making a minor update to the Distributed Game is a good task to understand how games can work internally.

Why are you proposing this project?

There are so many violations of Wikidata constraints (including both from Wikidata property constraints and constraints inferrable from the intended meaning of core Wikidata constructs like disjointness) that it is difficult for small groups of editors to keep up, or indeed even survey the extent of the problem for some kinds of violations. Games are one mechanism for encouraging large numbers of participants to fix problems like constraint violations. A game to fix constraint violations would help to reduce the number of constraint violations in Wikidata.

What is the expected impact?

The immediate impact would be the creation of a new game addressed as fixing problems in Wikidata. The desired eventual impact is a reduction in the number of constraint violations in Wikidata of the kind addressed by the game. Another desired eventual impact is the continued creation of similar games, which would expand the constraint violations being addressed.

Any other additional information that the interns should know about:

Closed Phabricator ticket that could be used as an example of an update to the Distributed Game https://phabricator.wikimedia.org/T252935

Open Phabricator tickets related to the Distributed Game
https://phabricator.wikimedia.org/T258067
https://phabricator.wikimedia.org/T253956
https://phabricator.wikimedia.org/T253839
https://phabricator.wikimedia.org/T210635

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I added links to the Phabricator pages of the mentors.
I added pointers to several Phabricator tasks related to the Distributed Game. These links can be used to find games implemented in the Distributed Game and other information that would be useful in the microtasks and throughout the project.

Hi @Pfps in case you missed the update on the parent task:
Unfortunately, the Wikimedia Foundation will not be participating in this year's Google Summer of Code program. We look forward to coming back in 2026. In the meanwhile, we will be participating in Outreachy round 30 this summer. You can propose a project here. Thank you for your patience and support.

Hi @Pfps, @LGoto and @DMartin-WMF,

I'm a student developer and I've made contributions to Scribe-Data. During my work there, I identified some issues in their codebase which resulted in duplication of data and unreliability. After discussion with the maintainers, I migrated it from static Wikidata dump to the live SPARQL query service to solve it.
I'm really interested in this project and the challenge of reliability analysis. Since it was originally drafted for GSoC 2025, I wanted to check if you’re planning to bring it back as a 350h project for GSoC 2026?
In the meantime, I'm setting up the Distributed Game environment to start working on the microtasks.

Edit: Fixed the bugs in T210635 (was the only open ticket) Code is at Github

This project is being revived for the 2026 GSoC.

Pfps renamed this task from GSoC 2025: Gamifying constraint violation fixes on Wikidata to GSoC 2026: Gamifying constraint violation fixes on Wikidata.Jan 12 2026, 2:46 PM
Pfps updated the task description. (Show Details)

Hi @Pfps thank you for the proposal. A couple questions:

  • Could you briefly explain in layman terms 1.) what is a constraint violation and 2.) why is it important to fix?
  • I'm not sure that microtask #2 is actually a small enough task, could you consider breaking this down a bit? I'd also encourage you to create the microtasks as subtasks of this one, so that candidates can contribute directly to them.

Hi @LGoto, @Pfps, @DMartin-WMF,

I am a full stack developer at Infosys limited and technical community organizer in WIkimedia ecosystem. I am very interested in this project for GSoC 2026. I possess the required and preferred skills for this project and I am particularly drawn to this game design challenge of making it engaging for users.
I am currently setting up the environment to work on the microtasks. Since there are no open Phabricator tickets, after completing the environment setup I will make a minor update to the Distributed Game to better understand how the game system works.

@Hridyesh_Gupta Thank you for your interest. It might be a bit early to start on the microtasks, as the period where potential contributors interact with mentors isn't for a while. As well, the topic will be getting some updates over the next little while.

Hi @Pfps @DMartin-WMF Reminder to please see my previous comment and update your task description. Thank you!

@Pfps Thanks for letting me know this. I'll wait for the interaction period to start and for your further updates on this project.

Thanks again!

Hi @Pfps thank you for the proposal. A couple questions:

  • Could you briefly explain in layman terms 1.) what is a constraint violation and 2.) why is it important to fix?
  • I'm not sure that microtask #2 is actually a small enough task, could you consider breaking this down a bit? I'd also encourage you to create the microtasks as subtasks of this one, so that candidates can contribute directly to them.

@LGoto I added a section on constraints and modified microtask to remove the "create new game" component, which I agree was too big. I also added an example task for this task. I expect to communicate with potential contributors to select this or another similar update.

@Pfps Thank you for the updates, much appreciated!

Hi, I run the "New Wikiquote article and category matches", "New Wikipedia article and category matches", and "Commons category matches" Wikidata games (connected to Pi bot activities). It's great to see this project taking place, the games definitely need some love and expansion!

I just got a pull request from @Harikrshnaa (I think?) at https://github.com/mpeel/wikicode/pull/17 to remove the brackets from the Wikiquote game, should I accept this, or leave this example open since it's listed in the project description?

If there's anything I can do to help with this project, do let me know. I'd love to see the three games I run expanded to cover more languages, for example - or a 'Commons category matches' version that uses extra checks (precise name matching, LLM pre-evaluation, etc.) to increase the likelihood of the suggested matches being good.

Yes, @Mike_Peel, I submitted that PR to address the bracketing issue.

After submitting, I’ve been considering if using a dedicated parser like mwparserfromhell would be a better fix than the current regex approach. I wanted to get your advice on whether you'd prefer keeping it lightweight with regex or if I should follow up with a version using the parser library?

Hi, I run the "New Wikiquote article and category matches", "New Wikipedia article and category matches", and "Commons category matches" Wikidata games (connected to Pi bot activities). It's great to see this project taking place, the games definitely need some love and expansion!

I just got a pull request from @Harikrshnaa (I think?) at https://github.com/mpeel/wikicode/pull/17 to remove the brackets from the Wikiquote game, should I accept this, or leave this example open since it's listed in the project description?

If there's anything I can do to help with this project, do let me know. I'd love to see the three games I run expanded to cover more languages, for example - or a 'Commons category matches' version that uses extra checks (precise name matching, LLM pre-evaluation, etc.) to increase the likelihood of the suggested matches being good.

I would say that you should go ahead if you think that this is a good change to make. I can either find a different simple change to a different game or point at the PR as an example of a change that has been made.

This comment was removed by Harikrshnaa.

@Pfps, I've been contributing to The Distributed game to really understand how gamification works and I have 7 merged PRs so far. Going ahead, I'm thinking of working on this issue - adding user text input support to the distributed game framework. Right now, the game only accepts predefined buttons which limits what violations can be fixed. With an input type, the game could fix format constraint violations like malformed postal codes (P281, around 1970 violations) I identified. I'd implement the feature and build a demo game alongside it.

Does this direction sound valuable ? Happy to adjust scope based on your feedback!

After discussion with the org admins and mentors, we’re pleased to share that early contributions are more than welcome for this project. There’s no need to wait until March 16 to start working on code or submitting your microtask patches; feel free to dive in whenever you’re ready.

We look forward to seeing your ideas and contributions. Happy coding!

This comment was removed by Harikrshnaa.

Hello,

I would really like to be a contributor for this project. I believe that this sort of game is what really helps the community join in efforts to do systematic fixes.

My approach would be like the following:

  1. The user first sees explanations and examples of the situation that needs resolution.
  2. The user than sees dummy tests that have no effect, and needs to choose the correct options.
  3. After this process ends, the user starts giving actual answers that are recorded somewhere.
  4. In order not to bloat Wikidata with bad answers, the answers are only committed if a few people independently reached the same results. (Things that were controversial are sent to a different set, in which experts deal with those issues.)

So this type of game could not only be useful for mundane tasks, but even more complicated tasks. I hope to be a contributor to this project so that I can implement this system.

@Pfps , Just checking in on the ideas list doc. I wanted to make sure you have everything needed from me. Happy to clarify anything, No rush at all!

@Pfps , Just checking in on the ideas list doc. I wanted to make sure you have everything needed from me. Happy to clarify anything, No rush at all!

I'm working through the backlog. Expect responses shortly.

Sounds good, looking forward to your thoughts.

Hello!
I'm Raushan. I have been an active contributor to the Wikimedia ecosystem with 4 patches already merged on Gerrit. I am very interested in the 'Gamifying Constraint Violation Fixes on Wikidata' project.
I have reviewed the requirements and am currently familiarizing myself with the codebase. I would love to hear your thoughts on the best microtask to tackle to better understand the technical environment and demonstrate my readiness before I draft my initial proposal for this project.

I'm participating in the https://www.mediawiki.org/wiki/Wikimedia_Hackathon_Northwestern_Europe_2026 right now, and in that, I made a small tutorial page for a tool called Depictor. I'm sharing it here now: https://huggingface.co/spaces/egezort/TutorialHackathon. This could be considered a microtask for this.

So, for the technical issues that I'd want users to understand well, there would be more questions in this type of a tutorial. I assume that people would understand class order, disjointness etc. if they succesfully solved 15-20 questions that teach these. If I'm selected, in my efforts, I'll do tutorials like these.

@Pfps , Could you please provide any feedback to the ideas list document I shared before ?

Sorry all. Due to some other issues, I have not been adequately responsive. But there is light at the end of the tunnel. I'll get through all the backlog today.

Also, I would like to talk this week to anyone who wants to put in a proposal. It is in your best interest to talk to me so that I can better understand you. I particularly want to find out your background and interest in Wikidata. I expect the calls to be about 1/2 hour, but you could ask for more time. I'm available most times between 6am and 10pm EDT (UTC-4). Send me email at pfpschneider@gmail.com and suggest a couple of times.

Hi @Pfps, @LGoto and @DMartin-WMF,

I'm a student developer and I've made contributions to Scribe-Data. During my work there, I identified some issues in their codebase which resulted in duplication of data and unreliability. After discussion with the maintainers, I migrated it from static Wikidata dump to the live SPARQL query service to solve it.
I'm really interested in this project and the challenge of reliability analysis. Since it was originally drafted for GSoC 2025, I wanted to check if you’re planning to bring it back as a 350h project for GSoC 2026?
In the meantime, I'm setting up the Distributed Game environment to start working on the microtasks.

Edit: Fixed the bugs in T210635 (was the only open ticket) Code is at Github

Hi:

Thanks for the bug fix. I looked at the change that you made. It shows good understanding of the game.

@Pfps , I've drafted an initial idea list document for this project, could you please take a look ?

I'd appreciate any feedback you have regarding any ideas I outlined

Document link

@Harikrshnaa I took a good look at the ideas. This looks to be what is needed. The major issue I have is scope - this might be too ambitious. See below for some specific comments.

You describe several property constraints and two other kinds of constraints. If you are going to have multiple mechanics I would say to just start with some of the property constraints. The others would be later additions. It is probably more difficult to create a good interface for class order violations so that would be last.

I suggest providing a road map of what will be done first, either something like multiple kinds of property constraints with legacy pick, or one or two kinds of property constraints with legacy plus one other interaction. That gives a path where a useful result is obtained without doing everything.

The idea of only making changes with some sort of consensus is the right way to go.

I think it might be useful to award to kinds of points - one for raw submission volume and one for consensus. Maybe even several kinds of consensus points - one for just for consensus and one for sufficient consensus that a change is made. There probably also should be a separate interface that allows non-game vetting of changes by trusted operators.

The problem is whether consensus would require a complete new code base.

I would make the collaboration part an addon, not something done initially. Template creation is also something to be done only if the first parts succeed.

The Wikidata Academy idea is interesting but probably out of scope.

Hello,

I would really like to be a contributor for this project. I believe that this sort of game is what really helps the community join in efforts to do systematic fixes.

My approach would be like the following:

  1. The user first sees explanations and examples of the situation that needs resolution.
  2. The user than sees dummy tests that have no effect, and needs to choose the correct options.
  3. After this process ends, the user starts giving actual answers that are recorded somewhere.
  4. In order not to bloat Wikidata with bad answers, the answers are only committed if a few people independently reached the same results. (Things that were controversial are sent to a different set, in which experts deal with those issues.)

So this type of game could not only be useful for mundane tasks, but even more complicated tasks. I hope to be a contributor to this project so that I can implement this system.

@Egezort This approach is definitely along the right lines. A proposal that fleshes out these ideas would fit into the project. This project has multiple places where there are choices to be made and there is not a single success criterion. You should try to write a proposal that can be considered a success without having everything in the proposal completed. Your proposal should say what the various success points could be.

This comment was removed by Pfps.

The guidance from Google in https://google.github.io/gsocguides/student/writing-a-proposal is to have deliverables and timelines in your proposals. This is a good idea, but it is possible to go too far in this area. What we are looking for is a sense that you can break down the overall project into several pieces, each probably including design, coding, and documentation. What is important is a plan to have something that can be evaluated at the midpoint. That doesn't have to be a full system, but there should be some coding involved.
This project has areas where you can decide how much or how little to do and still have a working result. That may make it different from other GSOC projects. It is a good idea to make pieces of your proposal optional, so that at the end you (and we) can claim victory even if everything in the proposal is not implemented.
In the end, a big part of GSOC is to get people interested in open-source projects. In my view it's a win for GSOC if you end up doing significant open-source work in the future, even if not all of your proposal ends up being implemented.

As far as I am concerned, the minimal deliverable at the end of the project is a playable game that implements fixes to some kinds of constraint violations. It is in your interest to have a proposal that can support this minimal deliverable, and also has significant optional parts.

I'm not looking for a daily or even weekly timeline in your proposal. What I am looking for is a breakdown of the overall task into a few pieces that can be tracked. I am also looking for one or more pieces that can be done by the midterm of the coding period. If your proposal is selected we will be using the familiarization period to further refine the work to be done.

Thank you so much for the detailed feedback,

Actually, I wrote the ideas list as a compilation of all the research I did and the ideas I had come up with. It was not intended as a commitment to implement everything. My actual proposal will be focused around a core playable game with some constraint types, and more ambitious features marked as optional add-ons which can be worked on if time allows.

I'll structure the draft proposal around midterm and final deliverables, and send some time slots to further discuss this on a call

This comment was removed by Egezort.

Hello! I am Meghana Madathanapalli, a 3rd year B.Tech Computer Science student specializing in Cybersecurity and Quantum Technology at SPMVV, Tirupati, India. I am interested in applying for this GSoC 2026 project.
I have experience with Python and I find the idea of gamifying Wikidata constraint fixes very interesting — combining data quality improvement with engagement mechanics like leaderboards and scores is a creative approach I would love to work on.
I would like to know: what would be the best first contribution I can make to get familiar with the codebase? I am also happy to discuss my approach for the proposal.
Thank you for your time!
— MeghanaMadathanapalli

Hi Meghana: The deadline for proposals is next week so you are starting very late in the process. By now you should have tried out some of the suggestions in the initial comment. If you want to proceed we can set up a call as described in my earlier comment.

Hello! My name is Arina, and I’ve just submitted a GSoC 2026 proposal for the project "Gamifying constraint violation fixes on Wikidata".

I’m very interested in this idea and especially excited about combining Wikidata editing with game mechanics such as scoring, challenges, and collaboration.

I have experience with Python and I am currently deepening my knowledge of SPARQL and Wikidata. I would also be happy to contribute ideas on how to make the system both engaging for users and useful for improving data quality.

I would greatly appreciate any feedback on my proposal, and I’m ready to iterate and improve it if needed.

Thank you for your time!

Best regards,
Arina

Hi Arina: By this time you should have been interacting with us, the potential mentors, for quite some time and have done some of the microtasks. We will evaluate your proposal even so, but iteration at this late date is not likely.

Hi Arina: By this time you should have been interacting with us, the potential mentors, for quite some time and have done some of the microtasks. We will evaluate your proposal even so, but iteration at this late date is not likely.

Hi! Thank you for your feedback — I understand that I joined late.

Even so, I’m very motivated to contribute and I’m actively exploring Wikidata constraints and the Query Service to understand the types of tasks involved.

I would still really appreciate any small suggestions or pointers on where I could focus, even if briefly.

Thank you for your time!

Hi Pfps, thank you for your response! I apologize for the late start. I have been studying the Wikidata constraint violations and SPARQL queries. I would love to set up a call as you mentioned. I am available any time this week. Please let me know a convenient time.
— MeghanaMadathanapalli

Hi Pfps, thank you for your response! I apologize for the late start. I have been studying the Wikidata constraint violations and SPARQL queries. I would love to set up a call as you mentioned. I am available any time this week. Please let me know a convenient time.
— MeghanaMadathanapalli

Hi! Thanks for your feedback.

I understand that I joined a bit late, but I’ve been actively exploring Wikidata items, including how statements like "instance of" and "subclass of" are structured, and how related entities are connected.

I’ve also started looking into constraint violations and how they might be resolved in practice. I’m continuing to learn and would really appreciate any guidance on where I can contribute most effectively at this stage.

Hi Meghana: If you email me at pfpschneider@gmail.com we can set up a meeting in the afternoon, US East Coast time.

Hello pfps, I am ready to submit my proposal bu the GSoC dashboard lists Medium as 175 hours and Large as 350 hours but this project is specified as 350 hours and medium. I was wondering which to choose.

This project is medium difficulty and large size.

This project is medium difficulty and large (350 hours) in scale.

This project is medium difficulty and large size.

@Aklapper Is it possible that there is a mistake on the GSoC dashboard?

Hello,
I saw the discussion about the project size and possible mistake on the GSoC dashboard.
Could you please confirm if the correct size is 175 hours or 350 hours?
Thank you!

@Pfps I'm not sure why you ask me; I have nothing to do with running GSoC :)

@Pfps I'm not sure why you ask me; I have nothing to do with running GSoC :)

Oops, sorry.

@Pfps @RaushanCode @arina_salenko Hi all, apologies for the confusion with the GSoC dashboard. Can someone show me where you are seeing the size and hours? I'm unable to find this to check the details. Thanks!

Hi @LGoto, I picked Large on the dashboard. I was just following the GSoC guide's breakdown (350h for Large / 175h for Medium), and @Pfps confirmed that Large fits what we’re planning for the project. I also put '350 hours' on the PDF just to be safe! Link for ref: https://google.github.io/gsocguides/student/time-management-for-students