Profile
Name: Ege Atacan Doğan
Email: egeatacandogan@gmail.com
Wikimedia username: Egezort
Web Page: https://github.com/EgeAtacanDogan
User Page: https://www.wikidata.org/wiki/User:Egezort
Resume: https://drive.google.com/file/d/1HPd9v35Ggqy4p-VU7CQZUrbYFi8dyf75/view?usp=sharing Location: Würzburg, Germany Time Zone: UTC+1 (CET)
Typical working hours: 3:00 PM to 9:00 PM CET (flexible)
Synopsis
Note: Anthropic’s Claude Opus 4.6 was extensively used for the creation of this proposal, however all the responsibility of what was said here was checked and edited by me, and I bear the responsibility for all things said here.
Wikidata is one of the largest knowledge graphs on the web, and it has many inconsistencies in how entities are classified. The distinction between individuals, first-order classes, and metaclasses is frequently blurred: items appear simultaneously as instances and subclasses of the same class, metaclass order is routinely misassigned, and the sheer scale of the problem is too high for manual inspection by a small group.
This project implements a gamified web-based crowdsourcing tool where users classify Wikidata entities as individuals, classes, or metaclasses. Through a weighted consensus mechanism, high-agreement classifications are automatically committed to Wikidata, while ambiguous or difficult cases are routed to a structured "Expert Backlog" for review by experienced ontologists. The project emerges directly from my work with the Ontology Cleaning Task Force, where I have been collaborating with mentor Peter F. Patel-Schneider since January 2024.
The project will also provide a structured Expert Backlog for cases lacking significant consensus, or that users have specifically flagged for expert review.
Mentors: Peter F. Patel-Schneider, David Martin
Contacted? Yes, discussed in detail with Peter F. Patel-Schneider.
About the Project
The Problem
The Wikidata ontology is built collaboratively by thousands of contributors with varying levels of ontological expertise. This produces systematic classification errors:
Instance-and-Subclass Confusion: An item is marked as both an instance of (P31) and a subclass of (P279) the same class.
Metaclass Order Errors: Classes are not correctly typed with their ontological order. A class whose instances are all individuals should be an instance of first-order class (Q104086571), but this is often missing or wrong.
Excessive Depth: Chains of instance of relationships crossing metaclass boundaries multiple times, creating 3rd-order or higher metaclasses where they should not exist.
Concept Proliferation: Items like concept (Q151885) appear as high-level metaclasses in many subclass/instance trees, creating spurious metaclass levels (e.g., champagne → wine → liquid → fundamental state of matter → concept).
These errors break SPARQL queries, corrupt reasoning engines, produce misleading results in knowledge-graph-powered applications, and undermine trust in Wikidata. Brasileiro et al. (2016) identified these anti-patterns formally in "Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierarchies in Wikidata," and the Cleaning Task Force has confirmed they remain widespread.
Proposed Solution:
Core Classification Game
The game presents users with a Wikidata entity and its contextual information: label, description, existing P31/P279 statements, and a sample of its instances or subclasses. The user classifies the entity into one of three categories:
Individual — A concrete entity that is not itself a class (e.g., Douglas Adams, Q42).
First-Order Class — A class whose instances are individuals (e.g., human, Q5).
Metaclass — A higher-order class whose instances are themselves classes (e.g., ship type, Q2235308).
Each entity is presented to five independent users. The outcome depends on agreement:
5/5 unanimous: The classification is automatically applied to Wikidata via the API, with a bot-flagged edit summary referencing this tool.
4/5 majority: The classification is automatically applied, the person who marked the 1/5 option is notified, after which point they can say that they’re convinced, or escalate it to the experts.
3/5 slight majority: An Expert Ticket is created in the Expert Backlog for review by experienced ontologists.
Expert Backlog
The Expert Backlog is a structured review queue for entities lacking sufficient consensus, or that users have manually flagged. The experts will be a small (but ideally growing) group of people that understand the issue well enough to make decisions.
Everyone will have access to discussions on the Expert tickets, but actually committing will be up to the experts.
Gamification
Leaderboards: Ranked by total classifications, accuracy rate, and Expert Tickets resolved.
Achievements: Milestone badges
Streaks: Daily participation counters.
Educational Feedback: Brief explanations after each classification reinforce ontological concepts.
Technical Architecture (tentative plan)
Component
Technology
Frontend
React.js + TypeScript; responsive for desktop and mobile
Backend
Python (Flask or FastAPI); RESTful API
Database
PostgreSQL (users, votes, trust scores); Redis (cache, rate limiting)
Wikidata I/O
Wikidata REST API + SPARQL; wbeditentity for automated edits
Authentication
OAuth 2.0 via Wikimedia SUL
Deployment
Wikimedia Toolforge; Docker for local dev
Entity Pipeline
SPARQL candidate queries from WikiProject Ontology/Problems
Timeline
Period
Task
Apr 30 – May 25
Community bonding
Week 1–2 (May 25 – Jun 8)
Learning about Toolforge. Setting up the architecture of the game.
Week 3–4 (Jun 8 – Jun 22)
Creation of the tutorial for the Class Order.
Week 5–6 (Jun 22 – Jul 6)
Fetching problematic items and producing QS commands.
Week 7–8 (Jul 6 – Jul 20)
Implementing the Expert Backlog Ticket System.
Jul 14 – Jul 18
Midterm evaluation: A tool like Depictor is up online and playable. Gamification with scores and other player incentivisers is not set up yet. Rest of the project will be either that, or fixing up previous issues.
Week 9–10 (Jul 20 – Aug 3)
Introducing a point system. Tentative plans for achievement badges.
Week 11–12 (Aug 3 – Aug 18)
User Experience trials, either with real people, or simulated trials by myself.
Aug 25 – Sep 1
Finalisation and publicisation.
Every week I will post progress updates on my Wikidata User Page and get my work reviewed by mentors. This will help catch issues early before moving to the next phase.
Deliverables
Working web application deployed on Wikimedia Toolforge.
Entity selection pipeline using SPARQL queries from WikiProject Ontology/Problems.
Expert Backlog system with QuickStatements integration.
(Optional) Gamification system (leaderboards, badges, streaks, educational feedback).
Complete documentation: user guide, developer setup guide, API reference.
Participation
Weekly progress reports on my Wikidata User Page (User:Egezort).
Active daily on Wikidata and on Wikidata related Telegram channels.
Weekly video calls with Peter F. Patel-Schneider.
About Me
I am Ege Atacan Doğan, based in Würzburg, Germany. I graduated from Sabancı University (Istanbul, Turkey) in 2025 with a degree in Computer Science and a minor in Gender Studies (2019–2025). I joined Wikidata's WikiProject Ontology in January 2024 without knowing much about ontology. After attending several Cleaning Task Force meetings, I became deeply invested.
Experience and Contributions
All of my relevant experience is outlined in my Wikidata user page: https://www.wikidata.org/wiki/User:Egezort
Other Commitments
I will have some academic overlap during the coding period, approximately 10–20 hours per week of coursework. I will treat GSoC as my primary commitment and dedicate at least 25–30 hours per week to the project. My typical working hours are 15:00–21:00 CET, with flexibility to extend. I have no other internships, jobs, or vacation commitments during this period.
The second iteration of the Wikidata Ontology Course (co-run with Peter), and some of the Mereology Task Force work will overlap with the early GSoC period. But this is more complementary than problematic.
Post-GSoC
Toolforge deployment ensures zero-cost hosting within Wikimedia infrastructure.
Comprehensive documentation enables community maintenance and extension.
The entity selection pipeline is extensible: new SPARQL queries for newly discovered anti-patterns can be added without code changes.
The Expert Backlog creates a persistent record, which will be maintained.
I will continue maintaining the game and working with the Cleaning Task Force — this is not only a summer project for me, it is a continuation of work I have been doing for over two years.