Page MenuHomePhabricator

Extending the WikidataComplete plugin for enabling data donations, recommendations, and gamification
Open, Needs TriagePublic

Description

Open data collections -- like Wikidata -- are created and maintained by volunteers and are thriving on community knowledge. Missing or new information needs to be added by community members. Otherwise, the knowledge base will dry out and be obsolete after a while. Hence, changing and completing a knowledge base is a crucial task within any community-driven initiative. Consequently, we consider helping people to integrate their knowledge into a knowledge base as a crucial task and defined it as the long-term goal of our initiative.

Considering the editing process, users are enabled to use the Wikibase editor to change the data. However, most of the data is published in unstructured textual form. The main sources for new or updated facts are Wikipedia and other HTML pages like News portals. Obviously, it is not easy for Wikidata editors to identify these pieces of information, and even if they do, then integrating the discovered information is a time-consuming process. We value the time of Wikidata editors a lot, so we define increasing the efficiency of the editing process as our goal.

In the past, other approaches tried to tackle the obstacles. For example, the Wikibase plugin Recoin [1,2] was developed to suggest to editors typical properties while browsing an entity that is present in similar entities but is missing within the current entity. In our earlier research [3] we already showed that it is possible to find outliers in graph-based knowledge bases which need to be checked by experts to ensure the data quality. The WikidataComplete fact extraction implementation [4,5] shows how facts are extracted from text documents and offered to users for validation (outside of Wikidata). Recently we published the WikidataComplete plugin [6] that was developed during a Google Summer of Code 2021 project. There it is already possible to point Wikidata editors to new facts and prepare the forms within the Wikidata UI for almost instant evaluation and confirmation (cf. the presentations at [7] and the YouTube tutorial at [8]). The plugin already helped to integrate many new facts into Wikidata.

Screenshot 2022-02-12 at 17.56.47.png (1×3 px, 444 KB)

As the community has reached out to us regarding additional data sources for facts, we propose here an extension of the WikidataComplete plugin towards an infrastructure where data donations can be stored and therefore provided to Wikidata editors. This WikidataComplete ecosystem will consist of Web service interfaces to push data from different sources, s.t., researchers, and industry can easily integrate it into their systems and efficiently help to extend the Wikidata knowledge base. Consequently, while opening the raw data collection process for additional contributors we also should integrate an improved recommendation process to point an editor to a fact that he/she would likely be capable of approving (because it’s in the editor’s scope of expertise). Additionally, we should use the information to show appreciation to active editors while establishing a leaderboard and other gamification extensions.

Goals
global: Improve the editing process for changing/extending Wikidata facts
Implement Webservice interfaces for data donations, i.e., sets of approvable facts (which should also contain links to evidence, etc.)
optional: Implement a recommendation engine that suggests agreeable facts to Wikidata editors that are likely to be in their area of expertise.
optional: Extend the gamification features to show appreciation to Wikidata editors, e.g., a badge Web service allowing users to integrate their score/rank into their profiles of social networks (e.g., on Wikidata’s user page, GitHub profile, Linkedin profile) to show their dedication and activate other users.

Impact
Data donations are possible and at the same time, an efficient approval process is supported.
More fact ingestions with the help of the Wikimedia community.
Faster integration of new facts into Wikidata.

The long-term impact of our project is an increase of editors which finally will lead to a stronger Wikidata knowledge base through better integration of additional data sources as well as more editors. Hence, the quality and quantity of the open data offered by Wikidata will increase.

Warm-up tasks
Activate the WikidataComplete plugin [6,7] in your Wikibase account to work with the current implementation
Select 3 Wikidata entities and manually find missing facts based on external data sources
Checkout this tool https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Understand the Wikidatacomplete UI and APIs [4,5] which is currently considered to be the (default) data donator
Get familiar with the data structure available in Wikidata
Set up a MediaWiki development environment

Mentors
Dennis Diefenbach
Andreas Both
Kunpeng Guo
Aleksandr Perevalov

The project size can follow the medium-sized (~175 hours) and large (~350 hours) format. However, we prefer the large format as it provides more opportunities to increase the impact.

Keywords
Data Editing, Data Quality, Data Curation, Knowledge Graph completion, Recommendation Engine, Gamification, Machine Learning, Natural Language Processing, Open Data

[1] https://www.wikidata.org/wiki/Wikidata:Recoin
[2] Vevake Balaraman, Simon Razniewski, Werner Nutt. Recoin: Relative Completeness in Wikidata. Wiki Workshop at The Web Conference 2018
[3] Didier Cherix, Ricardo Usbeck, Andreas Both, and Jens Lehmann (2014). Lessons learned—the case of crocus: Cluster-based ontology data cleansing. In European Semantic Web Conference (pp. 14-24). Springer, Cham.
[4] https://wikidatacomplete.org/
[5] Bernhard Kratzwald, Guo Kunpeng, Stefan Feuerriegel, and Dennis Diefenbach. IntKB: A Verifiable Interactive Framework for Knowledge Base Completion. International Conference on Computational Linguistics (COLING), 2020
[6] https://www.wikidata.org/wiki/Wikidata:WikidataComplete
[7] Presentation at Wikidata Quality days 2021: https://docs.google.com/presentation/d/1Hb5q5a2CC2XgXk_lvTkAQrc9SBtvi0MhBUwwAXbn5m0/edit?usp=sharing
[8] WikidataComplete Plugin Tutorial: https://www.youtube.com/watch?v=Ju2ExZ_khxQ

Event Timeline

srishakatux added a subscriber: srishakatux.

@DD063520 Thank you for creating this task and for willing to mentor :) Would it be possible for you to add the details of the project also here https://www.mediawiki.org/wiki/Google_Summer_of_Code/2022#Ideas_for_projects? Thanks!

Hello!
What are the skills required for this GSOC 2022 project?
@srishakatux @DD063520

We are searching a full stack profile. The backend can be written either in java, using spring boot or in python using flask. For the front-end we prefer React. We are not necessarily searching for someone with exactly this skills, but with the ability to adapt quickly to these frameworks .....

@DD063520
I am interested in this project.
I know Javascript,React and Python also. I will learn flask in 2-3 days.
So,Can you add me in community channel (Slack or Zulip) of this project where discussion related to this project is going on?

Hi all! Our mentoring org application is under review. We are being asked by Google to provide additional information on projects - expected size of the project: 175 or 350 hours and difficulty rating: easy, medium, or hard. Based on my analysis, I have added two data points to each project on the GSoC 2022 page on MediaWiki.org. If you disagree with it, feel free to make additional changes.

Hello I am interested in contributing to this project for GSOC 2022 , I have the knowledge of frontend as well as backend and have worked with java as well as python(Flask) for backend in many of my projects. How can i start contributing to it , Is there any test or prerequisite for the project?

Hi Everyone,

we are not starting the project yet, but are following the google summer of code project. If you are interested, apply for it : )

Salut
D06320

Hi, @DD063520 It's Lalit from India. I have been working in Python/Django for some time.

Read that Wikidata is maintaining a large amount of structured data. Which is used by Siri and Alexa also brought my interest in this project.

I am currently exploring Wikidata and found the Wikidatacomplete plugin helpful for editors. As the description suggests extending it to work with more sources like researchers and industry will be great for the Wikidata knowledge base.

Also, The gamification part sounds like an excellent idea to me. It will be very encouraging for the contributors.

Hi @DD063520 Done with the warm up tasks.
And I am Compatible with Flask and React both and been developing different web apps using them.
Example Using React -: https://infallible-goldstine-a881bf.netlify.app/
Example Using Flask -: https://farm-cyield.herokuapp.com/
I am a student of IIIT, Pune and want to contribute for Wikimedia foundation, if there's anything more I can do to prove my capability for this project do reply me. Thanks in advance :)

Hi everyone, if the project will be approved we will contact you for a small interview. Express your interest here. Sorry for not being more proactive but this is the process ....

@srishakatux: anything we still need to do from an organisational point of view?

Hi everyone, if the project will be approved we will contact you for a small interview. Express your interest here. Sorry for not being more proactive but this is the process ....

Sure @DD063520, I am interested in this project. Do you have any dates in mind for the interviews?

Dear GSoC interested parties,

we are happy for your interest into the project.

If you haven’t contacted us directly yet, please join this Slack Channel.

There, create a new channel and invite the four people mentioned above.

We will look at your questions about the project and the application process individually for each of you.

Looking forward to discussing with you,
Andreas et al.

Hi @AnBo-de,
There's some problem when I try to join the slack channel you mentioned above.

Screenshot (4).png (768×1 px, 104 KB)

please guide me

Hi @AnBo-de it is showing the same error to me sutharlalit.97@gmail.com doesn’t have an account on this workspace.

@srishakatux: anything we still need to do from an organisational point of view?

Everything looks good! One thing, though, is that in addition to the warm-up tasks, it might be helpful to add more concrete tasks that are easy, self-contained and small for which folks can make code contributions. You can check out other GSoC projects for reference.

Hi @AnBo-de,
There's some problem when I try to join the slack channel you mentioned above.

Screenshot (4).png (768×1 px, 104 KB)

please guide me

Sorry for having published the wrong link. Please use https://join.slack.com/t/gsoc2022exten-baz6139/shared_invite/zt-16nfejcmr-hRT3vsbm9ZuekYHDTzIpYA to join.

Hi @AnBo-de it is showing the same error to me sutharlalit.97@gmail.com doesn’t have an account on this workspace.

Sorry for having published the wrong link. Please use https://join.slack.com/t/gsoc2022exten-baz6139/shared_invite/zt-16nfejcmr-hRT3vsbm9ZuekYHDTzIpYA to join.

No problem, the new link worked fine thanks :)

Hi! I am very interested in this project. I learned Python, Javascript, and React. I know flask and I am learning it right now since my course needs this.
So, can you add me to the community channel of this project? Thank you in advance!

Hi! I am very interested in this project. I learned Python, Javascript, and React. I know flask and I am learning it right now since my course needs this.
So, can you add me to the community channel of this project? Thank you in advance!

Hi @Carina920 you can join slack space for this project through https://join.slack.com/t/gsoc2022exten-baz6139/shared_invite/zt-16nfejcmr-hRT3vsbm9ZuekYHDTzIpYA.
Also, you can get in touch with the Wikimedia community through https://wikimedia.zulipchat.com/ to keep yourself updated with the announcements related to GSoC.