Page MenuHomePhabricator

GSOC 2021 Proposal: Gamified Knowledge Base Completion Plugin for Wikibase/Wikidata
Open, Needs TriagePublic

Description

Profile Information

Name: Dhairya Khanna
GitHub: Dhairya3124
LinkedIn: Dhairya Khanna
Zulip: Dhairya Khanna
Location :New Delhi, India
Typical working hours 12PM to 3 AM UTC+5.30

Synopsis

  • Short summary describing your project and how it will benefit Wikimedia projects

Wikidata is a central storage repository that can be accessed by others, such as the wikis maintained by the Wikimedia Foundation.

The project Gamified Knowledge Base Completion Plugin for Wikibase/Wikidata will help lots of new and existing users to contribute more for knowledge completions. Personally, when I tried completing data manually for some articles I found difficulty gathering facts for the articles about their properties. Using the Recoin plugin we can get the data for the properties which are yet empty and important but the plugin does not give the recommendations or facts from other knowledge bases or the internet.
By creating a plugin which completes the data and asks users if it should be updated or not which will make the data completion for any item will make it more easier and efficient to update data.Through this project it can attract a large number of people to update facts and information regularly and increase in number of contributions.
Additionally, implementing a badge service will act as a rewarding system which will increase the number of users to edit the Wikibase so that the collections of the information can be regularly updated and accurate. It will additionally also help other wiki databases(such as Wikipedia) in regular updation of data and editing of data more fun.

  • Have you contacted your mentors already?

Yes I have contacted them using Phabricator comments.

Deliverables

Describe the timeline of your work with deadlines and milestones, broken down week by week. Make sure to include time you are planning to allocate for investigation, coding, deploying, testing and documentation
We can implement the Gamified Wikibase completion using Recoin and Wikidata Complete APIs. As we can see, Recoin fetches the data from Tool Forge servers database by giving the properties which are empty and their frequencies by gaining end points for the empty properties using SPARQL.
Working of Recoin Plugin:

Recoin Working.jpg (1×1 px, 111 KB)

Recoin Levels according to the frequencies:
Recoin Levels.jpg (1×1 px, 140 KB)

Now after fetching the data we can use the KB completion algorithm to find missing facts for the specific property of an item using NLP(using KB completion from text corpuses and link completion).
Then if the user says Yes then the request for change of data will be sent back to SPARQL for completion of database. If the user says No then there will be no change in updation of the property.

Then for the badge service I’ll be implementing the Wikidata Account through which it can count the contributions for an item in the Wikidata base. We can use
Extension Auth to calculate the total contributions for the wikidata contributions and then make badge service like 0 to 15 contributions will make it achieve the beginner batch.
Then by making a file badge.js we can use it to assign the badge for the number of contributions.
Contributions Count: Badge Assigned
0-15: Beginner
16-50: Intermediate
51+ : Expert
TIMELINE:

  • 30th March to May 16,2021:
    1. Learn more about SPARQL queries to save the data in the required databases and how it works with MediaWiki Environment.
    2. More in depth study of WikiData Complete APIs and finding out any other efficient way for completion of data for Wikidata using the Wikidata Complete APIs.
  • May 17,2021 to June 7,2021(Community Bonding Period)
    1. Understanding about the WikiMedia Organisation.
    2. Start working on the implementation and discussion about the project with the mentors.
    3. Completion of Blog posts for Community Bonding Period.
  • June 7,2021 to June 14,2021(Week 1) and June 15,2021 to June 22,2021(Week 2)
    1. Started working on the wikicompletion.js file for implementation of the completion of data.
    2. Working with php scripts for the user interface part for the plugin.
    3. Writing test cases for the working of the plugin.
  • June 23,2021 to June 30,2021(Week 3) and July 1,2021 to July 7,2021(Week 4)
    1. Connecting through APIs for completion of data using KB algorithms through generation of endpoints of SPARQL.
    2. Connection of databases to the plugin for updation using SPARQL queries.
    3. Working upon updation of data in the databases.
    4. Testing and Documentation
  • July 8,2021 to July 14,2021(Week 5)
    1. Phase 1 Evaluations
    2. Fixing Bugs and errors
    3. Documentation
    4. Updating my blog posts in Bi-weekly reports page.
  • July 15,2021 to July 21,2021(Week 6) and July 22,2021 to July 29,2021(Week 7)
    1. Started working upon Badge services.
    2. Fetching the data from wikidata account and counting number of contributions
    3. Testing and fixing the bugs for the errors encountered.
  • July 30,2021 to August 6,2021(Week 8) and August 7,2021 to August 14,2021(Week 9)
    1. Completion of remaining tasks and implementation of final badge services.
    2. Evaluation of Code with Mentors
    3. Documentation
    4. Testing
    5. Updating my blog posts in Bi-weekly reports page.
  • August 16,2021 to August 23,2021(Week 10)
    1. Correction of code and documentation after evaluation by mentors.
    2. Final Phase
  • August 23,2021 to August 30,2021(Week 11)
    1. Final Mentor Evaluation
  • August 31,2021(Week 12)
    1. Results Announced

Participation

  • I have joined ZulipChat for wikimedia.
  • I’ll regularly update my progress through phabricator.
  • I’ll regularly communicate with my mentors and ask for help as required.
  • I’ll submit my periodic weekly blog posts in Bi-weekly reports page.
  • I will publish the final summary of my work at the end of my coding period on my blog.
  • I’m going to be online on ZulipChat and active on Gerrit during working hours.

About Me

Tell us about a few:

  • Your education (completed or in progress)

I am currently pursuing a Bachelor's of Technology in Computer Science at Maharaja Agrasen Institute of Technology, New Delhi,India and I’m in my sophomore year.

  • How did you hear about this program?

I heard about this program at a workshop about “Introduction to Open Source” in my college. One of my college seniors talked about participating in this program and open source which also inspired me to explore the world of open source.

  • Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?

I’ll be able to dedicate 40 hours per week to this project and more if required.
Since this semester is fully online and my college classes are early in the morning I can dedicate all my time for this project. In my summer vacations, I’ll dedicate all my time to this project only.

  • We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

I’m only applying for Google summer of code and in 1 project under the wikimedia organisation.

  • What does making this project happen mean to you?

This project means a lot to me and to the whole Wikimedia community as it is a growing community.I have seen lots of people creating communities discussing facts and discussion related to completion of data in wikidata.The project that I’m working will fetch resources from external websites will make data completion more easier and efficient which will also attract a large number of audience.
The badge system will act as a rewarding system for the people who will verify the fact suggested is true or not. As it will gain more people for this great community.This project will be really a life changing project because it will consist of two major features to the wikidata base which will be a great real life experience for me.

Open Source Contribution and Past Experience:

I have worked with JavaScript,C++,Python,C,Azure Applications,Git Version Control System,Linux,SQL databases.
I have also contributed to Python Packaging Authority(pypa):

https://github.com/pypa/warehouse/issues/9174

Warm Up Tasks Completed:

  • Get familiar with data structures available in Wikidata

I have gone through all the tutorials of data structures with wikidata and understood them completely.

  • Select 3 Wikidata entities and manually find missing facts based on external data sources

All the contributions are done by finding missing facts externally and the Recoin
Pandas:https://www.wikidata.org/wiki/Q15967387
Tensorflow:https://www.wikidata.org/wiki/Q21447895
Matplotlib:https://www.wikidata.org/wiki/Q2985668
after recoin extracts the data it goes to wikidata complete apis and then fetch the new facts using the property url
generates external facts then if yes updates in database else does not updates the database.
badge service can be implemented by wikidata account by counting how many contributions done.

  • Set up the MediaWiki development environment

I have setted up the MediaWiki development environment using both the ways(https://www.mediawiki.org/wiki/MediaWiki-Docker and https://github.com/wmde/wikibase-docker/blob/master/README-compose.md )

Blog Posts

Google Proposal Link

Event Timeline

Hey @Dhairya3124

Thanks for showing your interest to participate in Google Summer of Code with Wikimedia Foundation! Please make sure to upload a copy of your proposal on Google's program site as well in whatever format it's expected of you, include in it this public proposal of Phabricator before the deadline i.e April 13th. Good luck :)

@Gopavasanth I have also uploaded the final pdf for submission. :)

GSoC application deadline has passed. If you have submitted a proposal on the GSoC program website, please visit https://phabricator.wikimedia.org/project/view/5104/ and then drag your own proposal from the "Backlog" to the "Proposals Submitted" column on the Phabricator workboard. You can continue making changes to this ticket on Phabricator and have discussions with mentors and community members about the project. But, remember that the decision will not be based on the work you did after but during and before the application period. Note: If you have not contacted your mentor(s) before the deadline and have not contributed a code patch before the application deadline, you are unfortunately not eligible. Thank you!

Mentors,I have started blog post series about all the learning of SPARQL queries. Please give your Reviews/Feedbacks about it :).