Page MenuHomePhabricator

[Outreachy 2020 - 2021 Proposal] Analyze community authored functions that build Wikipedia infoboxes and more
Closed, DeclinedPublic

Description

Profile Information

Name : Udokaku Ugochukwu
Email: UdokakuUgochukwu@gmail.com
Github: UdokaVrede
IRC nickname on Freenode : Udokaku Ugochukwu
Location (country or state) : Rivers, Nigeria
Time Zone : (UTC + 01:00) West Central Africa
Typical working hours (include your timezone): 10 AM - 6 PM (UTC + 01:00) West Central Africa

Project Task: T263678

Synopsis

Wikipedia’s vision of the world is one in which everyone can share in the sum of all knowledge. However this goal has not been fully achieved, because achieving this goal requires the availability of multiple language editions of Wikipedia contents to be shared. Having more contributors would mean more content for more people in their different languages.
At the time of writing this, Wikipedia has about 54 million articles written in 309 different languages, but there is an underlying problem of divergence of information between different editions of the Wikipedia content, which is could be due to:

  • The ideological bias of the contributors.
  • Community with lesser contributors and more content to handle, requiring more time to provide contents. e.t.c

Wikipedia proposed an architecture for a system that would bring solutions to these existing problems by breaking down its goal into 2 projects:

  • Abstract Wikipedia: a project that provides natural language independent encyclopedic content for items in Wikipedia in abstract notation, it allows for creating, editing and maintaining content.
  • Wikilamba: a project that houses functions, algorithms and linguistic knowledge required for encoding abstract content from Abstract Wikipedia to natural language. Abstract Wikipedia is therefore built on Wikilamba.

The creation of these projects will lead to the creation of an open, widely usable, well tested natural language generation library on the Web, covering many different languages.
The goal of Abstract Wikipedia is to let more people share more knowledge in more languages. Abstract Wikipedia is an extension of Wikidata. Abstract Wikipedia requires functions that take abstract content as the input and return natural language text as the output. This is expected to be done multilingual and requires a rich environment to create and maintain the functions, and at the same time allow more people to contribute knowledge and reach more people with their contributions, no matter what their respective language background is.
Wikipedia Info-boxes are tools placed at the top-right corner in desktop view and at the top in mobile view of a Wikipedia article that gives an overview or summary of the information in an article.
On Wikipedia, an info-box is transcluded into an article by enclosing its name and attribute–value pairs within a double set of braces. The MediaWiki software on which Wikipedia operates then parses the document, for which the info-box and other templates are processed by a template processor. This is a template engine which produces a web document and a style sheet used for presentation of the document.

This project aims to analyze community authored functions that build Wikipedia info-boxes and centralizing community authored functions to achieve the Abstract Wikipedia purpose to create and maintain the content of Wikipedia for all languages only once, then render it in the preferred natural language of the reader using Wikilamba functions, which would lead to more coverage and accuracy between articles and also make available more knowledge for readers provided by more contributors. To achieve this, I would;

  1. Fetch the different community authored functions on the wikis, and determine their usage in articles and how many page views use each community authored function.
  2. Analyze the similarity between the community authored functions hosted across different projects, highlighting redundant or similar code.
  3. Determine whether there are segments of code that can be turned into pure functions in the wiki of functions.
Current State

Currently, Wikipedia community authored programming functions are used on different language editions of Wikipedia and require users to go through the tedious processes of searching for the community authored programming function then copy and paste it to their Wikipedia language edition. This process is error-prone, can lead to code duplication, and worse, improvements to functions on one language edition may not ever make their way to other language editions, it would only reflect on the improved function and not automatically across all Wikipedias.

Why is your proposed solution an improvement to the current solution?

With the current existence of redundant code that could cause reduced reliability and maintainability across functions and might affect performance mostly in cases of dead code. This solution which covers analyzing and refactoring duplicate code would improve software metrics, such as lines of code, cyclomatic complexity, and lead to shorter compilation times, lower cognitive load, less human error in the process of copy-pasting code, and fewer dead code.
The proposed solution will;

  • Refactor the community functions into a harmonious whole, and provide more efficient ways of contributing and maintaining Wikipedia functions, and in conjunction with the Wikilamba functions deliver high quality contents globally. As opposed to the current solution with challenges as stated in Current State section above.
  • Reduce the burden on contributors, especially contributors in communities with fewer contributors as they would no longer have to go through the manual processes of finding, copying and pasting functions to provide contents.
  • Eliminate divergence of information across Wikipedia content.
  • Provide centralized functions for use across Wikipedia contents.
  • Provide higher quality, concurrent and accurate articles for more people around the world.
  • Help in achieving a significant part of the goal of Abstract Wikipedia.
How will the proposed project benefit Wikimedia projects?

This is a work-in-progress project being developed to achieve a Wikipedia in which everyone can share in the sum of all knowledge. This will help ensure high quality concurrent and accurate content across Wikimedia projects when developed, thereby making available more content to more people around the world regardless of their language background. Functions would be accessed from a concurrent central point by all Wikimedia projects, which would result in the ease in maintenance of all community-authored functions and accessibility of accurate content.

Do you see any risks/concerns involved in implementing the planned features?

Centralizing community authored functions creates a single point of failure. Instead of having hundreds of independent Wikipedia functions.
Given the number of functions we need to gather information from, the major problem I see has to do with latency.

About Me

I have a National Diploma in Computer Science from Federal Polytechnic Nekede, Owerri, Nigeria.
My journey in tech started during the lock-down when I applied for a training with She Code Africa in April 2020. As a requirement and for the first time, I created a GitHub account and started familiarizing with the environment. In May, 2020, I got accepted into She Code Africa program to learn python for Data science, this was my first try and it was a breaking point for me, because, getting to understand more about tech,listening to people share their success story and writing simple programs piqued my interests and challenged me to go further. This program lasted for 3 months where I went through series of training, attended workshops and webinars. During the course of this program, I was introduced to W.O.S.C.A, a women initiative that introduces Women in Africa to open source. At the time, I could not participate in their activities as I was short of resources. Mid-August, I had gotten resources and kick-started my open source journey, and by August 24th, I made my first open source contribution to Layer 5, an organization centered around cloud native computing.
The training was a whole rebranding process for me from which I learnt a lot, and I believe that the knowledge gained from this training, personal projects and other open source contributions will be beneficial to this project, and look forward to getting accepted as an intern for this project.

Past Experience
  • Datasist: a python package providing fast, quick, and an abstracted interface to popular and frequently used functions or techniques relating to data analysis, visualization, data exploration, feature engineering, Computer, NLP, Deep Learning, modeling, model deployment, here is a link to my merged contribution on this project.
  • I also wrote a simple calculator using python and tkinter in the JamBot3000 repository on GitHub.
  • I participated in a hackathon, for a project on getting information about countries, nearby events and their weather conditions, my contribution to this was writing a script to fetch weather data from an API, and here is a link to my code.
  • Contributions I made to some other open source projects can be viewed here.

Asides making open source contributions, I am a python developer and technical writer with experience in HTML, CSS, Python, Git, currently acquiring skills in data science.
Birthdays are one of my favorite days, this motivated me to design a desktop application, “CREST”, that keeps track of birthdays. In designing this application, I made use of python, SQlite3 for the database and the tkinter library for the GUI.
Currently, I am a member of the Eddie Jaoude community, a community that gives people an excellent hand-holding onboarding experience into open source.
I am a member of Open-source Community Africa (OSCA), Facebook Developers Community(DevC) and Andela Learning Community (ALC).

Deliverables
PERIODTASK
November 1st to November 23rdWithin this period, I intend to continue contributing to tasks on Wikimedia. I would also continue honing my skills in Python and Data Analysis, my progress on this could be tracked here
December 1st - December 7thCommunity Bonding period. Get to know more about the community and also implement the guidelines for the community bonding period on MediaWiki
December 8th - December 15thGet more information on the project and how the tasks are expected to be done. Access the community authored functions. List out community authored functions to be analyzed. Ask questions about the confusing parts of the features to be implemented and get information on any additional task that might be needed. Layout plans and designs for the final implementation and share with my mentor to get feedback
December 16th - January 5th Fetch the different community authored functions on the wikis that build the info-boxes and other community functions. Identify their usage articles. Analyze the functions for redundant and dead code.
January, 6th - January, 19thWork on the feature to count the page views used by each community authored functions and send in my report for review.
January 20th - February, 9thAnalyze the similarity between the community authored functions hosted across different projects on Wikipedia.
February, 10th - February, 23rdFollowing the guide on building Wikipedia bots, I intend to build a bot that would highlight redundant or similar code within the community authored functions and returns redundant codes, then seek feedback from my mentor on the next steps to take with my findings if there exists redundant code.
February, 24th - March, 2ndAnalyze and determine segments of code to be turned into pure functions in the wiki of functions. Finalize, review, organize, get feedback and wrap up the project.
Other Deliverables
  • Blog post on my progress every week
  • Regular communication with my mentor(s) and other community members
  • Attend program-related meetings.
  • Follow any guidelines to get involved with the community and process.
Participation
  • I will be online on IRC in my working hours (10 am - 6 pm UTC+1) to collaborate with mentors and community members.
  • I will use Phabricator as well as Github for managing bugs and subtasks.
  • I will be available on Gmail to be contacted when needed in the non-working hours.
  • I would keep effective communication with my mentor during the internship period.
  • Wikimedia contributions

Microtask

  • Fetched all of the source code on English Wikipedia in the Module: "namespace" .
  • Generated a summary report.
Other Wikimedia Contributions

Pywikibot: T265128 - Replaced all occurrences of “basestring” to “str” and migrated from epytext type hint to annotation type hint.

Pywikibot: T264721 - Rewrite scripts using new option handler.

What does making this project happen mean to you?

Wikimedia aims to make available content to everyone around the globe. As I anticipate my acceptance to this internship round with this Wikimedia project, contributing to this project, would mean adding my quota to making the available content to people around the world, this would be elating and a rewarding experience for me, knowing I helped create solutions to provide valuable content to people around the world and also to the progress of the Wikimedia vision.
Career-wise, I would be exposed to:

  • A well-structured organizational procedure in achieving goals, and different patterns to achieving goals.
  • I would also meet and build a formidable network with more experienced software engineers and broaden both my knowledge set and skills.
  • It would also accelerate my growth and experience in the technology industry.

Details

Due Date
Sat, Oct 31, 4:00 PM

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 23 2020, 11:57 AM
Udoka_Ugo set Due Date to Sat, Oct 31, 12:00 AM.Oct 23 2020, 11:59 AM
Udoka_Ugo updated the task description. (Show Details)
Udoka_Ugo changed Due Date from Sat, Oct 31, 12:00 AM to Sat, Oct 31, 4:00 PM.
Udoka_Ugo removed a subscriber: Aklapper.
Udoka_Ugo updated the task description. (Show Details)Oct 23 2020, 12:04 PM
Udoka_Ugo updated the task description. (Show Details)Oct 28 2020, 2:35 PM
Udoka_Ugo renamed this task from [Outreachy 2021 Proposal] Analyze community authored functions that build Wikipedia infoboxes and more to [Outreachy 2020 - 2021 Proposal] Analyze community authored functions that build Wikipedia infoboxes and more.Fri, Oct 30, 2:14 PM
Udoka_Ugo updated the task description. (Show Details)Fri, Oct 30, 4:07 PM
Udoka_Ugo updated the task description. (Show Details)
Udoka_Ugo updated the task description. (Show Details)Sat, Oct 31, 2:20 PM
Gopavasanth closed this task as Declined.Thu, Nov 26, 6:05 AM
Gopavasanth added a subscriber: Gopavasanth.

@Udoka_Ugo We are sorry to say that we could not allocate a slot for you this time. Please do not consider the rejection to be an assessment of your proposal. We received over 28 quality applications, and we could only accept 7 interns. We were not able to give all applicants a slot that would have deserved one, and these were some very tough decisions to make. Please know that you are still a valued member of our community and we by no means want to exclude you. Many interns who we did not accept in 2019 have become Wikimedia maintainers, contractors and even Outreachy interns and mentors this year!

Your ideas and contributions to our projects are still welcome! As a next step, you could consider finishing up any pending pull requests or inform us that someone has to take them over. Here is the recommended place for you to get started as a newcomer: https://www.mediawiki.org/wiki/New_Developers.

If you would still be eligible for Outreachy next year, we look forward to your participation!