Page MenuHomePhabricator

Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia
Closed, DeclinedPublic

Description

Profile Information

Name : Ekemini Okpongkpong
Github: https://github.com/KemmieKemy
Medium: https://medium.com/@kemmie
Location (State, Country) : Akwa Ibom, Nigeria
Time Zone : (UTC + 01:00) West Central Africa
Typical working hours (include timezone) : 9 AM - 4 PM (UTC + 01:00)

Synopsis
Wikimedia is a global movement whose mission is to bring free educational content to the world, via Wikipedia and other projects.
The aim of the project is to Create Machine Learning datasets to measure content reliability on Wikipedia with the following objectives;

  • Research, gather and process Wikipedia related data about articles content reliability.
  • Detect crowd-generated tags or labels currently used by the Wikipedia editors and developers
  • Signal problems with content integrity on Wikipedia to other editors.
  • Characterize the content, select the most relevant ones.
  • Create machine readable datasets that will allow ML systems to detect problematic content potentially automatically

Outreachy Internship Project Timeline

PeriodTask
November 24th to December 1stCommunity Bonding Period
December 2nd to December 9thGather and process Wikipedia related data about articles content reliability
December 10th to December 17thUnderstand the concept of semi-automated methods
December 18th to December 24thExplore the space of templates related to content integrity using semi-automated methods
December 25th to January 2ndDocument progress in the First Month
January 3rd to January 10thDownload and process articles and sections with the found templates
January 11th to January 18thDownload and process articles and sections with the found templates
January 19th to January 26thTest the datasets by running different ML algorithms
January 27th to February 2ndDocument progress in the Second Month
February 3rd to February 10thAnalyze the data to summarize the main statistics
February 11th to February 18thProduce the statistics and data properties
February 19th to February 26thVisualize the statistics and data properties
February 27th to March 2ndFinalize and review the project, document overall Outreachy experience

About Me
I graduated from Akwa Ibom State University, Nigeria with a Bachelor of Science (B.Sc) degree in Computer Science.
My Open Source journey began this year during the Women of Open Source Africa (WOSCA) Launch Practical Session, I made my first open source contribution on github to the First Contributions repository. Then during Hacktoberfest, I successfully completed the Hacktoberfest Challenge by making 4 pull requests and I also made one of my repositories on github Open Source for other developers to contribute to.
I learnt Data Science first online by following courses an Dataquest and Datacamp. I also participated in the first cohort of She Code Africa's Mentoring Program (Data Science track) where I was assigned to a mentor who was a professional in the field and I was given weekly tasks and resources. I also write articles on my Medium blog.
I believe that these skills and knowledge will be valuable to this project.

Event Timeline

Hi and welcome @KemmieKemy! Please see and follow https://www.mediawiki.org/wiki/Outreachy/Participants - thanks!
Also, Outreachy round 20 ended two months ago. I assume this is about Outreachy round 21? Is this a proposal for T263860?

Hi and welcome @KemmieKemy! Please see and follow https://www.mediawiki.org/wiki/Outreachy/Participants - thanks!
Also, Outreachy round 20 ended two months ago. I assume this is about Outreachy round 21? Is this a proposal for T263860?

Thank you Sir. Yes, it's a draft of the proposal for T263860 .
I am still working on it

KemmieKemy renamed this task from Outreachy '20 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia to Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.Oct 26 2020, 12:27 PM

Thanks; setting as subtask in that case. (PS: No "sir" or other assumptions on gender needed here.)

@KemmieKemy thanks for submitting. You are doing great progress.

My main recommendation (I will add this on the main task too) is to keep the Notebook as easy to read as possible. That means:

  • Explain each piece of code that you are running. The idea is to make the notebook easy to understand by any reader.
  • Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
  • Avoid long/repetitive code outputs that doesn't give information. For example, if you are applying a model that run 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show them in a way that is compact and easy to understand (for example a plot).

@diego Thank you for this recommendation but I really wanted to know if the timeline is Okay so that I can make my final submission on the Outreachy website

Got you. Yes, looks good, please add it in the outreachy application.

Got you. Yes, looks good, please add it in the outreachy application.

Okay. Thank you, I will.

For more details on the timeline recommendations please check Isaac's comment here: T263874#6589856

Gopavasanth subscribed.

@KemmieKemy We are sorry to say that we could not allocate a slot for you this time. Please do not consider the rejection to be an assessment of your proposal. We received over 28 quality applications, and we could only accept 7 interns. We were not able to give all applicants a slot that would have deserved one, and these were some very tough decisions to make. Please know that you are still a valued member of our community and we by no means want to exclude you. Many interns who we did not accept in 2019 have become Wikimedia maintainers, contractors and even Outreachy interns and mentors this year!

Your ideas and contributions to our projects are still welcome! As a next step, you could consider finishing up any pending pull requests or inform us that someone has to take them over. Here is the recommended place for you to get started as a newcomer: https://www.mediawiki.org/wiki/New_Developers.

If you would still be eligible for Outreachy next year, we look forward to your participation!