Page MenuHomePhabricator

Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia.
Closed, ResolvedPublic

Description

Brief summary

The project will consist of researching, gathering and processing Wikipedia related data about articles content reliability, detecting crowd-generated tags or labels currently used by the Wikipedia editors and developers to signal problems with content integrity on Wikipedia to other editors. Nowadays, many Wikipedia templates and tools are used to label potentially bad content, but they are usually not machine friendly. In this project, we will characterize this content, select the most relevant ones, and create machine readable datasets that will allow ML systems to detect problematic content potentially automatically. During the project, we will also test those datasets by running different ML algorithms that will be used as baselines for future researchers.

Skills required

  • Python, SQL
  • Basic data analysis skills
  • Plus: data visualization skills

Possible mentor(s)

@Miriam @diego

How to Apply?

Please check the instructions in the following task: T263874. Detailed submission instructions are included in the task as well.

Event Timeline

Miriam changed the edit policy from "All Users" to "Outreachy Mentors (Project)".Sep 25 2020, 4:39 PM
Aklapper changed the visibility from "Public (No Login Required)" to "Outreachy Mentors (Project)".Sep 25 2020, 4:48 PM
Aklapper changed the edit policy from "Outreachy Mentors (Project)" to "All Users".
Miriam renamed this task from Insert project title here to Outeachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..Sep 25 2020, 4:50 PM
Miriam renamed this task from Outeachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia. to Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..Sep 28 2020, 5:15 PM

FYI -- you can direct them to T263874 for the microtask

Isaac changed the visibility from "Outreachy Mentors (Project)" to "Public (No Login Required)".Oct 7 2020, 4:30 PM

Hey! My name is Liz, an Outreachy applicant. I am excited to get started on this project. What do I need to set up and where can I start?

Hello @Miriam. I am pleased to be here. I am an Outreachy applicant. Can you kindly guide me to where I can start contributing?

Hello, I'm Bimie, really excited to be here as an outreachy applicant. Please where do I start from?

Hi @Bimie_babs ! I am an Outreachy applicant as well. There are instructions in this task

Hi everyone, I'm excited to be here as an Outreachy applicant. Please how do I start contributing?

Hi @Bimie_babs ! I am an Outreachy applicant as well. There are instructions in this task

Hey @KemmieKemy, please refer to the link in the comment above. :>
Basically we'll have to complete microtask T263874 in order to be considered for this project.

Hello am Lisa and am super excited to be an applicant. Please show me where to start with the contributions

Hi @Lisasiziba
Please check the instructions here T263874.

Perfect day my name is Thulie I'm new to open source and an outreach applicant I excited to start learning and make my first ever contribution,hope to learn and interact with you 🤗

Hi and welcome! Please see the task description and follow https://www.mediawiki.org/wiki/Outreachy/Round_21 - thanks!

Hello everyone, I'm Abhipsha, an outreachy applicant in this cohort. Was really excited to see a ML project in the projects list so I look forward to contributing and interacting with everyone and also learn a lot about open source. Cheers! 😄

Hi everyone! I am Tanya, an Outreachy applicant. I look forward to contributing and interacting with everyone and also learn about open source.

Divvya24 removed a subscriber: Divvya24.
Divvya24 added a subscriber: Divvya24.

Hello everybody ! This is Divya here , an Outreachy applicant. I am really excited to work with open source projects and really looking forward to gain ample of experience with Wikimedia community.

Hi everyone, I am Ashmita, an Outreachy applicant. I am data science student and looking forward to contribute and learn.

Hi! I am an Outreachy applicant, I am really glad to be here and excited to get started.

This comment was removed by Thulieblack.

Hi everyone, I'm Anna, an Outreachy applicant. Nice to meet you all

Hi everyone! I'm Jocelyne, an Outreachy applicant. I am looking forward to make contributions and learn a lot throughout this project. Pleased to meet everyone!

Hello everyone and fellow Outreachy applicants.
My name is Sébastien and I am excited to learn along with y'all.

Regarding task T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data, if, like me, you are having problems downloading the notebook using the ?format=raw append as explained in the instructions, just copy-paste the raw text into a text file and save it as a .ipynb file. Then on your PAWS notebook press upload and find that file.

Ciao guys. Looking forward

Hey may name is Onesha Sappleton. I am an outreachy applicant and I would like to contribute however am not seeing the task that needs to be completed so I can get accepted , therefore I am asking for assistance if that is possible . I need a little guide on where I should start from with the project. Thank you.

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Hey may name is Onesha Sappleton. I am an outreachy applicant and I would like to contribute however am not seeing the task that needs to be completed so I can get accepted , therefore I am asking for assistance if that is possible . I need a little guide on where I should start from with the project. Thank you.

Hi @Sappleton101 , many thanks for reaching out! Please check the instructions in the following task: T263874. Detailed submission instructions are included in the task as well.

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Hi @tanny411 thanks for your interest in this project! This project and the "inferring country" one are 2 separate projects. Only the application task is shared by both projects, as all of us work in the same team, and the skills required for both projects are similar. When you complete the task, you will be submitting your task as part of your application to this (or the other) project. If you apply to this project, @diego and I will be reviewing your task as part of your application. Does that make sense?

Hey
My name is Liz an Outreachy applicant. Has anyone here been able to work with the page table dump without running out to memory.?Would appreciate some tips:)

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Hi @tanny411 thanks for your interest in this project! This project and the "inferring country" one are 2 separate projects. Only the application task is shared by both projects, as all of us work in the same team, and the skills required for both projects are similar. When you complete the task, you will be submitting your task as part of your application to this (or the other) project. If you apply to this project, @diego and I will be reviewing your task as part of your application. Does that make sense?

Yes it does. So in essence, (I'll just repeat to make sure that I understood well) if an applicant applies to both projects, the task will be reviewed by both sets of mentors.

Hello everybody!

I have a general recommendation to all of you: Keep the notebook easy to read. That means:

  • Explain each piece of code that you are running. The idea is to make the notebook easy to understand. Don't make the reader have to guess what you were trying to do.
  • Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
  • Avoid long/repetitive code outputs that doesn't provide relevant information. For example, if you are applying a model that runs 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show that information in a way that is compact and easy to understand (for example a plot).

@diego thank you for the thoughtful suggestions!

Is everything in this project task planned for Outreachy (Round 21) completed? If yes, please consider closing this and other related tasks as resolved. If bits and pieces are remaining, you could consider creating a new task and moving them there.

@srishakatux project finished successfully, more details here: T260566