Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia.
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Miriam
	Sep 25 2020, 4:39 PM

Description

Brief summary

The project will consist of researching, gathering and processing Wikipedia related data about articles content reliability, detecting crowd-generated tags or labels currently used by the Wikipedia editors and developers to signal problems with content integrity on Wikipedia to other editors. Nowadays, many Wikipedia templates and tools are used to label potentially bad content, but they are usually not machine friendly. In this project, we will characterize this content, select the most relevant ones, and create machine readable datasets that will allow ML systems to detect problematic content potentially automatically. During the project, we will also test those datasets by running different ML algorithms that will be used as baselines for future researchers.

Skills required

Python, SQL
Basic data analysis skills
Plus: data visualization skills

Possible mentor(s)

@Miriam @diego

How to Apply?

Please check the instructions in the following task: T263874. Detailed submission instructions are included in the task as well.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		None	T263860 Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia.
		Declined		None	T266426 Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia

Event Timeline

Miriam created this task.Sep 25 2020, 4:39 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 25 2020, 4:39 PM

Miriam changed the edit policy from "All Users" to "acl*outreachy-mentors (Project)".Sep 25 2020, 4:39 PM

Aklapper changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Sep 25 2020, 4:48 PM

Aklapper changed the edit policy from "acl*outreachy-mentors (Project)" to "All Users".

Miriam renamed this task from Insert project title here to Outeachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..Sep 25 2020, 4:50 PM

srishakatux moved this task from Backlog to Featured Projects on the Outreachy (Round 21) board.Sep 25 2020, 8:28 PM

Pavithraes subscribed.Sep 26 2020, 10:06 AM

Miriam renamed this task from Outeachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia. to Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..Sep 28 2020, 5:15 PM

FYI -- you can direct them to T263874 for the microtask

Miriam updated the task description. (Show Details)Oct 7 2020, 4:00 PM

Isaac changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Oct 7 2020, 4:30 PM

He7d3r subscribed.Oct 7 2020, 4:49 PM

Lisasiziba subscribed.Oct 7 2020, 6:00 PM

Countess_Olufunmi subscribed.Oct 7 2020, 8:18 PM

Hey! My name is Liz, an Outreachy applicant. I am excited to get started on this project. What do I need to set up and where can I start?

Hello @Miriam. I am pleased to be here. I am an Outreachy applicant. Can you kindly guide me to where I can start contributing?

AlexGP subscribed.Oct 7 2020, 9:21 PM

Hello, I'm Bimie, really excited to be here as an outreachy applicant. Please where do I start from?

Hi @Bimie_babs ! I am an Outreachy applicant as well. There are instructions in this task

0xkaywong subscribed.Oct 8 2020, 3:44 AM

Sidrah_M_Siddiqui subscribed.Oct 8 2020, 4:36 AM

Hi everyone, I'm excited to be here as an Outreachy applicant. Please how do I start contributing?

In T263860#6527375, @LiviaCavalcanti wrote:

Hi @Bimie_babs ! I am an Outreachy applicant as well. There are instructions in this task

Hey @KemmieKemy, please refer to the link in the comment above. :>
Basically we'll have to complete microtask T263874 in order to be considered for this project.

@0xkaywong Okay, Thank you.

Hello am Lisa and am super excited to be an applicant. Please show me where to start with the contributions

Hi @Lisasiziba
Please check the instructions here T263874.

@diego thank you

Sakshi_Priya subscribed.Oct 8 2020, 10:25 AM

Miriam updated the task description. (Show Details)Oct 8 2020, 11:24 AM

Miriam updated the task description. (Show Details)Oct 8 2020, 11:28 AM

Perfect day my name is Thulie I'm new to open source and an outreach applicant I excited to start learning and make my first ever contribution,hope to learn and interact with you 🤗

Hi and welcome! Please see the task description and follow https://www.mediawiki.org/wiki/Outreachy/Round_21 - thanks!

Thanks

Hello everyone, I'm Abhipsha, an outreachy applicant in this cohort. Was really excited to see a ML project in the projects list so I look forward to contributing and interacting with everyone and also learn a lot about open source. Cheers! 😄

Miriam updated the task description. (Show Details)Oct 8 2020, 2:55 PM

Seen

Isaac mentioned this in T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.Oct 8 2020, 3:00 PM

Hi everyone! I am Tanya, an Outreachy applicant. I look forward to contributing and interacting with everyone and also learn about open source.

Divvya24 subscribed.Oct 8 2020, 7:45 PM

Divvya24 unsubscribed.

Divvya24 subscribed.

Hello everybody ! This is Divya here , an Outreachy applicant. I am really excited to work with open source projects and really looking forward to gain ample of experience with Wikimedia community.

Hi everyone, I am Ashmita, an Outreachy applicant. I am data science student and looking forward to contribute and learn.

Hi! I am an Outreachy applicant, I am really glad to be here and excited to get started.

Thulieblack added a comment.Oct 9 2020, 1:43 PM

This comment was removed by Thulieblack.

Hi everyone, I'm Anna, an Outreachy applicant. Nice to meet you all

Hi everyone! I'm Jocelyne, an Outreachy applicant. I am looking forward to make contributions and learn a lot throughout this project. Pleased to meet everyone!

Hello everyone and fellow Outreachy applicants.
My name is Sébastien and I am excited to learn along with y'all.

Regarding task T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data, if, like me, you are having problems downloading the notebook using the ?format=raw append as explained in the instructions, just copy-paste the raw text into a text file and save it as a .ipynb file. Then on your PAWS notebook press upload and find that file.

Ciao guys. Looking forward

Hey may name is Onesha Sappleton. I am an outreachy applicant and I would like to contribute however am not seeing the task that needs to be completed so I can get accepted , therefore I am asking for assistance if that is possible . I need a little guide on where I should start from with the project. Thank you.

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

SIBGHAsheikh subscribed.Oct 13 2020, 4:35 AM

In T263860#6537677, @Sappleton101 wrote:

Hey may name is Onesha Sappleton. I am an outreachy applicant and I would like to contribute however am not seeing the task that needs to be completed so I can get accepted , therefore I am asking for assistance if that is possible . I need a little guide on where I should start from with the project. Thank you.

Hi @Sappleton101 , many thanks for reaching out! Please check the instructions in the following task: T263874. Detailed submission instructions are included in the task as well.

In T263860#6537687, @tanny411 wrote:

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Hi @tanny411 thanks for your interest in this project! This project and the "inferring country" one are 2 separate projects. Only the application task is shared by both projects, as all of us work in the same team, and the skills required for both projects are similar. When you complete the task, you will be submitting your task as part of your application to this (or the other) project. If you apply to this project, @diego and I will be reviewing your task as part of your application. Does that make sense?

Thanks @Miriam, makes sense :D

Hey
My name is Liz an Outreachy applicant. Has anyone here been able to work with the page table dump without running out to memory.?Would appreciate some tips:)

YemiKifouly subscribed.Oct 19 2020, 12:34 AM

mrlucasrib subscribed.Oct 20 2020, 2:09 PM

Isaac mentioned this in T266180: Request increased quota for wmf-research-tools Cloud VPS project.Oct 21 2020, 6:47 PM

In T263860#6538338, @Miriam wrote:

In T263860#6537687, @tanny411 wrote:

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Hi @tanny411 thanks for your interest in this project! This project and the "inferring country" one are 2 separate projects. Only the application task is shared by both projects, as all of us work in the same team, and the skills required for both projects are similar. When you complete the task, you will be submitting your task as part of your application to this (or the other) project. If you apply to this project, @diego and I will be reviewing your task as part of your application. Does that make sense?

Yes it does. So in essence, (I'll just repeat to make sure that I understood well) if an applicant applies to both projects, the task will be reviewed by both sets of mentors.

@Tambe correct!

Aklapper mentioned this in T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.Oct 26 2020, 11:23 AM

Aklapper added a subtask: T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.Oct 26 2020, 3:52 PM

Precillieo subscribed.Oct 29 2020, 10:48 AM

Hello everybody!

I have a general recommendation to all of you: Keep the notebook easy to read. That means:

Explain each piece of code that you are running. The idea is to make the notebook easy to understand. Don't make the reader have to guess what you were trying to do.
Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
Avoid long/repetitive code outputs that doesn't provide relevant information. For example, if you are applying a model that runs 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show that information in a way that is compact and easy to understand (for example a plot).

@diego thank you for the thoughtful suggestions!

mrlucasrib unsubscribed.Oct 30 2020, 11:46 AM

Sebaucillon unsubscribed.Oct 30 2020, 4:32 PM

Gopavasanth closed subtask T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia as Declined.Nov 26 2020, 6:03 AM

diego mentioned this in T274400: Request creation of research-collaborations-api VPS project.Feb 10 2021, 6:49 PM

Is everything in this project task planned for Outreachy (Round 21) completed? If yes, please consider closing this and other related tasks as resolved. If bits and pieces are remaining, you could consider creating a new task and moving them there.

@srishakatux project finished successfully, more details here: T260566

Outcome: https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia

diego closed this task as Resolved.Apr 3 2021, 1:10 AM

Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia.Closed, ResolvedPublicActions