Not Safe for Work (NSFW) media Classifier for Wikimedia Commons
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Abbasidaniyal
	Sep 28 2020, 8:16 PM

Description

Brief summary

Wikimedia Commons is an online repository of free-use images, sounds, other media, and JSON files. Anyone can upload media to the Commons portal. The uploads are moderated by members and volunteers of the foundation manually. This project aims to build a classifier that can flag NSFW images/media for review.

Upon successful completion of this internship, the intern would have designed, implemented and tested a machine learning model that would be able to classify image and video media as SFW or NSFW with a high accuracy. They would also be given the chance to deploy the model to WIkimedia test and production servers. Further, they would build a data processing pipeline and an API for the model.

Since this is a scratch project, applicants are required to do some research initially. A basic comparison of the existing NSFW classifiers along with their computational requirements is required. All applicants are expected to read various research papers, and draw comparisons between them.
They are expected to come up with a report detailing their research, the various options that can be used to implement the model and what they are proposing to do if they are selected. This report should also detail implementational methods and procedures.

Skills required

Python
Tensorflow / PyTorch for model creation
Flask / Django / FastAPI for the API

Possible mentor(s)

@Abbasidaniyal @Chtnnh

Micro Tasks

These are the tasks that Outreachy aspirants for Round 21 are expected to do during the contribtion period. Applicants are required to complete atleast one of the tasks in order to be eligible for selection.

After successfully recording contributions, applicants are required to submit a Final Application on the Outreachy website. This should contain details about your research on the problem statement, proposed implementations with details as well as a descriptive project timeline. The proposed implementation along with the timeline should be in the "Outreachy internship project timeline" section.

Repository for Micro Tasks

Wikimedia NSFW Classifier Reports
Follow the README.md of this repository for more details.

Internship tasks

These are the tasks that the selected intern is expected to do during the course of their internship.

Related Objects
Search...

Status	Assigned	Task
Resolved	Harshineesriram	T264045 Not Safe for Work (NSFW) media Classifier for Wikimedia Commons
Resolved	Harshineesriram	T264050 Video Processing Module for NSFW Classfier
Resolved	Harshineesriram	T264052 API for NSFW Classifier
Resolved	Harshineesriram	T264049 Model Development for NSFW Classifier
Resolved	Harshineesriram	T264056 Comparison of Existing NSFW Classifiers
Resolved	Harshineesriram	T264938 Comparison of Existing NSFW Datasets

Event Timeline

Abbasidaniyal created this task.Sep 28 2020, 8:16 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 28 2020, 8:16 PM

srishakatux mentioned this in T214201: Implement NSFW image classifier using Open NSFW.Sep 28 2020, 8:20 PM

srishakatux added a project: Outreachy (Round 21).

srishakatux moved this task from Backlog to Featured Projects on the Outreachy (Round 21) board.

Abbasidaniyal updated the task description. (Show Details)Sep 28 2020, 8:33 PM

Abbasidaniyal updated the task description. (Show Details)Sep 28 2020, 8:41 PM

Pavithraes subscribed.Sep 28 2020, 8:54 PM

Abbasidaniyal updated the task description. (Show Details)Sep 28 2020, 9:03 PM

srishakatux updated the task description. (Show Details)Sep 28 2020, 9:36 PM

srishakatux updated the task description. (Show Details)Sep 28 2020, 9:53 PM

Abbasidaniyal mentioned this in T264056: Comparison of Existing NSFW Classifiers.Sep 28 2020, 10:03 PM

Abbasidaniyal updated the task description. (Show Details)

Aklapper added subtasks: T264050: Video Processing Module for NSFW Classfier, T264052: API for NSFW Classifier, T264049: Model Development for NSFW Classifier.Sep 29 2020, 6:33 AM

Aklapper added a subtask: T264056: Comparison of Existing NSFW Classifiers.

srishakatux changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Sep 30 2020, 1:57 AM

srishakatux changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Oct 7 2020, 6:19 PM

Slst2020 subscribed.Oct 7 2020, 6:55 PM

Abbasidaniyal mentioned this in T264938: Comparison of Existing NSFW Datasets.Oct 7 2020, 7:48 PM

Abbasidaniyal updated the task description. (Show Details)

Abbasidaniyal added a subtask: T264938: Comparison of Existing NSFW Datasets.

Abbasidaniyal updated the task description. (Show Details)Oct 7 2020, 7:51 PM

Abbasidaniyal updated the task description. (Show Details)Oct 7 2020, 8:52 PM

Abbasidaniyal updated the task description. (Show Details)Oct 7 2020, 10:40 PM

Hi everyone,

I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons in the form of a report @Chtnnh @Abbasidaniyal ?

Many thanks

Angela

In T264045#6528414, @Angybrim wrote:

Hi everyone,

I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons in the form of a report @Chtnnh @Abbasidaniyal ?

Many thanks

Angela

Question answered in Zulip.

Hi everyone, I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons?

Not necessarily all, but it will be nice to see comparisons between the widely used ones.

The answer from Zulip for everyone's reference.

Reena1796 subscribed.Oct 8 2020, 6:53 PM

YemiKifouly subscribed.Oct 9 2020, 3:28 AM

tanny411 subscribed.Oct 11 2020, 3:14 AM

Abbasidaniyal updated the task description. (Show Details)Oct 21 2020, 3:05 PM

RoySmith subscribed.Dec 26 2020, 5:08 PM

What community standards are going to be used in some sort of scoring system for this? What is considered "safe for work" easily varies widely throughout the world.

TheDJ subscribed.Dec 26 2020, 7:35 PM

@Xaosflux: See T214201 instead; this T264045 is 'only' project task.

MusikAnimal subscribed.Jan 1 2021, 9:12 PM

Fae awarded a token.Feb 1 2021, 12:37 PM

Fae subscribed.

Under discussion at Commons VP. Overwhelmingly negative.

This proposal should not go ahead unless there is some reasonable consensus as to what the desired use case is, and why it would benefit rather than harm the project collections or be used to potentially disrupt and harm the community. Various forms of NSFW categorization, classification and censorship have been discussed since the Wikimedia Commons started, it would be worth researching the VP discussion archives.

For those unaware of the history, as an example of how controversial this issue is, Jimmy Wales abandoned making contributions to Commons in 2014 because of his views about nudity/sexuality content and has made one edit since. Inviting interns to jump into these muddy waters would be, frankly, foolish.

In T264045#6791491, @Fae wrote:

Under discussion at Commons VP. Overwhelmingly negative.

This proposal should not go ahead unless there is some reasonable consensus as to what the desired use case is, and why it would benefit rather than harm the project collections or be used to potentially disrupt and harm the community. Various forms of NSFW categorization, classification and censorship have been discussed since the Wikimedia Commons started, it would be worth researching the VP discussion archives.

For those unaware of the history, as an example of how controversial this issue is, Jimmy Wales abandoned making contributions to Commons in 2014 because of his views about nudity/sexuality content and has made one edit since. Inviting interns to jump into these muddy waters would be, frankly, foolish.

I replied there, hopefully what I said was accurate. But in a nutshell – this all sounds much more controversial than it really is. I think renaming the project from something other than NSFW should be a priority, as this is a frequent source of confusion.

Frank_Schulenburg subscribed.Feb 1 2021, 11:39 PM

tanny411 unsubscribed.Feb 2 2021, 6:34 AM

@Fae: Did you read the last comment before yours in this task?

Yes, this task needs to be withdrawn.

Renaming a NSFW filtering system for nudity, would remain a NSFW filtering system which has no community consensus on Commons, as others have spelt out in different words in T214201.

In whatever task, it would be productive to address the Commons VP issues raised by several members of our community, such as is recommended in Phabricator Etiquette (criticize ideas, not people). The most significant issue is that the consensus already exists against implementing any automated NSFW type filter for Commons.

You are welcome to comment directly at the Commons VP rather than relying on any interlocutors or to raise a proposal on Commons in the conventional way, rather than keeping discussion and rebuttals effectively hidden inside Phabricator task comments which the vast majority of the active Commons community will never engage with.

Thanks

Yes

In that case you commented intentionally in the wrong place. Please bring personal opinions and arguments up in better suited places. Thanks.

There does seem to be a misunderstanding. This task is labelled as being for the "Internship tasks" related to NSFW classifiers "for Wikimedia Commons". If it's not this, perhaps someone could describe it more clearly. Thanks

Is everything in this project task planned for Outreachy (Round 21) completed? If yes, please consider closing this and other related tasks as resolved. If bits and pieces are remaining, you could consider creating a new task and moving them there.

Hello @srishakatux, thank you for the nudge. There are some aspects of the task yet to be completed. We will take your suggestion and move them there and mark the Outreachy task resolved.

For reference, the new task is T279416. I have added all the subscribers here in that task as well. The task description will be updated soon. Feel free to edit the task description and continue the discussion there @Fae @MusikAnimal @Aklapper