Page MenuHomePhabricator

Not Safe for Work (NSFW) media Classifier for Wikimedia Commons
Closed, ResolvedPublic

Description

Brief summary

Wikimedia Commons is an online repository of free-use images, sounds, other media, and JSON files. Anyone can upload media to the Commons portal. The uploads are moderated by members and volunteers of the foundation manually. This project aims to build a classifier that can flag NSFW images/media for review.

Upon successful completion of this internship, the intern would have designed, implemented and tested a machine learning model that would be able to classify image and video media as SFW or NSFW with a high accuracy. They would also be given the chance to deploy the model to WIkimedia test and production servers. Further, they would build a data processing pipeline and an API for the model.

Since this is a scratch project, applicants are required to do some research initially. A basic comparison of the existing NSFW classifiers along with their computational requirements is required. All applicants are expected to read various research papers, and draw comparisons between them.
They are expected to come up with a report detailing their research, the various options that can be used to implement the model and what they are proposing to do if they are selected. This report should also detail implementational methods and procedures.

Skills required

  • Python
  • Tensorflow / PyTorch for model creation
  • Flask / Django / FastAPI for the API

Possible mentor(s)

@Abbasidaniyal @Chtnnh

Micro Tasks

These are the tasks that Outreachy aspirants for Round 21 are expected to do during the contribtion period. Applicants are required to complete atleast one of the tasks in order to be eligible for selection.

After successfully recording contributions, applicants are required to submit a Final Application on the Outreachy website. This should contain details about your research on the problem statement, proposed implementations with details as well as a descriptive project timeline. The proposed implementation along with the timeline should be in the "Outreachy internship project timeline" section.

Repository for Micro Tasks

Wikimedia NSFW Classifier Reports
Follow the README.md of this repository for more details.

Internship tasks

These are the tasks that the selected intern is expected to do during the course of their internship.

See also

T214201: Implement NSFW image classifier using Open NSFW

Event Timeline

srishakatux changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Sep 30 2020, 1:57 AM
srishakatux changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Oct 7 2020, 6:19 PM

Hi everyone,

I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons in the form of a report @Chtnnh @Abbasidaniyal ?

Many thanks

Angela

Hi everyone,

I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons in the form of a report @Chtnnh @Abbasidaniyal ?

Many thanks

Angela

Question answered in Zulip.

Hi everyone, I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons?

Not necessarily all, but it will be nice to see comparisons between the widely used ones.

The answer from Zulip for everyone's reference.

What community standards are going to be used in some sort of scoring system for this? What is considered "safe for work" easily varies widely throughout the world.

Under discussion at Commons VP. Overwhelmingly negative.

This proposal should not go ahead unless there is some reasonable consensus as to what the desired use case is, and why it would benefit rather than harm the project collections or be used to potentially disrupt and harm the community. Various forms of NSFW categorization, classification and censorship have been discussed since the Wikimedia Commons started, it would be worth researching the VP discussion archives.

For those unaware of the history, as an example of how controversial this issue is, Jimmy Wales abandoned making contributions to Commons in 2014 because of his views about nudity/sexuality content and has made one edit since. Inviting interns to jump into these muddy waters would be, frankly, foolish.

Under discussion at Commons VP. Overwhelmingly negative.

This proposal should not go ahead unless there is some reasonable consensus as to what the desired use case is, and why it would benefit rather than harm the project collections or be used to potentially disrupt and harm the community. Various forms of NSFW categorization, classification and censorship have been discussed since the Wikimedia Commons started, it would be worth researching the VP discussion archives.

For those unaware of the history, as an example of how controversial this issue is, Jimmy Wales abandoned making contributions to Commons in 2014 because of his views about nudity/sexuality content and has made one edit since. Inviting interns to jump into these muddy waters would be, frankly, foolish.

I replied there, hopefully what I said was accurate. But in a nutshell – this all sounds much more controversial than it really is. I think renaming the project from something other than NSFW should be a priority, as this is a frequent source of confusion.

@Fae: Did you read the last comment before yours in this task?

Yes, this task needs to be withdrawn.

Renaming a NSFW filtering system for nudity, would remain a NSFW filtering system which has no community consensus on Commons, as others have spelt out in different words in T214201.

In whatever task, it would be productive to address the Commons VP issues raised by several members of our community, such as is recommended in Phabricator Etiquette (criticize ideas, not people). The most significant issue is that the consensus already exists against implementing any automated NSFW type filter for Commons.

You are welcome to comment directly at the Commons VP rather than relying on any interlocutors or to raise a proposal on Commons in the conventional way, rather than keeping discussion and rebuttals effectively hidden inside Phabricator task comments which the vast majority of the active Commons community will never engage with.

Thanks

Yes

In that case you commented intentionally in the wrong place. Please bring personal opinions and arguments up in better suited places. Thanks.

There does seem to be a misunderstanding. This task is labelled as being for the "Internship tasks" related to NSFW classifiers "for Wikimedia Commons". If it's not this, perhaps someone could describe it more clearly. Thanks

Is everything in this project task planned for Outreachy (Round 21) completed? If yes, please consider closing this and other related tasks as resolved. If bits and pieces are remaining, you could consider creating a new task and moving them there.

Chtnnh assigned this task to Harshineesriram.

Hello @srishakatux, thank you for the nudge. There are some aspects of the task yet to be completed. We will take your suggestion and move them there and mark the Outreachy task resolved.

For reference, the new task is T279416. I have added all the subscribers here in that task as well. The task description will be updated soon. Feel free to edit the task description and continue the discussion there @Fae @MusikAnimal @Aklapper