Wikimedia Commons is an online repository of free-use images, sounds, other media, and JSON files. Anyone can upload media to the Commons portal. The uploads are moderated by members and volunteers of the foundation manually. This project aims to build a classifier that can flag NSFW images/media for review.
Upon successful completion of this internship, the intern would have designed, implemented and tested a machine learning model that would be able to classify image and video media as SFW or NSFW with a high accuracy. They would also be given the chance to deploy the model to WIkimedia test and production servers. Further, they would build a data processing pipeline and an API for the model.
Since this is a scratch project, applicants are required to do some research initially. A basic comparison of the existing NSFW classifiers along with their computational requirements is required. All applicants are expected to read various research papers, and draw comparisons between them.
They are expected to come up with a report detailing their research, the various options that can be used to implement the model and what they are proposing to do if they are selected. This report should also detail implementational methods and procedures.
- Tensorflow / PyTorch for model creation
- Flask / Django / FastAPI for the API
These are the tasks that Outreachy aspirants for Round 21 are expected to do during the contribtion period. Applicants are required to complete atleast one of the tasks in order to be eligible for selection.
After successfully recording contributions, applicants are required to submit a Final Application on the Outreachy website. This should contain details about your research on the problem statement, proposed implementations with details as well as a descriptive project timeline. The proposed implementation along with the timeline should be in the "Outreachy internship project timeline" section.
Repository for Micro Tasks
Wikimedia NSFW Classifier Reports
Follow the README.md of this repository for more details.
These are the tasks that the selected intern is expected to do during the course of their internship.
- T264049: Model Development for NSFW Classifier
- T264052: API for NSFW Classifier
- T264050: Video Processing Module for NSFW Classfier