Page MenuHomePhabricator

Implement NSFW image classifier using Open NSFW
Open, HighPublic

Description

Library: https://github.com/rahiel/open_nsfw--

It uses python and some weird "coffee" serialization format. We could probably include this in ORES without too much trouble depending on what memory usage looks like.

Intended use: Flag edits to Wikipedia articles that add NSFW images/media for review.
Strategy:

  1. See if ORES can even handle this.
  2. Try hosting a model in labs.
  3. Look into tools that might help reviewers who are trying to catch vandalism that follows this pattern.

Background:

Some vandalism comes in the form of edits that inappropriately add NSFW images to articles. The clear intention behind this type of vandalism is to shock an innocent reader. By flagging edits that add NSFW for secondary review, we can help patrollers track these types of edits.

For clarity, there is no plan to use this to filter images from articles for which they are appropriate. Many NSFW images are welcome on commons and in various Wikipedia articles. The intention is instead to help patrollers work with a specific type of vandalism.

Diagram:

nsfw.png (614×799 px, 45 KB)

Related Objects

Mentioned In
rMSNS mediawiki-services-open-nsfw
T371035: Archive Gerrit repo mediawiki/services/open-nsfw
T336682: operations/docker-images/production-images contains references to non-existent image python3
T312792: Gather documentation about past efforts / approaches / community concerns on NSFW detection
T279416: Deploy Image content filtration model for Wikimedia Commons
T264049: Model Development for NSFW Classifier
T264045: Not Safe for Work (NSFW) media Classifier for Wikimedia Commons
T250110: New Service Request 'open_nsfw'
T247891: Vandalism on Structured Data tasks
T247614: Proposal (GSoC 2020): Implement an NSFW image classifier with open_nsfw
T198550: Allow hiding certain (NSFW etc) images by default and letting users explicitly expand them
T227347: Create a diagram of the machine vision middleware architecture
T225664: Update open_nsfw-- for Wikimedia production deployment
T224751: Include NSFW likelihood scores or filter out images classified as likely NSFW from image caption edit suggestions
T223131: Translate image captions in “Suggested edits”
Mentioned Here
T264045: Not Safe for Work (NSFW) media Classifier for Wikimedia Commons
T258524: Design and plan Outreachy round 21 with a focus on data science and engineering projects
T250110: New Service Request 'open_nsfw'
T225664: Update open_nsfw-- for Wikimedia production deployment
T96384: Integrate file revisions with description page history
T28741: Migrate file tables to a modern layout (image/oldimage; file/file_revision; add primary keys)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Scoring revisions of the file page instead of revisions of the image is a somewhat dirty hack but doesn't really have disadvantages other than maybe performance (there are more page revisions than file revisions) since a new file revision is created every time a file is uploaded or reuploaded. Some care would have to be taken in how changes are detected (sometimes the revisions are null revisions; sometimes they aren't sent to recentchanges) but other than that, it's viable, and probably saves work that isn't necessary long term as eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

Interesting. Is there a Phab task or wiki page describing this?

T28741 is about having sane keys and revision-like structure (revisions are currently in the revision table, keyed by an autoincremented ID, and moved to archive on page deletion, and flagged on revision deletion; for images the current version is in the image table, moved to oldimage when a new version is uploaded, and to filearchive when the image or a revision of it gets deleted, which results in a lot of cross-table copying; and they are keyed by upload timestamp, resulting various hacks to deal with two images being uploaded at the same time - you'll find a sleep(1) somewhere in the codebase as a last-ditch defense). The MCR task is T96384: Integrate file revisions with description page history. It would be a huge refactoring (and even MCR is not finished yet) so not something to expect soon, though.

open_nsfw is basically a (publicly available) snapshot of a neural network, not something we'd train ourselves, right? Which means that while it's definitely a good stopgap measure, a clever attacker could probably easily defeat it using an adversarial network.

...a clever attacker could probably easily defeat it using an adversarial network.

I think that's probably true, however, we are more likely trying to stop the drive-by vandal than a serious coordinated attack. If we could reduce by 10% the amount of inappropriate images admins had to deal with, we'd be saving hundreds of hours of people's time.

The technical merits but have flaws but the real-world application could have great benefits.

Mholloway moved this task from Epics to Tracking on the MachineVision board.

With all that said, I'm interested in trying to cram this into volunteer time and will give you updates when I manage to make some progress.

@Halfak would now be a good time to bring this back up?

After going through the links mentioned in the task description, I have been able to run a Docker instance of the model and receive output on a few test images. The results are satisfactory as well and the memory footprint seems to be manageable considering my system is not very high end.

I think this can be implemented successfully. I am claiming this task and will be working on it with @Halfak to start work on this.

In T225664 @Mholloway developed an API implementation of open_nsfw based on open_nsfw--. Utilizing this implementation hosted at nsfw.wmflabs.org, I wrote a python script to benchmark the processing speed of the model.

from urllib.request import urlopen
from bs4 import BeautifulSoup

import requests
import re
import time

html = urlopen('https://commons.wikimedia.org/w/index.php?title=Special:NewFiles&offset=&limit=500&user=&mediatype%5B0%5D=BITMAP&mediatype%5B1%5D=ARCHIVE&mediatype%5B2%5D=DRAWING&mediatype%5B3%5D=MULTIMEDIA&mediatype%5B4%5D=UNKNOWN&start=&end=&wpFormIdentifier=specialnewimages')

bs = BeautifulSoup(html, 'html.parser')
timeSum = float()
count = int()

images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images:
    url = image['src']
    start = time.time()
    req = requests.post("https://nsfw.wmflabs.org", data={"url":url})
    # print((req.text))
    timeSum += time.time() - start
    count += 1

avgTime = timeSum/count
print(timeSum)
print(avgTime)
print(count)

Output:

801.2340695913221
1.804581237815277
444

This script scrapes the commons site for the 500 latest images uploaded and passes their URL to the model hosted on labs and times the whole thing.

The output shows 444 images processed which may be because of the fact that the model is not able to process very large images. This is a potential complexity which will need to be addressed in the task.

On average it takes about 1.8s per image which is good performance generally speaking but may cause a roadblock if we want upload time classification.

I also ran the API locally to test compute and memory usage and the model is very efficient in both these regards.


TL;DR

The model is already hosted at nsfw.wmflabs.org and serves API requests that post image URLs.
On testing the model with 500 latest images uploaded to commons, model successfully processed 444 images.

Issues:

  1. Why aren't all images processed?
  2. How to ensure all images are processed?
  3. If we cannot make the model faster, how do we achieve upload time classification?
  4. Should we just score images when they are being added to a wiki and ask the user to wait for the scoring to complete before confirming the edit?

@MusikAnimal Some follow up questions after the preliminary work I have done on the task:

  1. What is the maximum threshold of time we can wait for the model to process the image to allow upload time classification?
  1. Are we open to custom implementations of a Deep Neural Net that can be trained in wmflabs and then deployed into production?
  1. Can you shed some light on how I could begin integrating nsfw.wmflabs.org instance to track RC and alert AbuseFilter?

Would really get the task moving. Thank you so much!

Love to all for all this hardworking activities ❤❤❤❤❤❤❤💓💓💓💓💓💓💓💕💕💕💕💖💖💖💖💖🌍🌎🌐🗺🌞🌞🌞🌞✨✨✨✨💡💡💡💡🕯🕯🕯🕯✨✨✨

After speaking with @Daimona I have been informed that we require a MediaWiki extension if we wish to communicate with AbuseFilter. Is it possible to use the MachineVision extension for this or should I write a custom extension for this task?

Please correct me if I have made any wrong assumptions

Thank you

  1. What is the maximum threshold of time we can wait for the model to process the image to allow upload time classification?

If you are scoring at upload time, I think you could do it in a deferred update so that it doesn't block giving a response to the user. So the scoring just sort of happens in the background and won't slow anything down for the user. This does mean it's possible the score won't immediately be available as soon as the user adds the image to an article, but in theory we're only talking about a 1-2 second window, which I think is acceptable. Hopefully that answers the question.

  1. Are we open to custom implementations of a Deep Neural Net that can be trained in wmflabs and then deployed into production?

Sure! I don't know how much work that involves, but I've no opposition. All I can say is open_nsfw works like a charm out of the box.

  1. Can you shed some light on how I could begin integrating nsfw.wmflabs.org instance to track RC and alert AbuseFilter?

There's no "alerting" AbuseFilter. Rather, when an edit is made that adds an image, AbuseFilter would look up the score for it, and be able to disallow the edit if it matches some threshold. Basically for this project, I don't think we need to be concerned with AbuseFilter, rather just making it possible for AbuseFilter to use the data. The solution there (as I understand it) is to simply store the image scores at upload time in a database table. Later, AbuseFilter can be updated to fetch these scores.

I think you could do it in a deferred update so that it doesn't block giving a response to the user.

Could you please elaborate on how I would go about doing that?

All I can say is open_nsfw works like a charm out of the box.

If that is the case, I surely don't mind going forward with it, as that would significantly reduce the scope and time of this task.

The solution there (as I understand it) is to simply store the image scores at upload time in a database table. Later, AbuseFilter can be updated to fetch these scores.

So at this point, the scope of this task would only be to implement the scoring mechanism and the DB to store the given scores, if I understand correctly?

Thank you so much for the clarifications!

I also had some additional questions after doing some more work on the task:

  1. Since production cannot rely on services that are hosted on wmflabs.org, do you need me to create a phabricator task to bring the API implementation to production first?
  2. After my discussion with @Daimona , I understand that allowing AbuseFilter to retrieve the scores in the DB is a nontrivial task. Do you want me to file a phabricator task for that as well?

Thank you!

Since production cannot rely on services that are hosted on wmflabs.org, do you need me to create a phabricator task to bring the API implementation to production first?

That is correct. You have to be prepared though, the standards for services deployed within production are significantly higher than on tool forge thus will probably require a bunch of work in polishing your service. Anyway, you'd need to file a ticket following the process described here - don't bother with the rest of that page, it's for node.js services, so it probably wouldn't apply to yours.

As for integration with MediaWiki - I strongly suggest talking with @Mholloway and trying to integrate your new classifier into the MachineVision extension - what you're going to do seems like a very good fit there.

Thank you so much for your review @Pchelolo,

I will currently wait for some approval from @MusikAnimal and @Mholloway before filing the ticket. Anything else you might want to add or should I proceed with filing the ticket?

The new service request has been filed (T250110) and if you have any input on it please feel free to comment on the task.

FYI that there is some talk about WMF deploying a NSFW classifier for commons on the year 2020-2021

Ahem, i think it got removed from the planning cause i no longer see it

Hello everyone.
I wanted to propose an image moderation project for Outreachy round 21 T258524. I had a chat with @Chtnnh and he pointed me to this project. Since the theme for this round is 'data science and engineering', I believe this would be an ideal project for the internship. We would like to build upon this implementation. We have penned down few tasks which can be covered in the scope of the internship.

Tasks:

  • Model performance benchmarking
  • Dockerizing the project
  • Video processing module for directly processing videos, performing frame segmentation and the passing these frames to the model.
  • Use a different API framework

In addition to these, we also plan on incorporating the issues mentioned above.

We would love to know what everyone here feel about this. 😃

Hi all! @Chtnnh and @Abbasidaniyal are planning to propose this project in Outreachy Round 21 and are interested in being a mentor. If you could share any concerns you may have in moving forward with this idea, please share.

@calbon Any reason we removing the Machine Learning Platform (Research) tag from this task?

Might be in scope for Machine-Learning-Team instead (but that's a team project, so it's up to that team)?

@Aklapper the ML team (cc @calbon) will be working on the platform to deploy ML in the near term not specific projects. My 2 cents here is that for an outreachy project is probably of interest to work on the model on the cloud environment but I would not spend energy coming up with a way to deploy it to production/dockerization.

@Aklapper @Chtnnh @Nuria For the next 9 months the team will be focused on building a model training, deployment, and management infrastructure. We can deploy models other folks create but we don't have the bandwidth to tackle this ourselves.

Hi!
As @Nuria suggested, @Chtnnh and I decided to go forward with working on the model. We have submitted a proposal on the Outreachy website as well. The proposal submitted is as follows:

Project title
Not Safe for Work (NSFW) media Classifier for Wikimedia Commons

Project Description
Wikimedia Commons is an online repository of free-use images, sounds, other media, and JSON files. Anyone can upload media to the Commons portal. The uploads are moderated by members and volunteers of the foundation manually. This project aims to build a classifier that can flag NSFW images/media for review.

Upon successful completion of this internship, the intern would have designed, implemented and tested a machine learning model that would be able to classify image and video media as SFW or NSFW with a high accuracy. They would also be given the chance to deploy the model to WIkimedia test and production servers. Further, they would build a data processing pipeline and an API for the model.

How to make contributions / Selection criteria
Since this is a scratch project, applicants are required to do some research initially. A basic comparison of the existing NSFW classifiers along with their computational requirements is required. All applicants are expected to read various research papers, and draw comparisons between them.
They are expected to come up with a report detailing their research, the various options that can be used to implement the model and what they are proposing to do if they are selected. This report should also detail implementational methods and procedures.

Project Goals

  • Model Development
  • Model performance benchmarking
  • Video Processing Module
  • API Setup

Skills
Tensorflow / PyTorch for model creation
Flask / Django / FastAPI for the API

Do let us know if you have any suggestions or improvement to this proposal. Thanks! 😄

Two issues to watch out for in this project. The first is that if you are going to flag images as NSFW you need a module that works equally across all skin colours. I know this should be assumed, but there are some big organisations whose flesh colour definition was built from a non global sample and they wound up unintentionally with a racist product.

The second is your definition of NSFW - this is culturally variable and will be very different around the world. Either you need to be able to set this to different settings of NSFW, which could be complex, or you need to decide on which particular culture's NSFW setting you are working to, which of course does not fit with community values and WMF strategy.

What community standards are going to be used in some sort of scoring system for this? What is considered "safe for work" easily varies widely throughout the world.

Regarding concerns about the definition of "NSFW" being culturally dependent: from my interpretation of the task description, it seems "NSFW" here means "image with a tendency to be added to unrelated articles by vandals for shock value", which seems both culture-independent and easier to calculate.

It is however not what is commonly understood as "NSFW". If you take a word that means A and use it to name concept B you invariably end up with people assuming that it is about A.

True. I propose a rename to "Machine learning filter for shock image vandalism" or something like that. There are probably better terms than "shock image", too, but at least I would like to get rid of the term "NSFW". I'm not gonna unilaterally do it, since there's an Outreachy task for this, so it'll involve renaming lots of stuff or at least lots of confusion.

Agree with others above that using different words to describe a NSFW classifier, is still a NSFW classifier.

The consensus on Wikimedia Commons is that not only is this not needed, but it would also be unhelpfully divisive and lead to feeding anti-nudity and anti-sexuality warriors that are frequently a source of disruptive vandalism. Contributors here may enjoy addressing the points raised at the Commons Village Pump discussion for this task, by responding on Wikimedia Commons.

If anyone would care more formally to assess the Commons consensus before investing scarce resources into this controversial task, then COM:VPC is the right place to do it.

Chtnnh added a subscriber: Harshineesriram.

@Harshineesriram reassigning to you

@Fae We hear your concerns and agree that this discussion could have been more proactive from our end. To resolve this, we have extended the deadline of the Outreachy internship and are going to discuss the possibilities and use cases for such a model in all relevant forums, including the ones you have mentioned. We are also open to going back to our definition of what the model should do and how it does it if needed.

We hope you can understand that we are trying to capture the meaning that the community wants us to and find a solution to a problem (vandalism) that exists within Commons.

Thank you for the pointers, we will take them into account and work towards rectifying our misgivings.

In addition, we also hope you can join us in capturing the meaning that the community wants us to by guiding us through the process of reaching out and discussing this with the Commons community. Your help would be highly appreciated.

Thank you @Xaosflux for your question. We are trying to capture the general consensus of the Wikimedia commons community on the matter of image vandalism and plan to be wiki specific in the future. Hope that answers your question.

Thank you @APerson for the suggestion. We are considering it and will keep you informed of the changes we make to the title of the project. Rest assured, it will not contain the term NSFW.

@JEumerus You make a valid point and I hope it has been cleared up in the above statement.

Any further pointers/concerns/suggestions from any of you are more than welcome.

The Foundation really shouldn't be assigning intern/outreach tasks that impact the community, when there is a known-or-likely risk of the project receiving hostility or rejection from the community.
Note that the Commons discussion received UNANIMOUSLY negative community reception, with the only positive comments coming from a WMF staff member trying to defend the project.

This project, narrowly construed as non-filtering and offering only advisory information, might in theory be tolerable to the community. However the community never asked for this, and there are concerns that this will ultimately be harmful to our work. If anyone wants to open further discussion with the community, that's swell. But work on this project clearly should not proceed when the community response has been negative. Especially not when the response has been unanimously negative.

https://github.com/infinitered/nsfwjs might be interesting for implementation too (it can be run on Node)

@Harshineesriram: Per emails from Sep18 and Oct20 and https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup , I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action...Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!