Page MenuHomePhabricator

Implement NSFW image classifier using Open NSFW
Open, HighPublic

Description

Library: https://github.com/rahiel/open_nsfw--

It uses python and some weird "coffee" serialization format. We could probably include this in ORES without too much trouble depending on what memory usage looks like.

Intended use: Flag edits to Wikipedia articles that add NSFW images/media for review.
Strategy:

  1. See if ORES can even handle this.
  2. Try hosting a model in labs.
  3. Look into tools that might help reviewers who are trying to catch vandalism that follows this pattern.

Background:

Some vandalism comes in the form of edits that inappropriately add NSFW images to articles. The clear intention behind this type of vandalism is to shock an innocent reader. By flagging edits that add NSFW for secondary review, we can help patrollers track these types of edits.

For clarity, there is no plan to use this to filter images from articles for which they are appropriate. Many NSFW images are welcome on commons and in various Wikipedia articles. The intention is instead to help patrollers work with a specific type of vandalism.

Diagram:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2019, 8:36 PM

Given previous controversy around this sort of thing, we should be extremely careful about how such a classifier gets used.

#wikimedia-ai logs:
2019-01-18 21:31:50 <halfak> harej, we have a specific request. It's not about removing NSFW content from commons, but flagging when NSFW content is added to an article -- for review.
2019-01-18 21:32:17 <harej> Ooh, that's interesting. Do you know more about the use case?
2019-01-18 21:32:57 <halfak> I'll CC you on the thread :)

It might be a better idea to have the details come here instead of a private email thread.

Uh. That's IRC and this is a new task :) Thank you for copying them though.

Yes. I believe that when opening a task around a subject that may involve controversy, it can be helpful to provide any context showing how this is not the same thing as was argued over last time, and that transparency is highly valued around here :)

lcawte added a subscriber: lcawte.Jan 18 2019, 10:32 PM
Halfak updated the task description. (Show Details)Jan 18 2019, 10:36 PM
Halfak updated the task description. (Show Details)Jan 18 2019, 10:44 PM
Halfak updated the task description. (Show Details)

I added some details to the task description that should at least make the intention clear. This task is still very much in our "backlog" and it's sitting in the "research & analysis" column because we're not sure what to do with it yet. Still I made it clear in the description what we are considering and not considering using this model for.

Halfak changed the visibility from "Public (No Login Required)" to "Custom Policy".Jan 22 2019, 3:14 PM

Why has an anti-vandalism ticket like this become private? It appears to be even further than standard security, this is actually #acl*security_team instead of Security ?

I should probably have just done Security. I'll change that.

But I made this private because @MusikAnimal was worried about talking publicly about the vandalism attack strategies that are getting more and more clever.

Halfak changed the visibility from "Custom Policy" to "Custom Policy".Jan 22 2019, 4:33 PM

Given the number of random members of the public that've emailed OTRS about such attacks I don't think it's a particularly well-kept secret at this point.

Given the number of random members of the public that've emailed OTRS about such attacks I don't think it's a particularly well-kept secret at this point.

Indeed, the problem we are facing is by no means a secret. It even made the news! https://www.theverge.com/tldr/2018/11/22/18108195/apple-siri-iphone-donald-trump-penis-wikipedia-fail-vandalism-editing

What we're doing to stop it is a different matter. In the email chain with Halfak, I was illustrating the M.O. of the vandal(s) and our current, sub-par strategies to identify them, as I felt this was relevant to how we might implement our image classifier. The CheckUsers/admins have been pretty diligent of keeping this information away from public view, especially details around how our AbuseFilters work. If they are private elsewhere I figure they should be here too.

The concept of a NSFW image classifier of course doesn't need to be private. If you want I can continue to use email for the sensitive information, and keep this task open? I've no strong feelings, I just don't want to tip the vandals off! :)

Halfak changed the visibility from "Custom Policy" to "Custom Policy".Jan 22 2019, 5:41 PM

As a first step, I'm happy to help with creating a Toolforge tool that provides an API to get a NSFW score on-demand, using the existing open_nsfw library (though I've no experience with Python). This should be fine as it would allow for automation to take action just after the edit is made. It's worth noting however that depending on the extent of the vandalism and the popularity of the tool, there could be a LOT of requests. On English Wikipedia, currently my plan would be to have a bot monitor for specific AbuseFilters to be tripped (e.g. new user adding an image), and then query to see if it is NSFW. This would make the request rate fairly low, but obviously other wikis may be interested too, and the vandalism could become a more widespread trend, etc.

As for ORES, the same applies: if we have an API to get the score for just the NSFW model, we are in good shape :)

Longer-term, it would be awesome to stop this vandalism preemptively. I mentioned over email that it may be better to introduce a new system to score the images themselves (mapped to the MediaWiki image table), as they are uploaded/updated, rather than revisions that add the images. I think this would make it possible to create an AbuseFilter variable, such that we could disallow addition of the image if needed. I assume it would also speed up the ORES processing, since you wouldn't need to re-score the images with each edit. But, that sounds like a lot more work -- not just for ORES, but also AbuseFilter, so maybe let's side-step that idea for now.

What we're doing to stop it is a different matter. In the email chain with Halfak, I was illustrating the M.O. of the vandal(s) and our current, sub-par strategies to identify them, as I felt this was relevant to how we might implement our image classifier. The CheckUsers/admins have been pretty diligent of keeping this information away from public view, especially details around how our AbuseFilters work. If they are private elsewhere I figure they should be here too.

The concept of a NSFW image classifier of course doesn't need to be private. If you want I can continue to use email for the sensitive information, and keep this task open? I've no strong feelings, I just don't want to tip the vandals off! :)

While CheckUsers may decide to keep information private, that isn't how Wikimedia software development works. Please see https://www.mediawiki.org/wiki/Technical_Collaboration_Guidance/Principles:

  1. Transparent and responsible - Development of a product should be in the open, with public feedback loops whenever possible. Decisions should be fully accounted for, and should be clearly explained and well-documented.

There is a provision for private planning, but that's limited to "early technical specifications". Whether the current work falls under that early stuff, I can't really tell. I do expect that once it passes that stage this task will be made public, and decisions that go into whatever software we run are publicized, just like the code itself.

MusikAnimal added a comment.EditedJan 23 2019, 8:10 PM

While CheckUsers may decide to keep information private, that isn't how Wikimedia software development works. Please see https://www.mediawiki.org/wiki/Technical_Collaboration_Guidance/Principles:

  1. Transparent and responsible - Development of a product should be in the open, with public feedback loops whenever possible. Decisions should be fully accounted for, and should be clearly explained and well-documented.

There is a provision for private planning, but that's limited to "early technical specifications". Whether the current work falls under that early stuff, I can't really tell. I do expect that once it passes that stage this task will be made public, and decisions that go into whatever software we run are publicized, just like the code itself.

Yes of course :) Sorry to confuse you all. What I said over email included mere examples, that sort of detailed how the vandal could evade our current mitigation tactics. What needed to be said for planning (from my perspective) is at T214201#4899971. The bot will still need to go through a public request for approval, the API publicly exposed, etc., no secrets here. Feel free to make this task public :)

Barring objection, I think I'll take a stab at creating a Toolforge tool for https://github.com/rahiel/open_nsfw-- hopefully in the coming days. I will likely enlist some Python experts to help me.

Legoktm changed the visibility from "Custom Policy" to "Public (No Login Required)".Jan 23 2019, 8:19 PM

Feel free to make this task public :)

Thanks, and done.

Miriam added a subscriber: Miriam.Jan 24 2019, 1:48 AM
JJMC89 added a subscriber: JJMC89.Feb 3 2019, 8:18 PM
aezell added a subscriber: aezell.Mar 14 2019, 1:26 PM
MusikAnimal added a comment.EditedMar 21 2019, 9:14 PM

The saga continues.

@Halfak and other kind, smart people... I'm pleading that we get this show on the road. I have set up the open_nsfw-- app on VPS at http://nsfw.wmflabs.org but it is not working properly. Installing Caffe seemed really complicated, so I instead tried to use the Docker container to build the environment and route traffic to it. I believe that's the only issue here -- exposing the app within Docker to incoming traffic. If you think you can help with this, let me know and I will add you as a maintainer.

The web app is great as sort of a trial, and I could author a bot task to make use of it, too. However we really need these NSFW scores in ORES, and in a dedicated table that relates to image. Revision-based scoring I don't think is that useful here, as we can't be preemptive about stopping the vandalism. Sometimes every second counts. If we score images as they are uploaded, we should be able to create an AbuseFilter variable that gets the scores of all images added with an edit. With that we can use various heuristics to tell if the image is appropriate for the article it's being added to, and if not it can disallow the edit altogether. That would be absolutely amazing... :) Also I assume going by images rather than revisions would be better in terms of storage, since there are a lot more revisions than images.

I understand this will probably have to be a volunteer effort. I want you to know I'm here to help every step of the way. I just need some guidance. If you think adding a separate ORES table for image scores is a good idea, I can probably handle creating the schema, doing tests, etc., and chaperoning it through DBA approval. Integrating the NSFW classifier into ORES is beyond me, though.

Myself and a long list of volunteers have been battling this vandalism, every day, for almost a year now. This is not limited to the person behind T210192. It has grown into a general trend that's impossible to predict and adequately control. Our existing Abuse Filters are only going so far, regularly see false positives, and still involve a significant amount of manual intervention. We need machine learning 🙏🙏🙏

Sorry to leave you hanging @MusikAnimal, the Scoring-platform-team is currently understaffed and doing our best to stay on top of our current obligations. We'd love to pick up this work, but we are struggling and likely won't be able to start work on it this quarter.

Normally, we might at least start experimenting with a model we can't commit to in a given quarter, but this is a different type of model than we generally work with. Regretfully, we don't have any specific facilities to *implement* an image classifier so it would require some refactoring to get the system to work at all.

With all that said, I'm interested in trying to cram this into volunteer time and will give you updates when I manage to make some progress.

Harej triaged this task as High priority.Apr 9 2019, 9:07 PM

[... ]it may be better to introduce a new system to score the images themselves [...] as they are uploaded/updated, rather than revisions that add the images. I think this would make it possible to create an AbuseFilter variable, such that we could disallow addition of the image if needed. I assume it would also speed up the ORES processing, since you wouldn't need to re-score the images with each edit.

Revision-based scoring I don't think is that useful here, as we can't be preemptive about stopping the vandalism. Sometimes every second counts. If we score images as they are uploaded, we should be able to create an AbuseFilter variable that gets the scores of all images added with an edit. With that we can use various heuristics to tell if the image is appropriate for the article it's being added to, and if not it can disallow the edit altogether. That would be absolutely amazing... :) Also I assume going by images rather than revisions would be better in terms of storage, since there are a lot more revisions than images.

I strongly agree with this thinking. Scoring included images for NSFW characteristics on wiki page revision might make sense as a sort of bridge while we're getting the existing corpus of images scored, but we should really be doing this scoring at upload time.

Mholloway moved this task from Backlog to NSFW scoring on the MachineVision board.Jul 5 2019, 4:19 PM
Mholloway updated the task description. (Show Details)Jul 8 2019, 9:11 PM
Tgr added a subscriber: Tgr.Jul 17 2019, 3:06 PM

Scoring revisions of the file page instead of revisions of the image is a somewhat dirty hack but doesn't really have disadvantages other than maybe performance (there are more page revisions than file revisions) since a new file revision is created every time a file is uploaded or reuploaded. Some care would have to be taken in how changes are detected (sometimes the revisions are null revisions; sometimes they aren't sent to recentchanges) but other than that, it's viable, and probably saves work that isn't necessary long term as eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

Interesting. Is there a Phab task or wiki page describing this?

Tgr added a comment.Jul 18 2019, 5:17 PM

T28741 is about having sane keys and revision-like structure (revisions are currently in the revision table, keyed by an autoincremented ID, and moved to archive on page deletion, and flagged on revision deletion; for images the current version is in the image table, moved to oldimage when a new version is uploaded, and to filearchive when the image or a revision of it gets deleted, which results in a lot of cross-table copying; and they are keyed by upload timestamp, resulting various hacks to deal with two images being uploaded at the same time - you'll find a sleep(1) somewhere in the codebase as a last-ditch defense). The MCR task is T96384: Integrate file revisions with description page history. It would be a huge refactoring (and even MCR is not finished yet) so not something to expect soon, though.

Tgr added a comment.Jul 19 2019, 10:40 AM

open_nsfw is basically a (publicly available) snapshot of a neural network, not something we'd train ourselves, right? Which means that while it's definitely a good stopgap measure, a clever attacker could probably easily defeat it using an adversarial network.

...a clever attacker could probably easily defeat it using an adversarial network.

I think that's probably true, however, we are more likely trying to stop the drive-by vandal than a serious coordinated attack. If we could reduce by 10% the amount of inappropriate images admins had to deal with, we'd be saving hundreds of hours of people's time.

The technical merits but have flaws but the real-world application could have great benefits.

Mholloway moved this task from NSFW scoring to Backlog on the MachineVision board.Aug 1 2019, 8:24 PM
Mholloway moved this task from Backlog to Epics on the MachineVision board.Aug 1 2019, 9:30 PM
Mholloway moved this task from Epics to Tracking on the MachineVision board.
Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptFeb 6 2020, 11:48 AM
Chtnnh added a subscriber: Chtnnh.Mar 11 2020, 8:59 PM

With all that said, I'm interested in trying to cram this into volunteer time and will give you updates when I manage to make some progress.

@Halfak would now be a good time to bring this back up?

Chtnnh claimed this task.Mar 19 2020, 8:59 PM

After going through the links mentioned in the task description, I have been able to run a Docker instance of the model and receive output on a few test images. The results are satisfactory as well and the memory footprint seems to be manageable considering my system is not very high end.

I think this can be implemented successfully. I am claiming this task and will be working on it with @Halfak to start work on this.

In T225664 @Mholloway developed an API implementation of open_nsfw based on open_nsfw--. Utilizing this implementation hosted at nsfw.wmflabs.org, I wrote a python script to benchmark the processing speed of the model.

from urllib.request import urlopen
from bs4 import BeautifulSoup

import requests
import re
import time

html = urlopen('https://commons.wikimedia.org/w/index.php?title=Special:NewFiles&offset=&limit=500&user=&mediatype%5B0%5D=BITMAP&mediatype%5B1%5D=ARCHIVE&mediatype%5B2%5D=DRAWING&mediatype%5B3%5D=MULTIMEDIA&mediatype%5B4%5D=UNKNOWN&start=&end=&wpFormIdentifier=specialnewimages')

bs = BeautifulSoup(html, 'html.parser')
timeSum = float()
count = int()

images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images:
    url = image['src']
    start = time.time()
    req = requests.post("https://nsfw.wmflabs.org", data={"url":url})
    # print((req.text))
    timeSum += time.time() - start
    count += 1

avgTime = timeSum/count
print(timeSum)
print(avgTime)
print(count)

Output:

801.2340695913221
1.804581237815277
444

This script scrapes the commons site for the 500 latest images uploaded and passes their URL to the model hosted on labs and times the whole thing.

The output shows 444 images processed which may be because of the fact that the model is not able to process very large images. This is a potential complexity which will need to be addressed in the task.

On average it takes about 1.8s per image which is good performance generally speaking but may cause a roadblock if we want upload time classification.

I also ran the API locally to test compute and memory usage and the model is very efficient in both these regards.


TL;DR

The model is already hosted at nsfw.wmflabs.org and serves API requests that post image URLs.
On testing the model with 500 latest images uploaded to commons, model successfully processed 444 images.

Issues:

  1. Why aren't all images processed?
  2. How to ensure all images are processed?
  3. If we cannot make the model faster, how do we achieve upload time classification?
  4. Should we just score images when they are being added to a wiki and ask the user to wait for the scoring to complete before confirming the edit?
Chtnnh added a comment.Apr 3 2020, 1:50 PM

@MusikAnimal Some follow up questions after the preliminary work I have done on the task:

  1. What is the maximum threshold of time we can wait for the model to process the image to allow upload time classification?
  1. Are we open to custom implementations of a Deep Neural Net that can be trained in wmflabs and then deployed into production?
  1. Can you shed some light on how I could begin integrating nsfw.wmflabs.org instance to track RC and alert AbuseFilter?

Would really get the task moving. Thank you so much!

Love to all for all this hardworking activities ❤❤❤❤❤❤❤💓💓💓💓💓💓💓💕💕💕💕💖💖💖💖💖🌍🌎🌐🗺🌞🌞🌞🌞✨✨✨✨💡💡💡💡🕯🕯🕯🕯✨✨✨

Chtnnh added a comment.Apr 7 2020, 3:40 PM

After speaking with @Daimona I have been informed that we require a MediaWiki extension if we wish to communicate with AbuseFilter. Is it possible to use the MachineVision extension for this or should I write a custom extension for this task?

Please correct me if I have made any wrong assumptions

Thank you

  1. What is the maximum threshold of time we can wait for the model to process the image to allow upload time classification?

If you are scoring at upload time, I think you could do it in a deferred update so that it doesn't block giving a response to the user. So the scoring just sort of happens in the background and won't slow anything down for the user. This does mean it's possible the score won't immediately be available as soon as the user adds the image to an article, but in theory we're only talking about a 1-2 second window, which I think is acceptable. Hopefully that answers the question.

  1. Are we open to custom implementations of a Deep Neural Net that can be trained in wmflabs and then deployed into production?

Sure! I don't know how much work that involves, but I've no opposition. All I can say is open_nsfw works like a charm out of the box.

  1. Can you shed some light on how I could begin integrating nsfw.wmflabs.org instance to track RC and alert AbuseFilter?

There's no "alerting" AbuseFilter. Rather, when an edit is made that adds an image, AbuseFilter would look up the score for it, and be able to disallow the edit if it matches some threshold. Basically for this project, I don't think we need to be concerned with AbuseFilter, rather just making it possible for AbuseFilter to use the data. The solution there (as I understand it) is to simply store the image scores at upload time in a database table. Later, AbuseFilter can be updated to fetch these scores.

Chtnnh added a comment.Apr 7 2020, 7:43 PM

I think you could do it in a deferred update so that it doesn't block giving a response to the user.

Could you please elaborate on how I would go about doing that?

All I can say is open_nsfw works like a charm out of the box.

If that is the case, I surely don't mind going forward with it, as that would significantly reduce the scope and time of this task.

The solution there (as I understand it) is to simply store the image scores at upload time in a database table. Later, AbuseFilter can be updated to fetch these scores.

So at this point, the scope of this task would only be to implement the scoring mechanism and the DB to store the given scores, if I understand correctly?

Thank you so much for the clarifications!

I also had some additional questions after doing some more work on the task:

  1. Since production cannot rely on services that are hosted on wmflabs.org, do you need me to create a phabricator task to bring the API implementation to production first?
  2. After my discussion with @Daimona , I understand that allowing AbuseFilter to retrieve the scores in the DB is a nontrivial task. Do you want me to file a phabricator task for that as well?

Thank you!

He7d3r added a subscriber: He7d3r.Apr 7 2020, 8:28 PM

Since production cannot rely on services that are hosted on wmflabs.org, do you need me to create a phabricator task to bring the API implementation to production first?

That is correct. You have to be prepared though, the standards for services deployed within production are significantly higher than on tool forge thus will probably require a bunch of work in polishing your service. Anyway, you'd need to file a ticket following the process described here - don't bother with the rest of that page, it's for node.js services, so it probably wouldn't apply to yours.

As for integration with MediaWiki - I strongly suggest talking with @Mholloway and trying to integrate your new classifier into the MachineVision extension - what you're going to do seems like a very good fit there.

Thank you so much for your review @Pchelolo,

I will currently wait for some approval from @MusikAnimal and @Mholloway before filing the ticket. Anything else you might want to add or should I proceed with filing the ticket?

The new service request has been filed (T250110) and if you have any input on it please feel free to comment on the task.

Imtiaz added a subscriber: Imtiaz.Apr 30 2020, 11:44 AM
SHEKH added a subscriber: SHEKH.Sun, May 3, 8:16 AM