Page MenuHomePhabricator

Implement NSFW image classifier using Open NSFW
Open, HighPublic

Description

Library: https://github.com/rahiel/open_nsfw--

It uses python and some weird "coffee" serialization format. We could probably include this in ORES without too much trouble depending on what memory usage looks like.

Intended use: Flag edits to Wikipedia articles that add NSFW images/media for review.
Strategy:

  1. See if ORES can even handle this.
  2. Try hosting a model in labs.
  3. Look into tools that might help reviewers who are trying to catch vandalism that follows this pattern.

Background:

Some vandalism comes in the form of edits that inappropriately add NSFW images to articles. The clear intention behind this type of vandalism is to shock an innocent reader. By flagging edits that add NSFW for secondary review, we can help patrollers track these types of edits.

For clarity, there is no plan to use this to filter images from articles for which they are appropriate. Many NSFW images are welcome on commons and in various Wikipedia articles. The intention is instead to help patrollers work with a specific type of vandalism.

Diagram:

Event Timeline

Halfak created this task.Jan 18 2019, 8:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2019, 8:36 PM

Given previous controversy around this sort of thing, we should be extremely careful about how such a classifier gets used.

#wikimedia-ai logs:
2019-01-18 21:31:50 <halfak> harej, we have a specific request. It's not about removing NSFW content from commons, but flagging when NSFW content is added to an article -- for review.
2019-01-18 21:32:17 <harej> Ooh, that's interesting. Do you know more about the use case?
2019-01-18 21:32:57 <halfak> I'll CC you on the thread :)

It might be a better idea to have the details come here instead of a private email thread.

Uh. That's IRC and this is a new task :) Thank you for copying them though.

Yes. I believe that when opening a task around a subject that may involve controversy, it can be helpful to provide any context showing how this is not the same thing as was argued over last time, and that transparency is highly valued around here :)

lcawte added a subscriber: lcawte.Jan 18 2019, 10:32 PM
Halfak updated the task description. (Show Details)Jan 18 2019, 10:36 PM
Halfak updated the task description. (Show Details)Jan 18 2019, 10:44 PM
Halfak updated the task description. (Show Details)

I added some details to the task description that should at least make the intention clear. This task is still very much in our "backlog" and it's sitting in the "research & analysis" column because we're not sure what to do with it yet. Still I made it clear in the description what we are considering and not considering using this model for.

Halfak changed the visibility from "Public (No Login Required)" to "Custom Policy".Jan 22 2019, 3:14 PM

Why has an anti-vandalism ticket like this become private? It appears to be even further than standard security, this is actually #acl*security_team instead of Security ?

I should probably have just done Security. I'll change that.

But I made this private because @MusikAnimal was worried about talking publicly about the vandalism attack strategies that are getting more and more clever.

Halfak changed the visibility from "Custom Policy" to "Custom Policy".Jan 22 2019, 4:33 PM

Given the number of random members of the public that've emailed OTRS about such attacks I don't think it's a particularly well-kept secret at this point.

Given the number of random members of the public that've emailed OTRS about such attacks I don't think it's a particularly well-kept secret at this point.

Indeed, the problem we are facing is by no means a secret. It even made the news! https://www.theverge.com/tldr/2018/11/22/18108195/apple-siri-iphone-donald-trump-penis-wikipedia-fail-vandalism-editing

What we're doing to stop it is a different matter. In the email chain with Halfak, I was illustrating the M.O. of the vandal(s) and our current, sub-par strategies to identify them, as I felt this was relevant to how we might implement our image classifier. The CheckUsers/admins have been pretty diligent of keeping this information away from public view, especially details around how our AbuseFilters work. If they are private elsewhere I figure they should be here too.

The concept of a NSFW image classifier of course doesn't need to be private. If you want I can continue to use email for the sensitive information, and keep this task open? I've no strong feelings, I just don't want to tip the vandals off! :)

Halfak changed the visibility from "Custom Policy" to "Custom Policy".Jan 22 2019, 5:41 PM

As a first step, I'm happy to help with creating a Toolforge tool that provides an API to get a NSFW score on-demand, using the existing open_nsfw library (though I've no experience with Python). This should be fine as it would allow for automation to take action just after the edit is made. It's worth noting however that depending on the extent of the vandalism and the popularity of the tool, there could be a LOT of requests. On English Wikipedia, currently my plan would be to have a bot monitor for specific AbuseFilters to be tripped (e.g. new user adding an image), and then query to see if it is NSFW. This would make the request rate fairly low, but obviously other wikis may be interested too, and the vandalism could become a more widespread trend, etc.

As for ORES, the same applies: if we have an API to get the score for just the NSFW model, we are in good shape :)

Longer-term, it would be awesome to stop this vandalism preemptively. I mentioned over email that it may be better to introduce a new system to score the images themselves (mapped to the MediaWiki image table), as they are uploaded/updated, rather than revisions that add the images. I think this would make it possible to create an AbuseFilter variable, such that we could disallow addition of the image if needed. I assume it would also speed up the ORES processing, since you wouldn't need to re-score the images with each edit. But, that sounds like a lot more work -- not just for ORES, but also AbuseFilter, so maybe let's side-step that idea for now.

What we're doing to stop it is a different matter. In the email chain with Halfak, I was illustrating the M.O. of the vandal(s) and our current, sub-par strategies to identify them, as I felt this was relevant to how we might implement our image classifier. The CheckUsers/admins have been pretty diligent of keeping this information away from public view, especially details around how our AbuseFilters work. If they are private elsewhere I figure they should be here too.
The concept of a NSFW image classifier of course doesn't need to be private. If you want I can continue to use email for the sensitive information, and keep this task open? I've no strong feelings, I just don't want to tip the vandals off! :)

While CheckUsers may decide to keep information private, that isn't how Wikimedia software development works. Please see https://www.mediawiki.org/wiki/Technical_Collaboration_Guidance/Principles:

  1. Transparent and responsible - Development of a product should be in the open, with public feedback loops whenever possible. Decisions should be fully accounted for, and should be clearly explained and well-documented.

There is a provision for private planning, but that's limited to "early technical specifications". Whether the current work falls under that early stuff, I can't really tell. I do expect that once it passes that stage this task will be made public, and decisions that go into whatever software we run are publicized, just like the code itself.

MusikAnimal added a comment.EditedJan 23 2019, 8:10 PM

While CheckUsers may decide to keep information private, that isn't how Wikimedia software development works. Please see https://www.mediawiki.org/wiki/Technical_Collaboration_Guidance/Principles:

  1. Transparent and responsible - Development of a product should be in the open, with public feedback loops whenever possible. Decisions should be fully accounted for, and should be clearly explained and well-documented.

There is a provision for private planning, but that's limited to "early technical specifications". Whether the current work falls under that early stuff, I can't really tell. I do expect that once it passes that stage this task will be made public, and decisions that go into whatever software we run are publicized, just like the code itself.

Yes of course :) Sorry to confuse you all. What I said over email included mere examples, that sort of detailed how the vandal could evade our current mitigation tactics. What needed to be said for planning (from my perspective) is at T214201#4899971. The bot will still need to go through a public request for approval, the API publicly exposed, etc., no secrets here. Feel free to make this task public :)

Barring objection, I think I'll take a stab at creating a Toolforge tool for https://github.com/rahiel/open_nsfw-- hopefully in the coming days. I will likely enlist some Python experts to help me.

Legoktm changed the visibility from "Custom Policy" to "Public (No Login Required)".Jan 23 2019, 8:19 PM

Feel free to make this task public :)

Thanks, and done.

Miriam added a subscriber: Miriam.Jan 24 2019, 1:48 AM
JJMC89 added a subscriber: JJMC89.Feb 3 2019, 8:18 PM
aezell added a subscriber: aezell.Mar 14 2019, 1:26 PM
MusikAnimal added a comment.EditedMar 21 2019, 9:14 PM

The saga continues.

@Halfak and other kind, smart people... I'm pleading that we get this show on the road. I have set up the open_nsfw-- app on VPS at http://nsfw.wmflabs.org but it is not working properly. Installing Caffe seemed really complicated, so I instead tried to use the Docker container to build the environment and route traffic to it. I believe that's the only issue here -- exposing the app within Docker to incoming traffic. If you think you can help with this, let me know and I will add you as a maintainer.

The web app is great as sort of a trial, and I could author a bot task to make use of it, too. However we really need these NSFW scores in ORES, and in a dedicated table that relates to image. Revision-based scoring I don't think is that useful here, as we can't be preemptive about stopping the vandalism. Sometimes every second counts. If we score images as they are uploaded, we should be able to create an AbuseFilter variable that gets the scores of all images added with an edit. With that we can use various heuristics to tell if the image is appropriate for the article it's being added to, and if not it can disallow the edit altogether. That would be absolutely amazing... :) Also I assume going by images rather than revisions would be better in terms of storage, since there are a lot more revisions than images.

I understand this will probably have to be a volunteer effort. I want you to know I'm here to help every step of the way. I just need some guidance. If you think adding a separate ORES table for image scores is a good idea, I can probably handle creating the schema, doing tests, etc., and chaperoning it through DBA approval. Integrating the NSFW classifier into ORES is beyond me, though.

Myself and a long list of volunteers have been battling this vandalism, every day, for almost a year now. This is not limited to the person behind T210192. It has grown into a general trend that's impossible to predict and adequately control. Our existing Abuse Filters are only going so far, regularly see false positives, and still involve a significant amount of manual intervention. We need machine learning 🙏🙏🙏

Sorry to leave you hanging @MusikAnimal, the Scoring-platform-team is currently understaffed and doing our best to stay on top of our current obligations. We'd love to pick up this work, but we are struggling and likely won't be able to start work on it this quarter.

Normally, we might at least start experimenting with a model we can't commit to in a given quarter, but this is a different type of model than we generally work with. Regretfully, we don't have any specific facilities to *implement* an image classifier so it would require some refactoring to get the system to work at all.

With all that said, I'm interested in trying to cram this into volunteer time and will give you updates when I manage to make some progress.

Harej triaged this task as High priority.Apr 9 2019, 9:07 PM

[... ]it may be better to introduce a new system to score the images themselves [...] as they are uploaded/updated, rather than revisions that add the images. I think this would make it possible to create an AbuseFilter variable, such that we could disallow addition of the image if needed. I assume it would also speed up the ORES processing, since you wouldn't need to re-score the images with each edit.

Revision-based scoring I don't think is that useful here, as we can't be preemptive about stopping the vandalism. Sometimes every second counts. If we score images as they are uploaded, we should be able to create an AbuseFilter variable that gets the scores of all images added with an edit. With that we can use various heuristics to tell if the image is appropriate for the article it's being added to, and if not it can disallow the edit altogether. That would be absolutely amazing... :) Also I assume going by images rather than revisions would be better in terms of storage, since there are a lot more revisions than images.

I strongly agree with this thinking. Scoring included images for NSFW characteristics on wiki page revision might make sense as a sort of bridge while we're getting the existing corpus of images scored, but we should really be doing this scoring at upload time.

Mholloway moved this task from Backlog to NSFW scoring on the Machine vision board.Jul 5 2019, 4:19 PM
Mholloway updated the task description. (Show Details)Jul 8 2019, 9:11 PM
Tgr added a subscriber: Tgr.Jul 17 2019, 3:06 PM

Scoring revisions of the file page instead of revisions of the image is a somewhat dirty hack but doesn't really have disadvantages other than maybe performance (there are more page revisions than file revisions) since a new file revision is created every time a file is uploaded or reuploaded. Some care would have to be taken in how changes are detected (sometimes the revisions are null revisions; sometimes they aren't sent to recentchanges) but other than that, it's viable, and probably saves work that isn't necessary long term as eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

eventually the image table is supposed to be merged into the revision table, with images becoming MCR slots.

Interesting. Is there a Phab task or wiki page describing this?

Tgr added a comment.Jul 18 2019, 5:17 PM

T28741 is about having sane keys and revision-like structure (revisions are currently in the revision table, keyed by an autoincremented ID, and moved to archive on page deletion, and flagged on revision deletion; for images the current version is in the image table, moved to oldimage when a new version is uploaded, and to filearchive when the image or a revision of it gets deleted, which results in a lot of cross-table copying; and they are keyed by upload timestamp, resulting various hacks to deal with two images being uploaded at the same time - you'll find a sleep(1) somewhere in the codebase as a last-ditch defense). The MCR task is T96384: Integrate file revisions with description page history. It would be a huge refactoring (and even MCR is not finished yet) so not something to expect soon, though.

Tgr added a comment.Jul 19 2019, 10:40 AM

open_nsfw is basically a (publicly available) snapshot of a neural network, not something we'd train ourselves, right? Which means that while it's definitely a good stopgap measure, a clever attacker could probably easily defeat it using an adversarial network.

...a clever attacker could probably easily defeat it using an adversarial network.

I think that's probably true, however, we are more likely trying to stop the drive-by vandal than a serious coordinated attack. If we could reduce by 10% the amount of inappropriate images admins had to deal with, we'd be saving hundreds of hours of people's time.

The technical merits but have flaws but the real-world application could have great benefits.

Mholloway moved this task from NSFW scoring to Backlog on the Machine vision board.Aug 1 2019, 8:24 PM
Mholloway moved this task from Backlog to Epics on the Machine vision board.Aug 1 2019, 9:30 PM
Mholloway moved this task from Epics to Tracking on the Machine vision board.