community concerns on NSFW detection
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Isaac
	Jul 11 2022, 4:45 PM

Description

ML models to support tasks such as NSFW detection for imagery associated with the Wikimedia projects is not a new task / request nor is it a straightforward task (there are many different approaches that one might take). This task will cover the work to gather information around the following aspects and make it easily accessible (likely on meta) for this project or any future investigations:

Past attempts to build similar models
Concerns raised in the past about the potential models and how they might be applied
Approaches taken for this modeling on other platforms and issues that have arisen
Different potential taxonomies -- e.g., binary yes/no vs. more nuanced types of imagery

Related Objects

Mentioned Here: T225664: Update open_nsfw-- for Wikimedia production deployment
T198550: Allow hiding certain (NSFW etc) images by default and letting users explicitly expand them
T214201: Implement NSFW image classifier using Open NSFW
T250110: New Service Request 'open_nsfw'
T279416: Deploy Image content filtration model for Wikimedia Commons
T282712: Come up with a NSFW filter for images in NewcomersTasks

Event Timeline

Isaac created this task.Jul 11 2022, 4:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 11 2022, 4:45 PM

First pass on the history of this sort of modeling for Wikimedia:

Product requests:
- Come up with a NSFW filter for images in NewcomersTasks: T282712
  - High recall filter to not recommend editing tasks to new editors that might involve NSFW content (images or Wikipedia articles)
- Allow hiding certain (NSFW etc) images by default and letting users explicitly expand them: T198550
  - Request for a user preference to automatically blur NSFW imagery on articles / page previews
Modeling attempts:
- Open NSFW model API to be productionized for Community Tech: T225664
  - Planning notes: https://www.mediawiki.org/wiki/Wikimedia_Product/NSFW_image_classifier
- Implement NSFW image classifier using Open NSFW: T214201 and T279416
  - Outreachy (intern) project to deploy open-source NSFW classifier on ML Platform. Never was productionized because of deployment challenges (questions about how to incorporate with Mediawiki etc.): T250110
  - A model was created though and documented here: https://github.com/HarshineeSriram/Image-Content-Filtration
- Volunteer gadget: https://commons.wikimedia.org/wiki/MediaWiki:Gadget-NSFW.js
- Volunteer template: https://commons.wikimedia.org/wiki/Template:Nsfw
Community concerns / discussions:
- Discussion on Commons: https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2021/01#Not_Safe_for_Work_%28NSFW%29_media_Classifier_for_Wikimedia_Commons
  - Concerns about cultural variation in what NSFW is
  - Concerns about how the scores would be used – e.g., over-patrolling, auto reverts, etc.
- Variation in accuracy based on skin color: T214201#6712152

SNowick_WMF subscribed.Jul 14 2022, 7:44 PM

Adding a few more:

Survey from 2011 of community thoughts about image filters: https://meta.wikimedia.org/wiki/Image_filter_referendum
- Summary of thinking around implementation: https://meta.wikimedia.org/wiki/Image_filter_referendum/Next_steps and in particular concerns around technical misuse, social misuse, and the inability to do this neutrally.
- NOTE: much of this discussion is relevant to the idea of filtering content for end-users as opposed to e.g., a filter to support patrollers. Emphasizes the importance of clearly distinguishing the use-cases for these sorts of models.
Mediawiki bad image list (images are not rendered, just the link to the image): https://en.wikipedia.org/wiki/MediaWiki:Bad_image_list
Commons Policy on nudity: https://commons.wikimedia.org/wiki/Commons:Nudity and https://commons.wikimedia.org/wiki/Commons:What_Commons_is_not#Wikimedia_Commons_is_not_an_amateur_porn_site
Discussion of NSFW template on Commons that touches on folks concerns around censorship / cultural variance / etc.: https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Archive/2013/11/09#Template:Nsfw

I did a quick analysis of an existing tool on English Wikipedia that helps to flag potential NSFW imagery being added so that patrollers can quickly check the edits and respond. Summary based on three years of data:

total # images added that were flagged as possible NSFW: 4305
- average images per month: 124
- median images per month: 113
unique # images added that were flagged as possible NSFW: 2019
- average unique images per month: 71
- median unique images per month: 75
total # users adding images: 2302
- average users per month: 68
- median users per month: 66
based on a sample of 50 images and inspecting their titles (very coarse):
- % unique images that probably were reverted as NSFW: 46%
- % total images that probably were reverted as NSFW: 56%

Altogether, this supports that NSFW vandalism on English Wikipedia is a consistent issue facing patrollers and that a tool can help identify quite a bit of vandalism even though the range of images is quite big. Spot-checking a few images in the dataset indicated that they'd since been deleted for vandalism, so this also supports that a relatively static blocklist of image files is not necessarily sufficient and a tool that can evaluate arbitrary imagery has additional value in flagging potentially problematic edits. It also shows that these tools are not perfect and so definitely need an editor who an evaluate the specific image / context to decide how to handle each case.

Aklapper renamed this task from Gather documentation about past efforts / approaches / community concerns to Gather documentation about past efforts / approaches / community concerns on NSFW detection.Jul 15 2022, 7:25 PM

I'll attempt to summarize the community's general approach and some key concerns that have been raised.

First, "NSFW" should be replaced with "controversial content". This is the term used by the community. The "NSFW" issue is intertwined with a variety of other objectionable content, and it turns out that NSFW doesn't even top that list.(New York Times)

The key to the community's approach is that *we host educational content*. Our approach is to judge content on educational value and purpose. Our approach is that it is invalid and inappropriate to target content based on "NSFW" or other claims of offensiveness. Such content can only be disputed if the content lacks legitimate purpose, or if you are proposing and equal or better replacement to fulfill that same purpose.

The biggest problem the community faces, anywhere and in any way, is the disruptive stress and wasted time of unproductive argument. Wasted time directly subtracts from constructive volunteer labor elsewhere. Stress causes volunteers to burn out and quit - either temporarily or permanently. This underlies the concerns below:

The issue of content filtering and censorship has been a recurring and stressful time-sink. The issue has been repeatedly debated in depth and repeatedly resolved. It is especially stressful and disruptive due to frequent crusaders who continue to war beyond reasonable and accepted bounds for resolution.

One of the reasons the community has rejected filtering/rating is because it unavoidably raises endlessly disruptive arguments about rating criteria in general and about individual content rating. This is a matter of subjective cultural opinion. We want people writing new articles and improving existing articles, not wasting time with utterly unproductive, utterly hopeless, and grossly disruptive argument whether content is rated correctly.

A closely related concern, and a key concern here, is that *technical specifications are irrelevant* when the mere existence of a rating system results in disruptive social dynamics. A rating system that does not filter anything may, *in theory*, be acceptable. However the mere existence of a rating system will invariably result in an endlessly disruptive stream of people demanding that it "easily and obviously" be applies as a filter. The mere existence of a rating system is viewed as inherently disruptive to our work. This is why various experienced editors immediately reacted with derision or hostility when a non-filtering rating system was suggested on Commons. They know the social dynamics if such a system were built, and dread the pain and disruption it would bring.

Another prime concern, nearly unanimous in the 2011 image filter referendum, was cultural neutrality. This is already too long so I'll just say that the 2011 initiative was abandoned in part because the cultural neutrality requirement was impossible to satisfy. A machine-learning system imposing a rating standard is not remotely the same as culturally neutral. I think almost everyone in the machine learning field is acutely aware of that fact.

Assuming that this task is resolved. If not, please reopen and add an active project tag so this task can be found one some workboard. Thanks.

Gather documentation about past efforts / approaches / community concerns on NSFW detectionClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Gather documentation about past efforts / approaches / community concerns on NSFW detection
Closed, ResolvedPublic
Actions