Page MenuHomePhabricator

Gather documentation about past efforts / approaches / community concerns on NSFW detection
Closed, ResolvedPublic

Description

ML models to support tasks such as NSFW detection for imagery associated with the Wikimedia projects is not a new task / request nor is it a straightforward task (there are many different approaches that one might take). This task will cover the work to gather information around the following aspects and make it easily accessible (likely on meta) for this project or any future investigations:

  • Past attempts to build similar models
  • Concerns raised in the past about the potential models and how they might be applied
  • Approaches taken for this modeling on other platforms and issues that have arisen
  • Different potential taxonomies -- e.g., binary yes/no vs. more nuanced types of imagery

Event Timeline

First pass on the history of this sort of modeling for Wikimedia:

Adding a few more:

I did a quick analysis of an existing tool on English Wikipedia that helps to flag potential NSFW imagery being added so that patrollers can quickly check the edits and respond. Summary based on three years of data:

  • total # images added that were flagged as possible NSFW: 4305
    • average images per month: 124
    • median images per month: 113
  • unique # images added that were flagged as possible NSFW: 2019
    • average unique images per month: 71
    • median unique images per month: 75
  • total # users adding images: 2302
    • average users per month: 68
    • median users per month: 66
  • based on a sample of 50 images and inspecting their titles (very coarse):
    • % unique images that probably were reverted as NSFW: 46%
    • % total images that probably were reverted as NSFW: 56%

Altogether, this supports that NSFW vandalism on English Wikipedia is a consistent issue facing patrollers and that a tool can help identify quite a bit of vandalism even though the range of images is quite big. Spot-checking a few images in the dataset indicated that they'd since been deleted for vandalism, so this also supports that a relatively static blocklist of image files is not necessarily sufficient and a tool that can evaluate arbitrary imagery has additional value in flagging potentially problematic edits. It also shows that these tools are not perfect and so definitely need an editor who an evaluate the specific image / context to decide how to handle each case.

Aklapper renamed this task from Gather documentation about past efforts / approaches / community concerns to Gather documentation about past efforts / approaches / community concerns on NSFW detection.Jul 15 2022, 7:25 PM

I'll attempt to summarize the community's general approach and some key concerns that have been raised.

First, "NSFW" should be replaced with "controversial content". This is the term used by the community. The "NSFW" issue is intertwined with a variety of other objectionable content, and it turns out that NSFW doesn't even top that list.(New York Times)

The key to the community's approach is that *we host educational content*. Our approach is to judge content on educational value and purpose. Our approach is that it is invalid and inappropriate to target content based on "NSFW" or other claims of offensiveness. Such content can only be disputed if the content lacks legitimate purpose, or if you are proposing and equal or better replacement to fulfill that same purpose.

The biggest problem the community faces, anywhere and in any way, is the disruptive stress and wasted time of unproductive argument. Wasted time directly subtracts from constructive volunteer labor elsewhere. Stress causes volunteers to burn out and quit - either temporarily or permanently. This underlies the concerns below:

The issue of content filtering and censorship has been a recurring and stressful time-sink. The issue has been repeatedly debated in depth and repeatedly resolved. It is especially stressful and disruptive due to frequent crusaders who continue to war beyond reasonable and accepted bounds for resolution.

One of the reasons the community has rejected filtering/rating is because it unavoidably raises endlessly disruptive arguments about rating criteria in general and about individual content rating. This is a matter of subjective cultural opinion. We want people writing new articles and improving existing articles, not wasting time with utterly unproductive, utterly hopeless, and grossly disruptive argument whether content is rated correctly.

A closely related concern, and a key concern here, is that *technical specifications are irrelevant* when the mere existence of a rating system results in disruptive social dynamics. A rating system that does not filter anything may, *in theory*, be acceptable. However the mere existence of a rating system will invariably result in an endlessly disruptive stream of people demanding that it "easily and obviously" be applies as a filter. The mere existence of a rating system is viewed as inherently disruptive to our work. This is why various experienced editors immediately reacted with derision or hostility when a non-filtering rating system was suggested on Commons. They know the social dynamics if such a system were built, and dread the pain and disruption it would bring.

Another prime concern, nearly unanimous in the 2011 image filter referendum, was cultural neutrality. This is already too long so I'll just say that the 2011 initiative was abandoned in part because the cultural neutrality requirement was impossible to satisfy. A machine-learning system imposing a rating standard is not remotely the same as culturally neutral. I think almost everyone in the machine learning field is acutely aware of that fact.

Assuming that this task is resolved. If not, please reopen and add an active project tag so this task can be found one some workboard. Thanks.