Page MenuHomePhabricator

[Spike] Investigate the effect of external links on the likelihood of deletion of an image
Closed, ResolvedPublic

Description

If an image is uploaded and the user adds a link to the site where they found it (source in the {{Information}} template), this might be a good signal for whether the image is likely to be deleted - for example if the link is to google or facebook

Event Timeline

AUgolnikova-WMF renamed this task from Investigate the effect of external links on the likelihood of deletion of an image to [Spike] Investigate the effect of external links on the likelihood of deletion of an image.Jul 4 2024, 1:10 PM

Uploaded in 2023

source contains a link todeletednot deletedproportion deleted
facebook28683600.89
google8358040.51
instagram136190.99
pinterest228240.90
youtube2026830.96
gettyimages193170.92
reddit15540.97
shutterstock2010.95
alamy44130.77
istockphoto2310.96
tiktok2010.96
fbcdn00-
cdninstagram301
amazon98110.90
media-amazon3401

Ignoring google because it's only ~50/50 gives us 7597 files uploaded in 2023 that we can identify as having a ~93% chance of being deleted

Note that's around 7k deletions in 2023, compared to ~8.5k for logos

Could you add how old is the account in the above query? And you could also add not only a link, just a mention, i.e. "source=Google", etc.

@Yann thanks for the suggestion! Yes, we will be able to add a mention as well in the wikitext. We will be working on integrating this in the flow and providing a search capability to moderators the upcoming month as part of https://phabricator.wikimedia.org/T375264 if you want to follow the work. As for adding the age of the account, it's a bit more work, but maybe we can consider it afterwards.

People often upload scans of public domain works to Facebook that aren't uploaded anywhere else on the internet, this is true for a lot of the websites on the list. Another example is YouTube, there are public domain silent films on YouTube that shouldn't be excluded.

The best course of action would be adding an automatic "needs human reviewing" template, but then there is also the question if this is an issue with new users Vs. experienced users.

Also, Google (and other search engines) sometimes caches public domain images that were deleted at the original domain. I'm not sure if we need to add extra clicks again. These links should be tagged for review *after* upload, adding unnecessary warning templates before uploading will prevent people from uploading free images because they think "oh no, I don't want to be a spammer".

Sometimes museums upload images on social media that can be downloaded from social media but not their own websites.

Thanks @DonTrung. Right now we're working on T375264 where we just add structured data to the upload based on the source field in UploadWizard - this will make it possible for moderators to find uploads via UploadWizard from Facebook (or wherever) but won't affect the uploader's flow.

We have a separate ticket T377443 for warning users as they're uploading. Obvs if we make it difficult to upload then we limit our supply of good files, while if we make it too easy then our moderators get overwhelmed, and getting the balance right is tricky, so it's not scheduled for implementation at the minute - we'll see how T375264 goes first, and probably have some community discussion around it too

@DonTrung As @Cparle mentioned, yes the UI warning is out of scope of the main ticket https://phabricator.wikimedia.org/T375264 that we are working on. In terms of UI warning, we asked on the project page if a potential UI warning could be something useful or not and suggested a draft design, but this is not planned as we would like to get community feedback. So please feel free to comment there as well. And thanks so much for your feedback!