Page MenuHomePhabricator

Large amounts of unwanted files (mostly copyvios) uploaded via cross-wiki upload tool (A/B test of different upload interfaces)
Closed, InvalidPublic

Description

Commons is seeing a large number of files uploaded via the cross-wiki upload dialog that are copyright violations. In order to fix this, we could...

  • On clicking "Upload", bold the license requirement, and add another message saying "Be sure, now"
  • Require the user to type the string "I understand that I must be the owner of the copyright on this image in order to upload it" into a text box to confirm
  • Clarify what "copyright owner" means, something like "Did you take this picture? Did you draw this drawing?" etc. etc.

Beware of copying the UploadWizard process. Too bulky.

Related:

Context on wizards:


We ran an A/B test of four different interface options. They weren't very successful. https://www.mediawiki.org/wiki/Multimedia/December_2015_cross-wiki_upload_A/B_test

We're working on a second round now.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
TheDJ added a comment.May 9 2016, 7:33 PM

I think the assessment of Matmarex is fair, but to make sure that we don't forget that there is a problem here, I have opened T134802: Improve the curator workflow for reviewing new files, which to me is the core of much of the complaints being ushered here.

which to me is the core of much of the complaints being ushered here

Not really.

@matmarex: Thx for your feedback.

"(...) and you're not proposing any new solutions."

I can't propose any new solutions because I am still collecting some data (and that's a lone wolf job) but, of course, if I would go hard core now: deactivate cross-wiki uploads from all wikis via VisualEditor.

Ok, what we have:

First:
We have A LOT of untagged copyvios etc... not only from cross-wiki uploads. This is a generic problem of Commons = understaffed. This makes any quick analysis useless. If you check my contributions I would say: around 70 % of my edits since 15.12.2015 are somehow related to cross-wiki uploads. The other % --> 5 % for the Bangladesh Facebook Case + 2 % for the "FBMD"-case (33.348 images most likely grabbed from Facebook since 11.2014) + 23 % for other tasks/random edits.

Second:
For single wikis we have definitely high bad ratios --> ptwiki: currently around 85 % including 12.2015 — and 01.2016 (without further checks) is already at 64 %. I presume similar results from eswiki. And you know that we have wikis which "traditionally" don't care/don't know about copyrights. On the other hand, sure, we have a few wikis, where the term "copyright" is mostly NOT an empty phrase...

Third:
I am currently running a random selected day of ALL cross-wiki uploads = Cross-wiki upload from *.wikipedia.org (08.03.2016) (living & tagged & deleted). 08.03.2016 was a Tuesday.

See also User:Gunnex/Cross-wiki uploads 08.03.2016:

On 1st execution date of the query (10.05.2016) we had:
1.289 uploads on 08.03.2016

  • 387 deleted files
  • 87 files pending (no-permission, deletion requests, etc.)

--> 474 affected files

On 2nd execution date of the query (12.05.2016) we had:
1309 uploads on 08.03.2016 (no clue why this number increased)

  • 463 deleted files
  • 180 files pending (no-permission, deletion requests, etc.)

--> 643 affected files

On 3rd execution date of the query (15.05.2016) we had:
1317 uploads on 08.03.2016 (no clue why this number increased)

  • 504 deleted files
  • 252 files pending (no-permission, deletion requests, etc.)

--> 756 affected files

Currently, I checked +/- 50 % of the files, so... more to come. But potentially, we already reached here a +50 % bad ratio — from all wikis.

I do not understand...

  • why a ptwiki user needs a account registered for at least 30 days and 500 valid edits before he can upload images in scope of local image policy (some kind of fair use derivative)

... if he could just gaming the system, uploading files (faking "own" works) via Visual Editor — with no restrictions.

  • for enwiki, it appears to be 10 autoconfirmed (valid) edits, before the user can uploads under fair use...

... if he could just gaming the system, uploading files (faking "own" works) via Visual Editor — with no restrictions.
etc.

and, of course, it's just (via Visual Editor) a quick click on a check field (what did it say? No matter...) to upload a file...

So, considering all this...
"(...) and you're not proposing any new solutions."

Well, a quicky could be: I urgently would recommend a however configured restriction (e.g. autoconfirmed user account with 50/100/500 valid edits on local wiki) before allowing a cross-wiki upload.

I am on wiki-break: 160516—200516.

Update 29.05.2016 for User:Gunnex/Cross-wiki uploads 08.03.2016 = https://quarry.wmflabs.org/query/9633:

4th run on 29.05.2016: 1291 rows. 643 living (97 pending deletion), 648 deleted --> more to come...
(bad ratio > 50 % — including the pending deletions = 58 %)

Btw I: from all cross-wiki uploads on 08.03.2016 only since 01.05.2016 were deleted 406 files --> obviously mostly based on my current check. That means --> extremely low tag ratio at Commons (what I already know)--> imagine this for all days of cross-wiki uploads since 10.2015 = Commons is flooded by untagged, critical files.

Btw II: No answer is also an answer.

Btw III: For what I am doing this?

Btw IV:

So, considering all this...
"(...) and you're not proposing any new solutions."
Well, a quicky could be: I urgently would recommend a however configured restriction (e.g. autoconfirmed user account with 50/100/500 valid edits on local wiki) before allowing a cross-wiki upload.

Another solution: Deactivate cross-wiki uploads ASAP.

Another solution: Deactivate cross-wiki uploads ASAP.

That makes sense to me. They can be reactivated when

  • interface issues are fixed (e.g. T135917) and
  • the data analysis is completed to really show that each kind of users (e.g. "first time uploaders") doesn't make significantly more copyright mistakes than with Special:UploadWizard/Special:Upload .

Another solution: Deactivate cross-wiki uploads ASAP.

+1. Cross-wiki uploads has became a problem instead of a benefit on Commons. The number of copyright violations has increased when cross-wiki uploads came.

In T120867#2338229, @Pokefan95 wrote:

Another solution: Deactivate cross-wiki uploads ASAP.

+1. Cross-wiki uploads has became a problem instead of a benefit on Commons. The number of copyright violations has increased when cross-wiki uploads came.

Summarized: Commons does not have enough people to deal with an increase in the amount of contributions by untrusted sources. We saw the same with Mobile and I'm not sure why WMF was expecting anything else this time round.

Please just have a discussion, with a consensus to point to, so that it can be deactivated.. because this will just lead to an endless back and forth. I'm pretty sure about what the end result will be, but that's not gonna convince the WMF to take any action.

Also, I would like to point out to the Commons community that keeping away everyone that hasn't invested into learning everything about copyright seems like a generally VERY bad long term onboarding process to me. As a community that wants to succeed, someone better start thinking about how to fix this problem in scalable way, because currently I see a smaller core of people managing increasingly more files. It's not healthy and will eventually lead towards community collapse.

Gunnex added a comment.EditedJun 2 2016, 11:15 AM

(...)

  • the data analysis is completed to really show that each kind of users (e.g. "first time uploaders") doesn't make significantly more copyright mistakes than with Special:UploadWizard/Special:Upload .

"first time uploaders" --> see update at User:Gunnex/Cross-wiki uploads 08.03.2016

In T120867#2338071, Gunnex wrote:

(...)
(...) 4th run on 29.05.2016: 1291 rows. 643 living (97 pending deletion), 648 deleted --> more to come...
(bad ratio > 50 % — including the pending deletions = 58 %)

5th run on 01.06.2016: 1299 rows. 522 living (146 pending deletion), 777 deleted
(bad ratio: 59,82 % — including the pending deletions = 71 %)

Introducing the new column "UserRegistrationDate" = uploader account creation date on Commons:

  • 794 uploads were made by uploaders who had their account created on Commons on same upload date: 08.03.2016
  • 176 uploads were made by uploaders who had their account created on Commons from 01.03.2016 — 07.03.2016

    = 970 uploads from uploaders who had their account created on Commons in March 2016 = 74,67 % of total uploads on 08.03.2016 are coming from "fresh" registered users
  • 194 uploads were made by uploaders who had their account created on Commons from 01.01.2016 — 29.02.2016
  • 135 uploads were made by uploaders who had their account created on Commons in 2015 or later (the oldest one backs to 2008)

    = 1299 total uploads (control sum)

From 794 uploads made by user registered on upload date 08.03.2016 were deleted (ignoring so far the 146 pending deletions) 488 uploads:
= 61,46 % bad ratio.

From 176 uploads made by user registered from 01.03.2016 — 07.03.2016 were deleted (ignoring so far the 146 pending deletions) 105 uploads:
= 59,66 % bad ratio

Or, from 970 uploads (794+176) from "fresh user" registered in March 2016 were deleted (ignoring so far the 146 pending deletions) 593 (105+488) uploads
= 61,13 % bad ratio

What we have:

  • the "cross-wiki upload" feature via Visual Editor definitely attracts user to upload media to Commons. They may be "fresh registered" at Commons, but eventually are registered on local wikis already a longer time before.
  • the bad ratio of 61,13 % based on uploads only by users registered on Commons in March 2013 is (IMHO) undiscussable too high. The overall bad ratio of currently 59,82 % (or 71 % including the pending deletion) is just not acceptable.
  • we have A LOT of (1-upload) users who spontaneously want to illustrate their area of interest (home town, football club, music band, artists etc.) with a photo, just grabbing it from Internet and ignoring any advices ("is this file really your own work?") or check boxes by ignorance/"I don't care" or lack of education. I guess: in most of the cases they don't know or understand for what "Commons" stands for. "Freely licensed" or "public domain" = wth? Most likely also influenced by social media where copyrights of shared media are — well... — more or less irrelevant.
  • we have A LOT of users who think that their local wiki is some kind of social media-thing, creating an user page and uploading user images --> mostly grabbed from Facebook etc.. and then: nothing more, no local edits etc.. We have also A LOT of user images just uploaded for no purpose.
  • we have also A LOT of users who are (spontaneously) spamming with images for their own projects (Youtube channels, companies, websites, products, etc. = ego-spammers)
  • we have "good" wikis and we have "bad" wikis. See also ongoing analysis for User:Gunnex/Cross-wiki uploads from pt.wikipedia.org --> bad ratio so far: > 87 %. As already said above: it depends on culture/education level/etc.
  • the Quarry Cross-wiki upload from *.wikipedia.org (08.03.2016) (living & tagged & deleted) started with a 1st run on 10.05.2016 with 902 files living and 387 deleted. Now, on 01.06.2016, we have 522 files living and (so far) 777 deleted. It's again an indication that Commons is understaffed of people checking the uploads and is NOT ABLE to handle all uploaded files.
  • from the (so far) 777 deleted files, 538 files were deleted only in May/June 2016 --> mostly in connection with this special analysis. That indicates to a high "Not-Yet-Tagged-ratio", approaching or exceeding 50 %.
  • more or less 10 % of all uploads on 08.03.2016 were logos.

Ok, that's only data from 08.03.2016.

Surprisingly, Quarry managed to execute also a query for all cross-wiki uploads in March 2016 (I thought, the query would break the 30 min limit) --> Cross-wiki upload from *.wikipedia.org (March 2016) (living & tagged & deleted).

Here we currently have:

  • 36.478 uploads
  • 10.871 deleted files
  • 25.607 living files (410 pending deletion)

    = the bad ratio reached already (without further checks & analysis, see above: high "Not-Yet-Tagged-Ratio") 30 % and I am convinced, if we go into deep also in these 25.607 living files, we will accumulate a similiar bad ratio as shown for the 08.03.2016.

Btw, I can only endorse the last comment by @TheDJ...

As a community that wants to succeed, someone better start thinking about how to fix this problem in scalable way

The scalable method so far adopted is to make users go through wizards which explain the copyright requirements of Wikimedia Commons: https://commons.wikimedia.org/wiki/Commons:Upload and https://commons.wikimedia.org/wiki/Special:UploadWizard . The cross-wiki upload feature intentionally circumvented the existing scalable solutions, even though there is (still) no reason to think they're worse than nothing.

the "cross-wiki upload" feature via Visual Editor

@Gunnex, please take a look at this screenshot:

The cross-wiki upload tool is available to all editors in both editing systems. Many of these inexperienced people are using the wikitext editor to upload images to Commons.

Many of these inexperienced people are using the wikitext editor to upload images to Commons.

Yes, and probably they don't even mean to upload anything, they just were confused by the buttons: T135917: "Insert media" dialog offers confusing insert/upload alternative buttons.

Gunnex added a comment.Jun 4 2016, 9:29 PM

the "cross-wiki upload" feature via Visual Editor

@Gunnex, please take a look at this screenshot:
(...)
The cross-wiki upload tool is available to all editors in both editing systems. Many of these inexperienced people are using the wikitext editor to upload images to Commons.

Well, do we really know it? As a fresh registered user you get (clicking somewhere on "edit") some kind of popup alert offering the options: use the Visual Editor or Standard (just tested it with a fresh created account). Btw, the "wikitext editor" (can't I remeber if I ever used this) is only a wiki code provider. You can't select a file in the field "Filename" from you harddrive or search for existing files which was (even for me a bit confusing). It just insert the code you specified in the fields into the wiki entry, you are editing... A click on "upload" leads to the questionable upload wizard used also by the Visual Editor and (btw) ignores some eventually prefilled values like "Filename".

I presume, that most of the users are switching to the "Visual Editor" because (even lacking details from the popup info)... it sounds more... sophisticated/intuitive/WYSIWYG (or whatever...).

Anyway, the "problem" with the upload feature of the Visual Editor [or "wikitext editor") is IMHO: it is fast, it is uncomplicated (just an "X" on a check box) and you get the result instantly in the article. In other words: similiar (despite the "X") to uploading an image to Facebook & Co. Or better: it is too fast and it is too simple.

And with its simplicity, the wizard is open for all kind of abuses as demonstrated above, because, as mentioned by @Nemo_bis, the uploader isn't informed about all aspects of potential copyright related problems and isn't overwhelmed by warnings/alerts/tutorials from the normal upload wizards which tries to appeal to the ethics: is the file you are planning to upload really your own work? Well, appealing to moral values is — concearning media files — in times of social media (Facebook, Instagram & Co.) a critical thing, has its limits and is "washed-out", considering the flood of (re-) shared photos at the Internet. And, as I demonstrated above, most of the uploads (74,67 %) were uploaded by fresh users who (presumable) never got in touch with copyrights before uploading files. Btw, I guess that most of the fresh users even aren't aware (or do not understand --> the upload wizard gives no more infos) that their file is originally stored at "Wikimedia Commons" and they probadly don't know what's behind it...

And yes; I still find it strange that a ptwiki user needs a account registered for at least 30 days and 500 valid edits before he can upload files in scope of local image policy (some kind of fair use derivative). For the enwiki it's: account more than four days old and at least 10 edits are considered autoconfirmed. A typical fresh cross-wiki uploader has 1 edit in Commons and 1-5 edits at the local wiki - on same day, often within some minutes/hours.

Btw, I managed to import the March 2016-uploads to Commons, splitting the query data into 3 files:

Feel free to engage there. I personally will randomly check some files over the next weeks and month but I certainly will NOT repeat the "lone-wolf"-job that I did for the 08.03.2016-query. For checking 25.607 living files we need, if requested in case of doubt, a team.

He7d3r added a comment.Jun 5 2016, 1:50 AM

(...)
Btw, I guess that most of the fresh users even aren't aware (or do not understand --> the upload wizard gives no more infos) that their file is originally stored at "Wikimedia Commons" and they probadly don't know what's behind it...

Indeed, that is probably true, given the kind of questions I have seen on a local wiki about their image, after they were deleted on Commons...

Steinsplitter added a comment.EditedJun 5 2016, 10:29 AM

Summarized: Commons does not have enough people to deal with an increase in the amount of contributions by untrusted sources. We saw the same with Mobile and I'm not sure why WMF was expecting anything else this time round.

It is not mainly a question about people, we need a system for automatically detecting copyvios (by sha1 checks with img search engines) etc.

Even if we have a lot of people working in that area, we will wast peoples time.

Please just have a discussion, with a consensus to point to, so that it can be deactivated.. because this will just lead to an endless back and forth. I'm pretty sure about what the end result will be, but that's not gonna convince the WMF to take any action.

The tool was enabled whiteout consensus and we need consensus to disable it? This make no sense for me.

Also, I would like to point out to the Commons community that keeping away everyone that hasn't invested into learning everything about copyright seems like a generally VERY bad long term onboarding process to me. As a community that wants to succeed, someone better start thinking about how to fix this problem in scalable way, because currently I see a smaller core of people managing increasingly more files. It's not healthy and will eventually lead towards community collapse.

I don't think this schould be discussed on phabricator (offtopic), and people schould know the basic of copyright law. It is not a policy, it is applicable law.

As a community that wants to succeed, someone better start thinking about how to fix this problem in scalable way

The scalable method so far adopted is to make users go through wizards which explain the copyright requirements of Wikimedia Commons: https://commons.wikimedia.org/wiki/Commons:Upload and https://commons.wikimedia.org/wiki/Special:UploadWizard . The cross-wiki upload feature intentionally circumvented the existing scalable solutions, even though there is (still) no reason to think they're worse than nothing.

Agree :-)

we need a system for automatically detecting copyvios (by sha1 checks with img search engines) etc.

See T125459.

we need a system for automatically detecting copyvios (by sha1 checks with img search engines) etc.

See T125459.

Thansk for the link, interesting. But as far i can see the tool/bot does not support commons (even years ago).

Poyekhali added a comment.EditedJun 7 2016, 12:54 AM

we need a system for automatically detecting copyvios (by sha1 checks with img search engines) etc.

See T125459.

Thansk for the link, interesting. But as far i can see the tool/bot does not support commons (even years ago).

I can use Earwig's tool on Commons for copyvio hunting. However, it can only search only for text, not files. Maybe another good feature for Earwig's tool.

TheDJ added a comment.EditedJun 7 2016, 12:36 PM

It is not mainly a question about people, we need a system for automatically detecting copyvios

Different side of the same labor coin. The amount of labor executed should ideally be increasing. The amount of people to execute the labor is however currently limited. There is a gap between those two. The gap can be bridged with automation and/or humans, and probably best with both. Lacking automation AND people, there is no way that we can increase the amount of labor, since it will degrade the quality. People are just the 'canary in a coal mine' of labor.

BTW. we would be wise to note the enormous amount of false positives and negatives that Youtube has with automated copyright violation detection and the disruption this can create. Automating interpretation of copyright is not easy.

we will waste peoples time.

Have you ever known people who edit and maintain foursquare, openstreetmap, facebook places, google maps or openstreetmaps, or wiki vandal fighting, to qualify their time spent as "wasting time" ?

A part of the Commons community's attitude seems more and more: "I don't want to curate anyone else's work". There are plenty people willing to do that and it has been proven many times. Our communities are built on gnome work. The problem is that people are either unwilling to do this for Commons, not finding their way to Commons, or we have made it too complicated for them (and likely all three), not that these people don't exist.

Instead of fixing those problems, we choose to ever more narrow the door to make sure we have fewer contributors.

The scalable method so far adopted is to make users go through wizards which explain the copyright requirements

We think we are stopping the 'bad' contributors, but realistically we are likely to stop a sizable amount of good actors as well, thereby creating a vicious circle.

Most likely also influenced by social media where copyrights of shared media are — well... — more or less irrelevant.

and people schould know the basic of copyright law. It is not a policy, it is applicable law.

When reality doesn't match theory, then you choose to ignore reality ? This is not an academic exercise, we need to be practical. Copyright is something that people (with few exceptions) DONT understand and don't WANT to understand. Companies might want you to understand it, but it is as realistic as telling people they should understand the maintenance state of their car engine because it is a critical part of a potentially deadly vehicle that they use on the road daily. They will drive regardless and trust a mechanic instead. As a community, Commons should be advocates and mechanics not politicians and businessman. We need to guarantee roadsafety without stopping all traffic but our own cars.
Everything in your design of tooling should anticipate that people will not care about this. Throwing flyers in their face with: "You should care !" is not gonna make the big difference that we are looking for.

The tool was enabled whiteout consensus and we need consensus to disable it? This make no sense for me.

As a volunteer, I just care about ending the back and forth. A onwiki consensus discussion has been a proven way to do so over the years.

Gunnex added a comment.EditedJun 7 2016, 8:14 PM

FYI I:
Commons scope + upload wizard --> a discussion at Commons started with: "We have a serious problem with junk uploads. Wikimedia Zero in particular is terrible for reasons we've already established.[4] But in fact all uploads coming from new accounts have this problem. (...)"

but which is affecting also the constant problem with cross-wiki uploads, trying also to give some improvements (despite of just turning off the wizard).

FYI II:
I did a 6th update on User:Gunnex/Cross-wiki uploads 08.03.2016.
I started to check these files on 10.05.2016 with

  • 902 files living, 387 files deleted.

Today:

  • 446 files living, 833 files deleted + 73 files pending deletion.

That indicates to a "Not-Yet-Tagged-Ratio" of > 50 % --> an indicator which I already reached in other cross-wiki checks. That means, if I assume 30.000 cross-wiki uploads monthly since 11.2015 [ignoring 10.2015: the tool was introduced on 21.10.2015) = 210.000 (7x 30k) uploads till May 2016 --> 105.000 (+/- 50% bad ratio) --> 52.500 not-yet-tagged living files (copyvios/no-permission-source-etc/out of PS)...

Gunnex incidentally linked the foregoing Commons Village Pump discussion about the very same subject just above and more or less suggested that the following suggestions may be copied here from there. Fellow contributor Magog the Ogre suggested some kind of communicating interface. I worked on his ideas and hence want to suggest the following changes / improvements to be shown in the cross-wiki upload dialogue:

What kind of images may you upload?

  • Not OK Pictures of you and your friends.
  • OK Pictures of people, places, and things which are relevant to the whole world.
  • Not OK Logos of companies, sports clubs.
  • OK Old logos which have expired due to copyright, or which are too simple for copyright.
  • Not OK Pictures you found on the internet.
  • OK Photographs you took yourself.
    • The selfie thing is pretty straightforward: how many selfies landing in scope are to be expected?
    • Logos pose often a quite hard task to assess, better leave it to experienced Commons editors. Not having a logo does not harm the repository aim of Commons too much. I'd even strike the "old logo" point.
    • This wording gives the needed clarity here by avoiding any hinting to possible exceptions like "most pictures". We have enough experienced Flickr (Panoramio...) uploaders that can use internet sources.
    • Using the word with "photographs" instead of "pictures" or "images" may avoid the problem of screenshots here.

From a technical viewpoint, disallowing all cross-wiki JPEG and TIFF uploads without camera data or EXIFDateShot field would be a sensible way of filtering potential copyvios as those seldom exhibit a meaningful set of EXIF.
The cross-wiki feature is unequivocally an entry point for massed instead of "quality" uploads, so we'll need to deal with the sheer amount of files, as such, I think that we could disregard any need that may be felt to care about exceptions here (allowable webfinds, logos, self-promoting people that indeed fall in project scope, FOP stuff, etc.). I would even go as far to suggest barring PNG and GIF images from cross-wiki (due to their lack of EXIF), as those may most often just be logos.
Just follow the KISS principle here for the user frontend!

(just some info: First acted-upon DMCA of an image uploaded via coss-wiki upload: https://commons.wikimedia.org/wiki/Commons:Office_actions/DMCA_notices#Takedown_of_House-Ant1.jpg)

Link to the DMCA request https://wikimediafoundation.org/wiki/DMCA_House-Ant1.jpg

Note that this was a blatantly clear copyright violation, authorshipy and copyright claim was available at the exif data (it's unfortunate that it wasn't by detected by the community before the author had to go the DMCA path).

@Gunnex, maybe you could also list the creation date of the global account? I suspect not only their commons account will be brand new but that in many cases their "wikipedia account" will be quite recent, too.

@Gunnex, maybe you could also list the creation date of the global account? I suspect not only their commons account will be brand new but that in many cases their "wikipedia account" will be quite recent, too.

The uploader is Cdwichlaz, per log

Gunnex added a subscriber: Srittau.Jun 10 2016, 7:29 PM

@Gunnex, maybe you could also list the creation date of the global account? I suspect not only their commons account will be brand new but that in many cases their "wikipedia account" will be quite recent, too.

I do not think it is possible (multiple databases) and I fear also the 30 min. limit of Quarry to execute the query — but I am not a Quarry specialist. If somebody has an idea --> just fork https://quarry.wmflabs.org/query/9633.

As I already mentioned:

In T120867#2355854, @Gunnex wrote on 04.06.2016:

A typical fresh cross-wiki uploader has 1 edit in Commons and 1-5 edits at the local wiki - on same day, often within some minutes/hours.

In addition to @Grand-Duc comment above, there (at Commons scope + upload wizard ) were made also proposals like:

  • Improvement of the upload form and dialog

See the thread, multiple issues raised

  • "Quarantine"

Concearning the the idea of some kind of quarantine (="(...) all cross-wiki stuff lands in a special Wiki or a special Commons namespace (...)", by @Grand-Duc): this may be especially interesting (despite to simple turn off the wizard) because (and the following is some kind of "loudly thinking") it interrupts the "Facebook-style"/"instant success-feeling"/WYSIWYG-effect of the upload wizard (just a random-"I don't care" "X" at the form, store the wiki entry, and ready) and may decrease the ratio of spontaneous uploads (note that nearly 80 % of cross-wiki uploads are made by "fresh" users) — but they must be informed about that before the upload (and maybe that makes them to just abort their upload). Nevertheless: we have > 30.000 uploads monthly (in 03.2016: 36.478) via cross-wiki uploads and I fear that (considering the constantly understaffed Commons "crew") we will not be able to handle this amount of files (which may be lower after the improvement, but still will go into thousands).

  • Accessibility only for a certain user group

(that topic I already raised above...)

"Also, cross-wiki uploads only for auto confirmed users, if this is not the case yet." (by @Srittau aka "Sebari")

"Additionally, Sebari's idea of restricting the usage of the cross-wiki feature is a good idea, but auto-confirmed may be too low. There should be some edits on-wiki or a membership in groups as "editors" or users of the flaggedrevision extension before getting access to this tool." (by @Grand-Duc)

"Maybe we can keep the upload wizard as it is for the autoconfirmed users and for the newcommers a tool with more confirmation to give and with more check boxes with all the options, or more, described by Magog the Ogre, for an educational purposes" (by @Christian_Ferrer)

  • And my humble opinion

Repeating: Deactivate the cross-wiki uploads ASAP and start from 0. Wikipedia is NOT Facebook & Co. Forget the approach to make this thing social-media-like because in the medium to long term Commons (and the rest) will loose its credibility. +85 % bad ratio from pt.wikipedia.org from 10.2015 — 03.2016... +70 % bad ratio from cross-wiki uploads of 08.03.2016 + probadly 50-70 % bad ratio from all 230.000 cross-wiki uploads since 10.2015 + a constantly understaffed Commons team which leads to a "Not-Yet-Tagged-ratio" of critical files (copyvios etc.), reaching almost 50%.

Improve the form, limit the upload feature only for a certain user group (and yes: I follow @Grand-Duc when he says: "auto-confirmed may be too low") + X + Y + Z + and do not ignore the weak Commons userbase which is paying the price / resp. try to hit the middle of the feasibility and balancing act between "what-is-Commons-not" and "Commons-is-a-file-hoster".

Perfect last words in this case? Probadly not, but I am on wiki break till 15.06.2016.

Nemo_bis updated the task description. (Show Details)Jun 11 2016, 1:27 PM

The complains by muntiple users, the deletion vs keep ratio and the recent DMCA takedown request is speaking for itself, the underlying facts are speaking for itself.

Imho the tool must be widely improved or switched off, in the style of bonus pater familias. :-)

revi added a subscriber: revi.Jun 12 2016, 12:35 PM
Yann added a comment.Jun 14 2016, 5:34 PM

Needless to say, but I agree 100% with Steinsplitter's comment above. I think the tool must be disabled, and eventually enabled again later after improvements.

With change https://gerrit.wikimedia.org/r/#/c/293355/ being merged it will be now actually possible to disable this tool. Some additional work in WikiEditor and VisualEditor needs to be done to hide the non-functional buttons to launch the dialog if it is disabled (right now they would just pop up an error message).

Personally, I think this is a great feature that should stay enabled. It was a huge step towards moving the editor closer to 21st century, as even the lamest website today that lets you post stuff with images also lets you upload the images directly from the editor. But I concede that the Multimedia team is understaffed and unable to support this feature with secondary features that turned out to be required, even if it was clear what exactly those features are, and it isn't.

We've put a lot of effort into enabling you to review the uploads (the effect of this is that files are now patrollable, and that Special:NewFiles can limit the results to unpatrolled files), I'm sad that it did not suffice and disappointed that the efforts seem to have been entirely unrecognized by the Commons community.

the "cross-wiki upload" feature via Visual Editor

The cross-wiki upload tool is available to all editors in both editing systems. Many of these inexperienced people are using the wikitext editor to upload images to Commons.

Well, do we really know it? As a fresh registered user you get (clicking somewhere on "edit") some kind of popup alert offering the options: use the Visual Editor or Standard (just tested it with a fresh created account). Btw, the "wikitext editor" (can't I remeber if I ever used this) is only a wiki code provider. You can't select a file in the field "Filename" from you harddrive or search for existing files which was (even for me a bit confusing). It just insert the code you specified in the fields into the wiki entry, you are editing... A click on "upload" leads to the questionable upload wizard used also by the Visual Editor and (btw) ignores some eventually prefilled values like "Filename".

We don't actually know whether people are using the tool more in VisualEditor or in WikiEditor. Learning that would be one of the goals of T133306: Add EventLogging instrumentation to the upload dialog (another project we don't have time for).

@maxmarex: Being a software developer myself, I can understand your frustration when people do not appreciate the work you have poured a lot of sweat into. That said, there is unfortunately a long history of the WMF forcing features onto Commons that are a huge pain to us, especially the admins. This is just another not very well researched project that causes a big increase of workload, especially of the "unsexy" kind: patrolling for copyvios is not very fulling. Please keep in mind that we are talking about unpaid volunteers here.

I feel this was one more misguided attempt where communication beforehand would have been necessary. The last fiasco - mobile uploads - is still fresh in our minds, and this project unfortunately makes all the same mistakes again. This could easily have been avoided by actually asking the Commons community, whether they think this is a good idea.

As a software developer I of course see the benefits of cross-wiki uploads to Wikipedia users, and I wish there was an easy solution, but as someone working on Commons for more than 10 years, I know that the main challenge to such a feature is not technical.

There is a huge disconnect and mistrust between the WMF and the users, and the way this project was handled is unfortunately not going to improve the situation.

P.S.: That said, personally I appreciate the will to improve the project for our users. And I also know that it is sometimes very hard to cause change, because the status quo is deeply entrenched and people are afraid of change.

Nemo_bis added a comment.EditedJun 15 2016, 8:57 PM

This could easily have been avoided by actually asking the Commons community, whether they think this is a good idea.

Not really: WMF already knew. https://www.mediawiki.org/wiki/Talk:Wikimedia_Engineering/2015-16_Q2_Goals#Custom_uploading_interface

This is rather the usual issue of wheel reinvention. Everybody thinks their own solution will be the ultimate one, then what really happens is the eternal recurrence of the same e.g. the reinvention of a wizard interface to explain copyright.

The main risk of this initiative is that at the end we may have 3 "standard" upload wizards for Commons: the cross-wiki upload, UploadWizard and Commons:Upload (in addition to all the local uploaders, of course). Hopefully we manage to kill at least one of the three (i.e. merge it with the others). https://xkcd.com/927/

I feel this was one more misguided attempt where communication beforehand would have been necessary. The last fiasco - mobile uploads - is still fresh in our minds, and this project unfortunately makes all the same mistakes again. This could easily have been avoided by actually asking the Commons community, whether they think this is a good idea.

While we didn't ask for opinions before we developed the tool (we should have), I announced that it is being deployed (https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=175802601#Cross-wiki_uploads_to_Commons_from_the_visual_editor) and no one had foreseen what the effect is going to be. Or if they did, they kept it to themselves, or only left hidden clues, like Nemo with his unclear comment on a talk page nobody reads. I wish that getting opinions from any Wikimedia wiki's community was as simple as just asking them.

matmarex removed Prtksxna as the assignee of this task.Jun 15 2016, 10:34 PM

(Prateek is not really working on this, we shelved the plans for another A/B test since everyone hates this feature anyway.)

This seems like a pretty good idea, I left a longer note there: https://commons.wikimedia.org/wiki/Commons:Administrators'_noticeboard#Queries_and_data. Currently AbuseFilter limitations make it impossible to implement that filter for cross-wiki uploads only, I'll be working on T89252 to resolve this. I also filed T137841 about making the upload dialog react better to AbuseFilter preventing the uploads.

Change 276136 abandoned by Prtksxna:
mw.ForeignStructuredUpload.BookletLayout: E/F test of 2 different interfaces

https://gerrit.wikimedia.org/r/276136

Gunnex added a comment.EditedJul 26 2016, 12:53 AM

This seems like a pretty good idea, I left a longer note there: https://commons.wikimedia.org/wiki/Commons:Administrators'_noticeboard#Queries_and_data. Currently AbuseFilter limitations make it impossible to implement that filter for cross-wiki uploads only, I'll be working on T89252 to resolve this. I also filed T137841 about making the upload dialog react better to AbuseFilter preventing the uploads.

I already expressed my concerns against "cross-wiki-uploads" several times (see above)... but.... okay. Here we go again:

The AbuseFiler 153 entered into action on 14.07.2016.

https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#AbuseFilter_for_cross-wiki_uploads was dearchived and updated by @Nemo_bis on 24.07.2016 (check for news).

Analyzing cross-wiki-uploads from 15.07.2016–18.07.2016 via Quarry = User:Gunnex/Cross-wiki uploads 15.07.2016–18.07.2016, spending my last+/- 3.500 contributions on Commons almost exclusively for this:

  • 1st run on 22.07.2016: 1.845 rows. 1.581 living (355 pending deletion), 264 deleted
  • 2nd run on 24.07.2016: 1.866** rows. 1468 living (432 pending deletion), 398 deleted
  • 3rd run on 25.07.2016: 1.903** rows. 1,480 living (584 pending deletion), 423 deleted
  • 4th run on 26.07.2016: 1930** rows. 1.419 living (679 pending deletion), 511 deleted
  • --> **) no clue why the total of rows changes, probadly due to double entries

--> reaching (counting also the pending deletion) already an overall bad ratio of 62,,00 % (copyvios/PS/perm./source/etc.) for this period.

Btw, I reached currently files beginning with "S", so more to come...

What we "got" so far:

  • Low tag ratio --> Commons: understaffed - multiple times mentioned above (and I am noticing constantly technical defects in detecting copyvios from social media). Probadly 60-70 % of above tags (copyvios/DRs/etc.) are/were signed by me.
  • 196 logos --> only per file title = 10 % + (around) 60-80 (perceived) logos hidden in other file titles (around 15 %)
  • 659 .png-files --> 34 % --> 470 living, 189 already deleted, unknown number of pending deletions. png-files are mostly spam and/or (social media) copyvios, so (as suggested by @Nemo_bis) activate png in the AbuseFilter... no... deacticate the whole thing...

Technically, the AbuseFilter is working almost fine and dropped the total amount of cross-wiki-uploads significantly. Before 14.07.2016, it was around 1.200 daily uploads (see also https://quarry.wmflabs.org/query/11251 for 13.07.2016 = without further checks 1.137 rows, 777 living, 29 pending deletion, 360 deleted, bad ratio so far: 34,00 %. Now we have 1.930 files for 4 days.

But the bad ratio continues (as already mentioned above) too high. IMHO, undiscussable too high...

I will conclude this analysis to the end and will post final results... but... I already wasted countless hours and megabytes (see also here) for this issue and this will be most likely my last commitment in this matter. In other words: It's turning pointless for me.

Finished now the analysis of cross-wiki-uploads from 15.07.2016–18.07.2016 via Quarry = User:Gunnex/Cross-wiki uploads 15.07.2016–18.07.2016

  • 1st run on 22.07.2016: 1.845 rows. 1.581 living (355 pending deletion), 264 deleted
  • 2nd run on 24.07.2016: 1.866** rows. 1468 living (432 pending deletion), 398 deleted
  • 3rd run on 25.07.2016: 1.903** rows. 1,480 living (584 pending deletion), 423 deleted
  • 4th run on 26.07.2016: 1.930** rows. 1.419 living (679 pending deletion), 511 deleted
  • 5th run on 26.07.2016: 1.965** rows. 1.410 living (752 pending deletion), 555 deleted

--> **) including some double entries due to bug T140522

...reaching (counting also the pending deletion) an overall bad ratio of 66,51 % (copyvios/PS/perm./source/etc.) for this period.

Do what ever you want with the following numbers but: take a decision:

The (currenty) 555 deleted files were deleted, because (per info available at column "DeletionReason"):

DeletionReason:copyviosno-permissionno-sourceno-licensedeletion requests*others**total
Files:346365013929555

The (currenty) 752 pending deletion are based on (per info available at column "Template"):

Template:copyviosno-permissionno-sourceno-licensedeletion requests*others**total
Files:61902505310752

--> *) multiple issues (copyvios/permission/sources/project scope)
--> **) multiple issues (duplicates, attack images, user errors, project scope, etc.)

Info: 40–50 % of the deletion requests are related to files out of project (spam, unused user pics, etc.)

Regarding user registration we have:

Uploaded files, depending on registration date:

Registration date:18.07.201617.07.201616.07.201615.07.201614.07.2016–01.01.20162015201420132012
Files:361211260299ignored2401145537
Deleted:78538776ignored10638139
Pending deletion:1827589118ignored77442116
Bad ratio (%):72,0260,6667,6964,88ignored76,2571,9360,0067,57

1.131 uploads (57,56 %) by fresh users registered 18.07. –15,07,2016 are standing for 758 deleted files and pending deletion = 38,58 % --> that means, that each 2,59 file uploaded only by these users are probadly "bad".

Btw: All over the years by registration date, no significant change in the bad ratios for the checked period. It appears that even users with registration before 2016 and even maybe more familar with the policies of Wikipedia are using cross-wiki-upload because they...

  • ignored it (the policies) all over the years and are just fullfilling personal, spontaneous, interests
  • don't have a clue about "copyrights"
  • [option 3: feel free to insert text here]

So, in other words: the cross-wiki-upload tool is in the vast majority a perfect tool for users who – quicky-like – wants to illustrate/promote/etc. something spontaneous on Wikipedia, ignoring further concerns about copyrights. Just grab it from Internet. It is obvious that WMF is trying to establish a somehow social media-like thing, imitating Facebook & Co... – which is going completely wrong.

You can now imagine, before introducing the AbuseFilter, what kind of mostly copyrighted/out of project/spam/etc. works were uploaded to Commons since 10.2015 and which are remaining as (mostly) undetected (silent) copyright violations. The numbers? Probadly around +1.000 daily uploads since 21.10.2015 till 14.07.2016 = 267 days x 1.000 uploads = +267.000 uploads --> bad ratio (see above) around 40–50 % = (45 %) 120.150 uploads --> low "not-yet-tagged-ratio" (around 40 %) --> 48.060 copyvios/out of project/spam/etc. still living at Commons (probadly higher).

So, repeating again: deactivated cross-wiki-uploads ASAP. It is destructive. It is a fail. It is jeopardng Commons's goal to be a database of freely usable media files. Its... just an invitation for (mostly) copyvios (etc.) ...

Yann added a comment.Jul 26 2016, 10:48 PM

Thanks @Gunnex for this great analysis. I can only support him in his conclusion : deactivate cross-wiki-uploads ASAP.

If you want it disabled, file a configuration change request in Wikimedia-Site-requests to set $wgForeignUploadTargets = []; across all wikis (I implemented the ability to do this in eccf8dd01). If no one listens, make an abuse filter on Commons to block all of those uploads (I implemented the ability to do this per T138273). At the very least that will get you somebody's attention.

I have cared about this issue as long as I could and did all I could to resolve it, and now, six months later, I literally can not care any longer. The fate of the cross-wiki upload tool, no matter all the effort I put into it, no longer bothers me. I am, quite simply, all out of damn to give.

@Gunnex Yes, it is already known that this is an issue. Your analysis is nice, and probably no one reads it. I'm sorry but I think your work on it, apart from cleaning up existing bad uploads, is in vain.

Wow. Looks like the mobile uploads disaster all over again (in case you don't remember, see Mobile upload needing check#Background). I agree with @Yann: This has to stop immediately. Requesting wiki configuration changes says we need community consensus, so I suppose we should start an RFC at Commons:Village pump/Proposals?

@matmarex: thx for your honest answer. I followed your attempts to make this thing working and I respect the work you dedicated for it.

Let's do the same analysis as before, now for cross-wiki-uploads from 08.03.2016 via Quarry = User:Gunnex/Cross-wiki uploads 08.03.2016, because I worked on that data already intensively since May 2016.

Here, we have less uploads and only 1 day to check (may be influenced also by (un)lucky circumstances), but just to find out a "trend". I posted here already some interim results before but we have now an almost stable data situation.

  • 1st run on 10.05.2016. 1.289** rows. 902 living (87 pending deletion), 387 deleted.
  • 2nd run on 12.05.2016. 1.309** rows. 846 living (180 pending deletion), 463 deleted.
  • 3rd run on 15.05.2016: 1.317** rows. 813 living (252 pending deletion), 504 deleted.
  • 4th run on 29.05.2016: 1.291** rows. 643 living (97 pending deletion), 648 deleted.
  • 5th run on 01.06.2016: 1.299** rows. 522 living (146 pending deletion), 777 deleted.
  • 6th run on 07.06.2016: 1.279** rows. 446 living (73 pending deletion), 833 deleted.
  • 7th run on 27.07.2016: 1.276** rows, 363 living (0 pending deletion), 913 deleted.

--> **) including some double entries due to bug T140522

...reaching an overall bad ratio of 71,55 % (copyvios/PS/perm./source/etc.) for this period.

The 913 deleted files were deleted, because (per info available at column "DeletionReason"):

DeletionReason:copyviosno-permissionno-sourceno-licensedeletion requests*others**total
Files:407826040117913

--> *) multiple issues (copyvios/permission/sources/project scope)
--> **) multiple issues (duplicates, attack images, user errors, project scope, etc.)

Regarding user registration we have:

Uploaded files, depending on registration date:

Registration date:08.03.201607.03.201606.03.201605.03.201604.03.2016–01.01.2016201520142013201220112010–2008total
Files:778114111239631820319101.276
Deleted:5699571157481410273913
Bad ratio (%):73,1483,3363,64100,0065,6976,1977,7850,0066,6736,8430,00

904 uploads (70,85 %) by fresh users registered 08.03. – 05.03.2016 are standing for 672 deleted files = 73,60 % --> that means, that each 1,36 file uploaded only by these users were "bad".

Again, also older accounts (which may be more familiar with policies of Wikipedia) presents high bad ratios but here we have also a smaller data base to compare.

As you can see via User:Gunnex/Cross-wiki uploads from pt.wikipedia.org I am checking (with some back log) especially cross-wiki uploads from my (ex-) home wiki: ptwiki. It was not surprising to find out that I got here around a 85,00 % bad ratio (10.2015 – 04.2016) for cross-wiki uploads from pt.wikipedia.org

Ok, ptwiki is one the "bad wikis" – but who else?
I managed to adapt the Quarry for 08.03.2016 with a new column, indicating the origin of the cross-wiki upload + did the same for 15.07.–18.07.2016 for comparism.

The numbers:
(ignoring wikis with lower than 10 uploads)

08.03.201615.07.–18.07.2016
wikiUploadsDeletedBad ratio (%)UploadsDeletedPending deletionBad ratio (%)*
arwiki161487,5050172584,00
cswiki12433,33175241,18
dewiki562951,79118334364,41
elwiki13323,08125691,67
enwiki54941575,5970026020967,00
eswiki1199680,67146525472,60
frwiki955861,0584233366,67
huwiki402460,003801026,32
itwiki383181,5960172265,00
jawiki11872,73144135,71
mnwiki2323100,00nananana
nlwiki161381,25216761,91
plwiki3133,3352141861,54
ptwiki332884,8577282974,03
ruwiki584679,31105253759,05
srwiki10990,001000,00
svwiki10550,00127491,67
trwiki211571,4339231494,87
ukwiki291034,482918268,97
zhwiki141178,572412258,33

--> *) taking into account also the pending deletion

Well, eswiki was already on my "watch-list" before due to comparable cultural (let's say...) "spontaneity" as ptwiki (in other words: they don't care - and they may have heard about "copyrights" but they are ignoring it), and e.g. arwiki is probadly a typical case of "I don't care + I never heard of "copyrights". On the other hand dewiki, which has the merit of beeing probadly the most reliable wiki-version, but which is also "equipped" with some users who are falling into the group of "I don't care/I heard about it, but.../I don't know" – like also other "big" wikis like fr/it/nl/ru/etc.

Or in other words: all wikis are somehow "bad" and/or "not so good" – some more, some less. And the bad ratios from cross-wiki uploads from enwiki made by users around the world is a quite representative cross-section from user behaviour worldwide (and they confirm the bad ratios mentioned above). So, deactivating the cross-wiki upload tool only on some "critical" wikis (which could be also an option) most likely does not solve the whole mess (and...well... the users may also switch to a wiki with activated tool, gaming the system).

So, citing myself:

So, in other words: the cross-wiki-upload tool is in the vast majority a perfect tool for users who – quicky-like – wants to illustrate/promote/etc. something spontaneous on Wikipedia, ignoring further concerns about copyrights. Just grab it from Internet. It is obvious that WMF is trying to establish a somehow social media-like thing, imitating Facebook & Co... – which is going completely wrong.

And that's a global problem.

(...) Requesting wiki configuration changes says we need community consensus, so I suppose we should start an RFC at Commons:Village pump/Proposals?

Probadly yes (but not by me).

Gunnex added a comment.EditedJul 28 2016, 10:43 PM

(...) Requesting wiki configuration changes says we need community consensus, so I suppose we should start an RFC at Commons:Village pump/Proposals?

@Gunnex wrote:
Probadly yes (but not by me).

Well, I have some kind of conflict of interest here, because probadly I provided the most content and data AGAINST this tool... so it would be nice, if someone else would open the RFC. Please... soon (Commons is flooded by currently around 500 cross-wiki uploads daily)

Btw, if you have time, go through Cross-wiki upload from *.wikipedia.org (190716—270716) (living & tagged & deleted & from where) = User:Gunnex/Cross-wiki uploads 19.07.2016–27.07.2016 – which is the sequel of User:Gunnex/Cross-wiki uploads 15.07.2016–18.07.2016.

I made two runs of the query:

  • 1st run on 16:39, 28. Jul. 2016‎: 4.690 rows. 4.025 living (428 pending deletion), 665 deleted --> bad ratio: 23,31 %
  • (5 hours later)
  • 2nd run on 21:37, 28. Jul. 2016‎: 4.716 row**s, 4.043 living (489 pending deletion) 673 deleted --> bad ratio: 24,64 %
  • **) The total of rows is changing due to some double entries, caused by bug T140522

I engaged since 16:39, 28. Jul. 2016‎ for around 2 hours also exclusively into these uploads and the 61 additional pending deletion (copyvios/DR/no-permission/etc.) + 8 deleted files (copyvios) are probadly result of my work... again an indicator for the low "non-tagged-ratio" at Commons, understaffed, etc...

PS: I am out of office from 31.07. – 04.08.2016

(...) Requesting wiki configuration changes says we need community consensus, so I suppose we should start an RFC at Commons:Village pump/Proposals?

@Gunnex wrote:
Probadly yes (but not by me).

Well, I have some kind of conflict of interest here, because probadly I provided the most content and data AGAINST this tool... so it would be nice, if someone else would open the RFC. Please... soon (Commons is flooded by currently around 500 cross-wiki uploads daily)

Done at Commons:Village_pump/Proposals#Rfc: Should we request a configuration change to shut down cross-wiki uploads?

Gunnex added a comment.Aug 9 2016, 7:49 PM

Btw, if you have time, go through Cross-wiki upload from *.wikipedia.org (190716—270716) (living & tagged & deleted & from where) = User:Gunnex/Cross-wiki uploads 19.07.2016–27.07.2016 – which is the sequel of User:Gunnex/Cross-wiki uploads 15.07.2016–18.07.2016.
I made two runs of the query:

  • 1st run on 16:39, 28. Jul. 2016‎: 4.690 rows. 4.025 living (428 pending deletion), 665 deleted --> bad ratio: 23,31 %
  • (5 hours later)
  • 2nd run on 21:37, 28. Jul. 2016‎: 4.716 row**s, 4.043 living (489 pending deletion) 673 deleted --> bad ratio: 24,64 %
  • **) The total of rows is changing due to some double entries, caused by bug T140522

I engaged since 16:39, 28. Jul. 2016‎ for around 2 hours also exclusively into these uploads and the 61 additional pending deletion (copyvios/DR/no-permission/etc.) + 8 deleted files (copyvios) are probadly result of my work... again an indicator for the low "non-tagged-ratio" at Commons, understaffed, etc...

Update:

  • 3rd run on 08.08.2016: 4.775 rows**, 3.447 living (557 pending deletion) , 1.328 deleted

**) The total of rows is changing due to some double entries, caused by bug T_140522 (see above)

(...)

Cross-wiki upload from *.wikipedia.org (190716—270716) (living & tagged & deleted & from where) = User:Gunnex/Cross-wiki uploads 19.07.2016–27.07.2016

Checks:

  • 1st run on 16:39, 28. Jul. 2016‎: 4.690 rows. 4.025 living (428 pending deletion), 665 deleted
  • 2nd run on 21:37, 28. Jul. 2016‎: 4.716 rows**, 4.043 living (489 pending deletion) 673 deleted
  • 3rd run on 08.08.2016: 4.775 rows**, 3.447 living (557 pending deletion) , 1.328 deleted

Update

  • 4th run on 13.08.2016: 4.772 rows**, 3.166 living (533 pending deletion), 1.606 deleted

Cross-wiki upload from *.wikipedia.org (190716—270716) (living & tagged & deleted & from where) = User:Gunnex/Cross-wiki uploads 19.07.2016–27.07.2016

Checks:

  • 1st run on 16:39, 28. Jul. 2016‎: 4.690 rows. 4.025 living (428 pending deletion), 665 deleted
  • 2nd run on 21:37, 28. Jul. 2016‎: 4.716 rows**, 4.043 living (489 pending deletion), 673 deleted
  • 3rd run on 08.08.2016: 4.775 rows**, 3.447 living (557 pending deletion), 1.328 deleted
  • 4th run on 13.08.2016: 4.772 rows**, 3.166 living (533 pending deletion), 1.606 deleted

Update

  • 5th run on 20.08.2016: 4.723 row**, 2.659 living (345 pending deletion), 2.064 deleted
Teles added a subscriber: Teles.Aug 20 2016, 11:51 PM

I did some queries to see how the abuse filter is doing, and it looks like it's doing nice – see T144992: Find out how new anti-copyvio abusefilters are affecting uploads for details, but the short version is: it definitely decreased the number of uploads (by around 75%) and it probably also improved the quality of the remaining uploads (but it's difficult to tell for sure until we wait a bit longer, there is just a month of data).

So, I think the abuse filters by @Steinsplitter and the improvements to AbuseFilter that made them possible are a sufficient solution to this task. I'm going to mark this task as resolved in a couple of days unless somebody convinces me otherwise.

As a side note, the discussion in the RFC started by @El_Grafo (https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Rfc:_Should_we_request_a_configuration_change_to_shut_down_cross-wiki_uploads.3F) is still ongoing. It looks like the effect of the filters was only mentioned very recently and very briefly; I'm not sure if this is going to change anyone's opinion there.

Yann added a comment.Sep 21 2016, 6:25 PM

I really want to see a big change in the percentage and number of copyvios before approving this.

IMHO this task can be closed; the main problem left now is whether this additional interface is wanted at all (a topic which I've addressed many times).

An interactive visualization of the CSV files that @matmarex shared - https://prtksxna.github.io/upload-stats/. Clicking around will show that bad uploads are correlated with new users no matter the tool. Hope it helps.

@Yann, using this very cool visualization and taking only the period between July 31st-Aug 20th, it would seem that the percentage of "bad" uploads coming through the crosswiki uploader has indeed went down significantly after the filters where put in place.

@Nemo_bis, in the big scheme of things (i.e. assuming infinite review capacity and/or near-perfect uploaders) the answer is clearly yes. But we're not in that situation, so perhaps a discussion is indeed needed. Is there a wiki page or phab item that is used for that discussion?

Is there a wiki page or phab item that is used for that discussion?

COM:VP/P. The majority is supporting to switch off.

It might also be appropriate to have such a discussion at Meta. This decision affects editors at all projects, not just Commons users.

It might also be appropriate to have such a discussion at Meta. This decision affects editors at all projects, not just Commons users.

It's Commons that has to deal with the mess, and Commons that has the power to (effectively) turn it off. It's Commons' business, however hurt the WMF might be about its fancy new toy being turned off, or however mildly inconvenienced some users on other projects may be.

It's Commons that has to deal with the mess, and Commons that has the power to (effectively) turn it off. It's Commons' business, however hurt the WMF might be about its fancy new toy being turned off, or however mildly inconvenienced some users on other projects may be.

Next time somebody asks why I believe Commons is anything but a friendly place I'll refer her to this comment. It sums up perfectly all the "we're busy, leave us alone to do whatever we want" attitude that can be seen in every and all interactions on Commons.

Commons' main purpose is to serve as a free image repository for other projects (either from the Wikimedia world or from elsewhere). By itself, it has little purpose for anyone but a few photographers that enjoy having their images marked as featured.

Nemo_bis closed this task as Invalid.Sep 23 2016, 10:58 PM

Please move discussions on Commons' purpose to another place.

Per above, I'm closing this task because we've long passed the initial scope (mostly "A/B test of different upload interfaces") and other discussions belong elsewhere on the wikis, while a new task can be filed for specific actionable items.

It might also be appropriate to have such a discussion at Meta. This decision affects editors at all projects, not just Commons users.

Quote: "Note: This is only the first step, intended to get feedback from the Commons community alone. Depending on the outcome, a consultation of the global Wikimedia community on meta may follow. --El Grafo (talk) 12:23, 4 August 2016 (UTC)"

Next time somebody asks why I believe Commons is anything but a friendly place I'll refer her to this comment. It sums up perfectly all the "we're busy, leave us alone to do whatever we want" attitude that can be seen in every and all interactions on Commons.

Are you expecting we to have the time to process all the files from everywhere friendlily, when Commons don't even have a fifth of the admin manpower of English Wikipedia (counting # of admins)?

Are you expecting we to have the time to process all the files from everywhere friendlily, when Commons don't even have a fifth of the admin manpower of English Wikipedia (counting # of admins)?

Yes, certainly. I won't go into the studies that the WMF did about the editor decline (I'm sure you can find them on meta), but if people are the bottleneck, perhaps it's time for an automated approach. Just as the Collaboration Team worked on 2 vandalism robots for en.wp this year, they can just as well help you with automated DRs for newly uploaded files with lots of matches in Tineye (or any other feature that the commons admins feel it would help clear the backlog). Think about what you need and ask for features in the November community consultation.

Yann added a comment.Sep 25 2016, 11:47 AM

@Strainu Sorry, but you don't know what you are talking about, and you comment is not appropriate. Commons has have its share of issues, but expecting Commons volunteers to work overtime for the benefit of others who don't care about the project daily management is not acceptable.

Strainu added a comment.EditedSep 25 2016, 12:08 PM

@Yann, you're twisting my words. I never said the community should work more, what I said is that they should be friendly (or at the very least polite) with uploaders no matter how they upload, that's all. I even offered an alternative way to have more new images covered.

Saying that only the Commons community should decide how the users can or cannot upload is inappropriate, contrary to the movement's purpose and generally discouraging to users. What if the community decides next that the API should be disabled?

Yann added a comment.Sep 28 2016, 5:30 PM

@Strainu: The Commons community is not a closed club. You are welcome to be part in it.

But yes, the Commons community should decide what can be uploaded (copyright rules) and how it could be. The volunteers who deal with cleaning the upload queue should have the last word about it. It has been like this for the last 12 years, and so far, it has always made good decisions on these issues. I don't see why it would change.

It has been like this for the last 12 years, and so far, it has always made good decisions on these issues.

That's where our opinions diverge. I can show you countless examples of bad, hastly and misinformed decisions taken over the years, plus the usual abuses of otherwise sound rules. And that is why more and more experienced editors prefer to upload their pictures on local Wikipedias and let someboday else handle the copyright issues. You are certainly not the only ones dealing with cleaning the upload queues, so you should not be the only ones taking decisions on that matter.

I think we have polluted this bug long enough. If you want to discuss this further, feel free to contact me on commons (same username)

Base added a subscriber: Base.Dec 21 2016, 1:08 PM