Page MenuHomePhabricator

Disable FileExporter on all Arabic projects per Commons community request
Open, Needs TriagePublic

Description

We are told certain communities are not allowed to do mass-imports to Commons via bot. It is suggested to:

  • Either not promote file-transfers via FileImporter to these communities, e.g. by not showing the "Export to Wikimedia Commons" button.
  • Or make it easier for the Commons community to identify such transfers, e.g. by marking all files with a [[Category:Unchecked import from …]].

So far only the Arabic communities have been mentioned (as an example?).

Quote from https://www.mediawiki.org/wiki/Topic:Uyydhk03bpmnxcm4:

Two projects […] potentially problematic with respect to copyrights are Arabic Wikipedia and Arabic Wikisource. We had a declined bot request for transferring files from those two projects to Commons early last year, which revealed the following quotes: "Different Wikipedias have different attitude toward copyrights and its enforcement." by EugeneZelenko; "Seems to be an overall lack of consensus on Arabian Wikipedia about the process for how this bot would work. One commented that they don't trust Commons because there files got deleted for no convincing reason, another mentioned copyright issues and perhaps images not being transferable due to licenses." by ~riley; and my "I have issues with whoever is selecting these files for transfer, and that person's attention to detail (or lack thereof)." 60% of the files selected as examples for transfer (which should have been the best ones) had copyright problems which made them unsuitable for Commons or any WMF project. Admittedly, five is a very small sample size, but 60% is a huge error rate.

Event Timeline

Restricted Application added a subscriber: alanajjar. · View Herald TranscriptMay 3 2019, 12:20 PM

So only commons community consensus needed! Or Arabic Wikipedia consensus also?

@alanajjar, we are looking for input from all affected communities. Additional ideas other than the two I briefly described in this tasks description are very much welcome as well!

Hmm I'm a bit unsure if the problem with the automatic bot imports is comparable to what the FileImporter does.

The FileImporter does not allow imports of files that are not explicitly labeled on the source wiki to have the right license. - So mass-imports of random files with unclear status is per se not easily possible.

The above statement obviously depends on how license templates are handled on the source. If files easily get a CC-0 or CC-by license although it's not appropriate, the FileImporter can't control that.

Meno25 added a comment.EditedMay 12 2019, 7:57 PM

Hmm I'm a bit unsure if the problem with the automatic bot imports is comparable to what the FileImporter does.

The FileImporter does not allow imports of files that are not explicitly labeled on the source wiki to have the right license. - So mass-imports of random files with unclear status is per se not easily possible.

The above statement obviously depends on how license templates are handled on the source. If files easily get a CC-0 or CC-by license although it's not appropriate, the FileImporter can't control that.

We have a simple solution for the problem here, which is to allow only users who are in the following user groups on Arabic projects:

  • sysop (26 users on arwiki alone, not counting other Arabic projects)
  • rollbacker (188 users on arwiki)
  • editor (886 users on arwiki)

to use the tool and disable it for other users.

  • All these user rights are granted to users after a thorough review by the community. (Note: These user rights are not granted automatically to users as the case of autoconfirmed but they require manual review, so, we can trust the users in these groups.)
  • FileExporter is a wonderful tool and it surely took a lot of time and effort from Wikimedia developers to code it, so, I don't believe that disabling it for all users in a large project (Arabic Wikipedia is the 17th Wikipedia by number of articles) is the solution here, as it will be a waste of developers efforts.

Hi! As @Meno25 mentionned, tool can be limited to specific user groups (sysops, rollbackers, fileimporters and editors). Also, by defaut, we can add to those imported files the existing category [[Category:Files moved from ar.wikipedia to Commons requiring review]].

Pikne added a subscriber: Pikne.EditedMay 13 2019, 6:51 AM

Blacklisting a Wikimedia project doesn't seem right to me as long as Wikimedia Commons is supposed to be a shared file repository for all Wikimedia projects. Also it'd be still possible to move files to Commons, by disallowing FileExporter you'd only impede copying file history properly.

Based on current comments it's rather unclear what is the actual scale of this problem with file exports on Arabic Wikipedia and if FileExporter has significant effect on it. Three examples of deleted files above in fact were moved before FileImporter/FileExporter were enabled. If export link should be available based on user group then do we actually know that problematic moves are (mostly) by users outside of these selected user groups?

If root of this problem is that particular projects are lax on copyright then proper solution in my opinion would be that these projects tackle this actual problem. For instance they could mark all files or files with some problematic licenses for individual review. (Then similar to Bild-PD-Schöpfungshöhe template on de.wikipedia "NoCommons" tag would block the import until particular file is reviewed as suitable move to Commons candidate.) Then maybe you could also disable FileExporter (hopefully as a temporary solution) and instead of tab link make the export link available in review tag after positive review of particular file.

As for categorizing all imported files as "unchecked" or "requiring review" per project, it's unclear to me what would be the purpose. Bots and some older tools used BotMoveToCommons tag (includes a tracking category) as bot moves may not have been reviewed during file transfer or because tools allowed limited editing of wikitext and then further cleanup was needed after transfer. This is not the case for FileImporter that asks users to "check the page in detail before importing" and allows cleaning up everything during import. So far there have been no particular review procedure and everyone have been able to check their own file transfers. So I don't see a point for submitting all files for rereview, at least not without developing a clear review procedure beforehand.

Blacklisting a Wikimedia project doesn't seem right to me as long as Wikimedia Commons is supposed to be a shared file repository for all Wikimedia projects. Also it'd be still possible to move files to Commons, by disallowing FileExporter you'd only impede copying file history properly.

Based on current comments it's rather unclear what is the actual scale of this problem with file exports on Arabic Wikipedia and if FileExporter has significant effect on it. Three examples of deleted files above in fact were moved before FileImporter/FileExporter were enabled. If export link should be available based on user group then do we actually know that problematic moves are (mostly) by users outside of these selected user groups?

If root of this problem is that particular projects are lax on copyright then proper solution in my opinion would be that these projects tackle this actual problem. For instance they could mark all files or files with some problematic licenses for individual review. (Then similar to Bild-PD-Schöpfungshöhe template on de.wikipedia "NoCommons" tag would block the import until particular file is reviewed as suitable move to Commons candidate.) Then maybe you could also disable FileExporter (hopefully as a temporary solution) and instead of tab link make the export link available in review tag after positive review of particular file.

As for categorizing all imported files as "unchecked" or "requiring review" per project, it's unclear to me what would be the purpose. Bots and some older tools used BotMoveToCommons tag (includes a tracking category) as bot moves may not have been reviewed during file transfer or because tools allowed limited editing of wikitext and then further cleanup was needed after transfer. This is not the case for FileImporter that asks users to "check the page in detail before importing" and allows cleaning up everything during import. So far there have been no particular review procedure and everyone have been able to check their own file transfers. So I don't see a point for submitting all files for rereview, at least not without developing a clear review procedure beforehand.

Yeah, blacklisting a language because small sample size copyvio is problematic. Tools should not be incorporating unhealthy community practices such as "stop all transfers". Rather tools should incorporate best practices such as wikidata copyright statement.

Hello,
three ideas were suggested in this discussion:

  1. Disable the "Export to Wikimedia Commons" button for all users on a certain project.
  2. Show the "Export to Wikimedia Commons" button only to users with certain rights.
  3. Make it easier for the Commons community to identify from which wiki file transfers are coming.

Disabling the export feature for all users on a project would mean that no Commons-compatible files could be transferred from there either, which is why the Technical Wishes team would refrain from doing this.

The team can investigate if options 2 (T232480) and 3 (T232481) are doable, though. Could you discuss in the Commons community if these changes are wanted, and which aspects would be important to consider if we implemented these changes?

Thanks a lot,
Johanna

4nn1l2 added a subscriber: 4nn1l2.Sep 14 2019, 12:10 AM
WMDE-Fisch added a comment.EditedTue, May 19, 11:50 AM

Hello again,
we're currently picking up some of the left overs in the Move-Files-To-Commons realm and came back to this issue and possible options to help. We try to focus on options how to better identify imports from specific wikis so they can be filtered for potential cleanups. We currently have two options for that in mind.

  • (1) use the existing comment added to the wikitext and the import log entries
    • there is a comment in the format <!--This file was moved here using FileImporter from //en.wikipedia.org/wiki/[filename]--> added to each import
    • imports show up in the import log with a simmilar message as summary line Imported with FileImporter from //en.wikipedia.org/wiki/[filename]
    • imports are tagged with the fileimporter tag [1]
    • the comment could be used with CirrusSearch with insource: to identify uploaded images from specific projects [2]
    • both could in theory be used to trigger bots that are watching recent changes or uploads
  • (2) add a specific category to identify uploads that are done by the FileImporter that communicate the source wiki
    • these could for example say something like Importet from en.wikipedia.org

The main question is, if the current wiki text comment and import log entry would be enough to support potential clean up needs or if it would be better to add additional information like the specific categories mentioned above.

[1] https://commons.wikimedia.org/wiki/Special:Log?type=import&tagfilter=fileimporter
[2] https://commons.wikimedia.org/wiki/Special:Search?search=insource:"This+file+was+moved+here+using+FileImporter+from+//en.wikipedia"

@Meno25 @Helmoony @Jeff_G @Slowking4 it would be really great if we got some feedback on the above questions. We currently try to figure out what additional data would be helpful so that the communities can better deal with situations like this. Thanks! :-)

Slowking4 added a comment.EditedMon, May 25, 1:08 PM

rather than restricting importing, or stopping all imports, you should elevate potential problems, and put on a maintenance category. i.e. https://commons.wikimedia.org/wiki/Category:Image_overwrites_by_Jan_Arkesteijn_for_independent_review
(data searches are fine but undecipherable to some community members)

however,
you will then need to recruit volunteers to work your quality improvement backlog. (this last task is frequently missing) my quality improvement days on commons are over, so you will have to look elsewhere,
but using technical tools to stop a process is a symptom of the dysfunctional community at commons, would not want to enable that unhealthy behavior,