Page MenuHomePhabricator

Disable FileExporter on all Arabic projects per Commons community request
Closed, DeclinedPublic

Description

We are told certain communities are not allowed to do mass-imports to Commons via bot. It is suggested to:

  • Either not promote file-transfers via FileImporter to these communities, e.g. by not showing the "Export to Wikimedia Commons" button.
  • Or make it easier for the Commons community to identify such transfers, e.g. by marking all files with a [[Category:Unchecked import from …]].

So far only the Arabic communities have been mentioned (as an example?).

Quote from https://www.mediawiki.org/wiki/Topic:Uyydhk03bpmnxcm4:

Two projects […] potentially problematic with respect to copyrights are Arabic Wikipedia and Arabic Wikisource. We had a declined bot request for transferring files from those two projects to Commons early last year, which revealed the following quotes: "Different Wikipedias have different attitude toward copyrights and its enforcement." by EugeneZelenko; "Seems to be an overall lack of consensus on Arabian Wikipedia about the process for how this bot would work. One commented that they don't trust Commons because there files got deleted for no convincing reason, another mentioned copyright issues and perhaps images not being transferable due to licenses." by ~riley; and my "I have issues with whoever is selecting these files for transfer, and that person's attention to detail (or lack thereof)." 60% of the files selected as examples for transfer (which should have been the best ones) had copyright problems which made them unsuitable for Commons or any WMF project. Admittedly, five is a very small sample size, but 60% is a huge error rate.

Event Timeline

So only commons community consensus needed! Or Arabic Wikipedia consensus also?

@alanajjar, we are looking for input from all affected communities. Additional ideas other than the two I briefly described in this tasks description are very much welcome as well!

Hmm I'm a bit unsure if the problem with the automatic bot imports is comparable to what the FileImporter does.

The FileImporter does not allow imports of files that are not explicitly labeled on the source wiki to have the right license. - So mass-imports of random files with unclear status is per se not easily possible.

The above statement obviously depends on how license templates are handled on the source. If files easily get a CC-0 or CC-by license although it's not appropriate, the FileImporter can't control that.

Hmm I'm a bit unsure if the problem with the automatic bot imports is comparable to what the FileImporter does.

The FileImporter does not allow imports of files that are not explicitly labeled on the source wiki to have the right license. - So mass-imports of random files with unclear status is per se not easily possible.

The above statement obviously depends on how license templates are handled on the source. If files easily get a CC-0 or CC-by license although it's not appropriate, the FileImporter can't control that.

We have a simple solution for the problem here, which is to allow only users who are in the following user groups on Arabic projects:

  • sysop (26 users on arwiki alone, not counting other Arabic projects)
  • rollbacker (188 users on arwiki)
  • editor (886 users on arwiki)

to use the tool and disable it for other users.

  • All these user rights are granted to users after a thorough review by the community. (Note: These user rights are not granted automatically to users as the case of autoconfirmed but they require manual review, so, we can trust the users in these groups.)
  • FileExporter is a wonderful tool and it surely took a lot of time and effort from Wikimedia developers to code it, so, I don't believe that disabling it for all users in a large project (Arabic Wikipedia is the 17th Wikipedia by number of articles) is the solution here, as it will be a waste of developers efforts.

Hi! As @Meno25 mentionned, tool can be limited to specific user groups (sysops, rollbackers, fileimporters and editors). Also, by defaut, we can add to those imported files the existing category [[Category:Files moved from ar.wikipedia to Commons requiring review]].

Blacklisting a Wikimedia project doesn't seem right to me as long as Wikimedia Commons is supposed to be a shared file repository for all Wikimedia projects. Also it'd be still possible to move files to Commons, by disallowing FileExporter you'd only impede copying file history properly.

Based on current comments it's rather unclear what is the actual scale of this problem with file exports on Arabic Wikipedia and if FileExporter has significant effect on it. Three examples of deleted files above in fact were moved before FileImporter/FileExporter were enabled. If export link should be available based on user group then do we actually know that problematic moves are (mostly) by users outside of these selected user groups?

If root of this problem is that particular projects are lax on copyright then proper solution in my opinion would be that these projects tackle this actual problem. For instance they could mark all files or files with some problematic licenses for individual review. (Then similar to Bild-PD-Schöpfungshöhe template on de.wikipedia "NoCommons" tag would block the import until particular file is reviewed as suitable move to Commons candidate.) Then maybe you could also disable FileExporter (hopefully as a temporary solution) and instead of tab link make the export link available in review tag after positive review of particular file.

As for categorizing all imported files as "unchecked" or "requiring review" per project, it's unclear to me what would be the purpose. Bots and some older tools used BotMoveToCommons tag (includes a tracking category) as bot moves may not have been reviewed during file transfer or because tools allowed limited editing of wikitext and then further cleanup was needed after transfer. This is not the case for FileImporter that asks users to "check the page in detail before importing" and allows cleaning up everything during import. So far there have been no particular review procedure and everyone have been able to check their own file transfers. So I don't see a point for submitting all files for rereview, at least not without developing a clear review procedure beforehand.

Blacklisting a Wikimedia project doesn't seem right to me as long as Wikimedia Commons is supposed to be a shared file repository for all Wikimedia projects. Also it'd be still possible to move files to Commons, by disallowing FileExporter you'd only impede copying file history properly.

Based on current comments it's rather unclear what is the actual scale of this problem with file exports on Arabic Wikipedia and if FileExporter has significant effect on it. Three examples of deleted files above in fact were moved before FileImporter/FileExporter were enabled. If export link should be available based on user group then do we actually know that problematic moves are (mostly) by users outside of these selected user groups?

If root of this problem is that particular projects are lax on copyright then proper solution in my opinion would be that these projects tackle this actual problem. For instance they could mark all files or files with some problematic licenses for individual review. (Then similar to Bild-PD-Schöpfungshöhe template on de.wikipedia "NoCommons" tag would block the import until particular file is reviewed as suitable move to Commons candidate.) Then maybe you could also disable FileExporter (hopefully as a temporary solution) and instead of tab link make the export link available in review tag after positive review of particular file.

As for categorizing all imported files as "unchecked" or "requiring review" per project, it's unclear to me what would be the purpose. Bots and some older tools used BotMoveToCommons tag (includes a tracking category) as bot moves may not have been reviewed during file transfer or because tools allowed limited editing of wikitext and then further cleanup was needed after transfer. This is not the case for FileImporter that asks users to "check the page in detail before importing" and allows cleaning up everything during import. So far there have been no particular review procedure and everyone have been able to check their own file transfers. So I don't see a point for submitting all files for rereview, at least not without developing a clear review procedure beforehand.

Yeah, blacklisting a language because small sample size copyvio is problematic. Tools should not be incorporating unhealthy community practices such as "stop all transfers". Rather tools should incorporate best practices such as wikidata copyright statement.

Hello,
three ideas were suggested in this discussion:

  1. Disable the "Export to Wikimedia Commons" button for all users on a certain project.
  2. Show the "Export to Wikimedia Commons" button only to users with certain rights.
  3. Make it easier for the Commons community to identify from which wiki file transfers are coming.

Disabling the export feature for all users on a project would mean that no Commons-compatible files could be transferred from there either, which is why the Technical Wishes team would refrain from doing this.

The team can investigate if options 2 (T232480) and 3 (T232481) are doable, though. Could you discuss in the Commons community if these changes are wanted, and which aspects would be important to consider if we implemented these changes?

Thanks a lot,
Johanna

Hello again,
we're currently picking up some of the left overs in the Move-Files-To-Commons realm and came back to this issue and possible options to help. We try to focus on options how to better identify imports from specific wikis so they can be filtered for potential cleanups. We currently have two options for that in mind.

  • (1) use the existing comment added to the wikitext and the import log entries
    • there is a comment in the format <!--This file was moved here using FileImporter from //en.wikipedia.org/wiki/[filename]--> added to each import
    • imports show up in the import log with a simmilar message as summary line Imported with FileImporter from //en.wikipedia.org/wiki/[filename]
    • imports are tagged with the fileimporter tag [1]
    • the comment could be used with CirrusSearch with insource: to identify uploaded images from specific projects [2]
    • both could in theory be used to trigger bots that are watching recent changes or uploads
  • (2) add a specific category to identify uploads that are done by the FileImporter that communicate the source wiki
    • these could for example say something like Importet from en.wikipedia.org

The main question is, if the current wiki text comment and import log entry would be enough to support potential clean up needs or if it would be better to add additional information like the specific categories mentioned above.

[1] https://commons.wikimedia.org/wiki/Special:Log?type=import&tagfilter=fileimporter
[2] https://commons.wikimedia.org/wiki/Special:Search?search=insource:"This+file+was+moved+here+using+FileImporter+from+//en.wikipedia"

@Meno25 @Helmoony @Jeff_G @Slowking4 it would be really great if we got some feedback on the above questions. We currently try to figure out what additional data would be helpful so that the communities can better deal with situations like this. Thanks! :-)

rather than restricting importing, or stopping all imports, you should elevate potential problems, and put on a maintenance category. i.e. https://commons.wikimedia.org/wiki/Category:Image_overwrites_by_Jan_Arkesteijn_for_independent_review
(data searches are fine but undecipherable to some community members)

however,
you will then need to recruit volunteers to work your quality improvement backlog. (this last task is frequently missing) my quality improvement days on commons are over, so you will have to look elsewhere,
but using technical tools to stop a process is a symptom of the dysfunctional community at commons, would not want to enable that unhealthy behavior,

A few thoughts about general options to track imports. I think ideally the import log could have a field which allows filtering imports by source (maybe by wildcarding interwiki prefixes that are already present in log entries of page imports in order to link to source page). So imports could be found by source via simple interface, and in more reliable manner, without depending on page content that can be changed during/after import.

However, for a start it may be enough to amend FileImporter documentation by adding a few examples of "insource:" searches and/or Quarry queries based on edit comment (which presumably is more reliable). For a start we could also rely on community technicians that already run bots to generate wide variety of regular reports on Commons (e.g. if needed then someone can probably run a bot to generate pages with chronological lists of imports from particular source). Then based on what kind of queries or reports are more popular, you might get a better idea of what changes would be worth-while software-wise.

Hey @Pikne and thanks for the input. We decided now on providing a configurable template that will be automatically appended to the file info of the imported files. The Commons community can then further define how this template should be used to better surface the fact that the file was imported see the ticket and discussion around that here: T256205: Add configurable template to imported file info

Adding examples to the documentation, how insource: could be used is nevertheless also a good idea. I'll forward that in form of a ticket.

@Meno25 @Helmoony @Jeff_G @Slowking4 it would be really great if we got some feedback on the above questions. We currently try to figure out what additional data would be helpful so that the communities can better deal with situations like this. Thanks! :-)

I think it would be better to add additional information like the specific categories mentioned above.

I like FileImporter and I thinks it is a bad idea to block it on some wikis. If good users verify that files are good then those files should be possible to move to Commons.

I think the solution is to make the problematic wikis clean up. In worst case it should be possible to start a discussion on meta and perhaps give the wiki a final chance to clean up and if they don't then do a mass delete of all files and close for local uploads.

I have 2 questions:

  1. If the template is set up on the configuration page what prevent users to remove it from the configuration page or from the page on Commons?
  1. Why is a template better than a category?

Thanks for looking into this!

  1. It's true, users can edit the configuration pages. However, there is a bit of protection as users need to be autoconfirmed to do this, see T202071#6520637.
  2. For what FileImporter cares about there is no difference between templates and categories. I believe the question is about T256205: Add configurable template to imported file info. There is a brief description of the feature at https://meta.wikimedia.org/wiki/WMDE_Technical_Wishes/Move_files_to_Commons#How_it_works. The message https://commons.wikimedia.org/wiki/MediaWiki:Fileimporter-post-import-revision-annotation is currently empty, but can be used to either add a category or template to every imported file on Commons. One advantage of a template is that you can change it without touching the message in the MediaWiki: namespace.

In that case I prefer that we add information during import (template or category) instead of blocking for all imports.

I agree that a template has some advantages so it would be okay with me to add one.

Good to hear that! However, from this point on it's up to the Commons community to do this and make use of https://commons.wikimedia.org/wiki/MediaWiki:Fileimporter-post-import-revision-annotation.

awight added a subscriber: awight.

Closing this because we've decided not to disable FileExporter, but instead we've made it possible to monitor exports from specific wikis. Please feel free to continue the discussion here, we're still interested in finding an outcome that works for everyone.