Page MenuHomePhabricator

Investigate how to move files with history from wikipedia to commons
Closed, ResolvedPublic13 Estimated Story Points

Description

Task:
Please find out to what extent it would be possible to integrate a solution into mediawiki for a technically correct moving of files from wikipedia to commons.
This includes:

  • How many images in de-wiki should better be in commons? To what rate does this increase over time?
  • Where would this feature be and how could it look like?
  • Would this feature be needed anywhere else than for wikipedia -> commons?
  • What is the estimation of work to build a feature that moves the file from wikipedia to commons keeping all information wikipedia has about that file in place? And that also contains the info who was the person moving the file to commons?
  • What would be the next steps to do? (ideally after this ticket, we could take a decision how to proceed and Lea would know exactly which tickets need to be written)

There is a request from 2006 here: T8071

Background:
There is a wish on the 2013 German-speaking community wishlist, requesting a technically correct option to move files from Wikipedia to Commons, keeping the file history and the user name history intact: T140462

Event Timeline

Lea_WMDE renamed this task from PLACEHOLDER: Investigate what should be done to allow moving files with history to Investigate how to move files with history from wikipedia to commons .Aug 3 2016, 10:24 AM
Lea_WMDE updated the task description. (Show Details)
Addshore moved this task from Backlog to Doing on the TCB-Team-Sprint-2016-08-02 board.
Addshore added a project: User-Addshore.
Addshore moved this task from Unsorted 💣 to Back Burner 🏛️ on the User-Addshore board.

Current tools to move files to commons from other projects:

CommonsHelper, http://tools.wmflabs.org/commonshelper/
ForTheCommonGood, https://github.com/atlight/ForTheCommonGood, http://en.wikipedia.org/wiki/WP:FTCG

The German wish @ https://de.wikipedia.org/wiki/Wikipedia:Technische_W%C3%BCnsche/Topw%C3%BCnsche#Technisch_sauberes_Verschieben_von_Dateien_nach_Commons_unter_Beibehaltung_der_Versionsgeschichte_und_des_Benutzernamens_.5BUmfrage_2013.5D

Translated into English: Technically clean move files to Commons while maintaining the version history and user name (2013 - 15 points)

Main points:
Requires global account (All WMF accounts are now global)
Users should have the same privileges as the XML-Ex / Import (can be done)
Make use of the Import / Export Feature (Needs fixing up)

The only way to maintain the link to the user account would be to have some sort of integrated solution (extension) rather than a tool or external program.
Otherwise random users would end up being able to attribute random edits to other users (not themselves)

Currently Special:Export will return all versions of the page (for file description) as well as the correct users etc.
The only thing that is missing here is the versions of the images. These can be retrieved using the API with action=query and prop=imageinfo which will return the URLs of the files.
A combination of these could be used by an extension to import the whole history of the description page, as well as images while keeping the correct global users attached to the edits.
It should also be noted that although Special:Export can be used all of the information needed could also be gathered using API requests.
Also in the case of the WMF where all sites dbs can be accessed by other sites the API could be avoided all together.

Note: While discussing this with people there was a mention of the WMF potentially wanting to move the files within swift (the file storage backend) rather than by downloading & reuploading them (which of course makes sense but adds some complexity.

To me it looks like the solution would be in an extension doing roughly what is described above.

The file description page is also a bit of an issue, as different templates will be used on different sites. It may make sense to have some sort of translator within a possible extension that converts the templates. An alternative would be to implement this within an external tool?
The extension could also have a whitelist of templates that work, and thus only allow transferring files with description pages that use those templates. It could also have some system built in where upon finding a template that it does not know how to translate allowing the user to specify how to translate it!

I digress.... back to the initial questions:

How many images in de-wiki should better be in commons? To what rate does this increase over time?

As far as I can tell by https://de.wikipedia.org/wiki/Kategorie:Datei:Commonsf%C3%A4hig there are currently just under 400 files to move (most of which are logos)
The English wikipedia has a much larger backlog, https://en.wikipedia.org/wiki/Category:Copy_to_Wikimedia_Commons currently displays 259,901 images to be moved (this number is decreasing and the history can be seen on that page).

Where would this feature be and how could it look like?

See the block of text above!

Would this feature be needed anywhere else than for wikipedia -> commons?

I imagine all projects would find a use for it, although as described above if developed as an extension this may only need to be deployed on commons.

What is the estimation of work to build a feature that moves the file from wikipedia to commons keeping all information wikipedia has about that file in place? And that also contains the info who was the person moving the file to commons?

The extensions looks like a fair bit of effort, worrying about also translating templates to the commons versions / some conversion step will increase the estimate rather allot.

What would be the next steps to do?

  • Decide if the extension is the right way forward.
  • Decide how to get the data exactly, Special:Export? API? DB? Swift movement?
  • Decide what to do about the translate step / conversion step.
  • Decide if this extension should be specific to the WMF or should also work in other situations (probably relates to how we get the data)
  • What should happen once the image has been copied over? Do we need to worry about that at all?
  • I'm sure there are other things that have also been missed in this investigation that we need to consider.

All comments welcome...

As far as I can tell by https://de.wikipedia.org/wiki/Kategorie:Datei:Commonsf%C3%A4hig there are currently just under 400 files to move (most of which are logos)

It has subcategories, so the number will be slightly larger, but similar order of magnitude. Also, I think there are some files there which I think might not be suitable for Commons for legal reasons, such as this one? So I guess we can't move them over in an automated way, users would have to decide for each file.

I imagine all projects would find a use for it, although as described above if developed as an extension this may only need to be deployed on commons.

It would have to be deployed on each wiki that contains the source files, right?

It would have to be deployed on each wiki that contains the source files, right?

Well, if the extension simply uses the API or DB or special:export to get the data, I see no reason such an extension would have to be on the wiki that the image is coming from, only on the wiki the image is going to (so commons for us).

I imagine there would be a config for the extension which lists wikis and how to get the data for them? / what their db is / where their api is. This could also be done using some magic and the current site list.

To summarize: Going for an extension looks rather straight forward. The extension is probably going to be part of Commons, but then called from elsewhere (e.g. Wikipedia) to do the work. This might happen throuhgh a gadget on the wikipedias, or as part of the extension itself. Open so far is how exactly the integration should look like, i.e. where and how users can interact with it.