Page MenuHomePhabricator

Upload images to Commons using description provided in spreadsheet.
Closed, DeclinedPublic

Description

Through a GLAM collaboration, Wikimedia Spain has obtained access to hundreds of images of old coins and their catalogue information under a CC-BY-SA license. A work flow is already in place to download the images from the collaborator's website, process them, categorize them and automatically create a Commons-format description of the file usin spreadsheet functions. Then each of the files is uploaded to Commons one by one, as the description has to be manually copied for each file to the appropriate field of the upload form. This last step is currently the bottleneck of the entire process.

We would like to have an automated system to upload batches of JPG files to Commons using as description the text provided in a spreadsheet. In addition, for good housekeeping, we would like the automated system to add a third column to the spreadsheet with the URL of the file in Commons.

As a test, we are providing a ZIP file with 25 of the images we would like to upload along with a spreadsheet in Open Office format.

Event Timeline

Hispalois raised the priority of this task from to Needs Triage.
Hispalois updated the task description. (Show Details)
Hispalois subscribed.

We would like to have an automated system to upload batches of JPG files to Commons using as description the text provided in a spreadsheet

Are you aware of anybody volunteering to work on this?

And could you elaborate what is specifically currently missing in https://www.mediawiki.org/wiki/Extension:GWToolset ?

And could you elaborate what is specifically currently missing in https://www.mediawiki.org/wiki/Extension:GWToolset ?

The ability to read from spreadsheets (T60510 for a start).

Also...

we would like the automated system to add a third column to the spreadsheet with the URL of the file in Commons.

this is obviously not going to happen via GWToolset.

I would pick pywikibot, it can upload files, it is well documented and many volunteers are already familiar with it, and it is trivial to integrate with Python libraries for manipulating CSV/Excel/whatever.

Note that Phabricator is probably not the best place to find a volunteer who will actually do this. Maybe the bot request page on Commons?

@Hispalois @Tgr Meanwhile, https://commons.wikimedia.org/wiki/User:Wmigda/Anuta_upload_script could help to process uploads from a spreadsheet. It reads a CSV file in a specific format, like the ones generated by https://commons.wikimedia.org/wiki/User:Nichalp/Upload_script.

Its pretty trivial to convert a spreadsheet to the format gwtoolset likes. One could still use gwtoolset with a little data manipulation

Nemo_bis changed the task status from Open to Stalled.Jun 29 2015, 11:19 AM