Page MenuHomePhabricator

Allow Commons to Accept Books in Zip format
Closed, InvalidPublic


It's common to upload books in books in a zip file because a book scan consists of hundreds, if not thousands of pages, as one unit. Furthermore, the copyright status of all the images depends on a singular image in the set. Commons should allow books to be uploaded as a zip file.

Event Timeline

Nope nope nope, land of 10,000 nopes.

We've had enough problems with people trying to slip malicious/copyvio ZIPs into other files, we don't need to let them upload ZIPs directly.

I understand your skepticism and hesitation, but I think that this can be mitigated through two steps.

  1. An automatic review of the zip contents to ensure that only images are present.
  2. Triaging the file into an admin-only section until it is cleared.

The need to upload zip files is important for being able to upload book scan directly to Commons for use in Wikisource. While most of the books in Wikisource come from IA, many also come from Haiti Trust or other sources that provide books as individual image files. Now, if we want to add a book to Wikisource, we need to upload to IA, wait several days, and then have it imported into Commons via IA. This is just to simplify the steps of adding a books to Wikisource. If IA can figure out a system for dealing with scanned books in individual image files stored in a zip, which technical barriers are preventing Commons from doing the same?

If you have a ZIP file with images and don't know how to produce an appropriate multi-page document like DjVu or PDF, I recommend using the Internet Archive.

It's where most of the action is nowadays:

If IA can figure out a system for dealing with scanned books in individual image files stored in a zip, which technical barriers are preventing Commons from doing the same?

"If Ford can make a car to go from A to B, why can't Fincantieri build a ship to go from Portland to Las Vegas?" IA is a sort of open archive, Wikimedia Commons is a wiki. Different things. We don't necessarily need to do everything they're already doing fine enough.

You may want to comment on the various ongoing projects like ebook export and IA-upload:

This report is too vague so I'm closing it as invalid.

I'm sorry how is this vague. I'm asking for an extremely specific task. The ability to upload a set of images representing the scan of one physical book stored in a zip container. If we don't want to handle zip files than the existing rate limit of 380upload / 72mins ratelimit would prevent the books from being opened.

@Nemo_bis Why did you close this task?

Aklapper changed the task status from Invalid to Declined.Mar 19 2021, 10:39 PM

The report is too vague because it talks about all sorts of different use cases and issues, for which it proposes one solution. So it ends up being a duplicate of half a dozen other existing tickets, or none of them. I could inundate you with dozens of previous reports but I think it's better that you follow the three links above and participate there first.

Also, if the feature were to be described as:

ability to upload a set of images representing the scan of one physical book stored in a zip container

this would be already solved: we currently store such ZIP files to the Internet Archive, and then proceed on Commons and Wikisource with what we really need (images for proofreading etc.).

P.s.: It's not "declined" in the sense that the things you're asking about are declined, although it would be "declined" if it were to be interpreted as a configuration request for Wikimedia Commons. Hence I prefer to consider it "Invalid". A better formulated version of this report could be either declined or a duplicate of an existing valid feature request.

Aklapper changed the task status from Declined to Invalid.Mar 22 2021, 7:46 PM