Page MenuHomePhabricator

Disallow uploading of uncompressed tiff files
Open, Needs TriagePublic

Description

or files that have been compressed with very low level. This would protect us from issues described in T427949 including but not limited to a lot of money spent on hardware.

An even better way would be to compress it on the fly by us when being uploaded but I haven't looked at the code to make a decision on it.

Event Timeline

I was looking into how to implement this and it looks like you would have to check something like a compression ratio.
A file you mentioned: https://commons.wikimedia.org/wiki/File:M_3209762_sw_14_060_20200929.tif as being a problem is considered compressed with Adobe Deflate in the exif metadata. So the code couldn't just check if the metadata says it is compressed.

If we have to compress it to see how much is compressed, we can simply just save the compressed version then :D

Note: tiff supports both lossy and lossless compression. I assume lossless compression is being discussed here.

Why is the compression not done on file system level? I think most archives and backup systems do not compress the single files but on file system level. This would also enable deduplication what is not done when compressing singe files.

I have looked at this a bit. TLDR is that it's not really possible and on top of that it breaks the abstraction between storage infrastructure vs the application relying on them. Also filesystem compression is not as efficient as the file format-level compression.

If it is not possible to do this on general file system level, some mechanism for server side compression during upload and uncompression during download would be better. Putting this into the responsibility of the uploader is not a good solution. The main complaint of possible contributors is that contributing is to complicated.

There is one somehow related topic when discussing storage limitations: There are many discussions about having more video content on Commons. This will require massive amounts of storage despite being already heavily compressed.