Can't upload PDF / ODF Hybrid
Open, NormalPublic


Author: fun-stuff

The new LibreOffice supports exporting PDFs in a hybrid ODF / PDF file format. When trying to upload such a file MediaWiki reports that the ZIP file is ambiguous or has been damaged. (I have a German installation so I can't tell you the exact error message.)

I already took ZIP files out of the MediaWiki Blacklist and added the file extensions PDF, ODT and ZIP.

I think includes/ZIPDirectoryReader.php checks the file and throws out the error cause it doesn't know the new file format yet.

The new PDF / ODF hybrid format makes it easy to open documents for everyone while maintaining the possibility to edit them which might also be a great thing for Wikipedia. Therefore, this is a major bug for me.
Please fix this and thanks for the great software.


Version: 1.18.x
Severity: normal

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz28188.
bzimport created this task.Mar 22 2011, 3:55 PM

Can you attach a sample file or provide a link to it?

fun-stuff wrote:

Test PDF/ODF (ODT) Hybrid Document

Can be edited with LibreOffice 3.3 Writer and viewed with any PDF viewer. But it cannot be uploaded in MediaWiki 1.18alpha.

Attached: test_pdf_odf_hybrid.pdf

The cause is "ZipDirectoryReader: Fatal error: trailing bytes after the end of the file comment".

In simple words, we expect zip files to be... zip files and not contain something scary. We need to hack our detector to handle zips embedded in something known.

fun-stuff wrote:

Thanks for clarification und for taking care of the problem so quickly. I hope you can fix this bug in the near future.

From comments in triage:

"Workaround: 'don't save your PDF that way'. (Problem with workaround: if someone else made the file, you might not know how to re-save it.)"

So, we thought about dealing with it: "This presents same security threats as a PDF file.... need to check security model, probable threats."

"Our security checks are working as intended by detecting that the files have been smashed together unexpectedly. Might be possible to tweak it to consider 'oh that's ok' but not sure how much we want to. If not careful might accidentally allow all sorts of evil appended to a PDF file."

fun-stuff wrote:

Thanks for the comments.

I can imagine that deciding whether this is an 'OK' PDF file saved as hybrid ODF or not is difficult to code. However, I think it would be a great loss if this wasn't implemented as this format is so versatile.

dovijacobs wrote:

Hi, I asked about this problem here (and was referred to this bug):

The embedded PDF is an extremely useful file format, and one of the best features in the open source LibreOffice project. It is becoming extremely popular and is already being used in hundreds of millions of files around the world.

Therefore, I'd like to reiterate the comment before mine, which was made nearly two years ago: "I think it would be a great loss if this wasn't implemented as this format is so versatile."

If that was true two years ago, it is far more true today. I hope it can be made a basic part of PDF support in Wikimedia projects.

dovijacobs wrote:

In the meantime I've been uploading classic texts and educational materials at Internet Archive instead of at the Commons:

This is extremely inconvenient for proper use at Wikimedia projects. I hope this will be taken care of eventually.

Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:33 PM
Restricted Application added subscribers: Steinsplitter, Matanya, Aklapper. · View Herald TranscriptSep 4 2015, 6:33 PM

Add Comment