Whitelist OASIS OpenDocument file format
OpenPublic

Description

Author: leercontainer-bugzilla

Description:
Currently (as far as I'm aware) you can upload OpenOffice.org 1.x files, at
least with the extension ".sxw". OpenOffice.org 2.x uses the new OASIS file
format (see link).

The file upload whitelist should be extended to also include at least ".odt"
(writer), and possibly also ".odp", ".odg", ".odb".

OpenOffice-documents are useful for providing presentations and promotions.


Version: unspecified
Severity: enhancement
URL: https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=43151
https://bugzilla.wikimedia.org/show_bug.cgi?id=40504
https://bugzilla.wikimedia.org/show_bug.cgi?id=71954

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz2089.
bzimport created this task.Via LegacyMay 6 2005, 11:00 AM
bzimport added a comment.Via ConduitMay 7 2005, 1:49 PM

jeluf wrote:

The trouble with file formats is MSIE. It tries to autodetect the file format of
a file that it downloads. If it *looks* like HTML, MSIE will display it -
executing the JavaScript that is in the file.

To add a file format, there must be a way to check that the file really is using
that format.

bzimport added a comment.Via ConduitSep 3 2005, 3:38 PM

iztok.jeras wrote:

I am using OpenOffice.org to create figures on Wikibooks and there are many
figures for a single book. I would like to upload .odg source files for those
images, so that contributors could modify them.

About the MSIE problem, OpenDocument files are compressed (they do not look like
HTML), just unzip them and if there are no errors the file should be OK. If you
are still concerned about the achieve content, they can be scanned for viruses...

Raymond added a comment.Via ConduitJun 30 2006, 8:05 PM

It would be great to enable OASIS-fileformats for Commons at least. Storing all
kind of documents that can be updated later would be a great benefit.

daniel added a comment.Via ConduitFeb 9 2007, 1:54 PM

I would suggest to *not* allow *any* styled text or presenation files on
commons. text on wmf projects should generally be wikitext. That being said,
having presentations and promotional material in those formats may make sense
for meta, wikimediafoundation.org, wikimania sites, etc.

just my 2p

Raymond added a comment.Via ConduitFeb 28 2007, 8:17 AM
  • Bug 9127 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitFeb 28 2007, 8:23 AM

christof.hahn wrote:

I'm a Author form Wikibooks Germany. In the last two months I start a project
for schoolbooks. So the intent for this Project is to develop materials for
teachers and pupils. So what we need is the support of the ODF-Format and a
place where we can upload every raw material in every format where teacher can
spend her existing learn materials. And of the other side we need the
possibility to upload 7-Zip to bundle learning material.

bzimport added a comment.Via ConduitAug 21 2007, 10:47 PM

jeroenvrp wrote:

I think it's a good idea to disallow it on commons, but enable those file-formats by default and let the e.g. the wikipedia-projects by themselves decide if they want to allow those formats on their projects.

Also don't forget the .ods-files (spreadsheets).

Jeroenvrp

bzimport added a comment.Via ConduitSep 3 2007, 1:52 AM

mail wrote:

In includes/mime.types the line

application/zip zip jar xpi sxc stc sxd std sxi sti sxm stm sxw stw

must be changed to

application/zip zip jar xpi sxc stc sxd std sxi sti sxm stm sxw stw odt ods odp odg odf

This really should be the default MediaWiki configuration. Not being able to upload the only standardised Office file format to the most common Wiki software is kind of strange...

brion added a comment.Via ConduitMar 28 2008, 12:14 AM

Just a note -- the old StarOffice formats were disabled some time ago. OpenOffice (ODF) formats are enabled on our private/internal wikis, but not on the general wikis or in the MediaWiki default configuration.

An additional note is we have no current way to validate uploaded files as being ODF.

bzimport added a comment.Via ConduitMar 28 2008, 12:28 PM

rmh wrote:

(In reply to comment #9)

Just a note -- the old StarOffice formats were disabled some time ago.
OpenOffice (ODF) formats are [...]

Note that the division is not really StarOffice/OpenOffice. Both were using the old formats before, and use OpenDocument now (along with a lot of other apps, since that was the point of standarising it).

An additional note is we have no current way to validate uploaded files as
being ODF.

As long as the check is filename-based, this problem isn't introduced by adding "odt ods odp odg odf" to the list, since you can pass any kind of ZIP file as *.zip already.

brion added a comment.Via ConduitMar 28 2008, 7:24 PM

We don't allow .zip files. :)

Platonides added a comment.Via ConduitMar 28 2008, 10:38 PM

(In reply to comment #9)

An additional note is we have no current way to validate uploaded files as
being ODF.

That's not a reason for not enabling odf, we don't really validate many types (bug 10823)
<spam>I am independently validating commons uploads at #commons-image-uploads2 (it isn't
so hard)</spam> and by allowing odf (at big projects), it wouldn't be that hard,
just like pdfs: you need to manually review all of them and delete almost everyone.

brion added a comment.Via ConduitMar 28 2008, 11:02 PM

Currently allowed formats on Commons are:

png, gif, jpg, jpeg, xcf, pdf, mid, ogg, svg, djvu

I'm fairly certain we do at least magic-number signature validation on all of those now. PNG, GIF, and JPEG are run through a simple header sanity check. SVG is checked for XML well-formedness. DJVU is I believe checked for metadata validity, though I don't recall the details.

bzimport added a comment.Via ConduitMar 28 2008, 11:52 PM

reschke.michael wrote:

Well, at the German Wikiversity we would need OASIS-files to upload editable documents and presentations. OASIS-Files at Commons would make our work much easier.

bzimport added a comment.Via ConduitMar 29 2008, 10:33 AM

rmh wrote:

You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

$ odt2txt hello.odt

Hello

$ echo $?
0
$ zip test.zip hello.odt
updating: hello.odt (deflated 19%)
$ odt2txt test.zip
Can't read from test.zip: Is it an OpenDocument Text?
$ echo $?
1

It appears to work with the other types as well:

$ odt2txt hello.odp

Hello

This is a text

HTH

bzimport added a comment.Via ConduitMar 29 2008, 10:54 AM

michael.frey wrote:

(In reply to comment #15)

You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

That does only verify that there is text, but it doesn't warn for macros or other files that are included but not relate in the odt file.

(Else some could have the genius idea to upload the odt file that contain a macro virus or contain pictures with forbidden content and use the WMF servers to share them. Users that know the hidden content can simply rename and extract the file and get so the hidden content, other users don't see it and think it's a normall text, but also get the pictures withforbidden content.)

bzimport added a comment.Via ConduitMar 29 2008, 11:30 AM

rmh wrote:

(In reply to comment #16)

Or someone could use a program featuring steganography techniques (http://en.wikipedia.org/wiki/Steganography#Implementations) to embed forbidden content in a PNG.

As for the macro virus, proper sandboxing is expected to be present. If it isn't, that's an implementation bug.

bzimport added a comment.Via ConduitMay 2 2008, 1:37 PM

ingo.thies wrote:

(In reply to comment #4)

I would suggest to *not* allow *any* styled text or presenation files on
commons. text on wmf projects should generally be wikitext. That being said,
having presentations and promotional material in those formats may make sense
for meta, wikimediafoundation.org, wikimania sites, etc.

Please keep in mind that OpenOffice.org file types also include spreadsheets (*.ods) that can be used not only for presentation but also as an interactive calculation tool. The author defines a "user area" within a sheet where the user can enter parameters based on which calculations on a scientific topic is done. For example, you can write a sheet that calculates, tabulates and/or plots the pressure, temperature and density of the atmosphere in a user-defined altitude for the standard atmosphere, or orbital parameters of satellites or any other kind of scientific or technical stuff. In contrast to most other file types (and as far as I know all file types currently allowed on Commons) spreadsheets can be used *interactively*, which can be a great improvement for many science-related Wikipedia articles. Furthermore, I do not really see a reason for *not* allowing any styled context. The existence of wikitext IMHO does not strictly imply all other text formats being invalid. Please also remember that ODF is now an ISO standard.

Therefore I would strongly suggest to allow Open Dodument Format in general, but at least Open Document Sheets (*.ods).

bzimport added a comment.Via ConduitMay 2 2008, 6:38 PM

rmh wrote:

(In reply to comment #18)

Please keep in mind that OpenOffice.org file types [...]

Please, try to avoid confusing ODF with OpenOffice.org. There are many applications supporting ODF independently, and OpenOffice.org is just one of them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

[...]. Please also
remember that ODF is now an ISO standard.

which unfortunately doesn't mean much anymore. Even OOXML which not even Microsoft themselves (http://www.fanaticattack.com/2008/ooxml-questions-microsoft-cannot-answer-in-geneva.html#comment-220) have implemented can get its own ISO stamp.

IMHO, what's important is that any vendor can implement ODF, and the wide availability of ODF support in applications:

http://en.wikipedia.org/wiki/OpenDocument_software#Current_support

bzimport added a comment.Via ConduitMay 29 2008, 1:20 PM

ingo.thies wrote:

(In reply to comment #19)

Please, try to avoid confusing ODF with OpenOffice.org. There are many
applications supporting ODF independently, and OpenOffice.org is just one of
them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

You are right, I sometimes mix them up, because I am using ODF mainly via OpenOffice.org.

IMHO, what's important is that any vendor can implement ODF, and the wide
availability of ODF support in applications:

http://en.wikipedia.org/wiki/OpenDocument_software#Current_support

That's fully true. But as mentioned above, the major benefit for Wikipedia (where formatted content seems to be frowned upon unless the format is Wikitext and Wikitable etc.) would be the ability of interactive use at least for Open Document Spreadsheets (ODS). Allowing the upload of self-written source codes in common programming languages would also have the effect of allowing interactivity, but ODS allows interactivity in a very transparent and easy-to-use way. The following example (a zipped Excel spreadsheet, however) might explain what I mean:

http://nuclearweaponarchive.org/Library/Nukexls.zip

Such sheets, also including graphs, could be used for an interactive illustration of (not only) many physical and techical topics without forcing the user to type the formulas by him/herself.

bzimport added a comment.Via ConduitMay 29 2008, 9:48 PM

cormaggio wrote:

As mentioned above, having greater possibility for interactivity in files would greatly benefit Wikiversity. Particularly for presentations, but also for image files, spreadsheets (data), and others. On opposition to this proposal, are there fears around certain formats on certain sites? If so, perhaps projects could draw up a list of filetypes which would be useful, and provide a rationale for them to be (selectively) whitelisted.

bzimport added a comment.Via ConduitJun 1 2008, 3:33 PM

robert wrote:

(In reply to comment #21)

As mentioned above, having greater possibility for interactivity in files would
greatly benefit Wikiversity. Particularly for presentations, but also for image
files, spreadsheets (data), and others. On opposition to this proposal, are
there fears around certain formats on certain sites? If so, perhaps projects
could draw up a list of filetypes which would be useful, and provide a
rationale for them to be (selectively) whitelisted.

The main issue is that OASIS files can contains malicious content. Letting these be uploaded without validation would be undesirable, and as of yet (AFAIK) there is no OASIS validation interface for MediaWiki.

Platonides added a comment.Via ConduitJun 1 2008, 9:45 PM

Can you elaborate what malicious content do you refer? Zip files being uploaded as odf? Documents with embedded macros?

bzimport added a comment.Via ConduitJun 1 2008, 9:47 PM

robert wrote:

(In reply to comment #23)

Can you elaborate what malicious content do you refer? Zip files being uploaded
as odf? Documents with embedded macros?

Macros are the main issue, they are XML so running it through a basic XML parser would eliminate any Zip issue.

bzimport added a comment.Via ConduitJun 1 2008, 9:49 PM

robert wrote:

(In reply to comment #24)

Macros are the main issue, they are XML so running it through a basic XML
parser would eliminate any Zip issue.

Ignore the zip bit, they can be compressed -- as I have just found out.

Platonides added a comment.Via ConduitJun 1 2008, 10:13 PM

Seems macros are at <script> elements (<text:script>, <office:script>...) so doesn't look too hard.

bzimport added a comment.Via ConduitNov 6 2008, 5:16 PM

mandavi wrote:

Sun published the ODF Validator. It "is a tool that validates OpenDocument files and checks them for certain conformance criteria." That sounds like the tool we need.

bzimport added a comment.Via ConduitNov 6 2008, 5:34 PM

rmh wrote:

Unfortunately, with ISO's downfall in the IT sector, being an ISO standard is become less and less meaningful. I'm removing the "ISO" bits from bug title (which IIRC I added myself a while ago).

bzimport added a comment.Via ConduitNov 19 2008, 1:12 PM

lars wrote:

In a posting to wikitech-l, Brion Vibber elaborated on what's needed in an ODF validator,
http://lists.wikimedia.org/pipermail/wikitech-l/2008-November/040246.html

Brion said:

we have a basic file type check to confirm
that the file really thinks its an ODF of the appropriate extension, but
not yet checks to confirm there's not evil Java classes also sitting in
the ZIP etc.) [...]

There's an optional zip extension for PHP which should include support
for listing out the ZIP file directory; however since this isn't
included in PHP by default it might be nice to be able to read the
directory independently without the extension for general MediaWiki
installs. (It shouldn't be necessary to actually decompress anything for
our purposes here -- we're mainly looking for subfiles not expected in
an ODF, particularly Java classes that could be used for a session attack.)

Sj added a comment.Via ConduitAug 9 2011, 6:03 PM

Yes, please.

Asking people to use a secondary file-hosting system for materials they are uploading for use with wikiversity or wikibooks projects is embarrassing, and gets moreso every year.

People who are trying to use Commons (for classes or other collaborative-knowledge projects) commonly work with these standard formats; asking them to convert things to/from PDF is quite difficult considering the scarcity of freely-licensed PDF-editing tools.

Bawolff added a comment.Via ConduitNov 19 2012, 9:27 PM

This bug may have been fixed in the mean time (In particular, I had the impression that Tim did work on allowing zip based formats to be uploaded safely). Tagging testme. [Note comment 31: Fixing this bug, and enabling on Wikimedia are two different things].

Then again bug 35607 appears to suggest our support for open doc is broken (?).

Krenair added a comment.Via ConduitApr 7 2013, 12:23 AM
  • Bug 46977 has been marked as a duplicate of this bug. ***
Aschmidt added a comment.Via ConduitApr 7 2013, 12:43 AM

Thanks for including my request for enabling ODF upload for German Wikiversity.

Could you please indicate whether we are running any chance to have ODF upload implemented in the near future?

I would like to hand on the message to the German Wikiversity community ASAP.

Thx!

Nemo_bis added a comment.Via ConduitApr 7 2013, 1:07 AM

(In reply to comment #34)

Could you please indicate whether we are running any chance to have ODF
upload
implemented in the near future?

No. (Although I'd like to say the contrary.)

JeanFred added a comment.Via ConduitAug 28 2013, 9:55 AM

See the links from http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html

And the discussion at https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice

It was stated that, with the resolution of bug 24230, « Uploads of ZIP types, such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader was added which can scan a ZIP file for potentially dangerous Java applets. This allows applets to be blocked specifically, rather than all ZIP files being blocked. »

I have asked the question in several places and answers are both unclear and sometimes contradictory. Some have pointed out that concerns lie still with:

  • Potential embedded macros
  • Validation that it is actually ODF

Are these concerns valid? If not, what is missing to allow ODF upload on projects?

Nemo_bis added a comment.Via ConduitAug 28 2013, 10:46 AM

If nobody comes up with concrete concerns, would it be a valid proposal to just try and see how it goes, fixing problems as they come up?

PDF files are not exempt from problems either, we often have some with viruses; but most of them are deleted quickly and the others we found thanks to your.org running antivirus software on their copy.

Bawolff added a comment.Via ConduitAug 28 2013, 4:19 PM

(In reply to comment #36)

See the links from
http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html

And the discussion at
<https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/
03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice>

It was stated that, with the resolution of bug 24230, « Uploads of ZIP types,
such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader
was added which can scan a ZIP file for potentially dangerous Java applets.
This allows applets to be blocked specifically, rather than all ZIP files
being
blocked. »

I have asked the question in several places and answers are both unclear and
sometimes contradictory. Some have pointed out that concerns lie still with:

  • Potential embedded macros
  • Validation that it is actually ODF

    Are these concerns valid? If not, what is missing to allow ODF upload on projects?

The zip reader prevents someone from uploading an ODF file that's really a java archive, which was a pretty big security vulnrability. (It also would prevent those hacks where people make combined ODF/PDF files).

It does not prevent embedded macros, nor does it validate the file is an ODF file (beyond some very superficial checks. It would prevent someone from accidentally uploading another format. It would not prevent someone intentionally uploading a non-odf format that they've tweaked to slightly look like an ODF file)

Whether or not this is an acceptable situation (I consider the macro virus possibility a little scary. Platonides suggestion in comment 26 may be something we should look into) is probably a matter that's up to debate. I've cc'd Chris Steipp, as he probably has some thoughts on this, and would probably have the final word on if ODF upload is acceptable.

csteipp added a comment.Via ConduitAug 29 2013, 12:11 AM

The major threats I'm most concerned with are these attachments opening up and xss by causing the browser to think it's html, java applet, swf, etc.

So if it correctly unzips to something that validates as an odf, and the binary is checked to make sure sniffing wont think it's html, or another mime type, then we can probably enable this. Bawolff, could you confirm that's what it does?

The macro / embedded virus threat is definitely a danger to our users, but we currently do not scan incoming binaries (as Nemo pointed out, we have plenty of pdfs with hostile code already).

Bawolff added a comment.Via ConduitAug 29 2013, 3:46 AM

(In reply to comment #39)

The major threats I'm most concerned with are these attachments opening up
and
xss by causing the browser to think it's html, java applet, swf, etc.

So if it correctly unzips to something that validates as an odf, and the
binary
is checked to make sure sniffing wont think it's html, or another mime type,
then we can probably enable this. Bawolff, could you confirm that's what it
does?

I believe that is correct.

Nemo_bis added a comment.Via ConduitSep 10 2013, 7:32 AM

OSM added odp a minute ago: https://trac.openstreetmap.org/ticket/3323#comment:2
So, there are no blockers left here AFAICS, but we also have a guinea pig.

JeanFred added a comment.Via ConduitOct 21 2013, 5:30 PM

So per Chris & Brian comments, there are no security concerns blocking this?

If so, the "editorial" question is left on whether we want this on Wikimedia Commons or on Meta. I am more than willing to open a discussion over there, if that’s the last thing missing.

Steinsplitter added a comment.Via ConduitDec 5 2013, 3:30 PM

Community consensus?

Qgil added a comment.Via ConduitDec 5 2013, 5:01 PM

At a MediaWiki community level this looks like a consensus, yes. This report is at a point where no technical/legal obstacles are left.

Now, about deployment in Wikimedia... This is a Commons discussion. There is the right place to decide whether OpenDocument files belong to their domain (just like PDFs) or not. If they agree, then the related file extension can be enabled there. If they disagree... we can meet here again to discuss the next step.

Does this make sense? If so, can someone familiar with the Commons project share the news there, please?

Nemo_bis added a comment.Via ConduitDec 5 2013, 5:19 PM

(In reply to comment #42)

I am more than willing to open a discussion over there,
if
that’s the last thing missing.

Jean-Fred, per Quim it is, so please go ahead, yes.

JeanFred added a comment.Via ConduitDec 5 2013, 5:23 PM

(In reply to comment #45)

(In reply to comment #42)
> I am more than willing to open a discussion over there,
> if
> that’s the last thing missing.

Jean-Fred, per Quim it is, so please go ahead, yes.

Good. I’ll open a community consultation soonish.

Micru added a comment.Via ConduitMay 19 2014, 7:46 AM
Nemo_bis added a comment.Via ConduitMay 26 2014, 7:43 AM

What's concretely the configuration setting needed here?

(In reply to Jean-Fred from comment #46)

Good. I’ll open a community consultation soonish.

Ping.

(In reply to dacuetu from comment #47)

What was the result of this?

Consensus for OpenDocument has always been given for granted: in the innumerable discussions about it I don't recall ever finding an opposer. There are for instance a dozen supporters just in https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1 (several sections; nobody bothered +1 what was obvious).

Nemo_bis added a comment.Via ConduitMay 26 2014, 7:45 AM

Also, https://commons.wikimedia.org/wiki/Commons:Project_scope/Allowable_file_types is apolicy and states the formats are allowed by policy, just blocked on technical reasons. «SXW, SWC, SXD, and SXI (OpenOffice.org 1.x), as well as ODT, ODS, ODG, and ODP (OpenDocument) are theoretically permissible. Marking this shell; more discussion is always possible but not necessary.

JeanFred added a comment.Via ConduitJul 6 2014, 2:09 PM

(In reply to Nemo from comment #48)

What's concretely the configuration setting needed here?

(In reply to Jean-Fred from comment #46)
> Good. I’ll open a community consultation soonish.

Ping.

Thanks for the ping, I completely forgot about this. This is now opened at https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Support_for_OpenDocument_file_format_upload

(In reply to dacuetu from comment #47)
> What was the result of this?

Consensus for OpenDocument has always been given for granted: in the
innumerable discussions about it I don't recall ever finding an opposer.
There are for instance a dozen supporters just in
https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1
(several sections; nobody bothered +1 what was obvious).

Hmmmm, indeed. Well, let's make it super clear :)

jayvdb added a comment.Via ConduitOct 11 2014, 2:34 PM

So the result was 'interested, but no consensus' due to the need to have media preview for this format and concerns that these uploaded documents may contain

  • macros/scripts, which may be malicious
  • embedded typefaces, which may be non-free

https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2014/07#Support_for_OpenDocument_file_format_upload

In that discussion, PDF with OpenDocument embedded was raised as a bug and possible way forward, as we already have PDF preview support, so I have created bug 71954 for that.

We will also need bugs for detecting macros/scripts and embedded typefaces, and bug 17497 probably needs to be solved first.

Nemo_bis awarded a token.Via WebDec 12 2014, 8:22 AM
El_Grafo added a subscriber: El_Grafo.Via WebJan 16 2015, 10:26 AM

Add Comment