I've been working with the Dia extension and uploading Dia files. When saving a file on Dia, you have a checkbox that asks you if you want to save a compressed file or not. If you do not save it compressed, it saves a XML file with xmlns:dia="http://www.lysator.liu.se/~alla/dia/". If you save it compressed, it will save the same XML file compressed with gzip. Both files receive the .dia extension, and Dia recognizes if the file is compressed internally.
This is the output of "file" for an uncompressed and a compressed file:
$ file *.dia
test-unc.dia: XML document text
test-cmp.dia: gzip compressed data, from Unix
Recently MediaWiki added support for recognizing Dia files. As I understand, if the file is XML, it will parse the file and look for the namespace(?), and it will recognize it as a Dia file if it finds this URL: http://www.lysator.liu.se/~alla/dia/.
The problem is that it does not recognize compressed Dia files. First, it will (expectedly) assign it the file the MIME type application/x-gzip, and then in "verifyExtension()" it will not match ".dia" to application/x-gzip, therefore stating that the file is corrupted.
I've been thinking about how to solve this problem. One way would be recognizing that the file is gzipped, then trying to look inside and, if the contents look like a XML, then do the logic to try to guess what the type is from the namespace of the XML. However, that seems to be too complex and too much overhead for this task.
I still think that, in that particular case, the extension is the easiest way to reliably recognize a Dia file.
So I was thinking about patching MediaWiki to include a new table (like MM_WELL_KNOWN_MIME_TYPES or MM_WELL_KNOWN_MIME_INFO) with information on how to override a MIME type based on the extension of the file. So, the entry for Dia on this table would be something like (not exact PHP syntax here, I'm not good in PHP):
extension => ".dia", detected_mime => array('application/x-dia-diagram', 'application/xml', 'application/x-gzip' ), override_mime => 'application/x-dia-diagram'
What this means is, if on a file upload MediaWiki detects that the extension is ".dia" (or more generally, that the extension is in this table), it will check that the detected MIME type of the contents match one of the items of the array (in the case of Dia, it will be either a XML or a gzip compressed file), and if that is true, it will override the MIME to application/x-dia-diagram.
Now, I know that MediaWiki has tried to move away from detecting the type of the content based on the extension, but I really do not know what to do with Dia files. Of course I blame the problem on the Dia developers, after all they should probably not use a bare gzipped file and use somthing with a specific header instead, but now we already have a legacy of many Dia files and we will have to handle them in one or another way...
So, do you think my idea for a way to solve this is OK? If you do think so, I will work on a patch to do it and submit it to this bug.