Page MenuHomePhabricator

SVG Translate shows misleading "Only SVG files are supported" error on SVG files without XML and DOCTYPE declarations
Open, Needs TriagePublicBUG REPORT

Description

Steps to Reproduce:
https://tools.wmflabs.org/svgtranslate/File:Diagram_human_cell_nucleus_ru.svg

Actual Results:

Expected Results:
Change error-message: The file is an svg file therefore "Only SVG files are supported." is misleading.

Error message such as https://validator.w3.org/check?uri=https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FSpecial%3AFilepath%2FDiagram_human_cell_nucleus_ru.svg&doctype=Inline&ss=1#source might be easier to understand.

I might be related to the missing

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.1//EN' 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'>

Event Timeline

Aklapper renamed this task from SVG Translate: Only SVG files are supported. on SVG-file to SVG Translate shows misleading "Only SVG files are supported" error on (invalid) SVG file.May 3 2020, 4:20 PM
Aklapper updated the task description. (Show Details)

@Aklapper I'm not shure if the file is invalid, if I insert the source-code in https://validator.w3.org/#validate_by_input there are only warnings, but no error. (i.e. valid)

I don't know why the file is recognized as text/plain and not as image/svg+xml.

@Glrx You are the expert on SVG-standartization and about xml-validation. Can you comment on that?

Aklapper renamed this task from SVG Translate shows misleading "Only SVG files are supported" error on (invalid) SVG file to SVG Translate shows misleading "Only SVG files are supported" error on SVG files without XML and DOCTYE declarations.May 3 2020, 4:53 PM

@JoKalliauer: You are right, thanks for the correction!

Aklapper renamed this task from SVG Translate shows misleading "Only SVG files are supported" error on SVG files without XML and DOCTYE declarations to SVG Translate shows misleading "Only SVG files are supported" error on SVG files without XML and DOCTYPE declarations.May 3 2020, 4:53 PM
Glrx added a comment.May 3 2020, 6:36 PM

I think this falls out from the inner workings of Commons. Commons tells the world the file is plain text, so why should SVG Translate believe otherwise?

The file description on Commons is
https://commons.wikimedia.org/wiki/File:Diagram_human_cell_nucleus_ru.svg

If you access the actual SVG file from a browser, it displays as text:
https://upload.wikimedia.org/wikipedia/commons/7/7c/Diagram_human_cell_nucleus_ru.svg

If you look at the response header from Commons, it has
content-type: text/plain
That's why the browser displays text.

It may also be the reason that SVG Translate says it is not an SVG file. SVG Translate asks Commons for the file, Commons responds with the file saying it is just plain text, and SVG Translate believes Commons. Alternatively, SVG Translate may ask for Accept: image/svg+xml and get a negative response.

IIRC, there was some comment that Commons will serve an XML file as plain text unless the file has an XML processing instruction. So the file would need
<?xml version="1.0" encoding="UTF-8"?>
but it would not need the DOCTYPE.

I'd lay the bug on Commons. If Commons displays the image aa an SVG file, then it should serve that file as content-type: image/svg+xml. There may be a security issue here.

As far as the SVG Translate error message goes, it should add that it might be an SVG file that is missing its XML processing instruction. That would be better for the user. I do not think SVG translate needs to investigate files that are text/plain to see if they are actually SVG files. That may be a security issue.

Yep, it seems that it is Commons doing something inconsistent. The tool looks at the Content-Type header of the response for the SVG file, and fails if it's not image/svg+xml (which happens, as you say, when <?xml version="1.0" encoding="UTF-8"?> is missing).

I guess the difference is that MediaWiki does extra checks to see if a file is SVG, but whatever is serving the actual file is not.

We could switch to looking for <svg if the Content-Type check fails maybe? Or should these files actually be fixed on Commons?

Glrx added a comment.May 4 2020, 3:11 AM

I vote for fixing the files on Commons. Every XML file should start with an XML processing instruction.

https://www.w3.org/TR/xml/#sec-prolog-dtd states

"Definition: XML documents should begin with an XML declaration which specifies the version of XML being used."

If a file does not start with the XML processing instruction, then it is not XML (and therefore not SVG). I'm OK with being narrow-minded and prescriptive here.

SVG Translate could also refuse to process any encoding that does not say it is UTF-8. Or does SVG Translate's XML parser handle other encodings? The SVG Translate output will have to be UTF-8 for Arabic, Chinese, and many other languages. I think librsvg believes everything is UTF-8, so everything on Commons must be UTF-8 compatible.

A commons 'bot could find and fix SVG files without the appropriate prolog.