Page MenuHomePhabricator

RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis
Open, LowPublic

Description

I don't know the exact history, but at some point Wikimedia wikis added the ability to support inline SVGs by passing them through librsvg, which takes the SVG code and generates PNGs, as I vaguely understand it.

There are some notes here: https://meta.wikimedia.org/wiki/SVG_image_support.

I can't find any information about which version of librsvg Wikimedia is currently using, but the choice of using librsvg should be re-evaluated, given its rendering issues (cf. other bugs in this bug tracker) and the existence of perhaps better alternatives.

See Also:
T53555: librsvg seems unmaintained
T120746: Improve SVG rendering
T10901: [DO NOT USE] SVG rasterisation and management on Wikimedia sites (tracking)

Details

Reference
bz38010

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Glrx added a comment.Jul 18 2018, 8:28 PM

Thanks for developing the code, and thanks for your comments and advice.

I used the support table

https://razrfalcon.github.io/resvg-test-suite/svg-support-table.html

and it says things like letter-spacing, font-variant, font lists, and overflow are not supported. It has overflow supported for some particular items, but there's a red mark for overflow in general. I'm very happy you provided the support list, but my inspection of it was cursory. I'd expect the table entries to improve over time, and yes, if you use a library with issues, then you inherit those issues and hope the library gets fixed.

We would like to see SVG rendering improve, and resvg is a tantalizing alternative.

Librsvg has problems, but we are living with them. It would be nice to have better rendering, but I don't know if the servers have the margin for a slower but better renderer. Batik takes 50% longer; Inkscape 75% longer, and ImageMagick 100% longer according to some 2009 tests. I don't know how much those numbers hurt. I'd like to see tb text work, but is that worth a 50% hit? My guess is the answer is no; if it had been worth it, then we would have switched a long time ago. Resvg may offer improvements without the hit.

Yes, the overflow attribute is marked as unsupported, but only because its too generic. resvg simply does not support all elements that can have overflow yet.

@Patrick87: Ah, I did not know that. Thank you for correcting my wrong statement.

tstarling moved this task from P1: Define to Old on the TechCom-RFC board.Jul 18 2018, 8:53 PM
daniel added a subscriber: daniel.EditedJul 21 2018, 10:22 PM

To quote what @cscott said four years ago on this ticket:

Security is also a consideration. The solution must be able to be sandboxed and have its http fetch neutered. rsvg is designed for embedded solutions and has good controls for this. It would be much harder to sanitize/sandbox some of the other solutions.

Note that this is a security concern for server side rendering.

This comment was removed by daniel.

Batik takes 50% longer; Inkscape 75% longer, and ImageMagick 100% longer according to some 2009 tests.

I'm gonna go out on a limb here and suggest that maybe performance benchmarks from 2009 are not relevant in 2018, there have likely been many software and hardware changes since then.

@Krenair The software is indeed changed but in a bad way. At the moment Inkscape and Batik are extremely slow. At least according to my own benchmarks. Batik can be faster if we use it as a daemon, because JVM startup is very slow, but I'm not sure how much memory it will consume in that way.

Besides, benchmarking an SVG is pretty hard. Everything boils down to the specific SVG file itself and to list of supported features of the rendering library. For example, if we have a file with Gaussian blur, QtSvg will always be faster, because it simply skips it. Same goes for other high-level and expensive operations. For example, Batik doesn’t support anti-aliasing during clipPath rendering, which makes it faster, but incorrect. Also, Batik doesn't support test shaping, which is also incorrect.

At the moment, librsvg has no real competitor performance wise.

I refered to a different benchmark:
RazrFalcon updates the benchmark more often than Wikimedia updates librsvg (!), therefore I can't see your Problem with: https://github.com/RazrFalcon/resvg/#performance
RazrFalcon might has optimised resvg for his own hardware, and therefore benchmark might be a little different on Wikimedia servers.

Programrender timeTests passed
resvg150 s (best)221
rsvg 2.42.2242 s202 (worst)
Inkscape 0.92.21610 s257
Batik 1.93686 s254

The benchmark uses librsvg 2.42.2, wikimedia uses 2.40.16 (2.40.2 for SVG Checker), and the current version is 2.43.2.

As @Glrx pointet out we need systemLanguage which can't be handled with resvg, therefore as said by @RazrFalcon we should not change now to resvg, therfore should stay with librsvg.


But I would like to be able to set a flag (f.e. on the descriptionpage) to change some specific SVGs to be rendered by inkscape:


Checking https://commons.wikimedia.org/wiki/Librsvg_bugs (Most common Bugs are reported here)


Checking https://commons.wikimedia.org/wiki/Category:Pictures_showing_a_librsvg_bug_(unsolved) (Most difficult SVG to make a workaround)


Bugs checked by Commons SVG Checker


Because inkscape is based on librsvg it is most likely that there were will hardly any (maybe none) new not reported bugs. (The performance would not be a problem if it is possible to set a flag to specific SVGs to be rendered with Inkscape, but all other will be rendered by librsvg.)

5 Most important bugs: (all of them would be fixed by Inkscape)

  1. T11420 as already pointed out by @Glrx
  2. T36947 maybe most reported bug on Commons:Graphics_village_pump
  3. T43424 maybe best reported bug on helppages on commons Help on de, Help on commons, Category:Images_with_SVG_1.2_features, User:JoKalliauer/RepairFlowRoot
  4. T55899 maybe most common bug on Commons I repaired more than 250files (but that would be fixed with the update anyway)
  5. T20463 most bugs in Category:Pictures_showing_a_librsvg_bug_(unsolved)
RazrFalcon added a comment.EditedJul 22 2018, 10:43 AM

@JoKalliauer

RazrFalcon might has optimised resvg for his own hardware

There are no hardware-specific optimizations.

Because inkscape is based on librsvg

It's not. Inkscape has it's own rendering backend.

Dzahn removed a subscriber: Dzahn.Aug 6 2018, 2:25 PM
Rxy added a subscriber: Rxy.Oct 17 2018, 2:25 AM
mxn added a subscriber: mxn.Nov 10 2018, 10:09 PM

@Glrx Hello again.

Since the last comment, resvg gained two more updates and supports almost everything now, except BIDI. You can see a list of all unsorted features here. And SVG support table here.
The main focus of the next release will be on textPath, direction, unicode-bidi and writing-mode.

I'm interested in any feedback regarding the supported features. Maybe something should be prioritized and maybe something important is missing.

Glrx added a comment.Jan 4 2019, 7:08 PM

Thank you for the progress update. I'm happy to hear the good news.

I just looked through the tables.

resvg says Chrome only supports 56% of switch. I believe Chrome fixed all its switch-related issues in October (eg, space-separated instead of comma separated langtag list; subclass test was reversed; case sensitive comparison). I viewed a file the other day which made me believe that Chrome also does the SMIL allowReorder clause selection (Firefox does that).

In looking at the support tables, my primary concern is with entries that are unsupported in the resvg column but supported by librsvg. That's where Commons will see differences. If there is such a disparity, the next question is how frequent will the disparity appear. My experience is usually simple diagrams; they often do not use filters. More involved art may use many filters. Another question is how much damage the disparity does. If the image is still presentable, then the disparity would be minor. Failing to blur an object is minor; failing to paint the object would be major.

Fontlist is important. Many Commons users are confused and unhappy with font support, so there has been a push to use font lists such as "Neue Frutiger 55, sans-serif". A font list can give a user his favorite commercial font on his local machine and let Commons render a reasonable facsimile. In many situations, Commons will not have the first font in the list.

Embedded fonts are something that Commons discourages. We don't want licensed fonts embedded in otherwise free files. So it's not something high on my priority list. Commons will not want external font definitions chased even if those fonts are allegedly free. CSS font chasing should be disabled.

Nested sub- and superscripts are used, but not often. Many diagrams have typeset math formulas, so there will be formulas such as the normal distribution's e to the x squared. A modest priority.

BIDI priority is high, but librsvg bugs have probably kept the issues down. Most files on Commons that set direction to rtl will have improper SVG because librsvg paints the text the wrong direction; it's OK if those files do not render reasonably; they need to be fixed. I will tolerate "broken" images that are really improper SVG. Most librsvg files will stuff RTL characters into an LTR string. If the string is RTL dominant, it usually works, but put some LTR and neutral characters into the string, and the display may get strange. Any good BIDI implementation would be welcome. BTW, the SVG specification is ambiguous about text chunk placement.

Other issues appear to be desired features rather than necessities. Librsvg does not do textPath, and lack of textPath is probably the second most common reason for converting characters to curves. I want textPath, but it is not so valuable that I'm willing to tolerate a lot of broken images on Commons.

For vertical text, librsvg is a mess, so there are very few instances of vertical text on Commons. Most Commons diagrams will just convert the characters to curves. So vertical text is not a high priority now. Having it will allow us to simplify existing images and do new images correctly. A common workaround is to rotate Chinese strings 90 degrees, but I suspect that annoys Chinese readers.

Attribute selectors are not supported by librsvg, and some users have been frustrated by their absence. It is not important now, but there is some interest. Similarly, lang() pseudo class selectors could be useful. Very low priority.

Any SVG feature that is not supported in Chrome and Firefox is probably irrelevant to Commons.

Thanks for a detailed answer!

I believe Chrome fixed all its switch-related issues in October

I'm using Chromium build that comes with puppeteer.js. Maybe it's a bit outdated.

In looking at the support tables, my primary concern is with entries that are unsupported in the resvg column but supported by librsvg

librsvg supports more filter variants (like feColorMatrix), but the filter support, in general, isn't that good, as you can see. Also, librsvg supports enable-background which is used by filters, and resvg don't yet. Not sure if there is anything else.

Fontlist is important.

It's supported. It works on Qt backend but fails in cairo. It's a minor bug.

Embedded fonts are something that Commons discourages.

This part I really don't want to implement, because it's pretty big and I'm not sure if someone actually uses it.

Nested sub- and superscripts are used, but not often.

It's doable. Just left it for later.

BIDI priority is high, but librsvg bugs have probably kept the issues down.

The main problem with BIDI is that no one really supports it. So it's hard to tell how it should be rendered in the first place. Like glyph-orientation-* is only supported by batik. But on the other hand, it doesn't support text kerning. Which makes it pretty useless. Also, Firefox doesn't support baseline-shift, letter-spacing and word-spacing completely.

The comparison table is a bit misleading, since I mark a test as passed if resvg renders it correctly with both backends. And there are a lot of text-related issues that work in the Qt backend and fails in the cairo one.

lack of textPath is probably the second most common reason for converting characters to curves

I didn't look into it so I have no idea how hard it will be to implement, but I don't think that there will be any major problems.

For vertical text, librsvg is a mess

It uses pango's own implementation, kinda, and the problem with that is that Qt doesn't support vertical text at all. So the vertical layout should be implemented manually. And I don't think that simply placing glyphs vertically one after another is a good idea. The only explanation I could find is in CSS Writing Modes.

Attribute selectors

CSS support is partially out of scope. Currently, I'm using my own CSS parser which is extremely primitive. So at first, someone should write it. And not only a parser (which is already exists), but also a resolver.

Any SVG feature that is not supported in Chrome and Firefox is probably irrelevant to Commons.

The one thing I learned after writing resvg is that SVG support in browsers isn't that good. Anyway, almost everything from static SVG subset is already implemented. Just need to polish it a bit.

Joe added a subscriber: Gilles.Apr 11 2019, 6:01 AM

Wikimedia nowadays is using https://github.com/thumbor/thumbor to render all thumbnails, including SVGs. I'm not sure how resvg would fit into it, as I'm completely ignorant about the details of how thumbor uses librsvg. @Gilles do you have any idea of how easy it would be to swap usage between the two libraries?

It's very straightforward to switch to something else, here's the entire logic for SVG processing at the moment: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/thumbor-plugins/+/refs/heads/master/wikimedia_thumbor/engine/svg/svg.py

I can't find the man page for the resvg command-line tool. What it needs to support is rendering to a specific width and the ability to set the language you want rendered (for multilingual SVGs).

I can't find the man page for the resvg command-line tool. What it needs to support is rendering to a specific width and the ability to set the language you want rendered (for multilingual SVGs).

See https://manpages.debian.org/unstable/resvg/rendersvg.1.en.html

It has everything we need, then. If you feel like backporting this to Stretch, I can write the Thumbor engine and tests for it. Then we can test it easily via config override on a specific Thumbor server.

It has everything we need, then. If you feel like backporting this to Stretch, I can write the Thumbor engine and tests for it. Then we can test it easily via config override on a specific Thumbor server.

Given https://phabricator.wikimedia.org/T40010#4432284 I think the most promising next step is to wait a few more months and migrate thumbor to the buster and the rust-based new librsvg, I mostly mentioned the availability in Debian as I saw it arriving in the Debian archive and remembered this task.

Given https://phabricator.wikimedia.org/T40010#4432284 I think the most promising next step is to wait a few more months and migrate thumbor to the buster and the rust-based new librsvg

This is pretty outdated. The up-to-date comparison table can be found here.

Jc86035 added a comment.EditedApr 11 2019, 10:41 AM

Just noting, if resvg doesn't support non-UTF-8 files or files with no size (per the table) then it'll probably break a lot of existing files unless they're all fixed before it's implemented (if ever). UTF-8 didn't become the most popular encoding until around 2009 (per enwiki), and there are still a lot of older SVGs lying around.

@Jc86035 SVG without a specified size is an undefined behavior. There are no tests for this yet, so I'm not sure how good the librsvg and other implementations are.

As for UTF-8, yes, it's not supported yet and not planned. Not sure what I can do here.

Jc86035 added a comment.EditedApr 11 2019, 10:54 AM

Just for clarification:

  • Does resvg accept viewBox="0 0 500 500" without other dimension attributes? librsvg doesn't recognize this (example) and reads it as 512 × 512.
  • Does resvg accept SVGs without an XML header (I don't remember any specific examples at the moment) or without any encoding specified (example)? Both work with librsvg.

Does resvg accept viewBox="0 0 500 500" without other dimension attributes?

Yes.

Does resvg accept SVGs without an XML header

Yes.

SVG without a size is something like:

<svg xmlns="http://www.w3.org/2000/svg">
    <rect width="100%" height="100%"/>
</svg>

You have to specify a viewport size in this case, which is not supported yet.
librsvg will render a 1x1px image in this case.

Jc86035 added a comment.EditedApr 11 2019, 11:10 AM

Oh, okay. If files are only broken if they contain invalid bytes (not just encoding="iso-8859-1") then actual thumbnail errors will probably be a little less rare.

Would this file break? It contains "Ü", but the character is in the <title> element.

Would this file break? It contains "Ü", but the character is in the <title> element.

This file has a UTF-8 encoding, despite the encoding="iso-8859-1" attribute. So it will be rendered correctly.

resvg tries to load a file as UTF-8 string. It doesn't care about the XML encoding attribute.

It isn't necessarily one or the other. It's very conceivable to try to render a file first with resvg, and if the file can't be processed (command errors), try to render it with librsvg.

Does resvg accept viewBox="0 0 500 500" without other dimension attributes? librsvg doesn't recognize this (example) and reads it as 512 × 512.

It might be better to file specific tasks instead of bringing up several librsvg issues in this very task. For this very issue, the output in more recent librsvg is:

@Aklapper resvg has a prebuilt viewer too. So you can test it right away.

Kghbln added a subscriber: Kghbln.Apr 11 2019, 8:53 PM
Kghbln removed a subscriber: Kghbln.Apr 11 2019, 9:02 PM

A new version of resvg was released. Now it supports textPath (example, example), writing-mode (example), BIDI reordering (example) and better letter-spacing (example).

There are still work to do, but as for text rending it far better now.

Glrx added a comment.EditedJun 20 2019, 6:51 PM

I've skimmed the comments going back about 2.5 years. It looks like most bases are covered.

Commons must have BIDI, markers, and systemLanguage. Too many files use those features. (systemLanguage would not need support if WMF servers l10n'd the files at the server.)

The comments says BIDI and systemLanguage are now supported by resvg.

https://razrfalcon.github.io/resvg-test-suite/svg-support-table.html shows good support for markers. There is a comment about librsvg marker support is poor; I recall doing workarounds for some markers. For Commons, the primary use of markers is for arrowheads on leader lines. SVG allows several options, but Commons probably only needs the auto option to work for the marker-end attribute on lines and paths.

baseline-shift was also an issue for sub- and superscripts. The current support table says that is implemented.

Resvg also offers textPath, and that should be a significant benefit for maps. I'm willing to accept more damage to other files for this feature. The absence of textPath means a lot of files converted their text to curves, and that both increases the file size and makes the file difficult to translate.

I do not see character encoding as a big issue. Yes, it may break SVG files that do not use utf-8 compatible encodings, but that could be fixed with a robot scanning all svg files and updating incompatible encodings to utf-8. Essentially, Commons would be making a requirement that all SVG be utf-8 compatible.

There may be problem interactions with viewBox, x, y, width, height, and a render's notion of default dimensions. Commons has files with inconsistent attributes. In general, most SVG files on Commons should have a viewBox but not the other attributes. A robot could enforce a reasonable viewBox. Maybe somebody could do a quick scan to find how many SVGs are inconsistent.

Switching may alter the rendering of some files that have bad SVG. librsvg doesn't do BIDI correctly, so I suspect there are files that look right with librsvg but look wrong when displayed directly with Chrome. The files are wrong and need to be fixed.

I'm ignorant about SVG filters. If there is a hit, I'm guessing it will be a small one. Furthermore, it could be offset with other benefits such as textPath. JoKalliauer might be able to give more insight.

Right now, the only serious question is CSS selector support. See January 4 comments by RazrFalcon. ("Currently, I'm using my own CSS parser which is extremely primitive.") I'm not sure of the impact here. Inkscape puts style information on each element, so selectors are irrelevant. Adobe Illustrator uses the style element with selectors. Most of the usage will be simple class or id selection that I expect resvg to handle. Some files may employ descendant selectors. IIRC, librsvg does not support attribute or lang selectors. Consequently, CSS selectors may not be a big problem.

So WMF should consider using libresvg instead of librsvg.

Gilles comment about first running resvg and then running rsvg if that blows up is an an interesting one, but I suspect the major problems would be the visual result rather than an execution fault.

In the long run, I'd like to see small SVG files rendered directly. Or even in the shorter run. That can give WMF the advantage of tool tips, linking, and animation. The choice of librsvg vs libresvg is about static images. WMF fakes some support with imagemap; but similar functionality can be had with directly served SVG.

When the wiki markup processes a an SVG file inclusion, it could look at the size of the SVG file. If the size is less than N bytes (say 20 kB), then the HTML embeds an object element with a URL specifying a size-limited SVG. The image server checks the size and usually serves the SVG directly. (In the rare circumstance that someone has uploaded a new, 5 MB, version of the SVG, then the image server declines the request or supplies a default SVG.) If the wiki markup processes an SVG file inclusion that is larger than N bytes, then the HTML is the same as it is today.

In addition, the Commons file page may have some JoKalliauer-style flags. The flags might say serve this file directly even if it is 5*N bytes. That might be reasonable for animations. Consider, for example, the 141 kB animated GIF https://commons.wikimedia.org/wiki/File:Pi-unrolled-720.gif used in the en.WP pi article. WP should be able to serve animated SVG files that do the same.

The situation with librsvg has also been improving. The switch to Rust prevented incorporating some fixes for a while, but I have the impression that WMF can now use the newer Rust code. There is an argument to keep the status quo. If WMF servers are not computationally taxed, then resvg's faster speed may not be a significant benefit. Switching from librsvg will cause some headaches.

On the other hand, I do not see librsvg adding textPath or fixing vertical Chinese anytime soon. For many years, librsvg was not actively maintained.

I'd go for a trial.

CSS support is still pretty bad, yes. Only simple selectors are supported. Just like in librsvg.
I plan to rewrite a CSS parsing, but not sure when it will be available. Maybe even this year.

I'm ignorant about SVG filters.

librsvg has a better filters support, sort of. resvg doesn't support advanced filters yet, like turbulence. Also, enable-backgound isn't supported too.

then resvg's faster speed may not be a significant benefit.

resvg isn't faster than librsvg. Currently, I'm more focused on SVG support than performance.

Krinkle renamed this task from Re-evaluate librsvg as SVG renderer on Wikimedia wikis to RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.Feb 11 2020, 10:40 PM

For those following along, re-posting my comment from T243893:

Trying resvg in the Beta Cluster] does not require an RFC discussion. This can be worked out between the proposer and anyone they might need help from in terms of Beta access.

Note that any comparisons you might want to show (e.g. pick 1000 most used or random SVGs from Commons, render side-by-side and generate SSIM scores, perhaps?) could also be done locally be done locally, I think.

In any event, TechCom has no advice or opposition to trying this. Please report any conclusions about how well it works and/or any issues found on the RFC task at T40010.

Ebe123 added a subscriber: Ebe123.Feb 13 2020, 2:36 AM
JoKalliauer added a comment.EditedFeb 24 2020, 8:57 PM

Re-evaluate librsvg

Clarifying question. If I could run all our SVGs through a checker using resvg to find out what bugs they suffer from, is it possible to write such a checker? If so, we could catalog and get a better sense of how much pain this migration would entail. If not, is there an estimate of how widespread use of malformed files is?

@Milimetric Can you expand your question a bit? What this checker should do and what exactly it should catalog?

If not, is there an estimate of how widespread use of malformed files is?

Depends on what you call a malformed file. SVG is filled with undefined behavior. And since there are not reference implementation, no one knows how it actually should work.

@Milimetric Can you expand your question a bit? What this checker should do and what exactly it should catalog?

If not, is there an estimate of how widespread use of malformed files is?

Depends on what you call a malformed file. SVG is filled with undefined behavior. And since there are not reference implementation, no one knows how it actually should work.

I think the question was mostly directed at @JoKalliauer, who seems to have specific ideas of what bugs existing SVG files on Commons rely on.

Basically, the question we are interested in is "how many files on commons would break if we switched away from rsvg tomorrow? Can we list them?"

@daniel

how many files on commons would break if we switched away from rsvg tomorrow?

Most of the files will actually became correct. There are obviously some bugs in resvg too, but it's far beyond rsvg in SVG support.

Can we list them?

I don't think that this is possible. Yes, an SVG checker can be written that will check files for common rsvg bugs, but it's still be a speculation.

JoKalliauer added a comment.EditedFeb 27 2020, 9:50 PM

Clarifying question. If I could run all our SVGs through a checker using resvg to find out what bugs they suffer from, is it possible to write such a checker?

Common librsvg-bugs are known and some of them can be checked by Commons:Commons SVG Checker which are listed here: https://commons.wikimedia.org/wiki/Commons:Commons_SVG_Checker/KnownBugs
But writing a bug-checker for malformed files for librsvg-bugs (I did not know such files exist) is maybe even more difficult than render the file correclty. (So ~impossible) (For some few it is maybe possible, but there exist imho not a common one.)

If so, we could catalog and get a better sense of how much pain this migration would entail.

If we consinder correct rendering as a pain, we can't update any software any more not apply any patches, not any bugfixes. I consider it as a pain of the authors, but however the old PNG-thumbs on Commons/Wikipedia don't get updated as long as no one reuploads and no uses ?action=purge. (I purged 2019 solves images wich were rendered in 2016 or earlier.)

If not, is there an estimate of how widespread use of malformed files is?

I checked 126files in Category:Featured_pictures_on_Wikimedia_Commons_-_vector, because I consider those to be as one of the highest quality images on Commons, which are mostly used and generally complex.

  • 8 Files were affected by T246014, see last 8 pics in https://github.com/RazrFalcon/resvg/issues/223 (but you could change dpi in rsvg (default 90dpi) aswell as in resvg (default 96dpi)), but this not a render-issue, this is imho outdated preferences (at least nowadays it is a unusal preference). Maybe we should block such files, since there is no correct definition how to render them.
  • 1 File T246003
  • 1 File T246001
  • 1 File T245864
  • I might missed a few, maybe one or two

Several files, where the behaviour is unspecified:

Sometimes it might be difficult to decide who renders more precise: (But I do not consider that as a bug)

I think in Category:Featured_pictures_on_Wikimedia_Commons_-_vector were ~3 files with a librsvg-bug all with the same bug

  • ~3files with T11420 (path-text not rendered)

But in all those files you do not notice anything, if you do not compare rendeing with a second one (so it is not obvious).

So this means it is the ~same number of Svg which are rendered intendently wrong and unindentdly wrong by librsvg.

Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.Sep 16 2020, 7:12 PM
Ponor added a subscriber: Ponor.Nov 18 2020, 9:03 AM

Wouldn't it be best to use the tool that created a SVG for conversion to PNG as well, for one would get on commons exactly what they would get at home? No library can beat that, I am afraid.

My experience is that files that look the same in Inkscape, Firefox and Chrome usually do not work as expected on commons (blurs, text, some clones, hatch fills are usually broken). But how do you fix them than by trial and error - you upload, check all the elements, go back to Inkscape, remove one fancy feature, upload again, go back to Inkscape, turn text to paths, unclone, etc.

Inkscape slow? How many SVG conversions a day are we talking about? When profiled, was Inkscape tested in its ''shell mode'' (inkscape --shell)? That way export actions could be sent through a pipe and Inkscape would not need to be started for every single conversion.

This comment was removed by Milimetric.

@Aklapper T19012 is imho for attaching source files, but @Ponor is imho talking about converting Inkscape-SVG to PNG by Inkscape and Batik-SVG to PNG by Batik, and Adobe-SVG to PNG by Adobe Illustrator, ....

Wouldn't it be best to use the tool that created a SVG for conversion to PNG as well, for one would get on commons exactly what they would get at home? No library can beat that, I am afraid.

e.g. Adobe Illustrator is proprietary and won't be used by Wikimedia. That is also the reason why fonts like Arial or Times are not allowed.

My experience is that files that look the same in Inkscape, Firefox and Chrome usually do not work as expected on commons (blurs, text, some clones, hatch fills are usually broken). But how do you fix them than by trial and error - you upload, check all the elements, go back to Inkscape, remove one fancy feature, upload again, go back to Inkscape, turn text to paths, unclone, etc.

  1. Inkscape uses partly SVG 1.2 and SVG 2.0 features, which are not allowed in the current SVG 1.1-standard, since Firefox and Chrome support such features and librsvg not. Librsvg is the only program that renders it correctly according to the current standard (basically SVG 1.1 says if you use a undefined SVG1.1-feature it should be ignored, that's not what Chrome/Firefox are doing, they render it according to the working-draft SVG2.0).
  2. You can upload it to https://commons.wikimedia.org/w/index.php?title=Commons:Commons_SVG_Checker&withJS=MediaWiki:CommonsSvgChecker.js and you will see the results and it will get checked for commons librsvg-errors.
  3. You can use https://svgworkaroundbot.toolforge.org/ (activate run svgcleaner and activate scour) and it fixes most problems without visual change, some bugfixes are listed at https://commons.wikimedia.org/wiki/User:SVGWorkaroundBot .

Inkscape slow? How many SVG conversions a day are we talking about? When profiled, was Inkscape tested in its ''shell mode'' (inkscape --shell)? That way export actions could be sent through a pipe and Inkscape would not need to be started for every single conversion.

Inkscape is for creating SVGs not for converting SVGs and is more than 6times slower than librsvg and will run into time-out see T200866

According to https://en.wikipedia.org/wiki/File:Commons_Growth.svg it is currently 20000files per day, about 2.8% are svg, that is roughly 500 svg-files per day. If you check the newest SVGS https://commons.wikimedia.org/wiki/Special:NewFiles?mediatype[]=DRAWING&wpFormIdentifier=specialnewimages it seems it is more like 1000 svg-versions per day.

SVGs on average have maybe 2versions and get rendered in min. 14 different sizes, thats roughly 30 pngs per svg.

And adding more software will in the long run lead to more problems. Librsvg did an enormous progress since version 2.40 (2016) and most librsvg-bugs (I guess 80% of current problems) are already fixed, see subtasks of T193352 , however updating seems to be more challenging.

PS. I fixed >500 svg-files in Category:Pictures_showing_a_librsvg_bug_(overwritten_with_a_workaround) and other categories so I can definitely say a better renderer would have saved many man-hours. But for wikimedia I'm cheaper (volunteer) than larger servers (costs).

According to Grafana, eqiad and codfw each get an average of 0.8 queries for new SVGs per second, with spikes up to 4 qps. More than 75% of those requests are handled using 575ms of CPU time on average. For context, there are 8.4 requests per second to eqiad and codfw for filetypes handled by imagemagick, including SVGs, which use 2-4s of CPU time.

4nn1l2 added a subscriber: 4nn1l2.Nov 20 2020, 3:49 AM
Ponor added a comment.Nov 20 2020, 7:15 AM

@Ponor is imho talking about converting Inkscape-SVG to PNG by Inkscape and Batik-SVG to PNG by Batik, and Adobe-SVG to PNG by Adobe Illustrator, ....

Not exactly, I was thinking that 'inkscape --shell' should do Inkscape-SVG conversions. In my random sample on commons 24/30 files were made with Inkscape. Other producers could use either 'inkscape --shell' or whatever is used now.
For, you see, Inkscape will always be the best converter for whatever Inkscape can produce. I hope we can agree on that.

  1. Inkscape uses partly SVG 1.2 and SVG 2.0 features, which are not allowed in the current SVG 1.1-standard, since Firefox and Chrome support such features and librsvg not. Librsvg is the only program that renders it correctly according to the current standard (basically SVG 1.1 says if you use a undefined SVG1.1-feature it should be ignored, that's not what Chrome/Firefox are doing, they render it according to the working-draft SVG2.0).

Is SVG2.0 forbidden on commons? What happens when I upload a 2.0 file? Again, Inkscape converting its own SVG files to PNG should always work, regardless of SVG version.

  1. You can upload it to https://commons.wikimedia.org/w/index.php?title=Commons:Commons_SVG_Checker&withJS=MediaWiki:CommonsSvgChecker.js and you will see the results and it will get checked for commons librsvg-errors.

It took some time to discover this, and yes, it helped. But we're doing there what a computer should do without us having to worry.
Then, check this file and how it's rendered at different resolutions: https://commons.wikimedia.org/wiki/File:Scanning_tunneling_microscope_-_tip,_barrier_and_sample_wave_functions.svg. It only half-works, and the above test is of little help.

Inkscape is for creating SVGs not for converting SVGs and is more than 6times slower than librsvg and will run into time-out see T200866

'inkscape --shell' to which actions can be sent through a pipe does not seem that slow at all. I'll post my results.

SVGs on average have maybe 2versions and get rendered in min. 14 different sizes, thats roughly 30 pngs per svg.

What I'm seeing is that they have 1 version on average and are rendered in some 8 sizes (roughly 200, 500, 1000, 2000 twice). Most files uploaded daily on commons are very simple <100kB SVGs. 'inkscape --shell' converts those in less than 0.25s per png (on my little linux laptop).

And adding more software will in the long run lead to more problems. Librsvg did an enormous progress since version 2.40 (2016) and most librsvg-bugs (I guess 80% of current problems) are already fixed, see subtasks of T193352 , however updating seems to be more challenging.

I'll repeat: 'inkscape --shell' will convert properly everything Inkscape made, now and forever. Not sure how that increases complexity (e.g. here https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/thumbor-plugins/+/refs/heads/master/wikimedia_thumbor/engine/svg/svg.py)

PS. I fixed >500 svg-files in Category:Pictures_showing_a_librsvg_bug_(overwritten_with_a_workaround) and other categories so I can definitely say a better renderer would have saved many man-hours. But for wikimedia I'm cheaper (volunteer) than larger servers (costs).

I think this is doable, even as an option for an advanced uploader. Given how much crap (pardon my f.) is getting uploaded each day, a few extra milliseconds for a good PNG<-SVG is a small price to pay.
C'mon Wikimedia, WP:BEBOLD.

@Ponor: If I don't misunderstand then the argumentation seems mostly about stuff created with Inkscape. What about stuff not created with Inkscape?

Ponor added a comment.Nov 20 2020, 8:08 AM

According to Grafana, eqiad and codfw each get an average of 0.8 queries for new SVGs per second, with spikes up to 4 qps. More than 75% of those requests are handled using 575ms of CPU time on average. For context, there are 8.4 requests per second to eqiad and codfw for filetypes handled by imagemagick, including SVGs, which use 2-4s of CPU time.

Thanks for this info. I did two little tests on my linux laptop. First, I took 8 SVGs from "Category:Featured pictures on Wikimedia Commons - vector" (Inkscape:6, CorelDRAW:1, Illustrator:1; file sizes 100k, 2×150k, 300k, 400k, 700k, 1400k, 2200k) and ran 'inkscape --shell' actions (of this type 'file-open:AntigenicShift_HiRes.svg; export-type:png; export-width:600px; export-do;') for all 8 at once. Got this:
png width 300px: total 5s ⟶ 0.6s/file
png width 600px: total 6s ⟶ 0.8s/file
png width 1200px: total 9s ⟶ 1.1s/file
png width 2400px: total 18s ⟶ 2.2s/file
Not bad, huh?

Next, I took 30 random files (mostly flags, maps and logos) uploaded between 2015 and 2020 on commons. Mean file size 700k, quartiles 1, 13, 103, 915, 9500 kB. Of those, 20+4 were made with inkscape, 2 with Illustrator, 2 with Batik, 1 with matplotlib, 1 with gnuplot.

width 300px: total 15s ⟶ 0.5s/file
width 600px: total 18s ⟶ 0.6s/file
width 1200px: total 30s ⟶ 1s/file
width 2400px: total 62s ⟶ 2s/file

6 smaller files (26k, 95k, 19k, 2k, 13k, 180k) scaled to 4 widths (300px, 600px, 1200px, 2400px)
total time 6s ⟶ 1s/svg scaled to 4 widths ⟶ 2s/svg scaled to 8 png widths (typical for commons)

6 medium/large files (these are not too frequent) (460k, 900k, 780k, 520k, 1.3k, 1.3k) scaled to 4 widths (300px, 600px, 1200px, 2400px)
total time 50s ⟶ 8s/svg scaled to 4 widths ⟶ 16s/svg scaled to 8 png widths
5 of the last 6 file, without the slowest one (Afgewezen ontwerp van het provinciewapen Gelderland, 1893-1941.svg)
total time 21s ⟶ 4s/svg scaled to 4 widths ⟶ 8s/svg scaled to 8 png widths

4 s to convert each svg (offline, uploader does not wait for conversions to end), 1000 svg files a day, that's 1.1 hours of CPU time for wikimedia, no big servers, just my laptop.

Gilles added a comment.EditedNov 20 2020, 8:19 AM

The problem isn't as much the amount of SVGs we get per day, than the fact that we render thumbnails on demand when they're for a file/size combination never requested before. Any extra rendering time is a penalty for that viewer. The issue compounds if they request a lot of new thumbnails at once, making them more likely to run into throttling limits, resulting in erroring images. That can easily happen on galleries that get visited very rarely. But some people's workflows get them to visit those a lot and their overall experience becomes terrible.

We prerender the most common sizes at upload time, but there's a very long tail of more exotic thumbnail sizes requested because editors customised the sizes they wanted with wikitext, or the wiki itself has different defaults, etc.

SVG isn't the only format that would benefit from a more time-consuming encoding yielding higher fidelity or a smaller file, what we really need is a pipeline for improving existing thumbnails with better encoding asynchronously. On misses generate a fast, inferior thumbnail so that the first user gets something quickly and spawn an async job that will generate the ideal thumbnail for the next person to view it. That's quite an ambitious project that I can't undertake at the moment. Maybe that can be my next big project after I complete migrating our Thumbor service to Buster/Python3/Thumbor 5/Kubernetes, which is probably going to keep me busy for several months.

Ponor added a comment.Nov 20 2020, 8:40 AM

@Ponor: If I don't misunderstand then the argumentation seems mostly about stuff created with Inkscape. What about stuff not created with Inkscape?

I'm focusing on Inkscape because it's free and most often used to make SVGs (80% of all uploads?). Also, with Inkscape we know that every Inkscape-SVG to PNG conversion will work, I mean, it should, and this conversion can be checked by the authors beforehand.
Most files produced by matplotlib, gnuplot, batik (?) are very simple, and png conversion in Inkscape for them should also work, but this would have to be tested; worst case, stay with whichever converter is being used now.
For Illustrator and CorelDRAW files (10% of all uploads max?), I'd say it doesn't really matter, use librsvg or 'inkscape --shell', that's trial and error now, and will be trial and error then.

In short, things can stay the same or 'inkscape --shell' can be used for files generated with other software; 'inkscape --shell' for Inkscape SVGs gives a lot more predictable results (+some nice features that are missing in current converters).

Ponor added a comment.Nov 20 2020, 9:02 AM

The problem isn't as much the amount of SVGs we get per day, than the fact that we render thumbnails on demand when they're for a file/size combination never requested before.

Thanks for this clarification, quite interesting! I actually thought that you only serve those PNGs that have been cached or stored when the SVG was uploaded, given the fact that PNGs on WP look a bit blurry, unlike SVGs scaled to the same size in the very same browser. It really surprises me that you're generating the exact requested size every time. Why not just send the closest (bigger) PNG and let the client scale it to the exact size (www style)?
But anyway, I was more concerned about predictability of SVG to PNG conversion as someone who sometimes makes and uploads SVGs, and wanted you to (re)consider using 'inkscape --shell' for this conversion, at least for Inkscape-generated files.

For, you see, Inkscape will always be the best converter for whatever Inkscape can produce. I hope we can agree on that.

I prefer Inkscape or resvg compared to the current rsvg, however I can not fully agree on that:
Files should be SVG-Files not Inkscape-Files, otherwise making derivatives/improvements/translations will be difficult. More Infos why Wikimedia only allows free format files: Commons:File_types

All Inkscape-svg-files are not valid according to the SVG-document type definition.
E.g. one of your recent files: https://validator.w3.org/check?uri=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F7%2F73%2FDivergence_of_a_vector_field_in_the_rectangular_coordinate_system_-_derivation.svg&charset=%28detect+automatically%29&doctype=SVG+1.1&ss=1&group=0&user-agent=W3C_Validator%2F1.3+http%3A%2F%2Fvalidator.w3.org%2Fservices

Generally invalidity is not a problem, but if those inalid attributes significanlty effect the render-result it is a problem. (It is a problem of the file; It is not a problem of the renderer)

Is SVG2.0 forbidden on commons? What happens when I upload a 2.0 file? Again, Inkscape converting its own SVG files to PNG should always work, regardless of SVG version.

Basically SVG is SVG, so you cannot generally distinguish them and newer standards (generally) support older features. The newest standard is (currently) SVG 1.1, everything else are more or less private functions, that do not guaranty to work anywhere else. In Category:Images_with_SVG_2.0_features you can see what happens if you use SVG 2.0-features. (Sometimes you have to check the file-history.)

Then, check this file and how it's rendered at different resolutions: https://commons.wikimedia.org/wiki/File:Scanning_tunneling_microscope_-_tip,_barrier_and_sample_wave_functions.svg. It only half-works, and the above test is of little help.

You just need to edit width="516.3" height="324.3" to height="648.6" width="1032.6" viewBox="0 0 516.3 324.3" and you have the image in another resolution, or you use File:Test.svg.

I'll repeat: 'inkscape --shell' will convert properly everything Inkscape made, now and forever.

SVG is a developing format, it is difficult to predict future. For example Inkscape changed in 2014 the dpi from 90 to 96, which can lead to rendering-issues (Wrong borders).

PNGs on WP look a bit blurry

  1. On Windows it can be e.g. due to a scaling different than 100%, my laptop had e.g. 125% as default.
  2. Antialising makes edges blurry, see :w:en:Spatial_anti-aliasing. To disable it you can try to use shape-rendering="crispEdges" see https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/shape-rendering .(however I think it is not supported by librsvg)
  3. feGaussianBlur is buggy, generally too strong, with the librsvg-version at wikimedia, see e.g. https://commons.wikimedia.org/wiki/File:Question_Mark_Icon_-_Blue_Box.svg , imho it is fixed in the current librsvg-version.

@Gilles Is it possible to try resvg and/or inkscape at https://commons.wikimedia.beta.wmflabs.org ? Where to propose a test for beta.wmflabs and how to get access ?

Glrx added a comment.EditedNov 20 2020, 10:45 PM

Where to begin?

First, I'll thank Ponor for his measurement data. A mean file size of 700 kB is disheartening; it is just too big; WMF probably does not want to serve 700 kB images. Gilles exposition is informative as always. Johannes spent significant time on his reply, and yes, Commons should be dealing with SVG files rather than Inkscape, Illustrator, or CorelDraw files.

WMF should start serving SVG files instead of always converting to PNG.

There are many reasons for serving SVG files directly.

A. Better fidelity.

When MW started accepting SVG files, there was not good SVG support in browsers but there was good PNG support. Browsers have advanced, and current browser support is probably much better than librsvg. For example, browsers support textPath but librsvg does not. Modern browsers need to offer support around the world, so they have paid more attention to BiDi and to painting vertical Chinese characters. 'librsvg' messes up on BiDi and the vertical spacing of Chinese characters. Consequently, modern browsers do a better job of rendering SVG.

B. Some SVG is more compact than the PNG.

I loaded https://en.wikipedia.org/wiki/Ionization_energy into my browser. That page renders the image https://commons.wikimedia.org/wiki/File:First_Ionization_Energy.svg at 350 pix with this URL:

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/First_Ionization_Energy.svg/350px-First_Ionization_Energy.svg.png

The debugger states the response header content-length is 12543 bytes. A file that is 350 pixels by 140 pixels at 3 bytes/pixel is 147,000 bytes, so the PNG has a compression ratio of about 12:1.

The original SVG file is 36 kB. However, that file is transferred from Commons with GZIP compression, so the network transfer is only 6357 bytes -- about half the PNG file used in the article.

Ponor's https://commons.wikimedia.org/wiki/File:Scanning_tunneling_microscope_-_tip,_barrier_and_sample_wave_functions.svg is only 21 kB. I do not think it will compress as much; a 300-px PNG image was 12979 bytes.

There are many large SVG files on Commons. Given network bandwidth issues, it would be better for a large files to be rendered and cached on the server.

For example, https://en.wikipedia.org/wiki/File:Gibraltar_map-en-edit2.svg is 1.46 MB.

The 290-px PNG https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Gibraltar_map-en-edit2.svg/290px-Gibraltar_map-en-edit2.svg.png is only 79 kB.

C. librsvg has long term bugs and recent changes will probably break MW.

IIRC, the original 'librsvg` developer went on to other projects, so the code was static for several years. A couple years ago, some other developers picked it up, but their efforts included converting the C++ code to Rust. The Rust conversion blocked WMF from updating the code on its servers and getting the benefit of recent bug fixes. There has been no progress on significant issues such as textPath and small characterr pixel quantization. T154237 presents further problems. the new version of librsvg wants a locale string (e.g., "en_US") rather than a langtag (e.g., "en-US"). I suspect that implies trouble for Chinese languages and WMF's non-compliant "sr-EC" and "sr-EL" langtags.

We could localize the SVG before handing it to the renderer, but that would not get around the other rendering issues.

D. Problems with librsvg rendering lead many editors to convert text to curves

Many editors draw a nice graphic on their machine, upload it to Commons, and see a horrible result. They try some iterations, throw up their hands, and convert the text to curves. That bloats the file, makes it impossible to subsequently translate the file to another language.

E. No webfont support.

It is technically possible for me to create an SVG file that uses an exotic font. For example, Google is developing a Siddham font for ancient Sanskrit. Using CSS, I can point to a URL definition of a font. Then the SVG will display properly on a modern browser even though the user never installed an exotic font on his machine. WMF currently disallows that technology because it blocks URLs in CSS and xml-stylesheet.

We can debate security issues (Googleapi fonts allow frequent user tracking even though the fonts have year-long cache times), but librsvg (and other contenders?) do not have extensive CSS support.

F. Serving SVG puts the computational burden on the user's device

Exotic effects such as Gaussian blur would be done on the user's device.

That can be a blessing, but it can also be a curse. SVG can become comples to render, so they can burden the user's device. WMF gets around that problem by using a rendering timeout.

Although directly served SVG would obviate the server doing the rendering, it may have an inordinate cost in server network bandwidth. SVG files are often inordinately large. Most Inkscape users output files that have lots of redundant style information. GZIP may compress a lot of that, but the redundant information should not be in the file in the first place. Inkscape also chooses an unusual grid, so the coordinates in the file look like random numbers (127.5648 rather than 130).

G. Serving SVG would allow text selections, tool tips, and animations

Converting an SVG to a static PNG disables lots of wonderful SVG features. I cannot copy text out of a PNG. SVG files can provide built-in imagemap features: a rect with a title element will display a tooltip. MW animations are primarily GIF files or movie files. This 140 kB GIF animation of pi is very nice:

https://commons.wikimedia.org/wiki/File:Pi-unrolled-720.gif

It could also be done with a directly served SVG file. SVG files can also do animations with user interactions. One could single step a mechanical mechanism.

Path Forward

MW should directly serve small (say < 20 kB) SVG files that are flagged. A flag may be important because many SVG files have fixed width and height.

When the parser reads the page, it can check the file size and emit an HTML tag (object?) with a URL for a scalable SVG file. If the file is too big, then it emits the usual img with a PNG URL.

The image scaler will serve the cached SVG. If the SVG is modified and becomes bloated (e.g., > 20 kB), then the image scaler can substitute a small SVG file that says the page must be rebuilt.

Down the road, the image scaler may want to do language localizations to preserved the current systemLanguage semantics. (WMF may want to let the user agent control the rendered language, but that is not what MW does today.) An SVG file may be localized with an XSLT script similar to the PHP $lang variable being inserted to img URL and running librsvg with that language argument. The same script can strip width and height and make sure that viewBox exists.

@Glrx Hi. As the author of resvg, I would like to point out some limitations of serving SVG directly. Yes, browsers are great, but they are not perfect either. Both Chrome and Firefox have tons of issues with SVG rendering. Even the textPath feature you've mentioned is actually badly supported. The situation with filters is pretty bad in Chrome. And not as great with complex text either. No browser is supporting enable-background (deprecated in SVG 2 through).

As for the SVG interactivity, yes SVG animations are great, but badly supported and basically non-exitent, mainly because there are no tools to create them (afaik). And the amount of people the can write such files by hand is rather small.

Yes, resvg is far from completion, but it's a viable alternative.

Aklapper added a comment.EditedNov 21 2020, 10:42 AM

Is it possible to try resvg and/or inkscape at https://commons.wikimedia.beta.wmflabs.org ? Where to propose a test for beta.wmflabs and how to get access ?

Could that be discussed in a separate ticket, please? This one has already become an unhandable catch-all. :-/

Gilles added a comment.EditedNov 23 2020, 8:41 AM

My recollection of why we don't serve user-submitted SVGs directly as thumbnails is that the last time this was looked at there was no robust and up-to-date FLOSS SVG sanitiser that could ensure that the SVGs were safe to display directly in the browser.

XML is notoriously hard to sanitise and there are new tricks invented regularly to bypass sanitisation. Essentially, we don't want to deal with the possibility of a badly intentioned actor being able to inject a tracking URL inside an SVG that would let them collect IP addresses of anyone viewing that image in an article, run some arbitrary javascript, or worse, being able to leverage a browser security flaw in SVG parsing.

Furthermore, we would still need to have fallbacks for browsers that either don't render SVG natively or do a terrible job at it.

Glrx added a comment.Nov 25 2020, 11:55 PM

My current understanding is that WMF already does some sanitization of SVG files.

SVG with suspicious DTD subsets are rejected. There was an interesting DTD injection attack. I suspect only entity definitions are allowed. None of that seems to be a big issue. The modern view is that SVG files should not have DOCTYPE processing instructions. (Adobe Illustrator uses DTD subsets to define local entities.)

SVG with obvious Javascript/ECMAscript code is suppressed, I do not know the detection method. Presumably, all script elements could be forbidden and style elements restricted to type="text/css".

Presumably, event attributes such as onclick cause file suppression. Those attributes would allow arbitrary script injection. WMF could not rely on ECMAscript interpreters providing reasonable safety.

SMIL event attributes were allowed (e.g., begin, dur, and keytimes). IIRC, these attributes are declarative. (SMIL still has 98% support in browsers. See https://caniuse.com/svg-smil

I do not recall what MW does with a elements. Clicking on an element could take a user to a malicious site. However, allowing anchor elements that link to WMF sites could be a valuable feature. A diagram of an automobile could link to WP articles about tires, wheels, brakes, batteries, and engines. An image of a cell could link to DNA, mRNA, ribosome, and mitochondria.

SVG with xml-stylesheet processing instructions are rejected.

SVG with non-local URL references are rejected. For example, <use xlink:href="http://..." /> is problematic.

Embedded data URLs are limited to JPEG and PNG streams.

CSS with non-local URLs causes file rejection. For example, @font-face { font-family: foo; src: url(...); }. Webfonts present a tracking threat. I'd like to whitelist googleapi fonts, but they have short ET-phone-home cache times. Webfonts also present a DoS threat.

metadata elements with non-local URLs are allowed. RDF is declarative; Creative Commons RDF requires URLs to arbitrary URLs. SVG agents do not need to chase anything inside metadata elements to display an image.

SVG can do a DoS attack by describing a complicated image. Simple hierarchy can demand the painting of thousands of complex subimages. MW catches those files with a timeout. If we prohibit animations, then MW could verify files render in finite time by passing them through librsvg.

I do not know how the current SVG scanner operates. If the current level of protection is inadequate, then we could run the SVG through a transform that removes DTD subsets and keeps only whitelisted elements and attributes. Scanning CSS could be a problem.

What security issues does serving SVG present?

At this time, I am unsure whether SVG fallbacks are needed.

https://caniuse.com/?search=SVG

shows Opera Mini and UC Browser for Android (roughly 2% of global usage) as having unknown SVG support. Samsung Internet is also unknown at 0.72%, but a later version has SVG support. Unknown support may not be no support. SVG is significant enough now that I would expect minimal support in every browser.

I do not want to disenfranchise 2% of viewers, but the reason for this topic is that the current SVG to PNG conversion has significant problems now and will probably introduce further problems. Those problems affect not only accurate rendering of the SVG, but also frustrate content creators. SVG files that render on their local machine do not render correctly on WP pages. SVG files that should be easily modified cannot be because graphic artists have used librsvg workarounds.

Here is the big picture.

WMF has been serving converted SVG-converted-to-PNG files for years. That can still be a reasonable thing to do given that many SVG files on Commons are >400-kB monsters.

librsvg has been the image converter. The program has served WMF well, but it has significant problems. It's track record for fixing problems that are important to WMF has been very slow. It would be easy enough to continue using the current (old) version of librsvg. If we upgrade to a newer version, many MW code modifications may be required. Also, the new version may be incompatible with many switch-translated files. We may be stuck in the past. The newer version would not be a substantial upgrade.

A more recent resvg could be a better alternative. It seems that program is more faithful to the SVG specifications for those features it has implemented. Employing resvg would involve changing a few lines of MW code. It may offer substantial benefits. There are some downsides. I'm not sure how quickly issues would be fixed. Is it ready for prime time?

Starting to serve SVG files directly offers more features over static PNG files. Serving SVG can offer benefits that we do not yet appreciate. It requires more in-house work, but that should be a reasonable expense. There may be errors in rendering SVG files, but those errors are more diverse (one browser might do it right while another does it wrong), but the developer community for those browsers is larger, so the time-to-fix might be much shorter than the 6 years and counting for librsvg.

My current understanding is that WMF already does some sanitization of SVG files.

At upload time, yes. There are plenty of existing SVGs that predate some of the current checks though.

SVG with suspicious DTD subsets are rejected. There was an interesting DTD injection attack. I suspect only entity definitions are allowed. None of that seems to be a big issue. The modern view is that SVG files should not have DOCTYPE processing instructions. (Adobe Illustrator uses DTD subsets to define local entities.)

SVG with obvious Javascript/ECMAscript code is suppressed, I do not know the detection method. Presumably, all script elements could be forbidden and style elements restricted to type="text/css".

Presumably, event attributes such as onclick cause file suppression. Those attributes would allow arbitrary script injection. WMF could not rely on ECMAscript interpreters providing reasonable safety.

That is correct.

SMIL event attributes were allowed (e.g., begin, dur, and keytimes). IIRC, these attributes are declarative. (SMIL still has 98% support in browsers. See https://caniuse.com/svg-smil

I do not recall what MW does with a elements. Clicking on an element could take a user to a malicious site. However, allowing anchor elements that link to WMF sites could be a valuable feature. A diagram of an automobile could link to WP articles about tires, wheels, brakes, batteries, and engines. An image of a cell could link to DNA, mRNA, ribosome, and mitochondria.

Currently, they're ignored.

SVG with xml-stylesheet processing instructions are rejected.

SVG with non-local URL references are rejected. For example, <use xlink:href="http://..." /> is problematic.

Embedded data URLs are limited to JPEG and PNG streams.

CSS with non-local URLs causes file rejection. For example, @font-face { font-family: foo; src: url(...); }. Webfonts present a tracking threat. I'd like to whitelist googleapi fonts, but they have short ET-phone-home cache times. Webfonts also present a DoS threat.

Fonts will be the big problem. Many browsers will prevent fonts from being loaded for SVGs in <img> tags. Serving anything from a Wikimedia site that calls back to Google is a hard no.

metadata elements with non-local URLs are allowed. RDF is declarative; Creative Commons RDF requires URLs to arbitrary URLs. SVG agents do not need to chase anything inside metadata elements to display an image.

SVG can do a DoS attack by describing a complicated image. Simple hierarchy can demand the painting of thousands of complex subimages. MW catches those files with a timeout. If we prohibit animations, then MW could verify files render in finite time by passing them through librsvg.

I do not know how the current SVG scanner operates. If the current level of protection is inadequate, then we could run the SVG through a transform that removes DTD subsets and keeps only whitelisted elements and attributes. Scanning CSS could be a problem.

What security issues does serving SVG present?

It's the same as serving any arbitrary XML for a browser to render. We can't trust that the browser will handle malicious content for us, so we need to make sure that we aren't sending any. Doing so would require expanding the existing SVG checker in MediaWiki or finding one that someone else has built. Right now, the SVG security checker is mostly a "nice to have" minimal protection for those downloading SVGs and the software handling them. The actual security for most readers comes from the server-side rasterization in a restricted environment.

At this time, I am unsure whether SVG fallbacks are needed.

https://caniuse.com/?search=SVG

shows Opera Mini and UC Browser for Android (roughly 2% of global usage) as having unknown SVG support. Samsung Internet is also unknown at 0.72%, but a later version has SVG support. Unknown support may not be no support. SVG is significant enough now that I would expect minimal support in every browser.

I do not want to disenfranchise 2% of viewers, but the reason for this topic is that the current SVG to PNG conversion has significant problems now and will probably introduce further problems. Those problems affect not only accurate rendering of the SVG, but also frustrate content creators. SVG files that render on their local machine do not render correctly on WP pages. SVG files that should be easily modified cannot be because graphic artists have used librsvg workarounds.

SVG support across browsers is inconsistent. At least with server-side rasterization we all see the same bugs. Even targeting browsers directly, there will be "works fine for me" rendering problems. T134410: Evaluate SVG rendering compatibility in browsers has some initial thoughts, but this is something that would have to be thoroughly researched before any decision could be made.

Here is the big picture.

WMF has been serving converted SVG-converted-to-PNG files for years. That can still be a reasonable thing to do given that many SVG files on Commons are >400-kB monsters.

librsvg has been the image converter. The program has served WMF well, but it has significant problems. It's track record for fixing problems that are important to WMF has been very slow.

That's a strong statement, and I'm not sure it's entirely true. Most delays, at least right now, are blocked on deployment, not upstream development.

It would be easy enough to continue using the current (old) version of librsvg. If we upgrade to a newer version, many MW code modifications may be required. Also, the new version may be incompatible with many switch-translated files. We may be stuck in the past. The newer version would not be a substantial upgrade.

Backporting software because it is not available for the current OS version is one thing. Intentionally running old software when new, stable version are available in the repos is another, and it goes against good practice.

We'll get a new version of librsvg sooner rather than later, whenever T216815: Upgrade Thumbor to Buster gets done (probably before mid-2021).

I'll also remind you that client-side-rendered SVGs are 100% incompatible with language switching, as far as I'm aware. At best, they would render in the browser language, not the page language.

A more recent resvg could be a better alternative. It seems that program is more faithful to the SVG specifications for those features it has implemented. Employing resvg would involve changing a few lines of MW code. It may offer substantial benefits. There are some downsides. I'm not sure how quickly issues would be fixed. Is it ready for prime time?

I don't know. So far, no one has tested it against Commons SVG files. That needs to be done before there's any serious thought given to switching renderers.

resvg also depends on Rust, and we don't have Rust in Debian Stretch anyway (this is why librsvg hasn't been upgraded). So switching to resvg is also blocked on T216815.

Starting to serve SVG files directly offers more features over static PNG files. Serving SVG can offer benefits that we do not yet appreciate. It requires more in-house work, but that should be a reasonable expense. There may be errors in rendering SVG files, but those errors are more diverse (one browser might do it right while another does it wrong), but the developer community for those browsers is larger, so the time-to-fix might be much shorter than the 6 years and counting for librsvg.

That sort of discussion is largely outside the scope of this task and belongs more in T134410: Evaluate SVG rendering compatibility in browsers. There's significant groundwork required to get that anywhere close to working.