Page MenuHomePhabricator

SVG to PNG conversion, minimization, sanitization service
Closed, DeclinedPublic

Description

Following up on the mathoid discussion in T71702, it would be great to set up a simple end point / service for SVG to PNG conversion. We could use the rsvg bindings for node, and expose an API that

  1. supports posting an SVG & returning the PNG, or
  2. supports posting / GETing an URL that matches a whitelist and returns a PNG for the SVG living at that location.

The second task (whitelist matching and SVG fetching) could also be handled by RESTBase, so maybe we should just do 1) for now, with access restricted to the internal network.

In a later step, this service could be extended to also support:

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke changed Security from none to None.
GWicke subscribed.
GWicke triaged this task as Medium priority.
GWicke renamed this task from SVG to PNG conversion service to SVG to PNG conversion, minimization, sanitization service.Apr 23 2015, 9:43 PM
GWicke updated the task description. (Show Details)

It would be great to base this off https://github.com/wikimedia/service-template-node.

As discussed in the IRC, I thought to have a go at this if that is ok using the service-template-node. Certainly the first part anyway. It looks like a fun project. I was thinking on names and not sure if there is a convention to follow. There seems to be a number of mirrors on github. The first names that spring to mind are:

  • mediawiki-services-svg-to-png
  • restbase-svg-to-png

Thanks.

@Jcook , the service should be hosted on WM's Gerrit host. Please take a look at the guide about opening an account and requesting a repository.

I thought to have a go at this if that is ok using the service-template-node.

Yup, definitely. The idea of the template is to get developers started up quickly in developing services and let them iterate fast over changes. We can help in any issue you might encounter along the way.

I was thinking on names and not sure if there is a convention to follow

All services having a repository in Gerrit follow the scheme mediawiki/services/<service-name>. As for the concrete service-name here, I'd vote for something simple, such as svg2png, svg-converter or img-convert (to make it perhaps slightly more general).

the service should be hosted on WM's Gerrit host

Eventually, yes. @Jcook, if it's easier for you to start on github then that's fine too. We can import the repo to gerrit fairly easily later.

Re name: I like the idea of keeping it relatively generic. vectoid sounds cool IMHO, but is somewhat specific to SVG. Another generic option could be picserf.

@Jcook any news, developments on this front? Still interested in taking this on?

@mobrovac - I did make a start on this but it has been going a bit slowly as I've been busy with my day job. I apologise. I would still be interested. Do you need it completed by a certain date/time? Thanks

No worries, @Jcook, was just wondering what's up with this. Great to hear you're on it! We have showcased the service template at the Lyon Hackathon and given a real-world example of an SVG to PNG service - T99861: Building Microservices in Node.JS. Take a look at the slides posted there, they will hopefully allow you to get started pretty quickly.

No strict deadline, but as with everything nowadays - the sooner the better :) Ofc, if you have any questions, let us know.

Yep, Graphoid can totally use it already ))

OK great. Thanks for the info.

Do we still want to do the first part as well, e.g) have an endpoint to post a svg payload to convert to png. The slides focus on the second example I think, the url matching part etc? Thanks.

Do we still want to do the first part as well, e.g) have an endpoint to post a svg payload to convert to png. The slides focus on the second example I think, the url matching part etc? Thanks.

Right. The slides focus on obtaining an SVG from an existing MW installation and turning that into a PNG. We still need both parts, though. So, yes, we still need the first part as well.

Also note that the service described in the slides does not actually exist, so we need the part done by it as well.

I think the very first thing we need is SVG sanitizer - without it, we cannot safely serve service-generated SVGs.

I know it has been a long time since I picked up this issue. But I would still be very keen to contribute. I have some time available to work on this now. I'm sorry for the long radio silence. Thanks.

@Jcook, that would be awesome if you could figure out how to do SVG sanitization - given an SVG, ensure that it is secure, and possibly remove anything that could be compromised. The code would need to run in nodejs. Thanks!

@Jcook, regarding the svg2png service itself, it seems like there is already an effort on that front in the form of Thumbor - a thumb-nailing service which should be able to accomplish the conversion itself. Pinging @Gilles : we need a svg2png converter, and from the looks of it (T120204: Thumbor SVG support) Thumbor should be able to provide us that, correct? Does Thumbor provide SVG sanitazation facilities as well?

I've written the Thumbor SVG "engine", it relies on the rsvg binary, which is what mediawiki uses at the moment. I believe I use it in the same way mediawiki does. In essence Thumbor just takes media files in and outputs media files, one can certainly use it as a way to sanitize/lighten files while keeping the original dimensions. However, since I had to write the SVG engine and sanitization wasn't a need for my project, such support would have to be written. In the form of a custom Thumbor filter and/or modifications to the SVG engine.

Relevant code: https://phabricator.wikimedia.org/diffusion/THMBREXT/browse/master/wikimedia_thumbor/engine/svg/svg.py

I think it makes complete sense to write the necessary additions to Thumbor for this task, instead of writing a new service from scratch. Even if we end up running it separately from the thumbnailing cluster with a separate Thumbor instance configured only to handle SVG sanitization, for example. So far I've written all my extensions so that we can deploy a swiss army knife Thumbor instance that can do everything if we want to, but the option to configure each instance for a specific file type/task is also possible.

Perhaps this is blocked for now then if I understood the last comments. Could there be a suggestion for another ticket to pickup? I tried to ask in the irc but no one responded. Thanks.

With T142226 we will have yet another production service offering SVG rasterization, in that case based on Chrome's renderer. Chrome should have more complete coverage especially of HTML-in-SVG. That said, as long as we use rsvg in production, this is also a downside, as it would mean that some things would render in Chrome that don't in rsvg. We would also have to consider potential security issues in features not currently supported by rsvg.

Thumbor is on its way to handling production traffic this quarter, it seems more appropriate to me for this use case, as it will be responsible for the canonical thumbnail rendering. An SVG->PNG service based on Thumbor would ensure identical conversion compared to thumbnails and no duplicated logic..

It would have to be configured and exposed differently if one wants to transform images without them ending up as thumbnails saved in swift, but that's trivial. Such a transform-only Thumbor instance is also needed by Multimedia for their post-upload rotation/flipping tool.

I can walk someone through the few steps needed to get there.

As for using a different renderer, that's possible, but painful rsvg updates in the past have shown that it's very difficult to achieve that without losing some existing features. Even with rsvg itself, when updating to a new version we gain new features but tend to see some regressions. I imagine it's the same with Electron, only it hasn't had the kind of SVG user exposure yet that would surface those issues as fast as an rsvg update does.

I'm more than happy to switch to a better renderer if one comes along. If we do, it would be a nice little project to come up with a compatibility test suite based on past regression reports on rsvg. However, while Electron seems to have been gotten a pass so far on running without a proper secured environment, it's quite reckless to propose switching to it as the main SVG rendering mechanism before that's set up. Like anything XML-based, SVG is a very dangerous format security-wise. Thumbor, like Mediawiki, runs rsvg firejailed. Anything proposed to replace it ought to be able to run firejailed as well.

mobrovac edited projects, added User-mobrovac; removed Blocked-on-Services.

@Gilles so using the Thumbor instance that is about to hit production would not be possible? I haven't looked at the code in detail, but conceptually it seems to me that it shouldn't be that hard to instruct Thumbor not to create a thumbnail and store it, but rather to convert it and/or sanitise the image. If we have an image-manipulation service, shouldn't that instance be the reference point for all image-manipulation-related tasks?

@Gilles, Chrome's SVG renderer is an alternative to rsvg and other web content rasterization tools, but is not providing any of thumbor's general image manipulation & storage functionality.

I haven't comprehensively evaluated Chrome's SVG support against rsvg, but have found that several SVGs I created with an open source diagramming tool that embedded HTML rendered fine in Chrome, but failed in rsvg. Long term, it seems fairly likely that Chrome's support will be more complete & more consistent with native SVG display in browsers. In any case, it is an option we can consider, and comes for basically free with electron.

Just as an update, the Electron / Chrome based render service is now deployed in production, and is used for PDF rendering. As mentioned earlier, it does also offer web content rasterization, which could be interesting as part of thumbor or other services.

Here is an example SVG rasterized on a labs instance: https://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://upload.wikimedia.org/wikipedia/commons/0/02/SVG_logo.svg

That link doesn't work, presumably because I need the real accessKey value?

@Gilles, it just worked for me in Chrome & Firefox. Which kind of error are you seeing?

I didn't check what error it was giving me, I think I was just seeing a blank page, it works now. Does it pick an arbitrary size to render the SVG? It's giving me a 1024x768 image, which sounds like the browser viewport dimensions and not something related to the content itself.

FYI this link you originally had in your comment: https://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://en.wikipedia.org/api/rest_v1/page/html/Barack_Obama&browserHeight=7000&browserWidth=640 kept spinning for a very long time and ultimately nginx gave me a 502.

https://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://en.wikipedia.org/wiki/Main_Page&browserHeight=100000&browserWidth=1024 generates a 1024x3000 image, not a 1024x100000 image. 3000 seems to be a limit for the actual output, which is quite low when you consider SVGs.

https://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://upload.wikimedia.org/wikipedia/commons/0/02/SVG_logo.svg&browserHeight=200&browserWidth=200 just gave me this blank image again, like I had before:

render-1482421560659.png (200×200 px, 276 B)

It's not completely blank, the gray rectangle looks like it might be the browser' scroll control.

Looks like a race condition where the PNG is rendered before the image is actually displayed in the browser.

The 3000px default limit on raster output is something we can adjust upwards for our own use. I just restarted the labs instance with default request timeout settings, which should kill requests more quickly. Previously it was running with very high timeouts, as I was using it for memory usage testing.

I am also seeing occasional strange behavior when rendering to png or jpg. It does indeed look like a screenshot is taken before the page was fully rendered / re-scaled. This is now tracked in https://github.com/msokk/electron-render-service/issues/25.

Edit: Loading the SVG via a short HTML snippet avoids the timing issue: https://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://people.wikimedia.org/~gwicke/svg.html&browserHeight=102&browserWidth=102

Pchelolo subscribed.

There has not been any progress or any requests to do this in a while. I think it's safe to assume this is not needed.