Page MenuHomePhabricator

Rethink mathoids SVG to PNG conversion
Closed, ResolvedPublic

Description

Currently, mathoid uses the node module librsvg to convert SVG images to PNG images. This module is unmaintained. And even it's predecessor does not (yet?) support node version 12 or later. I think we should join forces and maintain only one component for SVG to PNG conversion. Maintaining a separate version for math rendering does not seem feasible. Therefore, I propose to

  • understand which software is currently used in production for PNG to SVG conversion
  • discuss how mathoid or math rendering can interface with the existing service
  • test if fonts and symbols are supported (can be done independently)

Deadline: All this has to be completed before production updates from node 10 to a newer version of node.js

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

If we do follow through with the plan on switching storage for Mathoid to normal mediawiki media storage, the need for Mathoid to render it's own PNGs will disappear and we can piggyback on normal MW image thumb nailing for png conversions.

Wikimedia uses Thumbor for SVG thumbnailing these days, so the service could call that directly. Although integrating it back into MediaWiki in a thumbnailer-agnostic way would certainly be nicer.

@Tgr. I uploaded a test image https://www.mediawiki.org/wiki/File:Thumbor_test_for_rendering_of_the_LaTeX_expression_18.svg While the PNG looks different from the SVG in my browser, it is not worse than the current version. Certainly, Thumbor can handle the path-transform tricks used in the uploaded image. Is there a documentation of the API that could be used by the Math Extension?

For example, we would somehow need to convert from
https://wikimedia.org/api/rest_v1/media/math/render/svg/085412b37ef2ac62058f72715866515b3ee71f39
to
https://wikimedia.org/api/rest_v1/media/math/render/png/085412b37ef2ac62058f72715866515b3ee71f39

Thus and API endpoint of Thumbor, that gets fetches an SVG image from a URL and outputs a PNG image would be great. A solution that requires (temporary) files on disk is not possible.

I think the most generic approach would be to have your own file backend (I think the Math extension does this already?), store SVG files there, and use File::transform() which will call whatever SVG thumbnailer the wiki is using (Thumbor for Wikimedia, rsvg via shell for most other wikis).

@Tgr the svg image is just a string. We do no longer use files which caused a lot of problems on privately hosted wikis.

daniel triaged this task as High priority.Aug 13 2020, 9:29 PM

Bumping to high, so we make some decision on where this should go.

Pinging @kchapman, since this sounds like a system architecture issue.

The "storing" backend for mathoid is RESTBase as @Pchelolo says. In fact, mathoid is fully storage agnostic and one of our true lambda services. The Math:Extension issues requests to RESTBase which functions as a cache and in case the formula doesn't have an already generated output calls out to mathoid to generate it (and subsequently stores it). In fact, given the URLs are unique hashes (see examples at T247697#6089797) it's Content Addressable Storage with the actual storage decoupled from the service doing the CPU intensive work of rendering (which means we can arbitrarily easily scale it up/down as needed). That being said, RESTBase is being sunsetted. Related tasks seem to be: T262315 and more specific to Math: T252389, T274436.

All of this is tangentially related to SVG itself, but I wanted to recap the situation a bit.

As I see it, a sane plan would be (this is actual steps below, but it's a very high level plan):

  1. Not mess much with mathoid, it's working quite fine as is right now
  2. Decouple from RESTBase, storing the generated SVGs on swift (that would mean that the Math:Extension would need to obtain some of the smarts that RESTBase currently has as far as retrieving already rendered formulas) instead. The actual rendering is probably best to happen via the jobqueue asynchronously as it might be an arbitrarily slow operation (depending on the input) and it's best if it isn't on the hot path of the end-user request.
  3. Rely on thumbor and the generic SVG to PNG render it has. Thumber already fetches from Swift and generates thumbnails, so this is something we already do.
  4. Drop the svg to png functionality from mathoid and the unmaintained dependency along with it.

An alternative plan, which would be to make mathoid storage aware, either via adding some datastorage e.g. mysql/cassandra/redis support or via some abstracting service like Kask (which already powers some things like sessionstore/echostore) could also be a valid option. Steps would be:

  1. Add kask/SQL/Cassandra/Redis/whatever support to mathoid
  2. Decouple from RESTBase, storing the generated artifacts in mathoid (that would mean mathoid would need to do what the Math:Extension would do in the above plan regarding caching etc)
  3. Change thumbor to do the SVG to PNG for mathoid as well
  4. Drop the svg to png functionality from mathoid and the unmaintained dependency along with it.

The latter has the drawback of needing to amend Thumbor and make it mathoid aware so it makes this the work of multiple teams. It would also make mathoid scale a bit less easily as it would not be a true lambda service anymore but it would become a stateful service. It would also (depending on the choice) add considerable code and dependencies in the project.

I 'll note that these are both are very high level plans and not very well thought out (the devil is in the details). I am more inclined to suggest the first one as it seems to require less work, keeps things simpler and follows down already trodden paths.

Since Thumbor is being discussed here, I would like to point out a few things about Thumbor's situation and its infrastructure:

  • Current librsvg version is 2.40.21 on debian stretch, with no possibility of backporting a newer version due to changes the library itself (switching from C to Rust)
  • librsvg version on Buster is 2.44.10 and on Bullseye is 2.50.3; in other words, any svg -> png renderings that look off, have the possibility of becoming better in the future
  • Although Thumbor has been serving us well, its development is somewhat stalled: last stable release was in Mar 2020, last commit was in April 2021, and it is unknown when version 7 (python3) will be released
  • Thumbor at WMF has no owner, an issue we constantly keep running into.
  • Thumbor at WMF has no owner, an issue we constantly keep running into.

This goes for almost ALL media related infrastructure, a situation that desperately needs fixing, but management and leadership of WMF keeps ignoring.

As I see it, a sane plan would be (this is actual steps below, but it's a very high level plan):

  1. Not mess much with mathoid, it's working quite fine as is right now

It needs to be updated at some point to the latest version. The new version has a mode to generate HTML in a way that it looks like math in most (maybe all?) browsers using legacy HTML.

  1. Decouple from RESTBase, storing the generated SVGs on swift (that would mean that the Math:Extension would need to obtain some of the smarts that RESTBase currently has as far as retrieving already rendered formulas) instead. The actual rendering is probably best to happen via the jobqueue asynchronously as it might be an arbitrarily slow operation (depending on the input) and it's best if it isn't on the hot path of the end-user request.

Yes, we are working on gertting rid of RESTbase. The currentl target for storing the SVG images is ObjectCache. Since the images are not exactly content addressable storage (the address is the hash of the input and not of the image) a job que for rendering the images is less problematic. Even today, there is a method to fetch non-existing images from a special case (if the input is stored in the DB). See for example

  1. Rely on thumbor and the generic SVG to PNG render it has. Thumber already fetches from Swift and generates thumbnails, so this is something we already do.

That is something that needs testing. Last time I tried it the PNGs were all empty, due to the special nature of the SVG images.

  1. Drop the svg to png functionality from mathoid and the unmaintained dependency along with it.

An alternative plan, which would be to make mathoid storage aware, either via adding some datastorage e.g. mysql/cassandra/redis support or via some abstracting service like Kask (which already powers some things like sessionstore/echostore) could also be a valid option.

I would rather like to avoid making mathoid itself storage aware.

Since Thumbor is being discussed here, I would like to point out a few things about Thumbor's situation and its infrastructure:

OK. However, I think it is better to fight with one SVG->PNG component instead of two.

  • Thumbor at WMF has no owner, an issue we constantly keep running into.

This goes for almost ALL media related infrastructure, a situation that desperately needs fixing, but management and leadership of WMF keeps ignoring.

I think the Wikimedia movement puts little emphasis on MediaWiki. I get the impression that it es easier to manage contries and regions than establishing a landscape of collaborating OpenSoruce projects with various contributors. However, this is a bit off topic, or?

In https://github.com/f3lang/node-rsvg-prebuilt/issues/11#issuecomment-1044045881 https://github.com/yisibl/resvg-js was suggested. I am uncertain if that would be a suitable alternative?

From T40010#7031414 I can say that for all test-suites resvg is faster and has a higher correctness (compared to any librsvg-version), even for (featured) images that are currently on Commons, which often have librsvg-workarounds.

However speed and correctness heavily depend on the examples: @Physikerwelt Could you provide a set of svg-examples, that I could test speed and check if any e.g. font-problems occur.

Physikerwelt claimed this task.

per T311620 we decided to discontinue PNG support.