Page MenuHomePhabricator

rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur
Open, Needs TriagePublic

Description

Gaussian blurs in librsvg 2.40.x are largely not optimized, causing larger thumbnails for files that make heavy use of Gaussian blur to time out.

  • 2.42.3 added minor optimizations (T193352)
  • 2.43.3 optimized further
  • 2.43.4 parallelized Gaussian blur
  • 2.44.0 introduced more speed improvements
Known affected files

At least the first three are rendered in 10-15 seconds with librsvg 2.48 in the requested size mentioned above.


Reported on IRC regarding https://commons.wikimedia.org/wiki/File:Jupiter_diagram.svg which has relatively large dimensions in the SVG base image.

This 2000px thumbnail has already rendered fine:
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Jupiter_diagram.svg/2000px-Jupiter_diagram.svg.png

But the file: page links offer a 5000px thumbnail which does not render:
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Jupiter_diagram.svg/5000px-Jupiter_diagram.svg.png

It takes some seconds (30-ish?) then times out with an HTTP 500 error. Too many attempts hits rate limiting, returning a 429.

A sample request failure id:

example-fail.png (211×1 px, 56 KB)

Related Objects

Event Timeline

Indeed Thumbor rate-limits failures to render a certain original after some tries, https://wikitech.wikimedia.org/wiki/Thumbor#Throttling
My guess would be that 5000px exceeds thumbor's memory limit for rendering

Usin rsvg-convert version 2.48.4, it takes 8-10 seconds to generate the 5000px thumbnail on my laptop. With rsvg-convert version 2.40.16, which is running in production, it takes 50-55 seconds. That's dangerously close to the 60-second timeout for the command. This file makes fairly heavy use of Gaussian blur, which was optimized in 2.42 and 2.43. Waiting for T193352 is the best solution here.

AntiCompositeNumber renamed this task from Timeout and HTTP 500 error on 5000px thumbnail of large SVG image to rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur.May 15 2020, 12:39 AM
AntiCompositeNumber updated the task description. (Show Details)
AntiCompositeNumber moved this task from Backlog to Upstream on the Thumbor board.
AntiCompositeNumber moved this task from Backlog to Patch merged upstream on the Upstream board.
JoKalliauer renamed this task from rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur to rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur aswell with heavy use of gradients.Jul 18 2020, 1:55 PM
JoKalliauer updated the task description. (Show Details)

@JoKalliauer, You added https://commons.wikimedia.org/wiki/File:Hematopoiesis_(human)_diagram_en.svg to this task, apparently because it was timing out because of gradients. However, I'm not able to reproduce the issue locally with rsvg 2.40.16:

$ time rsvg-convert -w 1405 -f png -u -o Hematopoiesis_\(human\)_diagram_en.svg.png Hematopoiesis_\(human\)_diagram_en.svg 

real	0m0.397s
user	0m0.288s
sys	0m0.023s

Generating a new, large thumbnail for that file also works just fine. Unless you have more information to show that this file is actually timing out, I'd ask you to revert your changes and file a new task if there are issues with that file.

JoKalliauer renamed this task from rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur aswell with heavy use of gradients to rsvg-convert times out while generating large thumbnails with heavy use of Gaussian blur.Jul 18 2020, 7:17 PM
JoKalliauer updated the task description. (Show Details)

@AntiCompositeNumber : Thanks for your answer, I think I was wrong.

This image was reported (by Sarang) on my talkpage: https://commons.wikimedia.org/wiki/User_talk:JoKalliauer#Strange_failure

If I open
https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Hematopoiesis_%28human%29_diagram_en.svg/80px-Hematopoiesis_%28human%29_diagram_en.svg.png
I get

Our servers are currently under maintenance or experiencing a technical problem. Please try again in a few minutes.

See the error message at the bottom of this page for more information.
If you report this error to the Wikimedia System Administrators, please include the details below.

Request from 193.81.142.140 via cp3055 frontend, Varnish XID 231829017
Upstream caches: cp3055 int
Error: 429, Too Many Requests at Sat, 18 Jul 2020 19:17:49 GMT

Is there a rough guess for the time-limit? @Ponor and I get completely different times for a svg-benchmark T40010#7031600 , there are several reasons for that see User_talk:Ponor. The most imporant difference might be e.g. segmentation fault after 7minutes of rendering-time (on a single CPU).

So for reevalution svg-renderer times over the time time-out-limit should be counted as failed and a time-penalty equal to the time-out-limit (in cpu-time). So for a fair comparison a reasonable time-out-limit is essential.

So why, how, and when do processes get time-out (cpu-time, real-time, memory, high cpu-load, ...)?

The caching layer complains when requests take longer than 60s wall-clock time. To avoid tripping this log unnecessarily, Thumbor enforces a 59s wall-clock timeout for subprocesses (like shelling out to librsvg). If the limit is exceeded, it is killed and the request 500s.

Rendering time depends on the efficiency of the rendering code, the SVG complexity, the output size, the characteristics of the rendering computer, rendering system configuration, and on the load that the rendering computer is experiencing (this is why some files that originally time out render fine later).