SVGMetadataExtractor is taking too much memory/time on large svgs, rendering certain pages inaccessible
Closed, ResolvedPublic

Description

See the URL given above. Error was reported on the French village pump at the Commons,

http://commons.wikimedia.org/w/index.php?title=Commons:Bistro&oldid=49919318#Et_sous_Firefox_.3F

The page User:Sting just is not served. After a loooong time (about 4-5 minutes), one gets a WikiMedia error page saying

Request: GET http://commons.wikimedia.org/wiki/User:Sting, from <MY IP OMITTED> via amssq43.esams.wikimedia.org (squid/2.7.STABLE7) to 91.198.174.35 (91.198.174.35)
Error: ERR_READ_TIMEOUT, errno [No Error] at Thu, 17 Feb 2011 20:09:23 GMT

The user's page contains quite a few images. Don't know if that might be a problem.

Behavior confirmed in Firefox 3.6.13 (Mac OS X), Safari (Mac OS X), Firefox 3.6.4 (Win XP), IE6, Opera 10.60 (Win XP); both logged in and logged out.

The page is also not served through the secure server

https://secure.wikimedia.org/wikipedia/commons/wiki/User:Sting

it returns relatively quickly a completely unstyled page saying

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /wikipedia/commons/wiki/User:Sting.

Reason: Error reading from remote server

Apache/2.2.8 (Ubuntu) mod_fastcgi/2.4.6 PHP/5.2.4-2ubuntu5.12wm1 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g Server at secure.wikimedia.org Port 443

Marked as "major" because although so far this concerns only one page, I think it's worth investigating before we find other pages. It's not clear to me whether this is some networking problem, or a caching (squid) problem, or a wikitext parsing problem, or some other problem in the MediaWiki code.


Version: unspecified
Severity: major
URL: http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz27508.
Lupo created this task.Via LegacyFeb 17 2011, 8:26 PM
Lupo added a comment.Via ConduitFeb 18 2011, 7:49 AM

It appears that this is caused by the reference

[[:File:Puerto_Rico_ecosystems_map-fr.svg]]

in User:Sting

Indeed, http://commons.wikimedia.org/wiki/File:Puerto_Rico_ecosystems_map-fr.svg also does not load.

However, even eliminating this link still leaves problems. Try clicking on the thumbnail at
http://commons.wikimedia.org/wiki/User:Lupo/q

or on the two links "SVG version" or "in French - raster": None of them is served! That's because they
all reference http://commons.wikimedia.org/wiki/Template:Other_versions/Puerto_Rico_ecosystems_map which in turn has references File:Puerto_Rico_ecosystems_map-fr.svg in a gallery.

Lupo added a comment.Via ConduitFeb 18 2011, 3:21 PM

Note that this also makes

http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

inaccessible.

It's also not possible to save an edit page if the wikitext contains an active (not commented out) wikilink to that file.

Lupo added a comment.Via ConduitFeb 18 2011, 4:06 PM

The file itself is there and loads and displays fine in Firefox 3.6.4:

http://upload.wikimedia.org/wikipedia/commons/4/43/Puerto_Rico_ecosystems_map-fr.svg

Lupo added a comment.Via ConduitMar 4 2011, 9:16 PM

The user page where the problem was originally noticed

http://commons.wikimedia.org/wiki/User:Sting

has been edited in the meantime to circumvent the problem.

However, links to this SVG file still cause problems, such as

http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

being inaccessible.

Bawolff added a comment.Via ConduitMar 4 2011, 10:01 PM

When i try to upload the (13 mb svg) file on my local wiki. I get an error with svg metadata extractor exceeding max execution time, so I think its an issue with the new svg metadata extractor.

Should maybe not try to extract metadata if file is beyond a certain size.

TheDJ added a comment.Via ConduitMar 4 2011, 10:05 PM

Created attachment 8241
patch to fix this

Attached: SVGreader.patch

Platonides added a comment.Via ConduitMar 4 2011, 10:57 PM

Stop after getting metadata

(ensure you're at least at r83254)

We could also avoid this if we stopped parsing once we got the metadata tag.
There may be files with several <metadata> tags, though, for which we would only fetch the first one.

Attached: end_metadata.patch

Bawolff added a comment.Via ConduitMar 5 2011, 2:02 AM

Created attachment 8245
Patch to only look at so much of the svg file for metadata.

How about we only look in the first 512 kb for metadata information

*Most svg files (ignoring the crazy maps) aren't even anywhere near 256 kb big
*The SVG metadata <title> and <desc> tags are almost always at the very beginning
*256 kb (Which i chose arbitrarily) of svg can be parsed pretty much instantaneously by our SVGReader class (in my tests anyways using eval.php)

Patch attached that does that. After using the patch I can successfully uploaded the Puerto_Rico_ecosystems_map-fr.svg to my wiki where before i ran into an execution time exceeded in SVGMetadataExtract type error. (Still took a long time, but i thing that's mostly from convert, which eventually gets killed by ulimit.sh) And parsing that svg using SVGReader is pretty much instantanous when done from eval.php (as i mentioned earlier in this comment) where before it took something like 7 minutes.

Attached: svgmetadatalimit.diff

Bawolff added a comment.Via ConduitMar 6 2011, 8:24 AM

I committed that in r83374. Marking this fixed as that should fix the issue (at least on my test wiki it does, using [[:File:Puerto_Rico_ecosystems_map-fr.svg]])

Gilles added a project: Multimedia.Via WebDec 4 2014, 10:17 AM
Gilles raised the priority of this task from "High" to "Unbreak Now!".Via WebDec 4 2014, 10:21 AM
Gilles moved this task to Closed on the Multimedia workboard.
Gilles lowered the priority of this task from "Unbreak Now!" to "High".Via ConduitDec 4 2014, 11:22 AM

Add Comment