Page MenuHomePhabricator

Uploading SVG file generated by matplotlib fails
Open, Needs TriagePublicBUG REPORT

Description

Matplotlib is a popular Python plotting library. It has an SVG rendering backend which creates files that typically contain the following element at the top:

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

See this for example: https://github.com/matplotlib/matplotlib/blame/192b7c24d153ab36b779bcd51f4050134d1ab254/lib/matplotlib/mpl-data/images/move.svg#L2-L3

Uploading files containing this element to MediaWiki fails with this error:

The XML in the uploaded file could not be parsed.

When the !DOCTYPE svg element is removed, the upload works. Since the matplotlib SVG files can be rendered by all common web browsers and image viewers, I think this is a bug in MediaWiki.

Event Timeline

I can upload https://github.com/matplotlib/matplotlib/blame/192b7c24d153ab36b779bcd51f4050134d1ab254/lib/matplotlib/mpl-data/images/move.svg to my dev wiki (running HEAD of master) just fine...

Which wiki are you trying to upload this to? What relevant version software versions?

We tried on the Arch Linux wiki (Special:Version) which has this config, namely this is relevant for SVG:

$wgFileExtensions[] = 'svg';
$wgSVGNativeRendering = true;

https://test.wikipedia.org/wiki/File:Move.svg worked fine too.

If I delete the file, and set $wgSVGNativeRendering = true;, it still uploads fine on my dev wiki...

T60553: Invalid xml accepted by svg upload/T67724: Chunked upload of SVGs triggers INVALIDXML exception, but file is valid

	/**
	 * @param string $filename
	 * @param bool $partial
	 * @return bool|array
	 */
	protected function detectScriptInSvg( $filename, $partial ) {
		$this->mSVGNSError = false;
		$check = new XmlTypeCheck(
			$filename,
			$this->checkSvgScriptCallback( ... ),
			true,
			[
				'processing_instruction_handler' => [ self::class, 'checkSvgPICallback' ],
				'external_dtd_handler' => [ self::class, 'checkSvgExternalDTD' ],
			]
		);
		if ( $check->wellFormed !== true ) {
			// Invalid xml (T60553)
			// But only when non-partial (T67724)
			return $partial ? false : [ 'uploadinvalidxml' ];
		}

So would be something in XmlTypeCheck resulting in $check->wellFormed being false.

But there are a few code paths that can do that.

Could be something related to the libxml version being used.

XmlTypeCheck is fairly standalone code, so shouldn't be too difficult to see where it's failing on a machine where this is replicable.

Umherirrender subscribed.

Works with php8.4 and libxml 2.11.9

# php maintenance\run.php eval
> echo LIBXML_VERSION;
21109

@Reedy An LLM agent created me these scripts to reproduce the issue in a Docker container: https://gist.github.com/lahwaacz/4c2aad65142c6619780870c8b3842f5a

It builds libxml2 from source, installs vanilla MediaWiki with SQLite, and tests uploading a matplotlib-style SVG with a <!DOCTYPE> declaration. The agent identified a breaking commit in libxml2: f6964781 "reader: Rework xmlTextReaderRead{Inner,Outer}Xml", released in libxml2 2.13.0. This commit introduced xmlTextReaderDumpCopy() which silently skips XML_DTD_NODE, breaking readOuterXml() for DOCTYPE nodes.

Since it affects even PHP functions, I don't know what is the proper fix for this. The agent has some suggestions for workarounds on the MediaWiki side, though.

Depending on how much trust we put in the LLM agents, and similarly whatever the upstream projects policy on AI reports etc, it may be worth trying to report it to them.

If it affects PHP functions, because those PHP functions use libxml2 functions, it could still be worth a report there too, for tracking and potential workaround purposes because it can take a long time for fixes for these sorts of things to be widely available.

Regarding workarounds on the MW side, we can certainly look at the suggestions/patches and see about incorporating them and back porting as appropriate. Obviously it depends what is being suggested and therefore whether they’re considered reasonable!

FWIW, my uploads that worked, Ubuntu 24.04 (no big upgrades for a while, obviously) on various PHP versions...

php > echo LIBXML_VERSION;
20914