Page MenuHomePhabricator

XML-Parser fails on <!ATTLIST bar id ID #REQUIRED>
Open, LowPublicBUG REPORT

Description

Steps to Reproduce:
add

<!DOCTYPE svg [
<!ATTLIST bar id ID #REQUIRED>
]>

to https://commons.wikimedia.org/wiki/File:W3C_SVG_11_TestSuite_text-tref-02-b.svg

Actual Results:

API Error verification-error:
The XML in the uploaded file could not be parsed.

Expected Results:
It is a valid file from the official W3C-Test-suite: https://www.w3.org/Graphics/SVG/Test/20110816/ and should be uploadable

Related to T279239

Event Timeline

In T151735, ATTLIST could be used for script injection attacks by setting default attribute values.

SVG script injection
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE x [
<!ATTLIST image x:href CDATA "xx:x"
                onerror CDATA "alert('XSSED = '+document.domain)"
                onload CDATA "alert('XSSED = '+document.domain)">
]>
<svg xmlns:h="http://www.w3.org/1999/xhtml" xmlns:x="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg">
<image />
</svg>

The onerror and onload attributes will execute their values as JavaScript. The image element will either load successfully or err, so one of the on attributes will execute.

I think the ATTLIST evaded the PHP checking for the on attributes.

As a result, MW whitelisted the set of allowed DTDs to prevent someone using a malicious DTD and restricted what could be in the local subset of a DTD. The local subset is permitted to have some ENTITY defs (but not parameter entities). Only one ATTLIST (which installs the xlink namespace) is permitted.

The essential prohibition of ATTLIST meant complicated security scanning was not needed. it was a reasonable decision.

The ATTLIST filter could whitelist the attribute definition pattern attr { CDATA | ID | IDREF | IDREFS | NMTOKEN | NMTOKENS } #REQUIRED because it can inject no text.

Simple enumerated values could also be permitted: color (red, blue, maroon) #REQUIRED.

In addition to #REQUIRED, #IMPLIED may be acceptable because no default value is supposed to follow either of those keywords. If the #FIXED keyword is used, then the fixed value must follow.

But this also gets into how different XML parsers work. Some may assign a default value for an enumeration.

More significantly, DTDs are little used now, so I do not see much value in adding further support. I've used DTD local subsets in SVG, so I was sorry to see them blocked by MW, but their use was not essential.

Perhaps the MW now uses a better XML parser so the use of ATTLIST will no longer evade the existing script detection mechanisms.

Trying to upload https://commons.m.wikimedia.org/wiki/File:GnuPG-Logo.svg using API I am getting the following error:

{
  "error": {
    "code": "verification-error",
    "info": "The XML in the uploaded file could not be parsed.",
    "details": [
      "uploadinvalidxml"
    ],
    "*": "See [...] for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes."
  }
}

I do not have any other issues with file uploads using API.

Is it this bug or is that a separate bug?