I did verify that this actually works on test.wikipedia.org (In a page preview). The underlying issue is quite complex and a combination of a lot of factors
Steps to reproduce:
- Set your skin to a skin that is not vector-2022. The exploit relies on the table of contents being in the body of the article. legacy vector or timeless should work fine. (Note: in the context of an exploit, an attacker can force the choice of skin with the useskin url parameter)
- Create a page that has the page language set to 'sr'.
- The exploit will work for any language that supports language converter (You can do this by setting $wgLanguageCode = 'sr'; alternatively set $wgPageLanguageUseDB = true; (This is the case on some WMF wikis) you can use Special:PageLanguage to adjust the language of the page. Some extensions (Translate) might also adjust page language.
- Install Charts and JsonConfig extensions
- Create a chart where the text Date appears somewhere in it.
- e.g. Copy https://test-commons.wikimedia.org/wiki/Data:Chart_Example_Data.tab and https://test-commons.wikimedia.org/wiki/Data:Example_Line_Chart.chart into your wiki, assuming you have JsonConfig set up to use your local wiki as a repo
- Install TemplateStyles extension
- Create a template styles sanitized-css page named Template:test/xss.css containing the following text:
@media ( width <meta\ ) { [\/property="mw:PageProp/toc"] > f {} } @media ( width <xss\ ) { autofocus foo[\/onfocus="alert('xss on' + origin)"][\/tabindex="1"] > baz { } }
- In your page that has the page language set to sr, edit it to have the following text:
-{H|Date=>sr:<templatestyles src="Template:test/xss.css"/>;Date=>sr-ec:<templatestyles src="Template:test/xss.css"/>;Date=>sr-el:<templatestyles src="Template:test/xss.css"/>}-
{{#chart:Example Line Chart.chart}}
==foo==
==bar==
==baz==- View the page. You should see an alert box pop-up
How this works
Charts extension uses a parser function in isHtml mode. This uses a general strip item type (as opposed to a Nowiki strip marker). This means language conversion (As well as doBlockLevels and the less relavent final internal links replacement) still happens to it. Language conversion allows us to put markup into the chart. We are still limited to what would normally be valid in a normal wikipage.
TemplateStyles seems to assume that HTML escaping rules are in place. However if a <style> tag is in an SVG (or MathML) namespace, then the rules are slightly different. In normal html, contents of <style> tag is plain text until a </style> tag is encountered. In foreign content namespaces,tags inside the <style> tag become children.
So in principle, <style><img src=x onerror=alert(1)></style> is fine in HTML but an xss in an SVG context. Note that children of <style> tags in svg are generally not rendered (which in svg prevent events from being triggered on them). However per https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign there are certain foreign content breaking tags, which change the mode back to html (<img> is one of them, but so are things like <div>).
In TemplateStyles we can make a tag like thing using @media queries
@media ( width <img ) {}However, minification turns that into @media(width <img){}. Keep in mind, in HTML a tag name is only ended by (ascii) whitespace, all other characters are valid parts of a tag name. If minification was disabled, this would be a lot easier to exploit. However with minification, it seems currently its impossible in template styles to make the opening tag without a weird symbol attached. These still get turned into elements. However because they are children of a non-rendered SVG element, they themselves are outside of the render tree and don't get any events.
This is where @media ( width <meta\ ) { [\/property="mw:PageProp/toc"] > f {} } . In the parser, the TOC is set as a marker <meta property="mw:PageProp/toc" /> which is then replaced later.
However the regex to replace it is fairly permissive:
/** * Permissive regexp matching TOC_PLACEHOLDER. This allows for some * minor modifications to the placeholder to be made by extensions * without breaking the TOC (T317857); note also that Parsoid's version * of the placeholder might include additional attributes. * @var string */ private const TOC_PLACEHOLDER_REGEX = '/<meta\\b[^>]*\\bproperty\\s*=\\s*"mw:PageProp\\/toc"[^>]*>/';
In particular, this is quite a bit more permissive than the HTML spec, as it only checks that meta is followed by a word break (\b). Symbols are generally considered word break characters to PCRE, so @media(width<meta\ ){[\/property="mw:PageProp/toc"]>f{}} matches this regex even though HTML would not consider that a <meta> tag. Once its matches it is replaced with the TOC which contains a bunch of foreign content breaker elements.
Thus we have the <style> in the SVG, which has a TOC in it which switches the mode back to HTML, then we have our <xss\ > tag, which as an HTML tag, can now get focus and run js events.
Fix recommendation
For the immediate fix, I think Charts extension should armour the output of its parser function with $parser->getStripState()->addNoWiki. (This would also prevent any newlines in the svg from messing things up. Since the language converter only applies to the non-script fallback right now, i imagine you probably don't want it generally so disabling it would be a good thing all and all in terms of consistency)
However, it seems like there are a lot of foot-guns around here, so maybe some of MW core/TemplateStyles could also use some improvements. In many ways, this feels more like a vulnerability in other components that just happened to be triggered by Charts instead of one in Charts itself.
- TOC_PLACEHOLDER_REGEX should be more strict
- Potentially the check in TemplateStyles:
if ( preg_match( '!</style!i', $value ) ) {
$value = '';
$status->fatal( 'templatestyles-end-tag-injection' );
}should be improved to '!<[a-zA-Z_]' (DomPurify does something like this to prevent mXSS).
- css-sanitizer should consider a <-delimiter followed by an identifier (or maybe any token type) to be seperable tokens (in the sense of Token::separate()), so < will be followed by whitespace instead of being right up next to a letter (and thus not a valid tag)
- LanguageConverter could be maybe ported to RemexHTML, and then not be run on foreign namespaces
- As an aside, at a glance, it looks like RemexHTML doesn't implement the foreign content breaker rules from the HTML spec (i might be misunderstanding). However i don't think that matters in any way, just something i noticed.
- Maybe we should also be more careful with BlockLevelPasses. Could be dangerous in the context of svgs, not sure (maybe not)
- If MediaWiki had a CSP policy that prevented unsafe-inline, this type of attack would be prevented
- if the charts extension used iframe sandboxing, this attack would not work