Page MenuHomePhabricator

XSS via Graph extension
Closed, ResolvedPublicSecurity

Description

Steps to reproduce

  • Start creating a page on a wiki with the Graph extension installed, such as Wikipedia.
  • Enter the following:
<graph>{"signals":[{"name":"a","init":{"expr":"indexof(({indexOf:indexof({indexOf:[].flat.constructor},'c','(0,eval)(c)')}||0),'alert(1)')"}}]}</graph>
  • Preview the page.

Other

  • Tested with Chrome 111.0.5563.147
  • Bug seems to be entirely within vega-expression
  • Could probably trigger on viewing of a malicious page

Public task: T334940: All Graphs broken on Wikimedia wikis (due to security issue T336556)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I pulled the plug now and disabled Graph everywhere: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/909407

Ah wait, this might explain part of T334911, as some maps on en.wp use the graph extension as well.

As parser cache expires and the uses of this extension generally require quite a bit of input, that is now rendered into the HTML as unrecognised wikitext it might be pretty disruptive to articles. Perhaps it is a good idea to have something like a hook, to simply consume the contents of the graph tag ?

As parser cache expires and the uses of this extension generally require quite a bit of input, that is now rendered into the HTML as unrecognised wikitext it might be pretty disruptive to articles. Perhaps it is a good idea to have something like a hook, to simply consume the contents of the graph tag ?

Do you mean something like this?

$wgHooks['ParserFirstCallInit'][] = 'wmfAddGraphTagToHideNowBrokenUses';

function wmfAddGraphTagToHideNowBrokenUses( Parser $parser ) {
	$parser->setHook( 'graph', 'wmfRenderEmptyGraphTag' );
}

function wmfRenderEmptyGraphTag( $input, array $args, Parser $parser, PPFrame $frame ) {
	$parser->addTrackingCategory( 'graph-tracking-category' );
	return '';
}

If so I agree deploying adding that is a good idea.

As parser cache expires and the uses of this extension generally require quite a bit of input, that is now rendered into the HTML as unrecognised wikitext it might be pretty disruptive to articles. Perhaps it is a good idea to have something like a hook, to simply consume the contents of the graph tag ?

Do you mean something like this?

$wgHooks['ParserFirstCallInit'][] = 'wmfAddGraphTagToHideNowBrokenUses';

function wmfAddGraphTagToHideNowBrokenUses( Parser $parser ) {
	$parser->setHook( 'graph', 'wmfRenderEmptyGraphTag' );
}

function wmfRenderEmptyGraphTag( $input, array $args, Parser $parser, PPFrame $frame ) {
	$parser->addTrackingCategory( 'graph-tracking-category' );
	return '';
}

If so I agree deploying adding that is a good idea.

Yup. Maybe for discoverability, it might be wise to have it return '<graph>' ?

Illustration of effect without consuming <graph> tag.

Screenshot 2023-04-18 at 11.50.17.png (2×2 px, 828 KB)

Is it possible to return an interface message or something? So that wikis either customise it "You are supposed to see a graph here, but..." or leave empty.

As a general solution as a bystander I would prefer to finally have an up to date version of Vega and perhaps server side rendering again, at least for non-interactive graphs :P

For the records (SAL, patch):

14:14:54 <+logmsgbot> !log taavi@deploy2002 Started scap: Backport for [[gerrit:909623|Hide raw Graph tags (T334895)]]

Given the massive impact this has (as opposed to Score that was only used in a few pages) I would say there has to be a communication right away, rather than just next monday in the Tech News. People would obviously compain a lot.

There is already a topic in ukwiki tech tavern

There should be a mass message to the technical village pumps, probably?

@Tgr, in the public task @Seddon has promised an announcement

Another thing to consider is: given there are known XSS vectors in vega 2, this might be the first time we get a report, but not the first time this has been found out. We should probably get someone to check all the past revisions of all pages containing graphs for suspiciously-looking patterns.

Vega ships an optional interpreter that can evaluate graph expressions by traversing an AST and performing each operation, rather than relying on runtime code generation. Per https://github.com/vega/vega/pull/3019#issuecomment-749107902, the interpreter mode is not the default because it is 10% slower. Seems like a negligible price to me. This seems like the only sensible option for keeping support for graph expressions but rooting out XSS vectors systematically.

For the records (SAL, patch):

14:14:54 <+logmsgbot> !log taavi@deploy2002 Started scap: Backport for [[gerrit:909623|Hide raw Graph tags (T334895)]]

I uploaded a potential WikimediaMessages change and config change to make this emit nonempty output instead of the current empty output; it could be customized locally by overriding the message used in the MediaWiki namespace and/or styling its class in common.css. Any and all changes to the message contents welcome, and feel free to deploy without me if you think it’s a good idea.

@Tgr also suggested that an additional tracking category could be added, whose description page (*-desc message in WikimediaMessages) could link to the public Phabricator task or otherwise explain the broken graph situation in some more detail. (This would also allow tracking how many pages weren’t re-parsed yet since the last time the extension was disabled or enabled.) How does that sound?

For the records (SAL, patch):

14:14:54 <+logmsgbot> !log taavi@deploy2002 Started scap: Backport for [[gerrit:909623|Hide raw Graph tags (T334895)]]

I uploaded a potential WikimediaMessages change and config change to make this emit nonempty output instead of the current empty output; it could be customized locally by overriding the message used in the MediaWiki namespace and/or styling its class in common.css. Any and all changes to the message contents welcome, and feel free to deploy without me if you think it’s a good idea.

Adding a message for projects to customize is fine to be, but I wonder if it should be empty by default? At least the uses I've stumbled upon (on enwiki, that is), they've been placed in a way where them suddently disappearing doesn't cause lot of confusion, so adding a message might cause more disruption than making it empty.

@Tgr also suggested that an additional tracking category could be added, whose description page (*-desc message in WikimediaMessages) could link to the public Phabricator task or otherwise explain the broken graph situation in some more detail. (This would also allow tracking how many pages weren’t re-parsed yet since the last time the extension was disabled or enabled.) How does that sound?

+1

Adding a message for projects to customize is fine to be, but I wonder if it should be empty by default? At least the uses I've stumbled upon (on enwiki, that is), they've been placed in a way where them suddently disappearing doesn't cause lot of confusion, so adding a message might cause more disruption than making it empty.

Also fine with that – I know very little about how Graph is used on wikis, I just found the empty default confusing when I was testing it locally (but probably in a way that’s not at all representative of real usage).

@Tgr also suggested that an additional tracking category could be added, whose description page (*-desc message in WikimediaMessages) could link to the public Phabricator task or otherwise explain the broken graph situation in some more detail. (This would also allow tracking how many pages weren’t re-parsed yet since the last time the extension was disabled or enabled.) How does that sound?

+1

WikimediaMessages change, config change.

Subscribing a bunch of Editing-team folks per the list of maintainers @taavi linked.

Thanks, we are the official maintainers for historical reasons, although we have only ever worked on the VE integration of this extension (and that was a while ago), so we have no particular insights, requirements or expertise on the matter. If at some point there is no one else able or willing to work on a fix, that's what "official maintainers" are for, so we will keep watching this space.

I also realize tracking categories need to be registered for them to show up at Special:TrackingCategories. Not sure if the best way to do that here is via $wgTrackingCategories (which is marked as deprecated) or via the ext.json attribute in WMMessages.

Happy to merge those once the parent patch adding a 'this is disabled' message defaults to an empty string (unless we have consensus here to add a message, of course).

+1 to have the default empty.

I also realize tracking categories need to be registered for them to show up at Special:TrackingCategories. Not sure if the best way to do that here is via $wgTrackingCategories (which is marked as deprecated) or via the ext.json attribute in WMMessages.

Good catch, thanks. Looking at TrackingCategories::getTrackingCategories(), I think $wgTrackingCategories is our best bet for now, even if it’s deprecated.

I think it should be fairly straightforward/easy to just use the latest Vega v5 with the disabled unsafe-eval, see https://github.com/vega/vega/blob/main/docs/usage/interpreter.md and let the community fix the broken graphs. It offers a clear and fast path forward, and solves a lot of bit rot issues.

Note that it also allows CSP to be set to disable unsafe eval.

+1 to have the default empty.

For the records (SAL, patch):

14:14:54 <+logmsgbot> !log taavi@deploy2002 Started scap: Backport for [[gerrit:909623|Hide raw Graph tags (T334895)]]

I uploaded a potential WikimediaMessages change and config change to make this emit nonempty output instead of the current empty output; it could be customized locally by overriding the message used in the MediaWiki namespace and/or styling its class in common.css. Any and all changes to the message contents welcome, and feel free to deploy without me if you think it’s a good idea.

Adding a message for projects to customize is fine to be, but I wonder if it should be empty by default? At least the uses I've stumbled upon (on enwiki, that is), they've been placed in a way where them suddently disappearing doesn't cause lot of confusion, so adding a message might cause more disruption than making it empty.

Why should it be empty by default? It's much better to show something instead of an empty rectangle, which is what most languages will get now.

Evidently, the English Wikipedia already put something there locally, so its specific case is moot. For other languages, some default text is better than nothing.

In addition to fixing the underlying issue, i think this bug shows that vega really should be rendered into an iframed sandbox.

In addition to fixing the underlying issue, i think this bug shows that vega really should be rendered into an iframed sandbox.

For anyone interested in that, I know Brion has looked into iframing content. And I once played around with attempting to move TMH's iframeoutput into core, as well as providing an iframe.php endpoint.

The difficult part is providing an endpoint that knows about your content fragment within the page. This issue is something that we have also seen with the Karthotherian map snapshot renderings (and with graphoid of course). We could really benefit if we designed a more encompassing and reusable architecture for referring to pieces of content within a page, at a certain time/revision, for a user with some context (language, variant etc [ParserOptions]).

This would be a pretty significant undertaking, but one I suspect is increasingly more required, looking towards the future (parsoid, discussion tools, etc) and it might unblock many more non-wikitext-content fragment problems especially in the area of multimedia content. Better to design it once than 5 times retro actively.

Additionally iframe'd content could be used to allow other websites to embed some of our content (wiki articles, commons videos, graphs, maps, 3d models, etc), and to serve up that content through dedicated servers and separate caching strategies and rate limits perhaps. If done well, it can open up interesting opportunities. </pitch>

No, that work has no plans to do "iframing" (user-land isolation) per se as no-one brought that up as a requirement in, indeed, the two years it's been talked about. Hopefully the work will be undertaken soon, but we'll see. (Off-topic for this task.)

For anyone interested in that, I know Brion has looked into iframing content.

That was T169027: Provide iframe sandboxing for rich-media extensions (defense in depth). There is also a more specific task, T222807: Sandbox Graph extension into an iframe.

The difficult part is providing an endpoint that knows about your content fragment within the page.

The easy way would be to use iframe srcdoc (support). Not sure about the performance implications (I guess that would be a question for iframe in general).

Additionally iframe'd content could be used to allow other websites to embed some of our content (wiki articles, commons videos, graphs, maps, 3d models, etc), and to serve up that content through dedicated servers and separate caching strategies and rate limits perhaps.

That's T31242: Extension to embed offsite media via oEmbed (video, photos, rich media). I don't think there is much overlap in terms of what it would take to implement it.

An iframe.php endpoint could be workable; given a revision ID it would fetch the page content from the parser cache. The original extension could stash whatever it needed in order to do the rendering (say, the raw contents of the extension tag) into ParserOutput::getExtensionData() and then iframe.php would use the revision ID to retrieve it from the ParserOutput's extension data and do the render.

Despite @Jdforrester-WMF's protestations, I think the asynchronous fragment proposals are actually a reasonably-good fit to incorporate the security aspect here. The fundamental idea (from the parsing side) is that a page is comprised of a DOM tree with "holes", and fragments are generated asynchronously to fill those holes. Implicit in that proposal is the need to name, cache, and invalidate those fragments -- and one you have a way to name them you have the basis for generating an <iframe> reference to a URL which fetches *just* the fragment contents, not the entire composed HTML.

But agree that this is probably off-topic for this bug, and perhaps more relevant to server-side rendering, aka T272530.

Vega ships an optional interpreter that can evaluate graph expressions by traversing an AST and performing each operation, rather than relying on runtime code generation. Per https://github.com/vega/vega/pull/3019#issuecomment-749107902, the interpreter mode is not the default because it is 10% slower. Seems like a negligible price to me. This seems like the only sensible option for keeping support for graph expressions but rooting out XSS vectors systematically.

I think it should be fairly straightforward/easy to just use the latest Vega v5 with the disabled unsafe-eval, see https://github.com/vega/vega/blob/main/docs/usage/interpreter.md and let the community fix the broken graphs. It offers a clear and fast path forward, and solves a lot of bit rot issues.

Note that it also allows CSP to be set to disable unsafe eval.

These both seem good to me. Both options seem to require 5+ so I've added that as a subtask, mostly on T223026.

Hey all - is there any reason to keep this particular task private? It's not currently tagged as PermanentlyPrivate and from what I can tell, it probably shouldn't be. The public task is at T334940 and the most likely paths forward are being actively discussed within T336595. The payload in the task description was mitigated by the migration of Vega to v5.25.0 (T165118), but obviously there are new issues regarding current versions of Vega, which have been reported upstream.

Making this one public sounds fine to me, but T336556 should stay private for now.

sbassett assigned this task to Jdlrobson.
sbassett moved this task from Untriaged to Needs Discussion / Investigation on the Editing-team board.
sbassett moved this task from In Progress to Our Part Is Done on the Security-Team board.
sbassett changed Author Affiliation from N/A to Wikimedia Communities.
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed the edit policy from "Custom Policy" to "All Users".
sbassett changed Risk Rating from N/A to High.