Page MenuHomePhabricator

Security review new version of the Vega lib
Closed, ResolvedPublic

Description

The new version of the Vega library (3.0) underwent a significant rewrite by University of Washington folks, and needs to have a security review in order to update the graph extension and the Graphoid service. Benefits include much more powerful visualization primitives, better map support, many minor cleanups.

Event Timeline

@Yurik I assume this library will have a similar wrapper to graph2.js that graph currently does? And it will be used both locally and in a node.js service?

@Bawolff correct, and it will be loaded side-by-side on nodejs, and hopefully - in the client-preview. Without side-by-side, preview will only show graphs of one version.

Bawolff set Security to Software security bug.Oct 18 2017, 9:39 AM
Bawolff added a project: acl*security.
Bawolff changed the visibility from "Public (No Login Required)" to "Custom Policy".

Some of the things I'm going to mention also apply to the existing vega 2 graphs

Applying to the current vega 2 wrapper

  • For CORS, in the interest of paranoia, could we set the origin parameter to be * instead of the actual origin. This wasn't available when the wrapper was originally written, but it extra ensures that cross-domain requests won't use cookies.
  • Currently the title parameter is checked to make sure it doesn't have a | (pipe) character in it. We also need to check for \x1f (That is ASCII control character UNIT SEPARATOR) as the api also uses that to separate titles. Alternatively we could validate with mw.Title
  • wikititle:/// is supposed to prevent query paramters from being used, however it could probably be bypassed if they are percent encoded due to T96274 (e.g. https://en.wikipedia.org/wiki/Main_Page%3faction=history%26curid=2120 is interpreted by our servers incorrectly )

For vega 3 library

So mostly this appears good. The same url restrictions that are used with vega2 would of course have to be used here. There are some concerning features though:

  • bind.element can be used to place text boxes in arbitrary parts of the page (with the data readable by vega). This could maybe be used to trick users
  • data.on.source - vega supports capturing various events. Including keyup and keydown, even for arbitrary dom elements outside of the vega drawing. This could potentially be used to make a keylogger in vega.
  • expressions are allowed to access the global screen js object. There's some privacy implications of this, revealing the users screen size. But its minor, so probably ok, not 100% sure.
  • This is a very large library. Well it looks well made, I certainly haven't manually audited all 35,593 lines

With that in mind, I would like to propose a sandboxing approach. How feasible would it be to place the vega output into an iframe like <iframe sandbox="allow-scripts allow-top-navigation" frameborder=0 width=500 height=500 srcdoc="vega graph here"> . I think that would address concerns about vega potentially spying on events for dom nodes unrelated to it, and also make the entire thing much more robust from a security prespective. (I think IE might need to do something different than srcdoc but that work arounds for it exists)

There's one other issue: fill and stroke - appear to support url(http://foo.com/bar.svg#pattern) which may allow loading external resources (in violation of the privacy policy). I haven't managed to fully reproduce this though so I'm unsure. If it was possible to load an external resource that way, that would be bad.

There's one other issue: fill and stroke - appear to support url(http://foo.com/bar.svg#pattern) which may allow loading external resources (in violation of the privacy policy). I haven't managed to fully reproduce this though so I'm unsure. If it was possible to load an external resource that way, that would be bad.

Ok, so I tracked down fill using external resources (Surprisingly difficult to find docs on this). Seems like firefox is the only one who really supports this, and only for resources from the same origin https://web.archive.org/web/20120318020837/https://developer.mozilla.org/web-tech/2008/10/10/svg-external-document-references/ (Yes, there's so little docs on this I had to go to archive.org to find the feature announcement :s)

But basically, if vega is in svg mode (canavas mode is not affected) you can make arbitrary GET requests to the same origin, in firefox. Since this is restricted to the same origin, its much less of a privacy issue. Could still be used as a data-exfiltration method via caching headers, but that's the same as all the other url access stuff.

Also the sandbox approach suggested above would make it not run in the same origin, which in theory should totally disable that feature of svgs (yay!).

Which brings me to another point - if we do the sandbox thing, it will mean all AJAX requests will be CORS, even to the local domain. I think that's acceptable (arguably perhaps even better than "Treat-as-untrusted" although we may still want that header hack in case any browsers don't support the sandbox).

@Bawolff I did a small standalone test with an iframe and strict CSP rules (no-eval). The sample actually sends Vega data from the main window into the frame via the window.postMessage(), but it could be done via the URL param as well. The postmessage passing would be useful for page save preview, whereas the url-param case is good when iframe can load the graph directly via an api call.

Code: https://github.com/nyurik/csp-vega-test
Try it at: http://sophox.org/csp/sandbox/index.html

P.S. Doing it via iframe means that the iframe src must point to some endpoint that will serve the needed html, javascript, and CSS. Which means the core MW or custom API may need to support this usecase. The iframe does not need to load the entire MW JS codebase. At the minimum, it should load:

  • some tiny HTML, which might actually be static
  • some loader that initailizes Vega with the graph data
  • the entire Vega library (vega, vegalite, ...) - minified or not depending on the debug= param

BTW, the same method can be used even for user-submitted JavaScript - e.g. d3 visualizations - iframe sandbox should prevent it from accessing other domains (custom iframe CSP?), and by virtue of sandboxing - perform any other undesirable actions. Moreover, it might even make sense to put the entire content portion into the iframe in the future.

P.S. Doing it via iframe means that the iframe src must point to some endpoint that will serve the needed html, javascript, and CSS. Which means the core MW or custom API may need to support this usecase. The iframe does not need to load the entire MW JS codebase. At the minimum, it should load:

  • some tiny HTML, which might actually be static
  • some loader that initailizes Vega with the graph data
  • the entire Vega library (vega, vegalite, ...) - minified or not depending on the debug= param

BTW, the same method can be used even for user-submitted JavaScript - e.g. d3 visualizations - iframe sandbox should prevent it from accessing other domains (custom iframe CSP?), and by virtue of sandboxing - perform any other undesirable actions. Moreover, it might even make sense to put the entire content portion into the iframe in the future.

Putting a doc in the src parameter is one method but an alternative that might be easier is the srcdoc attribute (some issues on older MSIE but i believe there are work arounds)

BTW, the same method can be used even for user-submitted JavaScript - e.g. d3 visualizations - iframe sandbox should prevent it from accessing other domains (custom iframe CSP?), and by virtue of sandboxing - perform any other undesirable actions. 

Indeed there are possibilities there. Ive sometimes heard people discuss using that approach to support scripted svgs (to add support to flash applet type support to wikimedia). However I wouldnt want this to be our only layer of defense when dealing with potentially malicious content, but more one part of the defense in case another layer breaks.

@Bawolff are there any CSP issues with srcdoc? In my example, parent can have a different CSP rules than iframe. This way you can enable unsafe-eval in the frame, without allowing it in the parent.

@Bawolff, yes, I just checked, the srcdoc attribute does not allow you to relax CSP rules. So if the site blocks unsafe-eval, <iframe srcdoc=" ... <meta content-security-policy=...>"> cannot relax it. Specifying the src= allows it because the CSP comes via the header for the different HTML page.

@Bawolff are there any CSP issues with srcdoc? In my example, parent can have a different CSP rules than iframe. This way you can enable unsafe-eval in the frame, without allowing it in the parent.

Well ideally we would not have unsafe-eval on either...but its not like we're removing unsafe-eval on WMF websites anytime soon.

Bawolff claimed this task.

so I think we've done everything on our end. See my comment above for my comments on the library - please address those things before using vega 3. Let me know if you have any questions.

@Bawolff is there anything in this ticket that is sensitive? There is a discussion about GSOC student tackling the Vega 3 upgrade, and they would need access to this discussion.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".May 8 2019, 3:45 PM

@Bawolff I was re-reading our notes in light of the resumed interest in this project. Assuming we are not yet rolling out the sandbox approach per our discussion, I have began looking closely at the issues you raised:

  • bind.element can be used to place text boxes in arbitrary parts of the page (with the data readable by vega). This could maybe be used to trick users

Essentially this allows any <div> to become a container for input fields. This is very useful for creating interactive graphs like job voyager -- that input field at the bottom could actually be placed somewhere else in the wiki page, with a nice styling around it, so that users can type in a job name and the graph would show just that. We could for now simply throw an error if the vega graph contains any signals.bind.element values, or we could even just delete them -- Vega will place those inputs next to the graph itself (like the example above).

  • data.on.source - vega supports capturing various events. Including keyup and keydown, even for arbitrary dom elements outside of the vega drawing. This could potentially be used to make a keylogger in vega.

How dangerous is this, considering that 1) AFAIK, login happens in a separate window, not while Vega is loaded, and 2) Vega has no access to the outside hosts, thus it cannot transmit that data anywhere. I might of course be missing some other case?

  • expressions are allowed to access the global screen js object. There's some privacy implications of this, revealing the users screen size. But its minor, so probably ok, not 100% sure.

true, but same question - if Vega cannot transmit this info anywhere, is this a concern?

  • This is a very large library. Well it looks well made, I certainly haven't manually audited all 35,593 lines

heh, and it just keeps growing :) Luckily Vega has now been in other high profile products, like Elastic's Kibana, so there are many more eyes on it now doing security auditing.