⚓ T93126 Perform a security review of graphoid

	Subject	Repo	Branch	Lines +/-
	Security fixes per review	mediawiki/services/graphoid	master	+4 -1

Status	Assigned	Task
Resolved	Yurik	T87406 Deploy Graph Extension to all Wikipedias
Resolved	Yurik	T73227 Graph extension does not have a fallback for non-JavaScript browsers
Resolved	akosiaris	T90487 Deploy graphoid service into production
Resolved	• csteipp	T93126 Perform a security review of graphoid

• GWicke created this task.Mar 18 2015, 8:02 PM

• GWicke assigned this task to • csteipp.

• GWicke raised the priority of this task from to High.

• GWicke updated the task description. (Show Details)

• GWicke added projects: Services, deprecated-security-team-reviews.

• GWicke added subscribers: • mobrovac, • GWicke, Yurik.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 18 2015, 8:02 PM

• mobrovac added projects: Blocked-on-Security, Scrum-of-Scrums.Mar 24 2015, 1:14 PM

• mobrovac set Security to None.

Liuxinyu970226 subscribed.Mar 25 2015, 8:45 AM

@mobrovac, where can I get all of the code for both mediawiki and the service itself for this? And is there a test instance setup yet (I think it's in beta)?

@csteipp The code itself is in the mediawiki/services/graphoid repo on gerrit. Graphoid is also present in MW-Vagrant, albeit its config there is outdated. It should work with the default provided one, though. There seems to be a running instance in labs - graphoid.wmflabs.org , but I couldn't deduce which project/instance it belongs to. @Yurik should know more.

@Yurik, could you please update the configs in MW-Vagrant and help @csteipp with setting it up and/or accessing a running instance (running the latest code)?

Yurik added a project: MediaWiki-extensions-Graph.Apr 6 2015, 6:21 AM

@csteipp, Per T94013, graphoid was updated to the latest tepmlate code. It is "semi-live" at graph3 wmflabs instance (part of the API proj), serving http://graphoid.wmflabs.org/wiki/Main_Page -- note that PHP code was hacked to produce two versions of each graph - the JavaScript one (first image), and corresponding PNG image from graphoid (second image).

The actual graphoid code is not set up as a service - it should be started from the shell with

$ cd /vagrant/graphoid
$ node server.js

@mobrovac, is there a doc/template on how to setup *oids in vagrant now?

Yurik mentioned this in T93881: Improve Graphoid code.Apr 6 2015, 6:29 AM

In T93126#1181423, @Yurik wrote:

@csteipp, Per T94013, graphoid was updated to the latest tepmlate code. It is "semi-live" at graph3 wmflabs instance (part of the API proj), serving http://graphoid.wmflabs.org/wiki/Main_Page -- note that PHP code was hacked to produce two versions of each graph - the JavaScript one (first image), and corresponding PNG image from graphoid (second image).

@Yurik, I'm seeing calls from http://graphoid.wmflabs.org/wiki/Main_Page to stuff like http://graphoid.wmflabs.org:19000/graph3.wmflabs.org/v1/Main_Page/50/342d2e0388da6f4b5db15e7571bd877386b05d66.png, which currently is failing.

Can you point me to the repo for that graphoid service running on 19000? And where in the extension (Ex:Graph, right?) where the calls to the service are being added? Or does the current extension not have calls to graphoid yet, in which case can you send be the diff of whatever you hacked in, so I can get a rough idea of how you're planning to set it up?

@csteipp - code is here, and you need to run the service by hand (its not enabled) on graph3 wmflabs instance (part of the API proj):

In T93126#1181423, @Yurik wrote:
The actual graphoid code is not set up as a service - it should be started from the shell with
$ cd /vagrant/graphoid
$ node server.js

Update:

I ran the nohup npm start as yurik on graph3 labs instance, so it works until the instance is rebooted.
settings contain this line:

$wgGraphImgServiceUrl = "//graphoid.wmflabs.org:19000/%1\$s/v1/%2\$s/%3\$s/%4\$s.png";

because of it, the PNG image replaces javascript canvas in case of <noscript>
In order to test it on graph3, I have hacked it to always include the PNG tag right after canvas tag - that's why it shows double graphs there.

In T93126#1184441, @Yurik wrote:

... on graph3 wmflabs instance (part of the API proj):

There's no such project in labs.

I ran the nohup npm start as yurik on graph3 labs instance, so it works until the instance is rebooted.

As you've updated the code, it now contains a script to generate init scripts. Simply run on the labs instance:

./scripts/gen-init-scripts.rb

It will place the generated scripts in dist/init-scripts. Note that it assumes the service resides in /srv/deployment/service-name/ with a configuration file in /etc/service-name/, so you might need to tweak it a little bit.

settings contain this line:
$wgGraphImgServiceUrl = "//graphoid.wmflabs.org:19000/%1\$s/v1/%2\$s/%3\$s/%4\$s.png";
because of it, the PNG image replaces javascript canvas in case of <noscript>

In order to test it on graph3, I have hacked it to always include the PNG tag right after canvas tag - that's why it shows double graphs there.

note that PHP code was hacked to produce two versions of each graph - the JavaScript one (first image), and corresponding PNG image from graphoid (second image).

These hacks and tweaks will need to disappear before we can declare the service ready for production.

is there a doc/template on how to setup *oids in vagrant now?

No, not yet. The code you've got there is ok, you just need to update it, e.g. puppet rules and templates related to the configuration file name, format and location.

In T93126#1185340, @mobrovac wrote:

In T93126#1184441, @Yurik wrote:

... on graph3 wmflabs instance (part of the API proj):

There's no such project in labs.

Mediawiki-api - graph3 instance, static IP 208.80.155.201 graphoid

I ran the nohup npm start as yurik on graph3 labs instance, so it works until the instance is rebooted.

As you've updated the code, it now contains a script to generate init scripts. Simply run on the labs instance:
./scripts/gen-init-scripts.rb
It will place the generated scripts in dist/init-scripts. Note that it assumes the service resides in /srv/deployment/service-name/ with a configuration file in /etc/service-name/, so you might need to tweak it a little bit.
settings contain this line:
$wgGraphImgServiceUrl = "//graphoid.wmflabs.org:19000/%1\$s/v1/%2\$s/%3\$s/%4\$s.png";
because of it, the PNG image replaces javascript canvas in case of <noscript>

In order to test it on graph3, I have hacked it to always include the PNG tag right after canvas tag - that's why it shows double graphs there.
note that PHP code was hacked to produce two versions of each graph - the JavaScript one (first image), and corresponding PNG image from graphoid (second image).

These hacks and tweaks will need to disappear before we can declare the service ready for production.

This is a hack inside Graph extension on betalabs to see how well the service is working, they are not part of the service. In production, wgGraphImgServiceUrl will be set to the same value, but with a different host/port.

is there a doc/template on how to setup *oids in vagrant now?

No, not yet. The code you've got there is ok, you just need to update it, e.g. puppet rules and templates related to the configuration file name, format and location.

This is a hack inside Graph extension on betalabs to see how well the service is working, they are not part of the service. In production, wgGraphImgServiceUrl will be set to the same value, but with a different host/port.

I see. That prompted me to see how the mw-vagrant role sets that parameter, and I see it doesn't at all. Please update the vagrant role to be up-to-date and functional (the graph extension should be included and its settings adjusted).

In T93126#1186953, @mobrovac wrote:

I see. That prompted me to see how the mw-vagrant role sets that parameter, and I see it doesn't at all. Please update the vagrant role to be up-to-date and functional (the graph extension should be included and its settings adjusted).

I don't think it would be possible - vagrant does not know the labs static IP's external DNS name. I can only set it up as "http://localhost:19000/...", and still keep that override in place to change the host.

Moving the discussion regarding Vagrant to T95399 so that we don't clutter the security review :)

@csteipp Need more info/help for doing the review?

Just a couple things so far,

A service like this should be setting the http headers,

X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff 
X-Frame-Options: DENY
Content-Security-Policy: default-src 'none' ;
X-Content-Security-Policy: default-src 'none' ;

(although might need to be "default-src 'self' ;", it would just need some testing.)

For the main route, it would also be a nice hardening to set

Content-Disposition: attachment; filename="whatever.png"

defaultProtocol should be https in all the examples, and in production. There should be a sanity check in the code that it's not "file:" too, just so there's no chance of opening a local file.

Possibly hard code a check that $wgGraphImgServiceUrl is either https or proto-relative?

Still looking into Vega's rendering code... but looks ok so far.

@csteipp, do you see any reasons for not setting those headers in all RB responses? We could basically set them unconditionally for anything that's proxied throug RB and/or service Varnishes.

Correct me if I got this wrong, but it looks like it's using https://github.com/Automattic/node-canvas for the actual rendering, which uses cairo for PNG generation. Most of the recent issues in cairo have been DoS'es, but we should assume it will have issues in the future.

I'm assuming this will run in production as a "graphoid" user, who will have limited rights on the system. And is this running on it's own hardware, or with other services?

Ops (I'm assuming) will keep the OS libraries like cairo up to date. Between Yuri and Services, who is going to be making sure we get security patches for vega and its dependencies, and making sure that those get tested and deployed in a reasonable amount of time?

Thank you for the review @csteipp! Very helpful.

Regarding CSP, probably something like this would work:

Content-Security-Policy: default-src 'self'; child-src 'none'; object-src 'none';

Also, adding X-WebKit-CSP with the same content might be a good idea.

In T93126#1190617, @GWicke wrote:

@csteipp, do you see any reasons for not setting those headers in all RB responses? We could basically set them unconditionally for anything that's proxied throug RB and/or service Varnishes.

@GWicke, is graphoid being proxied through restbase? If so, it wouldn't hurt to have restbase add most of those headers.

In general, I think the service template should set those headers (except content-disposition) by default, so when it's running in labs (like yurik has it setup right now), it's sending those headers. The service author can then disable individual ones if their service needs it, e.g., if they're actually producing html that rendered in an iframe that loads scripts from some other domain, and they can justify why they're doing that.

In T93126#1191076, @mobrovac wrote:
Thank you for the review @csteipp! Very helpful.

Regarding CSP, probably something like this would work:
Content-Security-Policy: default-src 'self'; child-src 'none'; object-src 'none';

I'm fine with that. It's a big step in the right direction, if we want to make that the default our services send.

Also, adding X-WebKit-CSP with the same content might be a good idea.

Do we have much old Chrome traffic? I'm not apposed to adding it, just didn't want to push my luck convincing us to add in more stuff.

In T93126#1191059, @csteipp wrote:

Correct me if I got this wrong, but it looks like it's using https://github.com/Automattic/node-canvas for the actual rendering, which uses cairo for PNG generation. Most of the recent issues in cairo have been DoS'es, but we should assume it will have issues in the future.

Yep, that's module it's using.

I'm assuming this will run in production as a "graphoid" user, who will have limited rights on the system. And is this running on it's own hardware, or with other services?

Almost no rights, to be precise. The service will run on the SCA cluster in eqiad, given that the predicted traffic is low. We already have Citoid, Mathoid and Zotero running there.

Ops (I'm assuming) will keep the OS libraries like cairo up to date. Between Yuri and Services, who is going to be making sure we get security patches for vega and its dependencies, and making sure that those get tested and deployed in a reasonable amount of time?

Security patches related to cairo will be delivered to us via the normal Debian security channels. As for the service and its node dependencies, @Yurik will keep that in check.

In T93126#1191087, @csteipp wrote:

In T93126#1190617, @GWicke wrote:

@csteipp, do you see any reasons for not setting those headers in all RB responses? We could basically set them unconditionally for anything that's proxied throug RB and/or service Varnishes.

@GWicke, is graphoid being proxied through restbase? If so, it wouldn't hurt to have restbase add most of those headers.

Not right now, but probably will, cf. T95402 . In any case, RESTBase should send them out, cf. T95443

In general, I think the service template should set those headers (except content-disposition) by default, so when it's running in labs (like yurik has it setup right now), it's sending those headers. The service author can then disable individual ones if their service needs it, e.g., if they're actually producing html that rendered in an iframe that loads scripts from some other domain, and they can justify why they're doing that.

Yup, +1 . Will change that in the template to enforce the headers.

Change 202917 had a related patch set uploaded (by Yurik):
Security fixes per review

https://gerrit.wikimedia.org/r/202917

gerritbot added a project: Patch-For-Review.Apr 8 2015, 9:00 PM

@csteipp, addressed the graphoid specific issues - https by default, extra check for protocol, added Content Disposition header. I will let template service update the other headers and will merge it.

As for $wgGraphImgServiceUrl (part of graph extension code, so technically a separate codebase) - do we really want to hardcode a check for the setting like that?

@csteipp, I have updated http://graphoid.wmflabs.org to the above security patch, but now it seems like Content Disposition gets in the way instead of helping. We do not set it on uploads, and this allows anyone to easily share an image link. With CD, the graph link starts downloading image automatically without confirmation - a very bad from user perspective.

@Yurik, if Content-Disposition gets in the way, then don't worry about setting it.

@csteipp, ok, removed

Change 202917 merged by Yurik:
Security fixes per review

https://gerrit.wikimedia.org/r/202917

Yurik mentioned this in rGGRA7d583cb65bb9: Security fixes per review.Apr 8 2015, 9:33 PM

@csteipp, I think it makes sense to proxy graphoid.

I was just looking into adding all of those headers by default in restbase, but noticed that Content-Security-Policy in particular will need a good amount of tweaking to allow loading of styles and media from our servers. Should we instead start with default-src 'none'; media-src '*'; img-src '*'; style-src '*'? Rationale is: start with locked-down default, then open up things we need. Of course, locking down the domain more would be nice (and very much possible), but it does add complexity and needs to be done right.

The spec also only describes CSP headers as applying to HTML and SVG documents, so it would seem that setting them for other explicit content types like application/json would not have any effect.

See https://github.com/wikimedia/restbase/pull/228 for the PR & T95443: Make sure we correctly set CSP & other security headers in RB for the corresponding task.

@GWicke, what functionality would graphoid get when proxied through restbase? Unless there is some useful functionality/stability gain, I feel that multiple layers could affect stability and complexity of the system, and increase latency. Plus, this builds extra component inter-dependencies which we were trying to remove with SOA.

In T93126#1193061, @GWicke wrote:

@csteipp, I think it makes sense to proxy graphoid.

I was just looking into adding all of those headers by default in restbase, but noticed that Content-Security-Policy in particular will need a good amount of tweaking to allow loading of styles and media from our servers. Should we instead start with default-src 'none'; media-src '*'; img-src '*'; style-src '*'? Rationale is: start with locked-down default, then open up things we need. Of course, locking down the domain more would be nice (and very much possible), but it does add complexity and needs to be done right.

How about setting default-src to the list of all of our domains right away ? Could avoid us annoyances while still keeping us safe.

The spec also only describes CSP headers as applying to HTML and SVG documents, so it would seem that setting them for other explicit content types like application/json would not have any effect.

From what I gather, the reason for that is that these headers are mostly meant to instruct browsers on first / principal page load. I am not sure whether these headers make any difference if they're received asynchronously (via XHR, e.g.)

In T93126#1193390, @mobrovac wrote:

In T93126#1193061, @GWicke wrote:

The spec also only describes CSP headers as applying to HTML and SVG documents, so it would seem that setting them for other explicit content types like application/json would not have any effect.

From what I gather, the reason for that is that these headers are mostly meant to instruct browsers on first / principal page load. I am not sure whether these headers make any difference if they're received asynchronously (via XHR, e.g.)

Right, they should only apply if the content returned from the service gets sniffed by the browser to html, which is why I think we really want to set them to 'none' by default, unless we're planning to return content that is supposed to be interpreted by the browser as html? As in, we iframe some content from restbase, or send the browser directly to a restbase url (obviously the frontend is going to parse the restbase return into html and add it to the page content, but CSP should have no effect in that case)

We return some content as HTML, and want it to be interpreted as HTML, including image / media loading and (safe) styles, but excluding scripts and objects.

In the patch, I return the CSP headers for anything that's not explicitly application/json.

In T93126#1193090, @Yurik wrote:

@GWicke, what functionality would graphoid get when proxied through restbase?

Some things it can provide:

discoverability in the REST API with docs etc
security headers
authentication for private wikis
metrics for backend requests, including error metrics
caching / storage

In T93126#1195967, @GWicke wrote:

discoverability in the REST API with docs etc

Agree, its good to have a documentation single go-to point, but we can do it via other means than to proxy all requests, for example register service with the restbase's help system and have a well known URL to get service capabilities info.

security headers

Shouldn't this be part of the service template? Each service could potentially change those, so it will be a nightmare to manage at a central location. Also, Varnish can do this just as well.

authentication for private wikis

In the future, Graphoid will need to call private wikis (getting page props via api), and would need to passthrough current user's credentials (@csteipp, is this secure?), so there is a potential use for this, but not for public wikis. Currently Vega does not support this, see T90526.

metrics for backend requests, including error metrics

Graphoid does its own metrics, including internal counters/timers. Not sure we want to duplicate that. Besides, this should be part of the service template, not proxy, as service might need to change/extend it.

caching / storage

Are you replacing Varnish? Because if Graphoid returns images, Varnish is perfect for caching those, and we have a well established way to invalidate cache, set caching time, and manually reset things. Is restbase replacing all that? Otherwise we are dealing with two caching systems, each needs to be invalidated, etc.

So unless you are replacing Varnish, restbase might provide value for private wikis implementation, but I don't understand the reason to complicate it otherwise.

The PR for setting up CSP headers in service-template-node is available here.

Can we thus consider this task resolved, @csteipp ?

Also FTR, @Yurik and I discussed the pros / cons of proxying through RB on IRC. We agreed that simple proxying (without caching / storage) would be useful & keep things simple.

Ricordisamoa subscribed.Apr 16 2015, 3:00 PM

Yurik moved this task from Backlog to Graphoid on the MediaWiki-extensions-Graph board.Apr 16 2015, 4:03 PM

@csteipp, @Yurik says that the service only exposes PNG images, so we won't have to deal with SVGs after all.

In T93126#1213485, @GWicke wrote:

@csteipp, @Yurik says that the service only exposes PNG images, so we won't have to deal with SVGs after all.

If SVGs are not supported, Graphoid won't be able to effectively replace manually-generated charts.
Using Vega+D3 to render PNGs is a crippled system.

@Ricordisamoa, afaik the service provides a server-side fall-back for clients that don't support client-side canvas drawing.

Ricordisamoa added a project: Graphoid.Apr 16 2015, 9:08 PM

Yurik removed a project: MediaWiki-extensions-Graph.Apr 16 2015, 9:19 PM

In T93126#1213485, @GWicke wrote:

@csteipp, @Yurik says that the service only exposes PNG images, so we won't have to deal with SVGs after all.

I hope you're wrong :)
T96309: Make Graphoid return SVGs

@Ricordisamoa, my immediate goal is to serve PNGs to non-JS browsers. Second goal - deliver PNGs to all anonymous users (they are much faster than sending all the data and rendering JS code to the client). And afterwards, we can switch PNGs to SVGs for js-supporting browsers (since vega already supports it on the server side). We would than have to implement SVG sanitization just to be on the safe side.

In T93126#1214321, @Yurik wrote:

@Ricordisamoa, my immediate goal is to serve PNGs to non-JS browsers. Second goal - deliver PNGs to all anonymous users (they are much faster than sending all the data and rendering JS code to the client). And afterwards, we can switch PNGs to SVGs for js-supporting browsers (since vega already supports it on the server side). We would than have to implement SVG sanitization just to be on the safe side.

Thank you. But why send SVGs to js-supporting browsers only?

@Ricordisamoa @Yurik, please move the conversation about SVGs to T96309, it's more pertinent there than here, given that in the first deployed version of Graphoid there will be no SVG support. Thanks :)