Page MenuHomePhabricator

Modern Event Platform: Stream Intake Service: EventGate security review
Closed, ResolvedPublic

Description

I've filled out the security review form (below) as it relates to the EventGate codebase and service(s), but what we are mostly looking for is a security review of our usage of AJV:

We are proceeding with a new nodejs based stream intake service (EventGate) in T206815. We are planning to use Ajv to do JSONSchema event validation. "Ajv generates code using doT templates to turn JSON Schemas into super-fast validation functions".

We have control over the schemas that will be used by this service, and there will be restrictions on what JSONSchema features can and cannot be used in our schemas. Even so, the fact that this service will look up JSONSchemas based on incoming (user supplied) URIs means that we need to be very careful about what we ask Ajv to do. We think that by restricting the URIs from which this service is allowed to download JSONSchemas we can avoid potential security issues with Ajv code generation, but we should do a security review of this plan with security engineers to see what they think.

Project Information

Description of the tool/project

EventGate takes in JSON events via HTTP POST, validates against a JSONSchema and then produces them to a pluggable destination (usually to Kafka).

Description of how the tool will be used at WMF

EventGate will replace our usages of EventLogging for Analytics and the production EventLogging EventBus service, allowing us to unify our event intake systems.

Dependencies

None.

Has this project been reviewed before?

No.

Working test environment

With nodejs 10 and npm installed:

git clone https://github.com/wikimedia/eventgate.git
cd eventgate
npm test

Post-deployment

Analytics Engineering & Core Platform - Andrew Otto

TODOs from Security Review Comment

In T208251#4936690, @sbassett wrote:

Vulnerable/Outdated JavaScript Packages

  • swagger-ui < 3.4.2 - WON'T FIX: needs fixed in service-template-node
  • lodash <= 4.17.5
  • cassandra-uuid
  • eslint-config-wikimedia
  • eslint-plugin-jsdoc
  • jquery 1.8.0.min - WON'T FIX: needs fixed in service-template-node via swagger-ui update
  • jquery 1.7.1 (clarinet/test/lib/jquery.js) - WON'T FIX: test depenency
  • jquery 1.9.1 (domino/test/fixture/jquery-1.9.1.js) - WON'T FIX: test depenency
  • jquery 2.2.0 (domino/test/fixture/jquery-2.2.0.js) - WON'T FIX: test depenency

DoS Vectors

Proxy Issues

  • there are potential issues around this app requesting external schema URLs. - Restricting to controlled domains.

Security Headers

  • media-src *; img-src *; style-src *; csp directives for the x-webkit-csp and x-content-security-policy` headers might be overly-permissive within the context of a ervice-based app with a single POST endpoint. Risk: Low - Using default-self by removing csp from configs.
  • * for the access-control-allow-origin CORS header might also be verly-permissive within the context of this application. Risk: Low - Set cors: false

Non-Security Quibbles

  • I'm not sure why dist/init-scripts/sysvinit.erb has an erb extension when it ooks to be a shell script. - WON'T FIX: needs fixed in service-template-node

Event Timeline

Ottomata created this task.Oct 29 2018, 5:59 PM
Ottomata triaged this task as Normal priority.

Adding the security folks.

I agree that code generation based on schemae can potentially be problematic, but we should fine with a simple setup of not declaring a proxy for the service. In production, if an entity wants to contact the outside world it needs to go through our proxies. If we don't configure the proxy list for the service, any requests that try to connect to the outside world would simply time out. that leaves us with the possibility of only retrieving schemae that are on our servers and assuming that we can trust these seems like a safe thing to do.

Ottomata moved this task from Backlog to Next Up on the EventBus board.Dec 5 2018, 10:02 PM
Ottomata removed Ottomata as the assignee of this task.Dec 5 2018, 10:10 PM
Nuria removed the point value for this task.Dec 5 2018, 10:19 PM
Nuria added a subscriber: chasemp.

Putting this on security's radar, @chasemp Please let us know the best way to drive this work

@charlotteportero do we have everything we need here to assign this at the next meeting? I know Analytics is hoping for movement this quarter

charlotteportero closed this task as Declined.Jan 8 2019, 12:37 PM

For Security work, we need all the information included in the Security Reviews form provided in the task or we will decline the task. If you have any questions, contact Charlotte Portero.

@charlotteportero I don't think any of us knew there was a security review form. Can you link to it please?

Hey @Ottomata -

Here's some current documentation for our security review process and here's the Phab request form. We're currently revamping the process a bit and updating the documentation with the goal of socializing it a bit better in the future.

Ottomata reopened this task as Open.Jan 8 2019, 3:56 PM
Ottomata updated the task description. (Show Details)
Ottomata renamed this task from T206785: Modern Event Platform: Stream Intake Service: AJV usage security review to Modern Event Platform: Stream Intake Service: AJV usage security review.

Ok thanks @sbasset. I've brought the form template over to this task and filled it out. @charlotteportero let me know if I'm missing anything.

Adding the security folks.

I agree that code generation based on schemae can potentially be problematic, but we should fine with a simple setup of not declaring a proxy for the service. In production, if an entity wants to contact the outside world it needs to go through our proxies. If we don't configure the proxy list for the service, any requests that try to connect to the outside world would simply time out. that leaves us with the possibility of only retrieving schemae that are on our servers and assuming that we can trust these seems like a safe thing to do.

Just to add my 2 cents (I just found out about this on IRC). This does not really address the insider threat, namely a previously trusted person either conducting a DoS by producing events with bogus URIs for schemas or malformed URIs (probably under a server they somehow got access to) trying to exploit a bug. The lockdown should be more than just "no external networks", aka some egress network policy allowing just kafka+schema server and maybe even a whitelist of allowed schema servers. Using HTTPS for the schema server can also help

sbassett added a comment.EditedJan 14 2019, 4:21 PM

@Ottomata -

The Security-Team should be able to get a review scheduled for this soon. Just a few initial questions/observations, in addition to some of the concerns already being brought up within other comments:

  1. We'd want to have a look at the eventgate code as well as any additional dependencies within package.json as opposed to just Ajv, at bare minimum scanning for existing vulnerabilities and any obviously-dangerous functionality.
  2. Given some of the recent unpleasantness with npm, has there been any plan for Analytics to host their own npm registry, only allowing vetted modules to be installed for various wikimedia Node applications? From various conversations I've had and doc I've read, it seems that having local repositories (e.g. apt) of vetted packages/modules are simultaneously 1) expected 2) not enforced in any way for things like npm, pip, etc. I'd imagine this is something the Security-Team would want to begin calling out within our reviews, as again, this seems a fairly lax policy within the context of wikimedia application development.
Ottomata added a comment.EditedJan 14 2019, 5:49 PM

We'd want to have a look at the eventgate code as well as any additional dependencies

Ok!

has their been any plan for Analytics to host their own npm registry

Analytics is not planning on doing anything like this. If we were to do this, I think it should be a larger effort between RelEng and the SRE/Service Operations team.

I think it should be a larger effort between RelEng and the SRE/Service Operations team.

That's a completely fair point. I just don't think this has been prioritized in any way from what I've personally seen/heard :/

@Ottomata et al - just fyi, this is officially in-progress and I hope to have a review completed either just before or just after all-hands.

Awesome, thank you!

Update: review will be posted here by Friday (2/8/2019) at the latest.

Security Review Summary - February 2019
Overall, this looks pretty good to me. Issues detailed below:

Vulnerable/Outdated JavaScript Packages
As found via npm audit, retire, npm-dview, https://nvd.nist.gov/, https://snyk.io/vuln, etc:

  • swagger-ui < 3.4.2
    • Outdated version (old: 2.2.8, new: 3.20.6 [current project, not wmf github fork])
    • Vulnerabilities: XSS
    • Risk: Medium
  • lodash <= 4.17.5
  • cassandra-uuid
    • Outdated version (old: 0.0.2, new: 0.1.0)
    • Risk: Low
  • eslint-config-wikimedia
    • Outdated version (old: 0.5.0, new: 0.10.1)
    • RIsk: Low
  • eslint-plugin-jsdoc
    • Outdate version (old: 3.15.1, new: 4.1.0)
    • Risk: Low
  • jquery 1.8.0.min (swagger-ui/lib/jquery-1.8.0.min.js, swagger-ui/dist/lib/jquery-1.8.0.min.js)
  • jquery 1.7.1 (clarinet/test/lib/jquery.js)
  • jquery 1.9.1 (domino/test/fixture/jquery-1.9.1.js)
  • jquery 2.2.0 (domino/test/fixture/jquery-2.2.0.js)

DoS Vectors

  • Ajv
    • ReDoS - open issue
    • Risk: Medium
    • Mentioning this since compileAsync is used within lib/EventValidator.js and seems to suffer from the same issue. For example, building off the exploitable code on the linked github issues page with something like:
const genstr = (len, chr) => {
    var result = ""; 
    for (i=0; i<=len; i++) {
        result += chr;
    }   
    return result;
}

const loadSchema = () => { };

var Ajv = require('ajv');
var ajv = new Ajv({ loadSchema: loadSchema }); 

var validate = ajv.compileAsync({
    "type": "object",
    "properties": {
        "foo": { "type": 'string',
            "oneOf": [
                {"pattern": genstr(60000, "if(") +"x" +  genstr(60000, ")")  }
            ]}  
    }   
}).then((val) => {
    console.log(val);
});

I can hang the node process for several minutes. However, there is a sweet spot, where if enough garbage JSON is passed as the schema to compileAsync, it just fails with Error: schema is invalid: data.properties['foo'].oneOf[0].pattern should match format "regex".

Proxy Issues

  • As originally noted by @mobrovac, there are potential issues around this app requesting external schema URLs. I suppose this would come down to the actual use-cases of the app: if all JSON schemas will be retrieved locally, then this wouldn't be all that risky IMO, provided these schemas were being vetted in some (minimal) way. From the comment blocks within lib/EventValidator.js, this doesn't seem like it will be the case. Even if the app is only interested in http://json-schema.org/ for now as an external repository for schema documents (note: my.schemas.org as used as an example within the comment blocks in lib/EventValidator.js appears to redirect to a questionable ad site) this is still risky. As noted within the Ajv README:
Ajv treats JSON schemas as trusted as your application code. This security model is based on the most common use case, when the schemas are static and bundled together with the application.

The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/refs/json-schema-secure.json - as a means of additional security for untrusted schemas. This sounds like a good recommendation, though this schema seems oddly absent via an install with npm i.

Finally, I'm a bit leery of Ajv's addKeyword and addFormat functions. Although these don't currently appear to be used within EventGate, we should probably plan to keep it that way, since uses like this seem potentially problematic.

Risk: Unknown (though probably at least Medium)

Security Headers

  • media-src *; img-src *; style-src *; csp directives for the x-webkit-csp and x-content-security-policy headers might be overly-permissive within the context of a service-based app with a single POST endpoint. Risk: Low
  • * for the access-control-allow-origin CORS header might also be overly-permissive within the context of this application. Risk: Low

npm Deployment

  • Not really an item for consideration here, as it's a much larger issue, but from what I can tell, the Wikimedia Node app deployment process typically involves running npm i, pulling in all dependent packages and then deploying everything as a bundled app. If this is completely wrong, feel free to let me know. Regardless, with at least a few high-visibility Wikimedia Node apps, creating a vetted repo/registry (like we do for apt and to be compliant, at least in spirit, with the Installing software portion of L3) should probably happen. Risk: Unknown

Non-Security Quibbles

  • I'm not sure why dist/init-scripts/sysvinit.erb has an erb extension when it looks to be a shell script. Risk: None

Thanks so much for the review! I'll respond to the AJV/schema stuff first.

if all JSON schemas will be retrieved locally, then this wouldn't be all that risky IMO, provided these schemas were being vetted in some (minimal) way. From the comment blocks within lib/EventValidator.js, this doesn't seem like it will be the case.

Schemas won't necessarily be retrieved locally (all many will, especially for the non-analytics use case), but all will be retrieved from domains that we control. I just added a config for our EventGate implementation that disallows non-relative URIs, which means that events won't be allowed to provide an absolute URI (one without a protocol scheme). Relative URIs (ones with the path portion only, e.g. /mediawiki/revision/create/0.0.3) will be prefixed with our configured schema_base_uris, which will be something like file:///path/to/event-schemas or http:///schemas.eqiad.wmflabs.

EventGate is meant to be a generic tool, useable outside of WMF, so it allows for users to implement their own validate functions that can look up schemas however they like. Our implementation is more restrictive.

ReDoS - open issue

I think this is not relevant for us. Our schemas will never allow uses of oneOf like this. We may want to allow people to validate fields against certain regexes though. But since we have control of the schemas, this should be ok, yes?

The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/refs/json-schema-secure.json

Oh, cool. I'll try to build that in.

Finally, I'm a bit leery of Ajv's addKeyword and addFormat functions. Although these don't currently appear to be used within EventGate, we should probably plan to keep it that way

Ya, sounds good. I think we don't need this. There has been some discussion of adding a special schema annotation to differentiate dimension from metric fields, but we haven't flushed that out at all

As for all the other comments, I'll see what I can do! Many of them are comments about mediawiki service-template-node, but I'll see if I can resolve them here.

media-src *; img-src *; style-src *; csp directives for the x-webkit-csp and x-content-security-policy headers might be overly-permissive within the context of a service-based app with a single POST endpoint. Risk: Low

Done, set to csp: false.

for the access-control-allow-origin CORS header might also be overly-permissive within the context of this application. Risk: Low

I'm not sure what to set this to yet, and it might need to remain *. Eventually a public deployment of EventGate will be POSTed to by browsers and by mobile apps.

lodash <= 4.17.5

EventGate uses ^4.17.11

eslint-config-wikimedia
Outdated version (old: 0.5.0, new: 0.10.1)
eslint-plugin-jsdoc
Outdate version (old: 3.15.1, new: 4.1.0)

I fixed these by extending fully from eslint-config-wikimedia instead of the deprecated eslint-config-node-services.

Interesting!

The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/refs/json-schema-secure.json

@Pchelolo, @mobrovac this schema seems to require that all strings with either format or pattern specified must also specify a maxLength. Our meta field uses format for dt and pattern for id (uuid). I can add "maxLength": 26 for dt and "maxLength": 36 for id and the schema will validate as 'secure'.

Since we will (probably? for now?) only be producing new schemas to EventGate, I think it would be fine to add this to our meta field. Thoughts?

Since we will (probably? for now?) only be producing new schemas to EventGate, I think it would be fine to add this to our meta field. Thoughts?

No objection, but mediawiki.job schema request_id has a pattern too.

The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/refs/json-schema-secure.json

Done: https://github.com/wikimedia/eventgate/commit/74a3c0d249c5b803ea553a71cb17aae5223180c2

lodash <= 4.17.5

EventGate uses ^4.17.11

It does, but npm is creating a hard dependency within package-lock.json for lodash 3.10.1, for kad. If you run an npm i, you can see the affected version installed at node_modules/kad/node_modules/lodash. Updating kad's package.json should fix the issue, but I have no idea what regressions that might introduce.

sbassett added a comment.EditedFeb 8 2019, 10:23 PM

Some additional follow-up:

Relative URIs (ones with the path portion only, e.g. /mediawiki/revision/create/0.0.3) will be prefixed with our configured schema_base_uris

Sounds good. I suppose there's still the risk of the insider threat here that @akosiaris mentioned, but that seems low IMO, especially with additional validation and safety measures being put in place, as discussed on this task.

ReDoS - open issue

I think this is not relevant for us. Our schemas will never allow uses of oneOf like this. We may want to allow people to validate fields against certain regexes though. But since we have control of the schemas, this should be ok, yes?

Probably should be ok, though if I were an attacker, this would certainly be a vector I'd explore if I were able to compromise a schema or inject a malicious schema somehow. Though json-schema-secure.json might help mitigate this; I'm not entirely sure.

Done, set to csp: false.

Er, we could probably keep csp in place, but just set those directives to 'self' instead of '*' within app.js, or just remove them since default-src is 'self'. csp's not all that relevant given what EventGate does, but it shouldn't hurt anything and could provide protection from some unaccounted-for attack vector.

The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/refs/json-schema-secure.json

Done: https://github.com/wikimedia/eventgate/commit/74a3c0d249c5b803ea553a71cb17aae5223180c2

Cool, hopefully that provides a decent layer of protection around DoS-ish attack vectors.

Hey @Ottomata - just wanted to check in on the status of all this and if you needed anything else from the Security-Team. Thanks.

Heya! I still need to resolve a couple of things here: CSP and package-lock.json stuff. Its on my TODO list.

I might have a few more questions when I get to it, but for now it's all on my plate.

Sounds good, thanks.

lodash <= 4.17.5

It looks like the lodash is fixed by updating service-runner to 2.6.9. Done.

Ottomata updated the task description. (Show Details)Feb 20 2019, 9:14 PM

Change 491857 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Set cors to false for eventgate-analytics node service chart

https://gerrit.wikimedia.org/r/491857

Ottomata updated the task description. (Show Details)Feb 20 2019, 9:19 PM

Change 491857 merged by Ottomata:
[operations/deployment-charts@master] Set cors to false for eventgate-analytics node service chart

https://gerrit.wikimedia.org/r/491857

Ottomata renamed this task from Modern Event Platform: Stream Intake Service: AJV usage security review to Modern Event Platform: Stream Intake Service: EventGate security review.Feb 20 2019, 9:20 PM

@sbassett, I've moved your TODO list into the task description. I believe I've resolved all of the relevant things.

Is there anything else here? If not, should we close this?

Ottomata moved this task from Next Up to In Code Review on the Analytics-Kanban board.

@Ottomata - LGTM. Ideally, we'd get some low-priority tasks filed for the service-template-node dependency issues, but that's not an emergency. And probably something on which I can follow up.

Ottomata raised the priority of this task from Normal to High.Feb 21 2019, 5:58 PM
Nuria closed this task as Resolved.Feb 25 2019, 10:39 PM
sbassett moved this task from Backlog to Done on the Security-Team board.Tue, Jun 11, 6:27 PM