Page MenuHomePhabricator

Security Review For Kask
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

Kask is a multi-master replicated opaque key-value data store.

Description of how the tool will be used at WMF

Kask will be deployed for multi-datacenter storage/access of MediaWiki user sessions.

Dependencies

Kask is implemented in Go, and sources its code dependencies entirely from what is packaged in Debian Stretch. It's direct dependencies are (as Debian package name/versions):

  • golang 2:1.7~5
  • golang-github-gocql-gocql-dev 0.0~git20171009.0.2416cf3-3~bpo9+1
  • golang-gopkg-yaml.v2-dev 0.0+git20160928.0.a5b47d3-2
  • golang-github-prometheus-client-golang-dev 0.8.0-1

Additionally, Kask uses Apache Cassandra (v3.11.2 or higher) for underlying storage.

Has this project been reviewed before?

No.

Working test environment

Since Kask relies on Debian for dependency management, it is easiest to build it in a Debian Stretch environment (host, container, chroot, etc). Build instructions can be found in the project README.

Built versions of Kask can be found as Docker images in the WMF registry.

Finally, a running installation exists on deployment-sessionstore01.deployment-prep.eqiad.wmflabs, (http://deployment-sessionstore01.deployment-prep.eqiad.wmflabs:8080/sessions/v1).

Post-deployment

Core Platform Team / @Eevans

Event Timeline

Eevans created this task.Apr 1 2019, 7:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 1 2019, 7:48 PM
sbassett triaged this task as Normal priority.Apr 2 2019, 7:20 PM

@Eevans, @Clarakosi - Thanks for the review request. Given that golang is a little outside of our typical mw core/extensions and web application reviews, the Security-Team will need to think about what will be the best solution for this request. It may end up being a combination of a more focused internal review (centered around static analysis) with the potential for an additional external/vendor review. We'll keep you posted.

@Eevans @Clarakosi This one is a bit out of our wheelhouse and something we can provide only a cursory review of. I'd like to propose the following: The Security team can perform a basic security review of this but I would recommend a secondary review by our 3rd party partners at BishopFox. That said, the 2nd review would incur some cost but I'm not sure how much. Do you have any budget available for something like that??

@Eevans @Clarakosi This one is a bit out of our wheelhouse and something we can provide only a cursory review of. I'd like to propose the following: The Security team can perform a basic security review of this but I would recommend a secondary review by our 3rd party partners at BishopFox. That said, the 2nd review would incur some cost but I'm not sure how much. Do you have any budget available for something like that??

I imagine the answer to that will depend on how much the 3rd party review costs; How would we go about getting a better idea?

/cc @Fjalapeno, @WDoranWMF

Eevans moved this task from Backlog to In-Progress on the User-Eevans board.Apr 25 2019, 3:48 PM
mobrovac added a subscriber: mobrovac.

@JBennett @sbassett would it be a fair to have you start the cursory review while we figure out the budgeting for a thorough review?

Also, since you mention golang not being your forte and I imagine that with the move to k8s this will not be the last time a golang service is in need of a security review, is there something we can do to help in that regard?

@mobrovac et al - we should be able to get some initial analysis scheduled during our security review scrum tomorrow (April 30th) and follow up here. Again, this will most likely be a surface-level analysis of whatever is in master right now (which appears to be pretty stable at this point in time) and we should continue to explore supplemental review options from external vendors.

@JBennett How do you want to go about getting a quote from BishopFox? I'm happy to work with them to figure that out if you want to introduce me and that works for you.

Security Review Summary - May 2019
Overall, the Kask application looks good, though I did not perform what I would consider an exhaustive analysis - see my findings below. If we were to solicit additional security analysis, I would recommend that the vendors focus specifically upon the http write (post, delete) methods of the service in addition to its authentication layers and perhaps performance.

Vulnerable Packages
I'm not seeing any CVEs or other reported vulnerabilities for any of the following dependencies:

  • golang-github-gocql-gocql-dev
  • golang-gopkg-yaml.v2-dev
  • golang-github-prometheus-client-golang-dev
  • golang-golang-x-tools

There is an XSS vulnerability for prometheus in versions <2.7.1, but that does not appear related to the go client. I also assume the go dependencies will be installed via apt when deployed (as described within the README) as opposed to being imported within various files. Go's standard library should be fine, but given go's usage of github.com and other external sites as package repositories, we'd need to proxy out of our production servers and ensure that the following are trusted supply chains:

  • github.com/prometheus: kask.go, http.go
  • github.com/gocql: http.go, storage.go (and related tests)
  • gopkg.in/yaml.v2: config.go

Security Headers
Most of these headers aren't relevant for a private, restful http service where I'm assuming access will be tightly controlled and limited to trusted clients (and why CSRF protections most likely aren't an issue here.) Though I would recommend adding support for a Strict-Transport-Security header, especially since TLS seems to be optional for the application.

TLS/SSL
According to documentation and related Phabricator posts, TLS will be enabled for both Cassandra and the Kask service and was a key consideration early on in the development of the service. TLS is configured within config.yaml.

Performance and DoS Vectors
I did not get a chance to test this in depth, nor am I extremely familiar with go's http server or prometheus, but it appears that @akosiaris has been testing production performance scenarios fairly thorughly in T220401#5128786 and that so far there appear to be no issues.

Policy Compliance
I am assuming data will live in Cassandra and within relevant logs in accordance with Wikimedia's existing data retention guidelines.

Static Analysis Findings
I ran gosec against the source and it came back with what appear to be mostly false positives, but I thought I'd post them here for completeness' sake:

  • config.go:62 - G304: Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
    • ioutil.ReadFile(filename)
  • storage.go:89 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('INSERT INTO "%s"."%s" (key, value) VALUES (?,?) USING TTL ?', s.Keyspace, s.Table)
  • storage.go:97 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('SELECT value, TTL(value) as ttl FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)
  • storage.go:104 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('DELETE FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)
  • http.go:159 - G104: Errors unhandled. (Confidence: HIGH, Severity: LOW)
    • w.Write(value.Value)

DAST Findings
It might be worthwhile to run ZAP (which has OpenAPI support) against a development instance of this service at some point, though I did not have time to do this prior to my leave. It should be fairly trivial for someone to run a ZAP baseline scan via ZAP's latest, stable docker build.

This is awesome @sbassett; Thanks for looking it over! I left some comments inline:

Security Review Summary - May 2019
Overall, the Kask application looks good, though I did not perform what I would consider an exhaustive analysis - see my findings below. If we were to solicit additional security analysis, I would recommend that the vendors focus specifically upon the http write (post, delete) methods of the service in addition to its authentication layers and perhaps performance.
Vulnerable Packages
I'm not seeing any CVEs or other reported vulnerabilities for any of the following dependencies:

  • golang-github-gocql-gocql-dev
  • golang-gopkg-yaml.v2-dev
  • golang-github-prometheus-client-golang-dev
  • golang-golang-x-tools

There is an XSS vulnerability for prometheus in versions <2.7.1, but that does not appear related to the go client. I also assume the go dependencies will be installed via apt when deployed (as described within the README) as opposed to being imported within various files. Go's standard library should be fine, but given go's usage of github.com and other external sites as package repositories, we'd need to proxy out of our production servers and ensure that the following are trusted supply chains:

  • github.com/prometheus: kask.go, http.go
  • github.com/gocql: http.go, storage.go (and related tests)
  • gopkg.in/yaml.v2: config.go

Yes, we've made a point of sourcing dependencies entirely from Debian (Stretch, for now); We're relying on Debian for release management and security support for our dependency graph. If we reach a point where we need something outside of what is in Debian, we'll backport or package as necessary, and push that work upstream (we have lots of Debian Developers at the Foundation :)).

Security Headers
Most of these headers aren't relevant for a private, restful http service where I'm assuming access will be tightly controlled and limited to trusted clients (and why CSRF protections most likely aren't an issue here.) Though I would recommend adding support for a Strict-Transport-Security header, especially since TLS seems to be optional for the application.

TLS is optional here in the sense that it can be enabled or disabled entirely. If enabled, it is strictly TLS encrypted, HTTP is not supported. Is HSTS still applicable in this situation?

TLS/SSL
According to documentation and related Phabricator posts, TLS will be enabled for both Cassandra and the Kask service and was a key consideration early on in the development of the service. TLS is configured within config.yaml.
Performance and DoS Vectors
I did not get a chance to test this in depth, nor am I extremely familiar with go's http server or prometheus, but it appears that @akosiaris has been testing production performance scenarios fairly thorughly in T220401#5128786 and that so far there appear to be no issues.
Policy Compliance
I am assuming data will live in Cassandra and within relevant logs in accordance with Wikimedia's existing data retention guidelines.

No PII should ever by logged, but session data is persisted in Cassandra. Sessions expire well before the max retention period, but that data is not immediately removed from storage, it's GC'd by compaction (as dictated by the compaction algorithm in use) after a moratorium (defaults to 10 days). I would be really surprised if that process were to drag out past 90 days (or anything even close to 90 days), but it might be worth putting a pin in this, and conduct an audit at some point to verify just how long they do hang around.

Static Analysis Findings
I ran gosec against the source and it came back with what appear to be mostly false positives, but I thought I'd post them here for completeness' sake:

  • config.go:62 - G304: Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
    • ioutil.ReadFile(filename)
  • storage.go:89 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('INSERT INTO "%s"."%s" (key, value) VALUES (?,?) USING TTL ?', s.Keyspace, s.Table)
  • storage.go:97 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('SELECT value, TTL(value) as ttl FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)
  • storage.go:104 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('DELETE FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)
  • http.go:159 - G104: Errors unhandled. (Confidence: HIGH, Severity: LOW)
    • w.Write(value.Value)

Yes, these look like false positives at first glance, but I'll have a closer look and comment on each of them specifically.

DAST Findings
It might be worthwhile to run ZAP (which has OpenAPI support) against a development instance of this service at some point, though I did not have time to do this prior to my leave. It should be fairly trivial for someone to run a ZAP baseline scan via ZAP's latest, stable docker build.

I'll look into this.

[ ... ]
Policy Compliance
I am assuming data will live in Cassandra and within relevant logs in accordance with Wikimedia's existing data retention guidelines.

No PII should ever by logged, but session data is persisted in Cassandra. Sessions expire well before the max retention period, but that data is not immediately removed from storage, it's GC'd by compaction (as dictated by the compaction algorithm in use) after a moratorium (defaults to 10 days). I would be really surprised if that process were to drag out past 90 days (or anything even close to 90 days), but it might be worth putting a pin in this, and conduct an audit at some point to verify just how long they do hang around.

See: T222990: Audit session storage to determine max age of un-GC'd sessions

Hey @Eevans - I'm officially on leave until June 10th, though I wanted to quickly follow up on a few things:

TLS is optional here in the sense that it can be enabled or disabled entirely. If enabled, it is strictly TLS encrypted, HTTP is not supported. Is HSTS still applicable in this situation?

If we know that TLS will always be enabled for any wmf-deployed instance of kask, then the additional header isn't necessary. Though security headers like this can serve as additional insurance policies against misconfigurations, zero-days, etc.

DAST Findings
It might be worthwhile to run ZAP (which has OpenAPI support) against a development instance of this service at some point, though I did not have time to do this prior to my leave. It should be fairly trivial for someone to run a ZAP baseline scan via ZAP's latest, stable docker build.
I'll look into this.

Sounds good. This would be fairly low-priority IMO and something I could most likely have a go at once I return from my leave. I just ran out of time attempting to stand up the kask dockers under docker-registry.w.o.

See: T222990: Audit session storage to determine max age of un-GC'd sessions

Thanks for filing!

[ ... ]

[ ... ]
Static Analysis Findings
I ran gosec against the source and it came back with what appear to be mostly false positives, but I thought I'd post them here for completeness' sake:

  • config.go:62 - G304: Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
    • ioutil.ReadFile(filename)

I'm not sure what to say about this one. It's part of a function meant to read the YAML configuration file and return a Config struct. I think it may be doing exactly what G304 is warning of, but it's kind of the point. :)

  • storage.go:89 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('INSERT INTO "%s"."%s" (key, value) VALUES (?,?) USING TTL ?', s.Keyspace, s.Table)
  • storage.go:97 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('SELECT value, TTL(value) as ttl FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)
  • storage.go:104 - G201: SQL string formatting (Confidence: HIGH, Severity: MEDIUM)
    • fmt.Sprintf('DELETE FROM "%s"."%s" WHERE key = ?', s.Keyspace, s.Table)

Presumably the concern here is an injection, but what is being string formatted are the keyspace and table names (nothing else), both parsed from the configuration file on startup.

  • http.go:159 - G104: Errors unhandled. (Confidence: HIGH, Severity: LOW)
    • w.Write(value.Value)

This one looks legit, and I'll follow up with a gerrit.

Yes, these look like false positives at first glance, but I'll have a closer look and comment on each of them specifically.

Change 513516 had a related patch set uploaded (by Eevans; owner: Eevans):
[mediawiki/services/kask@master] Log ResponseWriter.Write errors

https://gerrit.wikimedia.org/r/513516

Change 513516 merged by jenkins-bot:
[mediawiki/services/kask@master] Log ResponseWriter.Write errors

https://gerrit.wikimedia.org/r/513516

@sbassett is there anything remaining here before we close/resolve this?

@Eevans - I'm not seeing anything for this particular review, though I might dig a little deeper into the code and attempt some dynamic-scanning this week, as mentioned in T219831#5173498. But none of this should block resolving the task or deployment IMO.

@Eevans - I'm not seeing anything for this particular review, though I might dig a little deeper into the code and attempt some dynamic-scanning this week, as mentioned in T219831#5173498. But none of this should block resolving the task or deployment IMO.

OK, I'll go ahead and close it then; Thanks for you help!

Eevans closed this task as Resolved.Jun 12 2019, 2:24 PM
sbassett added a project: Restricted Project.Jun 26 2019, 7:32 PM
sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 15 2019, 4:40 PM