Page MenuHomePhabricator

Security Review of RESTBagOStuff
Closed, ResolvedPublic

Description

This task is a placeholder to remember that we need to deal with security issues around KaskBagOStuff, probably separate from Kask itself.

Event Timeline

EvanProdromou renamed this task from Security Review of KaskBagOStuff to Security Review of RESTBagOStuff.Jun 10 2019, 2:18 PM

@Eevans brought up the valid point that if this code is already in MediaWiki, and has only been altered somewhat to support Kask, that it may not need a security review.

I don't know if it ever has been reviewed, since I don't believe that the RESTBagOStuff class was used in production anywhere (?) before adapting it for Kask.

I'll admit that I feel somewhat queasy about deploying login-related code to production without some review. But I'm not sure what the requirements are for us to go forward.

WDoranWMF triaged this task as Normal priority.Jun 11 2019, 4:17 PM

We had an email conversation with @sbassett about security review of this component. He reviewed @BPirkle 's patches on gerrit, and thinks that since the RESTBagOStuff code has been in master for a few years, we don't have to do a review.

I think the next step would be for us to do an internal security review, and if we feel like we need it, we can ask for a full security review from the Security Team.

I'll add my thoughts here in a separate comment, and we can go from there.

This is an uninformed and inexpert analysis of potential security concerns with our upcoming session stack. I haven't looked at the code to see if any of these potential problems are mitigated. I hope that most or all of them are!

I'd love to have @Eevans and @BPirkle look this over and comment. If most or all of these things make you scoff and say, "This is ridiculous, it's completely obvious, rookie mistake, couldn't happen, of course we mitigate that," great! If anything makes you get a little queasy and think, "Umm, I'm not sure, we're not testing for that," let's check those possibilities, and if they make us really queasy, we can go to a formal Security Review to shake out more subtle attacks that professionals way smarter than I am could think of.

The stack involved in this project is roughly the following (please correct!):

Client ---> MediaWiki ---> RESTBagOStuff ---> Guzzle ---> Kask ---> Cassandra

Roughly:

  • The client sends MediaWiki a session id cookie as part of its HTTP Request, like "XXXX"
  • MediaWiki transforms that session id into a key, like "keyprefix:XXXX" and passes it to RESTBagOStuff
  • RESTBagOStuff transforms that key into an URL like "https://kaskserver/pathprefix/keyprefix:XXXX" and passes that to Guzzle
  • Guzzle makes an HTTP request to Kask for "/pathprefix/keyprefix:XXXX"
  • Kask makes a CQL request to Cassandra for "keyprefix:XXXX"

If we just consider the session ID, I can see these potential things that a villain could do by carefully crafting a session ID value:

  • Kask to Cassandra (CQL, mostly?)
    • retrieve data that's not authorized
    • delete or overwrite data
    • overload the servers, making them inaccessible
  • Guzzle to Kask
    • Force a fatal error that kills the Kask process (e.g. an unhandled exception from a badly-encoded character in an URL)
    • Overload Kask so it's unresponsive (e.g. very very very long URL)
    • Force an error message that reveals configuration or authorization data
  • RESTBagOStuff to Guzzle
    • Make Guzzle request restricted data from another internal server instead of Kask
    • Make Guzzle request data from an external server (using WMF servers to attack another site)
    • Make Guzzle go into an infinite loop, shutting down this app server

I think mitigations would be:

  1. Kask builds its query strings in a smart way that doesn't inject extra code. I don't know CQL, but I'd assume there's a similar mechanism to pass parameters with "?" as in SQL. "SELECT value from session where key = ?".
  2. Kask, and the http server libs it uses, handle "bad" URLs (crazy long, badly encoded, etc.) in robust ways
  3. Guzzle doesn't allow things like requesting multiple URLs at the same time with string programming, like "https://kaskserver/pathprefix/keyprefix:XXXX AND ALSO https://internalserver/secrets". (Not many HTTP client libs do things like this.)
  4. RESTBagOStuff builds the URLs it passes to Guzzle in a sane way, that doesn't allow overwriting or ignoring the URL prefix. I honestly don't know how to make this happen, but I have a superstitious fear of PHP string handling, and for all I know throwing 25 U+08 (BACKSPACE) characters into a string following by the URL you really want to request would do the trick.
  5. RESTBagOStuff checks keys it receives to make sure they're acceptable for Kask (character set and length and ...?) before passing to Guzzle. I don't think RESTBagOStuff should be scanning keys for, like 'DROP TABLES' or something, but it should probably be able to check for egregious strings that are likely to make trouble.
  6. RESTBagOStuff doesn't dump errors (which might have configuration or authorization info) verbatim back to MediaWiki.
  7. MediaWiki validates input session IDs to make sure they match the format that it sets. This would probably prevent a lot of these problems.
  8. MediaWiki doesn't dump errors verbatim from RESTBagOStuff back to the client

The other way I could see a villain doing bad things is by executing behaviour on the site that gets encoded into the session object itself ('{"lastSearch": "DROP ALL TABLES"}') which could cause problems in the stack. A couple of things I could see happening:

  • Destructive commands in the session body get injected into the CQL
  • Unauthorized queries in the session body get injected into the CQL
  • Bad characters or long strings in the session body cause the encoding process in RESTBagOStuff (JSON encoding, I think?) to make a fatal error or spin forever
  • Bad characters or long strings in the session body cause the decoding process in RESTBagOStuff (JSON encoding, I think?) to make a fatal error or spin forever (i.e. the session goes in but not out)
  • Bad characters or long strings mess up the HTTP body handling in Guzzle (make it die, make it loop forever)
  • Bad characters or long strings mess up the HTTP body handling in Kask or its HTTP server libs (make it die, make it loop forever)

Responding (in-line) to those items I'm familiar with...

[ ... ]
If we just consider the session ID, I can see these potential things that a villain could do by carefully crafting a session ID value:

  • Kask to Cassandra (CQL, mostly?)
    • retrieve data that's not authorized
    • delete or overwrite data
    • overload the servers, making them inaccessible

If an attacker could utilize the session ID of another, then they'd be able to to do anything the owner of the session would, there is no defense against this other than keeping other user's session a secret.

I think mitigations would be:

  • Kask builds its query strings in a smart way that doesn't inject extra code. I don't know CQL, but I'd assume there's a similar mechanism to pass parameters with "?" as in SQL. "SELECT value from session where key = ?".

There is.

  • Kask, and the http server libs it uses, handle "bad" URLs (crazy long, badly encoded, etc.) in robust ways

Golang puts an upper bound on all of the headers (including the request line), the default is 1MB. I suspect client libraries probably implement limits as well, and are probably more conservative than this.

Cassandra though is going to reject any request for a key longer than 65535 bytes (the max size of a partition). This will be bubbled up as a generic 500 to clients, but logged with the details by the server.

[ ... ]

  • RESTBagOStuff checks keys it receives to make sure they're acceptable for Kask (character set and length and ...?) before passing to Guzzle. I don't think RESTBagOStuff should be scanning keys for, like 'DROP TABLES' or something, but it should probably be able to check for egregious strings that are likely to make trouble.

An injection attack like this wouldn't work. Cassandra doesn't even support statements embedded within statements.

The other way I could see a villain doing bad things is by executing behaviour on the site that gets encoded into the session object itself ('{"lastSearch": "DROP ALL TABLES"}') which could cause problems in the stack. A couple of things I could see happening:

  • Destructive commands in the session body get injected into the CQL
  • Unauthorized queries in the session body get injected into the CQL

Again, the driver sanitizes input, but Cassandra doesn't even support this type of query anyway; You couldn't craft such a query even if you wanted to.

[ ... ]

  • Bad characters or long strings mess up the HTTP body handling in Kask or its HTTP server libs (make it die, make it loop forever)

See above, but TL;DR: this would represent a pretty serious bug in Go's HTTP handling. I wouldn't rule that out entirely, but given the number of folks using Go for webapps, it seems unlikely. I have tested this (lightly) though and it seems to Do The Right Thing.

Thank you, @EvanProdromou for this list! Responses follow to items I'm familiar with.

tl;dr: I think we're good.

The stack involved in this project is roughly the following (please correct!):
Client ---> MediaWiki ---> RESTBagOStuff ---> Guzzle ---> Kask ---> Cassandra

I would add "SessionManager" between "MediaWiki" and "RESTBagOStuff". (https://www.mediawiki.org/wiki/Manual:SessionManager_and_AuthManager) This is relevant because we made no changes to SessionManager (or associated classes, such as SessionBackend). Most things related to session management occur in this layer (key generation, populating/interpreting session value data, etc.). This mitigates an entire class of concerns, because the same code that is currently handling these things on production will continue to handle them. We should, of course, look for any new vulnerabilities that we may have introduced, or that expose previously unrecognized weaknesses in SessionManager. But much code is completely unchanged across the transition to using Kask. In particular, SessionManager already validates session ids passed in by the client.

  • Guzzle to Kask
    • Force a fatal error that kills the Kask process (e.g. an unhandled exception from a badly-encoded character in an URL)

The base url is taken from MediaWiki configuration, which we can have confidence in. If MediaWiki configuration were subject to attack, we'd have larger problems than sessions. Session-specific data (key, including prefix, and value data) is encoded using standard PHP functions.

  • Overload Kask so it's unresponsive (e.g. very very very long URL)

All data involved in the transmission is provided by either configuration or SessionManager, which are trusted sources.

  • Force an error message that reveals configuration or authorization data

Forcing an error message between Guzzle and Kask seems unlikely. 404s are expected and not logged. It is worth considering what an attacker could learn if errors occurred for some unforced reason and an attacker happened upon the info. Error details are logged via our standard logging. These details consist of:

  • message (Failed to fetch XXXX, where XXXX is the fully prefixed key)
  • standard HTTP code (500, etc.)
  • standard HTTP response ("Internal Server Error", etc.)
  • "type", "title", "detail", and "instance" values, if provided by Kask

The standard HTTP info is not of concern. Including the key in the message does potentially expose keys to anyone able to view logs. This is mitigated in (at least) two ways:

  • various controls over who can view logs
  • simply knowing the session key is insufficient to impersonate someone. You must also know the token, which is not logged

I will let @Eevans speak to the data potentially contained in the detail fields provided by Kask. I suspect that any attacks based on revealed configuration are sufficiently mitigated by Kask being accessible only from within our network. From the MediaWiki side, T220401 and T224993 are publicly viewable and contain our full configuration information. So there's no real secret to be kept there.

For clarity, none of the information logged is directly visible to the end user. These is no way for an end user to see an error page with this information. It is only logged and never bubbled back up to code that could include it in the UI.

  • RESTBagOStuff to Guzzle
    • Make Guzzle request restricted data from another internal server instead of Kask
    • Make Guzzle request data from an external server (using WMF servers to attack another site)

This would require manipulating the MediaWiki configuration, which we can consider safe.

  • Make Guzzle go into an infinite loop, shutting down this app server

Guzzle itself has already undergone security review (T202143). RESTBagOStuff does not add any additional code (retries, etc.) that could introduce an infinite loop. Even if this somehow occurred, our stack is protected by timeouts, (verified via fatalerror.php, T210567). I have personally caused infinite loops on production via fatalerror.php with no ill effects to the app server or overall site.

I think mitigations would be:

Guzzle doesn't allow things like requesting multiple URLs at the same time with string programming, like "https://kaskserver/pathprefix/keyprefix:XXXX AND ALSO https://internalserver/secrets". (Not many HTTP client libs do things like this.)

I don't believe that Guzzle allows this. But even if we switched to a different HTTP library that did allow it, the way in which RESTBagOStuff builds urls and encode parameters precludes it.

RESTBagOStuff builds the URLs it passes to Guzzle in a sane way, that doesn't allow overwriting or ignoring the URL prefix. I honestly don't know how to make this happen, but I have a superstitious fear of PHP string handling, and for all I know throwing 25 U+08 (BACKSPACE) characters into a string following by the URL you really want to request would do the trick.

That is a cleverly evil idea, but already mitigated.

RESTBagOStuff checks keys it receives to make sure they're acceptable for Kask (character set and length and ...?) before passing to Guzzle. I don't think RESTBagOStuff should be scanning keys for, like 'DROP TABLES' or something, but it should probably be able to check for egregious strings that are likely to make trouble.

This responsibility lies with SessionManager, which is unchanged. And Eric mentioned in a previous reply that Cassandra is not attackable in this way.

RESTBagOStuff doesn't dump errors (which might have configuration or authorization info) verbatim back to MediaWiki.

Already discussed above.

MediaWiki validates input session IDs to make sure they match the format that it sets. This would probably prevent a lot of these problems.

As discussed, this is the responsibility of SessionManager, which we have not changed.

MediaWiki doesn't dump errors verbatim from RESTBagOStuff back to the client

This already does not occur. Error details are not passed upward from RESTBagOStuff.

The other way I could see a villain doing bad things is by executing behaviour on the site that gets encoded into the session object itself ('{"lastSearch": "DROP ALL TABLES"}') which could cause problems in the stack. A couple of things I could see happening:

  • Destructive commands in the session body get injected into the CQL
  • Unauthorized queries in the session body get injected into the CQL

Addressed by Eric.

  • Bad characters or long strings in the session body cause the encoding process in RESTBagOStuff (JSON encoding, I think?) to make a fatal error or spin forever
  • Bad characters or long strings in the session body cause the decoding process in RESTBagOStuff (JSON encoding, I think?) to make a fatal error or spin forever (i.e. the session goes in but not out)
  • Bad characters or long strings mess up the HTTP body handling in Guzzle (make it die, make it loop forever)

Protecting against incoming bad/long data from the client is the responsibility of SessionManager, which has not changed, and which already does validation. From the other direction, let's pretend someone managed to introduce bad/long data directly into Cassanda then tried to attack MediaWiki by requesting that data via session storage. In that case, our stack is protected by timeouts and can withstand fatal errors, including out-of-memory (verified via fatalerror.php, T210567). Doing this at a level sufficient to attempt a denial of service attack would be a creative attack, but our logging is sufficient to identify the situation. One quick emergency triage would be to push a configuration change that pointed session storage at an EmptyBagOStuff. I suspect there are superior emergency triage methods, but this one would completely eliminate session storage as an attack vector, at the cost of disabling logins until the problem was resolved. Another would be to temporarily blackhole the kask hostname, with similar effects.

  • Bad characters or long strings mess up the HTTP body handling in Kask or its HTTP server libs (make it die, make it loop forever)

Addressed by Eric.

@Eevans and @BPirkle I feel pretty satisfied. I'm not a security professional, but I'm having a hard time thinking up other attacks than those listed. I have three questions left for you, and then I think we can close this ticket.

  • Are there attacks you can think of that I didn't list?
  • Are there mitigations that you expect the other component(s) to implement that you didn't hear about here? Like, "I thought that data would be sanitized by the time it got to my component, but you don't mention scrubbing it..."
  • Do we need any additional review from the security team?

My feeling is that, if the Security Team sees this thread and they don't freak out that we're ignoring some insanely obvious attack, we're probably good to go to production. But I'll defer to you two.

EvanProdromou added a comment.EditedJun 27 2019, 3:59 PM

If we just consider the session ID, I can see these potential things that a villain could do by carefully crafting a session ID value:

  • Kask to Cassandra (CQL, mostly?)
    • retrieve data that's not authorized

If an attacker could utilize the session ID of another, then they'd be able to to do anything the owner of the session would, there is no defense against this other than keeping other user's session a secret.

Again, I don't know CQL, but I was thinking about code that crafts SQL queries like

sprintf("SELECT * FROM session WHERE id = '%s'", id)

...so that an injected ID string like "IGNORED' OR 'constant' = 'constant" will generate "SELECT * FROM session WHERE id = 'IGNORED' OR 'constant' = 'constant'" which should get all rows in the table.

Using parameters in SQL queries prevents this, of course (as well as checking number of results if only one row is expected!).

An attacker could be more specific, and make the OR clause something like "value LIKE '%EvanProdromou%'", so they can go fishing around in the session body for EvanProdromou's session in particular.

Otherwise, thanks for the responses; everything else seems pretty reasonable. I know they're kind of obvious, but it's the obvious stuff that hurts the most when someone uses them against you.

@Eevans and @BPirkle I feel pretty satisfied. I'm not a security professional, but I'm having a hard time thinking up other attacks than those listed. I have three questions left for you, and then I think we can close this ticket.

  • Are there attacks you can think of that I didn't list?

Nothing comes to mind. I suppose we may be slightly changing the load characteristics of a denial of service attack, but I'm confident that @Eevans and team have that covered. I know I heard pieces of load/performance discussions in previous meetings, and I really doubt that anything we are doing here would be the critical bottleneck in that sort of situation.

  • Are there mitigations that you expect the other component(s) to implement that you didn't hear about here? Like, "I thought that data would be sanitized by the time it got to my component, but you don't mention scrubbing it..."

No. RESTBagOStuff is intended to be a general-purpose class, so I tried to not make assumptions.

  • Do we need any additional review from the security team?

My feeling is that, if the Security Team sees this thread and they don't freak out that we're ignoring some insanely obvious attack, we're probably good to go to production. But I'll defer to you two.

I concur. I think we're good to go.

If we just consider the session ID, I can see these potential things that a villain could do by carefully crafting a session ID value:

  • Kask to Cassandra (CQL, mostly?)
    • retrieve data that's not authorized

If an attacker could utilize the session ID of another, then they'd be able to to do anything the owner of the session would, there is no defense against this other than keeping other user's session a secret.

Again, I don't know CQL, but I was thinking about code that crafts SQL queries like

sprintf("SELECT * FROM session WHERE id = '%s'", id)

...so that an injected ID string like "IGNORED' OR 'constant' = 'constant" will generate "SELECT * FROM session WHERE id = 'IGNORED' OR 'constant' = 'constant'" which should get all rows in the table.

There are no queries that are crafted like this; The driver is doing parameter substitution and would sanitize accordingly (assuming any of the scenarios above were valid, which for Cassandra, they are not).

Using parameters in SQL queries prevents this, of course (as well as checking number of results if only one row is expected!).

Only one result can ever be returned; Iteration isn't possible.

sbassett added a comment.EditedJun 28 2019, 7:38 PM

My feeling is that, if the Security Team sees this thread and they don't freak out that we're ignoring some insanely obvious attack, we're probably good to go to production. But I'll defer to you two.

I've been following along :) I personally feel there's a lot of good analysis going on here, covering many common and dangerous attack surfaces. I definitely don't feel strongly that further analysis would be required on the part of the Security-Team, but if you would like for us to audit RESTBagOStuff in more depth (at least within the context of what we typically analyze during application security reviews) we can of course accommodate such a request.

EvanProdromou closed this task as Resolved.Jul 2 2019, 2:07 PM

@sbassett I'm going to close this ticket. Is there a good way to bring Security into the loop as we roll this code out? As in, "We are rolling login-related code out, so keep alert."

@Eevans Sweet. I'm going to use some 10% time to get smarter about Cassandra. Seems fascinating!

sbassett added a subscriber: Reedy.Jul 2 2019, 3:11 PM

@EvanProdromou - sounds good. I would say if you can add @Reedy and myself to any relevant Gerrit patch sets, that'd be good. And giving us a heads up on the deployment date (Phab task?) for this would also be helpful. Thanks.

Anomie removed a subscriber: Anomie.Jul 9 2019, 6:03 PM