Page MenuHomePhabricator

Proposal: fail explicitly and revoke relevant API keys over plain-text HTTP connection for all Wikimedia APIs
Open, Needs TriagePublic

Description

Rationale: Your API Shouldn't Redirect HTTP to HTTPS (HN thread)

Instead of redirecting API calls from HTTP to HTTPS, make the failure visible. Either disable the HTTP interface altogether, or return a clear HTTP error response and revoke API keys sent over the unencrypted connection.

Doing these has several benefits, like allowing API developers to identify misuse of HTTP connections earlier. It is also mentioned that Stack Exchange API used to revoke API keys sent over HTTP, and return an error message.

In my humble opinion, this article's point is a valid concern. Although this is not a big security hole, it can be implemented as a defense in depth. I posted on wikitech-l some times ago but it didn't receive much attention.

From my observation, all of our current API end points (Action API, REST API, RESTBase, Wikimedia Enterprise) automatically redirect HTTP traffic to HTTPS.

Event Timeline

I think this wouldn't be very useful as a security measure:

  • When authenticating with cookies, all cookies use the Secure flag already, so a reasonable client would not send them over HTTP.
  • OAuth 1 is designed to be secure over HTTP.
  • The OAuth 2 spec forbids serving HTTP requests, so if we allow them currently, we should stop doing that. There is some value in key revocation but I'm not sure if it's worth the effort - you'd need to avoid write on GET, notify the user somehow, it's easy to cause mass breakage by revoking keys which don't need to be revoked (SSL is terminated way before the request reaches the appserver so MediaWiki can't directly check what protocol is being used), hard to do consistently (how would non-MediaWiki APIs revoke?).

Request for someone to dig into this specifically, before we make a decision on how to proceed:

The OAuth 2 spec forbids serving HTTP requests, so if we allow them currently, we should stop doing that.

Is that something you could confirm, @Tgr ?

As an additional question -- are there any concerns about redirecting endpoints that don't require auth? Like GETs, where we generally don't require any token (but one may be provided)? I assume not, but want to make sure I understand that the only concern here is leaking credentials through HTTP.

Per RFC 6749:

Access token credentials (as well as any confidential access token attributes) MUST be kept confidential in transit and storage, and only shared among the authorization server, the resource servers the access token is valid for, and the client to whom the access token is issued. Access token credentials MUST only be transmitted using TLS as described in Section 1.6 with server authentication as defined by [RFC2818].

This is more a requirement for clients than for servers, so I guess it is somewhat open to interpretation how the server should behave if a request is made via plain HTTP anyway. But allowing clients to violate the spec without even warning them seems like poor practice. And access tokens sent via HTTP are de facto compromised, so accepting them (even if only via a HTTP -> HTTPS redirect) is counterproductive to site security.

Question about:

When authenticating with cookies, all cookies use the Secure flag already, so a reasonable client would not send them over HTTP.

Does that suggest that an unreasonable client might send them over HTTP? Is that happening? What happens if it does?

Even though there are no requirements for the server in the RFC, it seems like something we should still protect against at the server level. With that assumption, it seems like it should be implemented as part of the OAuth extension and MW Core for sessions. However, it's unclear where/when the redirect is actually happening.

Suggested next steps:

  • Determine where the redirect is happening. Is this a config setting? Be mindful that local dev environments are set up to run on HTTP.
  • If the redirect is happening as part of the API workflow explicitly, MWI can look into removing the redirect, so that requests go through as originally submitted.
  • If we remove the redirect, then we would assume that OAuth/MW Core handle appropriate HTTPS handling and rejects requests as necessary.

MWI will take care of initial investigation to figure out where the redirect is actually happening, then we can go from there :) We should also get a sense of how frequently this type of redirect is happening to determine the urgency and impact of this change.

Moving to Bugs & Chores for initial investigation.

Question about:

When authenticating with cookies, all cookies use the Secure flag already, so a reasonable client would not send them over HTTP.

. However, it's unclear where/when the redirect is actually happening.

For the public-facing production case where security matters, the redirect is currently happening way at the outer edge of traffic processing, before MW ever sees it (and before even our actual caching layers see it). If the request method is GET or HEAD, we issue a 301 (Permanent redirect) to HTTPS, otherwise for other methods (e.g. POST, PUT, etc) we issue a 403 rejection with an error page telling them to use HTTPS.

In the lowest-level technical sense, it is definitely possible for a client to send the cookie over insecure transport, especially if they're implementing their own client code (including HTTP implementation, perhaps, because a lot of this should be baked into some standard language HTTP library for most cases...). If they do so, they're basically:

  • Ignoring the well-defined Secure attribute of our auth token cookies
  • Ignoring that our canonical URIs are all HTTPS
  • Ignoring that any link they receive from us is already HTTPS (but: [1])
  • Ignoring the permanence of the 301->HTTPS they would've gotten from any previous plain-HTTP request
  • Ignoring the HSTS header intended to lock clients onto HTTPS with us permanently
  • Ignoring the public HSTS preload list that at least the major browsers get our domains from
  • Ignoring all general best practice for the past several years (HTTPS only)

If they manage to ignore all of those things that are trying to railroad them towards the normal secure way of doing things, then yes, their authentication token has then been exposed to anyone who can sniff traffic on the Internet between us and them. This set of entities is mostly going to be actual traffic-carrying ISPs at various tiers and/or governments, and anyone embedded in those places (by employment, or by compromising those entities) that wants to do something nefarious. This set of entities is not generally going to include legitimate commercial businesses just trying to scrape our content and such (they're not sniffing others' traffic on-path).

If they're willing to make egregious security mistakes on the client side, they could just as easily be exposing their stored auth token to compromise in numerous other ways (e.g. storing it in a publicly-accessible place, exposing it through their own APIs, sharing it randomly on the Internet, sending it insecurely to other places, saving it to public github gists in debugging pastes, etc).

Revocation in these cases sounds nice in theory, but there are a couple major caveats to that approach:

  • If we revoke the exposed token, won't they probably just create another one and do the same thing again? We may as well just disable the whole account the token was for, and ask them to talk to us about security practices before we manually re-enable them again?
  • Other parties could spam plain HTTP requests with randomly-generated auth tokens in order to cause havoc or DoS. These requests would have to all cut through to MediaWiki uncached to be effective, which is a DoS vector of its own in terms of internal request load (even if the auth tokens weren't random). We could heavily globally ratelimit it to defend against that basic req-load DoS, but then we'll be failing to revoke some of the "compromised" cases we actually care about because the limiter tossed them out.
  • We currently operate on the assumption, at many layers (cache business logic, applications, etc) that all insecure traffic is rejected at the outermost edge (as 301 or 403), and that any traffic that makes it further inside was transported by encrypted HTTPS. If we have to punch a hole through for plain HTTP all the way down to the MediaWiki application layer just so it can revoke a token, we'd have to update logic and assumptions all along that path to special case our traffic security assumptions. I don't even know what the scope of this work would be, as we've been operating under these assumptions for about a decade now.

Personally, I'd rather we put effort into eliminating port 80. It's already very minority-case in traffic stats. We could start auditing the traffic to identify any important bot cases that are relying on redirects, where we need to reach out to community members and help them upgrade, and then close off the port entirely.

[1] - T331356 - Wikidata still emits plain-HTTP URIs in outputs. We have an unresolved disagreement in this ticket about whether that's ok or not that's been outstanding for years now...

Other parties could spam plain HTTP requests with randomly-generated auth tokens in order to cause havoc or DoS. These requests would have to all cut through to MediaWiki uncached to be effective

Although requests with randomly generated auth tokens will do that anyway.

In any case, revocation would be a lot of effort for a very hypothetical minor threat. Not redirecting HTTP requests with an Authorization header seems like an easy change, though. It's pointless if we'll disallow HTTP traffic entirely, but I'd have thought we want to support HTTP GET requests forever because old web pages have such links? Modern browsers will automatically upgrade links to HTTPS because of HSTS preload, but plenty of other clients that might follow links won't.

Could also implement a ban on HTTP requests in the OAuth extension, if we want third parties to benefit (for Wikimedia it wouldn't make a difference since HTTP requests won't reach MediaWiki anyway). Not sure how reliably MediaWiki can determine the protocol though.

I can see that potential DoS factor is a valid concern.

So should we consider implementing it in a less aggressive way: instead of revoking API keys directly, indicate the client about the credential's de-facto compromised state in the explicit error message?

So should we consider implementing it in a less aggressive way: instead of revoking API keys directly, indicate the client about the credential's de-facto compromised state in the explicit error message?

Yeah, that seems like a reasonable compromise.

Similar older task: T247490: HTTP MediaWiki API GET requests to Wikimedia wikis should not be redirected to HTTPS when they have a session cookie or Authorization header

Personally, I'd rather we put effort into eliminating port 80.

Actually, this is my strong instinct as well - this would actually protect keys from being sent, and be the strongest possible signal you could send clients to fix themselves. And for people visiting web pages over port 80 - if their client is ignoring all of the protections listed above, because it is so old or misconfigured, then that means those users are exposing on the wire which page they're visiting, and their user agent (which is probably pretty identifying, under the circumstances), before we have the chance to do anything about it. I don't want to downplay the access impact necessarily, but 1) it's not strictly entirely in the user's interest to let them go through with that visit, and 2) I would like to see what the actual numbers are today (with at least a rough filter to likely human browser visitors).

Personally, i don't think this mitigation is worth it given the low risk. After all, most API requests aren't even authenticated.

If we want to reduce the risk of people accidentally misconfiguring their client which automatically follows redirects, I think a middle ground approach of just returning 403 (instead of a redirect) if you make an http API request would solve 90% of the problem without all the complication, and that can easily be done in varnish.