Page MenuHomePhabricator

MediaWiki's anonymous edit token leaves wiki installations (incl. Wikipedia) open to mass anonymous spam we can't block
Open, NormalPublic

Description

Problem statement

The edit token for logged-out users (anons) is currently a hardcoded string of +\. There is absolutely no CSRF protection for it.
That means any website on the internet that can trick a user into visiting their pages, can also (as long as the user isn't logged-in) make them edit Wikipedia on behalf of their IP and browser.

Aside from the user's privacy, this is also subject to potentially large-scale abuse with regards to spam. E.g. anyone visiting the affected website would inadvertently be making edits to Wikipedia (or another wiki being targeted)

Our abuse prevention tools may likely be unable to defend against such an attack. We couldn't block the spammer's IP given they'd be thousands of genuine and unpredictable user IPs. The only option we'd have is to disable disable anonymous editing entirely.

Demo

https://codepen.io/davidbarratt/pen/VwZPEqG?editors=0010

Requirements

Submitting an edit (via index.php or api.php) will only be possible from a web browser if the user is currently on the wiki's site.

Specifically, this means:

  • The token is not a predictable string.
  • The token cannot be seen or changed from page on another origin.
  • The token is not obtainable cross-origin in web browsers via an API endpoint.

Restrictions

Between 2012 and 2018, several restrictions have been uncovered whilst exploring different solutions.

  1. We cannot start a regular session upon viewing the edit page. While that would meet the requirements and solve the problem, it is unfeasible because the edit page is viewed over GET and receives significant traffic. Allocating session IDs and data for each of those views is infeasible with our current set up. Much of the GET traffic to the edit page is bots and other programmatic interaction that does never results in the form being submitted and thus, does not start and does not need, a session currently.

Context

Logged-in users

For registered users this problem does not exist because upon log-in, the software starts a session. The session is represented by an identifier and contains various key/value pairs saved in the session storage backend (e.g. Redis or MySQL). The user's browser is sent a cookie that identifies the user ID and session. And the session data contains information to verify the validity of the session. Upon viewing the edit page, a *edit token* is generated as an HMAC hash that combines a random cryptographic secret (stored in the session on the server), and a timestamp. Upon submitting the edit, the server decodes the packet, confirms the user's session validity, and validates the timestamp expiry.

Aside from registered/logged-in users, we also start this type of session for anonymous (logged-out) users after a a user submits their first edit. That means after they submit the edit page form, all regular session features work. Including talk page notifications, block status notices, and bypassing the static HTML cache for all page views in the session going forward.

HTTP cache

The viewing of the edit page currently requires freshness and is uncachable by Varnish. This suggests that despite the restriction on session storage, our web servers (and PHP) can handle the traffic without issue. Which means while the edit page cannot vary based on data in a session, it can vary on anything we can do in PHP based on current HTTP request headers.

Proposals

CSRF Cookie

Regular sessions work by generating edit tokens with an HMAC hash of a random token and some other data (salt, timestamp, ..). The random token is persisted in the session data on the backend. And the edit token is validated upon submission by confirming it matches what we'd re-compute at run-time from data we then see in session backend for the current requests's session-cookie ID.

The proposal is to, for users without a session, store the token directly in a cookie. This cookie will be marked httpOnly and have similar expiry as we use for session duration.

This CRSF cookie will not relate to session storage, and will not trigger bypassing of CDN cache. (It'd be the same to Varnish as random JS-based cookies.)

Checking requirements:

  1. Predictable: No.
  2. Accessible to other origins: No. Only JS code running same-origin can get or set cookies. As extra protection, the httpOnly mark makes it inaccessible to both cross-origin and same-origin JS code.
  3. Obtainable from API: No, but to ensure this we must update the API to extend its current protections in JSONP mode to cover this new use case.

About the API, specifically I propose to:

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode. This is similar to what JSONP mode already does, in that it does not initialise the same-origin session that co-exist at the same time.
  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Point 2 alone does not suffice, because the cookie can still exist due to genuine index.php activity. Point 1 alone would suffice, but it is confusing to users if the request only fails upon submission when the earlier action=tokens request works.

Details

Reference
bz38417
Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We can potentially avoid session inflation by creating the session separately from creating the edit html (which would indeed allow session inflation if an attacker requests edit urls repeatedly without cookies enabled). For example, we could start the session from JavaScript on the edit page in a background request (AJAX).

I don't understand how that would help the session inflation problem. The problem (if it still is a problem) is too many users have cookies on the client side, bypassing varnish. Using too much space in redis/whatever to store sessions on the server side, is not the problem (as far as I am aware). Creating sessions with javascript should not make a difference in that context

Catrope added a subscriber: daniel.May 4 2016, 8:41 PM
Bawolff changed the visibility from "Custom Policy" to "Custom Policy".May 4 2016, 8:51 PM
tstarling changed the visibility from "Custom Policy" to "Custom Policy".May 4 2016, 8:51 PM
tstarling changed the edit policy from "Custom Policy" to "Custom Policy".
tstarling changed the visibility from "Custom Policy" to "Custom Policy".
Bawolff changed the visibility from "Custom Policy" to "Custom Policy".May 4 2016, 8:56 PM
tstarling changed the edit policy from "Custom Policy" to "Custom Policy".May 4 2016, 9:05 PM
tstarling changed the visibility from "Custom Policy" to "Custom Policy".May 4 2016, 10:35 PM
tstarling changed the edit policy from "Custom Policy" to "Custom Policy".
DanielFriesen added a comment.EditedMay 4 2016, 10:43 PM

We can potentially avoid session inflation by creating the session separately from creating the edit html (which would indeed allow session inflation if an attacker requests edit urls repeatedly without cookies enabled). For example, we could start the session from JavaScript on the edit page in a background request (AJAX).

I don't understand how that would help the session inflation problem. The problem (if it still is a problem) is too many users have cookies on the client side, bypassing varnish. Using too much space in redis/whatever to store sessions on the server side, is not the problem (as far as I am aware). Creating sessions with javascript should not make a difference in that context

It works, because I assume he's not talking about an implicit AJAX call when visiting the edit page, but rather an onsubmit doing an AJAX call. The post handler for sessionless users would do an AJAX call initializing the session and then insert the edit token for that session into the form before continuing the form submission.

That way the session is only created via JS if the user is submitting the form (not just visiting the edit page) and would implicitly have gotten a session after the form submit anyways.

The idea is not to initialize the session for sessionless users using AJAX. It's that we'd have a post handler for sessionless users that would do an AJAX call initializing the session and then insert the edit token for that session into the form before submitting it.

Oh i understand now. That would address the performance concern (if it still exists)

We can potentially avoid session inflation by creating the session separately from creating the edit html (which would indeed allow session inflation if an attacker requests edit urls repeatedly without cookies enabled). For example, we could start the session from JavaScript on the edit page in a background request (AJAX).

My concern with this (and @brion's suggestion to just start a session on the edit page) is that the edit page isn't the only place where we want anonymous users using a real csrf token. The only one I can think of is MobileFrontend settings, but iirc there were other places too. I think Bawolff's current patch would handle that gracefully as well, which would be nice.

brion added a comment.May 6 2016, 7:50 AM

Wouldn't the settings cookies kill caching anyway? Or is that rigged up to cache-vary on the specific cookie values without forcing things through to the backend? (Eg, if I'm an anon user with images disabled, beta on, and font size bumped up, are my pages still cached?) Or are we thinking of optimizing the case where someone clicks on settings and then never does anything with it?

brion added a comment.May 6 2016, 7:56 AM

Reviewing older comments on this thread... Can anyone remember why we don't enforce a Referrer check for the anon case? If it's same-origin then JS can already fetch an edit token and submit a form via ajax.

brion added a comment.May 6 2016, 8:01 AM

(Well, referer and/or origin, on account of HTTPS)

I feel like if we don't need a session for other purposes, and we don't need it on GET submissions which are vulnerable to redirect attacks, maybe we don't actually need the token to be fancy here.

brion added a comment.EditedMay 6 2016, 8:14 AM

To answer myself... HTTPS means no Referer and inconsistent support for Origin means it's hard to distinguish old browsers from CSRF submissions.

But it should be safe enough to treat Origin match as a pass on POST submit if the IP hash fails due to a network change. In that case I would withdraw my objection to using the IP hashes, as that should give a good experience for anons on modern browsers even in the mobile/net switch case.

@brion Also some privacy software / virus suites / intermediate proxies supposedly strip out the Referer header as well.

I thought I might be making things more complex than needed with the noscript/signed-ip/etc idea and did a double check of origin then.

Sadly, inclusion of Origin: within form posts is not consistent. Chrome supposedly consistently does it. But Firefox does not, see 446344. After that I didn't even bother checking what IE/Edge and other browsers do.

Wouldn't the settings cookies kill caching anyway? Or is that rigged up to cache-vary on the specific cookie values without forcing things through to the backend? (Eg, if I'm an anon user with images disabled, beta on, and font size bumped up, are my pages still cached?) Or are we thinking of optimizing the case where someone clicks on settings and then never does anything with it?

I believe they figured out a way to vary the cache based on the options selected-- or cache the default and break the cache for anyone without the defaults. If it's the later, then we can make the same changes to that form as the edit form, to start a session via javascript or handle changes without javascript with a session+confirmation.

But it seems like Bawolff's patch would Just Work for any form using $user->getToken(), $user->matchEditToken(), without any changes to the form presentation. Which seems like an advantage?

Edit page views are quite common because people click red links.

Do we have recent data on how starting sessions on edit view would impact cache hit rates?

IIRC, my first version of the Squid caching integration started sessions on edit view, and back then it didn't seem to be a performance issue.

Edit page views are quite common because people click red links.

Do we have recent data on how starting sessions on edit view would impact cache hit rates?
IIRC, my first version of the Squid caching integration started sessions on edit view, and back then it didn't seem to be a performance issue.

My impression of that issue is from @tstarling's comment: https://phabricator.wikimedia.org/T40417#479003. I'm not sure if we have a way to count/measure how often that actually happens?

Here's the status on this:

  • @Bawolff has a patch that he posted in February (T40417#2034118) See “Use IP based token for sessionless, per-session secret if anon has session [v3]”
    • This maintains the status quo, only creating a server-side session on preview/save. If there is no cookie (with the session token), it verifies the edit by creating an HMAC combining the server-secret+client-IP+timestamp to use as token, otherwise uses the session-based token that the server verifies.
  • @GWicke asks that we investigate if *always* creating a server-side session the moment one selects “edit” is truly a performance problem T40417#2277811
  • @Krinkle suggests using Javascript to start the server-side session on “save” T40417#2263011

@Bawolff isn’t particularly wedded to any one choice, but is hoping that we can make a decision. He’s convinced me that his alternative is a good one.

This problem may or may not be a secret to anyone; we’ve probably discussed this publicly in the past (e.g. Brian thinks that Tyler may have posted a public patch for this years ago). Because the non-logged in CSRF protection token is hardcoded to “+\”, it’s currently trivial to either to draw traffic to a website with a “baked in” XSS spoofing edits to MediaWiki, or worse, to exploit an XSS on another popular website.

In chatting with @Bawolff, the question we have: should we make this discussion public?

Tgr added a comment.Sep 6 2016, 9:39 PM

In chatting with @Bawolff, the question we have: should we make this discussion public?

+1. It is a lot easier for a determined spammer to figure out this vulnerability by looking at the HTML source of the edit page than by finding this discussion.

IIRC, my first version of the Squid caching integration started sessions on edit view, and back then it didn't seem to be a performance issue.

It's still a performance issue on the client side: anyone who clicks on a red link would get slower pageviews for the next half hour. That's not a huge deal, but if we can avoid it without adding too much complexity, we should.

In chatting with @Bawolff, the question we have: should we make this discussion public?

To be more explicit, In my opinion, it is time to have a public discussion about this, pick some solution, and then do it.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".
Bawolff changed the edit policy from "Custom Policy" to "All Users".
Bawolff changed Security from Software security bug to None.

In the interest of making this an RFC and actually getting it done, I've made this bug public(!)

RobLa-WMF moved this task from Backlog to RobLa-WMF on the TechCom-Has-shepherd board.

My relatively uninformed opinion: we should go with @Bawolff's original patch, then modify it

  1. @Bawolff should take his patch at T40417#2034118 and get it into Gerrit (with modifications completely at his discretion based on what he's learned)
  2. If someone is ready to +2 it, yay \o/ (skip to step 5)
  3. If no one is willing to +2 before ArchCom discusses it (first ArchCom Planning at least 2-3 days after @Bawolff completes step 1), ArchCom figures out who in ArchCom will review and +2 it
  4. Someone +2s it
  5. We allow for the possibility that someone will implement one of the competing proposals (which @Bawolff should review in Gerrit if it happens, and feel free to +2/+1/-1/-2 based on what he sees)
  6. We ship the next version of MediaWiki with whatever was last +2'd

Does that work for you @Bawolff? Everyone else good with that?

Ltrlg added a subscriber: Ltrlg.Oct 2 2016, 8:58 AM

What's the status of this task? Is Bawolff going to submit a new Gerrit changeset?

What's the status of this task? Is Bawolff going to submit a new Gerrit changeset?

Yes I am. I was away for a little bit, and then somewhat backlogged with other tasks, but I do plan on submitting said patch very soon.

Change 318917 had a related patch set uploaded (by Brian Wolff):
Prevent CSRF for anons using tokens based on IP.

https://gerrit.wikimedia.org/r/318917

Change 318917 had a related patch set uploaded (by Brian Wolff):
Prevent CSRF for anons using tokens based on IP.
https://gerrit.wikimedia.org/r/318917

This is per Robla's plan, with the difference that this is a testing mode patch where we just log failures instead of hard failing.

My proposal is when we first deploy this, we run it in testing mode for say 2 weeks so we have some idea of any unintended consequences it might have. Also this allows us to quickly switch back to testing mode if it causes problems during the final deploy.

daniel added a project: TechCom-RFC.
daniel moved this task from Under discussion to (unused) on the TechCom-RFC board.Nov 16 2016, 6:43 PM
jeremyb added a subscriber: jeremyb.Jan 2 2017, 5:56 AM
SamanthaNguyen lowered the priority of this task from Normal to Lowest.Jan 8 2017, 12:01 AM
SamanthaNguyen moved this task from Backlog / Other to Patch pending review on the Security board.
brion added a comment.Jan 9 2018, 8:28 PM

This seems to have fallen by the side, but there's a patch which can be brought up to date, and it would be nice to close this glaring hole. Is there interest in reviving this? Any forseen problems?

From a quick glance, either my own recommendation (to start session with javascript) doesn't make sense, or I've forgotten part of the problem. I'm gonna assume the former.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users. Also, said edit pages don't allow iframing (prevented via FrameOptions, which we disallow by default and only plain page views opt-in to allowing framing).

So from that quick glance, I would assume it should be very trivial to start sending and requiring normal wpEditToken values in the edit page.

The only problem that leaves is the API. Regardless of disallowing CORS for third-party domains and not enabling authenticated sessions over JSON-P or CORS, people can still make opaque requests cross-domain, which is simple given the token is constant for anons.

I as concerned about how to address, when I noticed we already don't support this use case. For example, API:Edit documents the following query as a way to get page information and an edit token:

https://en.wikipedia.org/w/api.php?action=query&prop=info%7Crevisions&intoken=edit&rvprop=timestamp&titles=Wikipedia:Sandbox

{ ..
    "query": {
        "pages": {
            "16283969": { ..
                "starttimestamp": "2018-01-11T21:07:18Z",
                "edittoken": "46adad863b6a4c81d12fd7ac040a5fd55a57d206+\\",
                "revisions": [ .. ]
} } } }

However, when setting format=json&callback=test to try and retrieve this cross-domain, no token is sent:

https://en.wikipedia.org/w/api.php?action=query&prop=info%7Crevisions&intoken=edit&rvprop=timestamp&titles=Wikipedia:Sandbox&format=json&callback=test

/**/test({ ..
    "warnings": { ..
        "info": {
            "*": "Unrecognized value for parameter \"intoken\": edit.\n .."
        }
    },
    "query": {
        "pages": {
            "16283969": { ..
                "starttimestamp": "2018-01-11T21:11:11Z",
                "revisions": [..  ]
} } } })

So it seems like we will still support anon editing in the API, and aside from opaque cross-domain requests within web browsers intentionally abusing this problem with a hardcoded edit token, doesn't seem like this would break anything.

Again, maybe I'm overlooking something, but I hope this helps. I'd expect a patch for this to mostly consist of ensuring sessions are indeed started on the pages where I think they are (sign-up page, log-in page, edit page, etc.), and removing special casing for the constant edit token. Much of everything else seems to be in place already.

brion added a comment.Jan 11 2018, 9:26 PM

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users.

I believe it was envisioned that many hits to edit pages are stray clicks without actual edits following them: that is, that starting sessions on 'edit' link click would inflate the total number of sessions more than we could handle. I'm not sure whether we had stats to back that up or not though.

Tgr added a comment.Jan 11 2018, 10:08 PM

Also IIRC there was concern about blocking edits from users who have disabled cookies. The patch already contains a fallback mechanism for non-session-based tokens though so I'm not sure there is a problem to solve here.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts.

The session is not started there, as Brion pointed out. The session gets started in SubmitAction, i.e. after the user clicks "save" "publish" (or "preview" or "show changes").

For example, API:Edit documents the following query as a way to get page information and an edit token:

That needs updating. prop=info&intoken=edit has been officially deprecated since 2014 (fdddf9457). The modern method is to use meta=tokens. But that returns a similar error

https://en.wikipedia.org/w/api.php?action=query&meta=tokens&type=csrf&callback=test

/**/test({
    "batchcomplete": "",
    "warnings": {
        "tokens": {
            "*": "Tokens may not be obtained when the same-origin policy is not applied."
        }
    }
})

and aside from opaque cross-domain requests within web browsers intentionally abusing this problem with a hardcoded edit token, doesn't seem like this would break anything.

And clients that just hard-code the anonymous token because it has been constant so they can write simpler (but more fragile) code. If there's a stupid way to do something, some unmaintained API client somewhere is probably doing it.

I think that's an acceptable breakage as long as we announce it to mediawiki-api-announce when the time comes.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts.

The session is not started there, as Brion pointed out. The session gets started in SubmitAction, i.e. after the user clicks "save" "publish" (or "preview" or "show changes").

Ah, okay. So those pages are currently cache-bypassing without being session-starting. I assume the patch would need to change this to do both. Or is there another way? I haven't checked the numbers, but I imagine we should be able to handle the increased load from that (basically, anon bounce rate of edit page; possibly inflated by crawlers).

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users.

I believe it was envisioned that many hits to edit pages are stray clicks without actual edits following them: that is, that starting sessions on 'edit' link click would inflate the total number of sessions more than we could handle. I'm not sure whether we had stats to back that up or not though.

This was and remains true.

Just crunched some data on this based on the current editing metrics. Of editor loads that are initiated over the past month, only just over 5% involve someone trying to submit (pressing publish, diff, or preview), and all but another 4% of editor loads are aborted without the user even entering anything into the edit window.

Ah, okay. So those pages are currently cache-bypassing without being session-starting. I assume the patch would need to change this to do both. Or is there another way? I haven't checked the numbers, but I imagine we should be able to handle the increased load from that (basically, anon bounce rate of edit page; possibly inflated by crawlers).

Use a token that can be statically generated, such as hmac(<ip>, <user-agent>, $timestamp, install-secret) . "|$timestamp+\\"

If they take more time in submitting than whatever margin we decide to allow for $timestamp, or they have switched to a new IP address in the meantime, they will get a session loss error (and as they sent it, a proper session will now be created).

Expecting that users didn't change their IP while they perform their first edit seems reasonable. We had reports in the past about ISPs that had users get connect from a different IP each time, although I don't know if that's still the case.
(if we dropped the ip from the above list, the evil webpage could just grab a valid one for the user)

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Use a token that can be statically generated, such as hmac(<ip>, <user-agent>, $timestamp, install-secret) . "|$timestamp+\\"

More usefully, pass whatever static data (IP, agent) to MediaWiki\Session\Token::__construct() via LoggedOutEditToken::__construct() instead of trying to reinvent token handling.

You could have LoggedOutEditToken::match() enforce a maximum value for $maxAge if you want these tokens to expire after a certain amount of time.

Expecting that users didn't change their IP while they perform their first edit seems reasonable. We had reports in the past about ISPs that had users get connect from a different IP each time, although I don't know if that's still the case.

It was the case as recently as February 2016. See T40417#2024227. It would be worthwhile to check again rather than just assuming every ISP suddenly changed behavior in only 2 years.

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Note we'd then have to vary on that cookie, at least for views served via EditPage.

Tgr added a comment.Jan 12 2018, 8:56 PM

If a user has a constantly changing IP address, their edit will fail with a CSRF error, a session will be started due to the edit form submission attempt, and on their next submit the token will be session-based and will succeed. That seems acceptable.

In T40417#3897811, @Tgr wrote:

and on their next submit the token will be session-based

Assuming we code it to only use the non-session-based token if the anon doesn't have a session, as is the case in https://gerrit.wikimedia.org/r/#/c/318917/.

Per T172477, constantly changing IP addresses may be more common than previously thought. I'd expect fair increase in CSRF token failures. Aside from keeping a very close eye on that metric (which we have, in Graphite), we should also ensure we have a metric for abandonment as result of CRSF token failure. But assuming the worst, do we have alternatives worth considering?

I'm thinking of two separate scenarios:

  1. Secure token for anon edit with cookie support, without requiring a persistent backend session.
  2. Secure token for anon edit without cookies (e.g. cookies disabled in browser, or API client without cookiejar).

For case 1, I think we can do it without keying on (ip, ua). Perhaps, instead using (ip, ua) as the constant between edit-page view and edit-page submission, it seems like we could solve it by giving the user their own constant in the form of a cookie.

The cookie would contain a token generated only based on a private salt and rand, not tied to their connection by IP or UA. Given an attacker could presumably not force the same-origin cookie to exist, this should be secure. On submission, instead of validating the token based on age + ip/ua, we'd validate it based on the age + cookie, and the cookie in turn can be validated as well. We could even keep UA in the mix as extra entropy.

This would leave case 2 (no cookies). The secondary fallback @Tgr mentioned (re-try with a session) won't help no-cookie users. Whether we use (ip, ua) as primary or as fallback, both will fail for IP-changing no-cookie users. This would apply to API-users as well, for which not supporting cookies is even more common. (But security is less of a concern there, so presumably IP-changing API users will have to enable cookies)

It seems the only way to make IP-changing edits work securely is by using a generic token stored and sent back in a way the attacker can't do themselves. It seems cookies are the only way for that, which is also the one thing no-cookie users don't have. Even using JavaScript won't help because anything JS could fetch and insert in the form, the attacker could as well.

So overall, I think using a generic token as constant, instead of IP/UA, would reduce CSRF error rates a lot. But regardless, this change would mean no more edits from users that have variable IPs and cookies disabled. I suppose we could adapt the CSRF error message (if not already) to include that users need to enable cookies if the token error is persistent for them.

Krinkle raised the priority of this task from Lowest to Normal.Jan 13 2018, 12:00 AM

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Note we'd then have to vary on that cookie, at least for views served via EditPage.

I don't think it would be needed, since it would only be used on paths that are currently uncached (as soon as they press submit, a "real session" would be provided).

Tgr added a comment.Jan 13 2018, 1:37 AM

It seems the only way to make IP-changing edits work securely is by using a generic token stored and sent back in a way the attacker can't do themselves. It seems cookies are the only way for that, which is also the one thing no-cookie users don't have. Even using JavaScript won't help because anything JS could fetch and insert in the form, the attacker could as well.

localStorage, but it's very unlikely to be available when cookies aren't. Or one of the evercookie methods, but those are creepy.

...actually ETags don't seem that bad: put the same random token in the HTTP form and the ETag and hope that the browser sends back the ETag in the If-None-Match field (not writable by the attacker). But it's murky when exactly the browser is required to do that, especially if the edit form is generated via POST.

Krinkle updated the task description. (Show Details)Apr 7 2018, 12:30 AM

TechCom has scheduled an IRC Meeting for this RFC on 2018-04-11 in the #wikimedia-office channel at 2pm PST(22:00 UTC, 23:00 CET)

Krinkle updated the task description. (Show Details)Apr 18 2018, 10:16 PM
  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Rather than "JSONP mode", it might be better to refer to ApiBase::lacksSameOriginSecurity() which checks for that, CORS with origin=*, and a few other things.

action=tokens has been deprecated since MediaWiki 1.24, although numerous clients still use it (sigh). The currently recommended method for fetching tokens from the API is via action=query&meta=tokens (ApiQueryTokens).

Currently all the token-fetching methods in the core API return no tokens when lacksSameOriginSecurity() is true, so in that sense it's already "disabled". None currently raise an error in this situation though, although some (including ApiQueryTokens) do report a warning.

I don't think that having either ApiTokens or ApiQueryTokens raise an error would be a major problem, although it's possible it might break some clients that always try to fetch a token and use the absence of a token in the response as a sort of permission check. The even-older token-fetching methods like action=query&prop=info&intoken=edit are more likely to run into the "permission check" problem, since those methods include actual permissions checks in addition to "JSONP mode" checks.

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode.

The API's "token handler" wouldn't know anything about the cookie in the first place. To actually fetch Token objects the API (mostly[1]) goes through $this->getUser()->getEditTokenObject().

Rather than trying to ignore the new cookie, it might make more sense to have ApiBase::validateToken() return false or even throw an error when lacksSameOriginSecurity() is true so it never gets to the point of attempting to validate a token against it. Failing that, you'd need to have ApiMain somehow turn off $this->getUser()->getEditTokenObject() using the cookie (or duplicate the logic from that method).

[1]: 'login' and 'createaccount' tokens persist a session and go directly to $this->getRequest()->getSession()->getToken(), specifically to bypass the anon non-session-based "+\" token. Extensions might do something similar, particularly old ones.

  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Rather than "JSONP mode", it might be better to refer to ApiBase::lacksSameOriginSecurity() which checks for that, CORS with origin=*, and a few other things.

Agreed. @Krinkle partly got that language from me, and when I was more closely involved with API development we didn't have those things yet. But as time passes my API knowledge gets increasingly out of date :)

Currently all the token-fetching methods in the core API return no tokens when lacksSameOriginSecurity() is true, so in that sense it's already "disabled". None currently raise an error in this situation though, although some (including ApiQueryTokens) do report a warning.

This is good, and is the first line of defense (see also below).

I don't think that having either ApiTokens or ApiQueryTokens raise an error would be a major problem, although it's possible it might break some clients that always try to fetch a token and use the absence of a token in the response as a
sort of permission check. The even-older token-fetching methods like action=query&prop=info&intoken=edit are more likely to run into the "permission check" problem, since those methods include actual permissions checks in addition > to "JSONP mode" checks.

I agree that it's generally better to return an empty token or drop token fields from the output (in addition to a warning that explains what's going on), than to throw an error. Especially since the former is what we've been doing already.

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode.

The API's "token handler" wouldn't know anything about the cookie in the first place. To actually fetch Token objects the API (mostly[1]) goes through $this->getUser()->getEditTokenObject().
Rather than trying to ignore the new cookie, it might make more sense to have ApiBase::validateToken() return false or even throw an error when lacksSameOriginSecurity() is true so it never gets to the point of attempting to validate a token against it. Failing that, you'd need to have ApiMain somehow turn off $this->getUser()->getEditTokenObject() using the cookie (or duplicate the logic from that method).

Right now we already have code in ApiMain.php that sets $wgUser to an anonymous user if we are in lacksSameOriginSecurity mode. This is a backstop for if the first line of defense mentioned above fails: even if we don't notice that we're supposed to refuse to perform privileged actions and try to perform one anyway, it'll fail because we sabotaged ourselves at the start. We could do something analogous for anon tokens, like deleting the anon token cookie from $_COOKIES in this code block, or setting some variable or something that sabotages any future anon cookie generation.

dbarratt updated the task description. (Show Details)Aug 23 2019, 5:13 PM
dbarratt updated the task description. (Show Details)
dbarratt added a subscriber: dbarratt.EditedAug 23 2019, 5:59 PM

The reason this works, is because the request is a simple request which does not invoke a preflight request.

Since the user does not have a session, then this is by definition not a CSRF attack. I think it would therefore be inappropriate to abuse the CSRF tokens for this purpose.

I see three ways to solve this problem:

  1. Change the API to make edit (or any other requests with side effects) requests a non-simple request. Possible Changes:
    1. Use a method other than GET or POST
    2. Require Custom Header for non-authenticated requests.
    3. Require JSON for non-authenticated requests (and the Content-Type header set to application/json)
  2. Force the client to read a "token" that is randomly generated (in as cryptographically secure as possible) and not tied to a session, and send it back to the server. This is different from a CSRF token as it does not require a session. These tokens would be stored on the server and allowed a single(?) use before being destroyed or destroyed automatically after a small time period (5 minutes?).
  3. Require authentication (in the broadest sense possible) in order to edit (at least via the API). Effectively, starting a new session (even if that requires zero actions on the user's part).
  1. Change the API to make edit (or any other requests with side effects) requests a non-simple request. Possible Changes:
    1. Use a method other than GET or POST
    2. Require Custom Header for non-authenticated requests.
    3. Require JSON for non-authenticated requests (and the Content-Type header set to application/json)

Note this task isn't just about the API, it's also relevant to the Web UI. But, from an Action API perspective, any of these would be an extremely significant breaking change. Since less-breaking options are available, I'd oppose this idea.

  1. Force the client to read a "token" that is randomly generated (in as cryptographically secure as possible) and not tied to a session, and send it back to the server. This is different from a CSRF token as it does not require a session. These tokens would be stored on the server and allowed a single use before being destroyed or destroyed automatically after a small time period (5 minutes?).

That's the main option that has been discussed above, with the refinement that the "not-really-CSRF token" is being returned and accepted in exactly the same ways as the CSRF token is returned and accepted so from the client perspective there's no difference between the two.

In other words, while you're technically correct that it's not a CSRF token, very little of the codebase actually needs to care about the difference. Which is a major win from a back-compat perspective.

  1. Require authentication (in the broadest sense possible) in order to edit (at least via the API). Effectively, starting a new session (even if that requires zero actions on the user's part).

This is a bit of a non sequitur. There's no need for any sort of authentication in order to create a session, all that's needed to create a session is to set a session cookie.

This is also already discussed above. It would work well enough for the Action API to create the session when action=query&meta=tokens is hit, but it would have significant negative effects on caching if the Web UI created a session when the edit page is loaded.