Page MenuHomePhabricator

MediaWiki's anonymous edit token leaves wiki installations (incl. Wikipedia) open to mass anonymous spam we can't block
Open, MediumPublic

Description

Problem statement

The edit token for logged-out users (anons) is currently a hardcoded string of +\. There is absolutely no CSRF protection for it.
That means any website on the internet that can trick a user into visiting their pages, can also (as long as the user isn't logged-in) make them edit Wikipedia on behalf of their IP and browser.

Aside from the user's privacy, this is also subject to potentially large-scale abuse with regards to spam. E.g. anyone visiting the affected website would inadvertently be making edits to Wikipedia (or another wiki being targeted)

Our abuse prevention tools may likely be unable to defend against such an attack. We couldn't block the spammer's IP given they'd be thousands of genuine and unpredictable user IPs. The only option we'd have is to disable disable anonymous editing entirely.

Demo

WARNING: Going to the demo below will reveal your IP address publicly at https://test.wikipedia.org/w/index.php?title=Test&action=history

https://codepen.io/davidbarratt/pen/VwZPEqG?editors=0010

Requirements

Submitting an edit (via index.php or api.php) will only be possible from a web browser if the user is currently on the wiki's site.

Specifically, this means:

  • The token is not a predictable string.
  • The token cannot be seen or changed from page on another origin.
  • The token is not obtainable cross-origin in web browsers via an API endpoint.

Restrictions

Between 2012 and 2018, several restrictions have been uncovered whilst exploring different solutions.

  1. We cannot start a regular session upon viewing the edit page. While that would meet the requirements and solve the problem, it is unfeasible because the edit page is viewed over GET and receives significant traffic. Allocating session IDs and data for each of those views is infeasible with our current set up. Much of the GET traffic to the edit page is bots and other programmatic interaction that does never results in the form being submitted and thus, does not start and does not need, a session currently.

Context

Logged-in users

For registered users this problem does not exist because upon log-in, the software starts a session. The session is represented by an identifier and contains various key/value pairs saved in the session storage backend (e.g. Redis or MySQL). The user's browser is sent a cookie that identifies the user ID and session. And the session data contains information to verify the validity of the session. Upon viewing the edit page, a *edit token* is generated as an HMAC hash that combines a random cryptographic secret (stored in the session on the server), and a timestamp. Upon submitting the edit, the server decodes the packet, confirms the user's session validity, and validates the timestamp expiry.

Aside from registered/logged-in users, we also start this type of session for anonymous (logged-out) users after a a user submits their first edit. That means after they submit the edit page form, all regular session features work. Including talk page notifications, block status notices, and bypassing the static HTML cache for all page views in the session going forward.

HTTP cache

The viewing of the edit page currently requires freshness and is uncachable by Varnish. This suggests that despite the restriction on session storage, our web servers (and PHP) can handle the traffic without issue. Which means while the edit page cannot vary based on data in a session, it can vary on anything we can do in PHP based on current HTTP request headers.

Proposals

CSRF Cookie

Regular sessions work by generating edit tokens with an HMAC hash of a random token and some other data (salt, timestamp, ..). The random token is persisted in the session data on the backend. And the edit token is validated upon submission by confirming it matches what we'd re-compute at run-time from data we then see in session backend for the current requests's session-cookie ID.

The proposal is to, for users without a session, store the token directly in a cookie. This cookie will be marked httpOnly and have similar expiry as we use for session duration.

This CRSF cookie will not relate to session storage, and will not trigger bypassing of CDN cache. (It'd be the same to Varnish as random JS-based cookies.)

Checking requirements:

  1. Predictable: No.
  2. Accessible to other origins: No. Only JS code running same-origin can get or set cookies. As extra protection, the httpOnly mark makes it inaccessible to both cross-origin and same-origin JS code.
  3. Obtainable from API: No, but to ensure this we must update the API to extend its current protections in JSONP mode to cover this new use case.

About the API, specifically I propose to:

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode. This is similar to what JSONP mode already does, in that it does not initialise the same-origin session that co-exist at the same time.
  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Point 2 alone does not suffice, because the cookie can still exist due to genuine index.php activity. Point 1 alone would suffice, but it is confusing to users if the request only fails upon submission when the earlier action=tokens request works.

Origin header

Reject the edit if the Origin header is set and not on the list of trusted origins.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In chatting with @Bawolff, the question we have: should we make this discussion public?

+1. It is a lot easier for a determined spammer to figure out this vulnerability by looking at the HTML source of the edit page than by finding this discussion.

IIRC, my first version of the Squid caching integration started sessions on edit view, and back then it didn't seem to be a performance issue.

It's still a performance issue on the client side: anyone who clicks on a red link would get slower pageviews for the next half hour. That's not a huge deal, but if we can avoid it without adding too much complexity, we should.

In chatting with @Bawolff, the question we have: should we make this discussion public?

To be more explicit, In my opinion, it is time to have a public discussion about this, pick some solution, and then do it.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".
Bawolff changed the edit policy from "Custom Policy" to "All Users".
Bawolff changed Security from Software security bug to None.

In the interest of making this an RFC and actually getting it done, I've made this bug public(!)

My relatively uninformed opinion: we should go with @Bawolff's original patch, then modify it

  1. @Bawolff should take his patch at T40417#2034118 and get it into Gerrit (with modifications completely at his discretion based on what he's learned)
  2. If someone is ready to +2 it, yay \o/ (skip to step 5)
  3. If no one is willing to +2 before ArchCom discusses it (first ArchCom Planning at least 2-3 days after @Bawolff completes step 1), ArchCom figures out who in ArchCom will review and +2 it
  4. Someone +2s it
  5. We allow for the possibility that someone will implement one of the competing proposals (which @Bawolff should review in Gerrit if it happens, and feel free to +2/+1/-1/-2 based on what he sees)
  6. We ship the next version of MediaWiki with whatever was last +2'd

Does that work for you @Bawolff? Everyone else good with that?

What's the status of this task? Is Bawolff going to submit a new Gerrit changeset?

What's the status of this task? Is Bawolff going to submit a new Gerrit changeset?

Yes I am. I was away for a little bit, and then somewhat backlogged with other tasks, but I do plan on submitting said patch very soon.

Change 318917 had a related patch set uploaded (by Brian Wolff):
Prevent CSRF for anons using tokens based on IP.

https://gerrit.wikimedia.org/r/318917

Change 318917 had a related patch set uploaded (by Brian Wolff):
Prevent CSRF for anons using tokens based on IP.

https://gerrit.wikimedia.org/r/318917

This is per Robla's plan, with the difference that this is a testing mode patch where we just log failures instead of hard failing.

My proposal is when we first deploy this, we run it in testing mode for say 2 weeks so we have some idea of any unintended consequences it might have. Also this allows us to quickly switch back to testing mode if it causes problems during the final deploy.

SamanthaNguyen lowered the priority of this task from Medium to Lowest.Jan 8 2017, 12:01 AM
SamanthaNguyen moved this task from Backlog / Other to Patch pending review on the acl*security board.

This seems to have fallen by the side, but there's a patch which can be brought up to date, and it would be nice to close this glaring hole. Is there interest in reviving this? Any forseen problems?

From a quick glance, either my own recommendation (to start session with javascript) doesn't make sense, or I've forgotten part of the problem. I'm gonna assume the former.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users. Also, said edit pages don't allow iframing (prevented via FrameOptions, which we disallow by default and only plain page views opt-in to allowing framing).

So from that quick glance, I would assume it should be very trivial to start sending and requiring normal wpEditToken values in the edit page.

The only problem that leaves is the API. Regardless of disallowing CORS for third-party domains and not enabling authenticated sessions over JSON-P or CORS, people can still make opaque requests cross-domain, which is simple given the token is constant for anons.

I as concerned about how to address, when I noticed we already don't support this use case. For example, API:Edit documents the following query as a way to get page information and an edit token:

https://en.wikipedia.org/w/api.php?action=query&prop=info%7Crevisions&intoken=edit&rvprop=timestamp&titles=Wikipedia:Sandbox

{ ..
    "query": {
        "pages": {
            "16283969": { ..
                "starttimestamp": "2018-01-11T21:07:18Z",
                "edittoken": "46adad863b6a4c81d12fd7ac040a5fd55a57d206+\\",
                "revisions": [ .. ]
} } } }

However, when setting format=json&callback=test to try and retrieve this cross-domain, no token is sent:

https://en.wikipedia.org/w/api.php?action=query&prop=info%7Crevisions&intoken=edit&rvprop=timestamp&titles=Wikipedia:Sandbox&format=json&callback=test

/**/test({ ..
    "warnings": { ..
        "info": {
            "*": "Unrecognized value for parameter \"intoken\": edit.\n .."
        }
    },
    "query": {
        "pages": {
            "16283969": { ..
                "starttimestamp": "2018-01-11T21:11:11Z",
                "revisions": [..  ]
} } } })

So it seems like we will still support anon editing in the API, and aside from opaque cross-domain requests within web browsers intentionally abusing this problem with a hardcoded edit token, doesn't seem like this would break anything.

Again, maybe I'm overlooking something, but I hope this helps. I'd expect a patch for this to mostly consist of ensuring sessions are indeed started on the pages where I think they are (sign-up page, log-in page, edit page, etc.), and removing special casing for the constant edit token. Much of everything else seems to be in place already.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users.

I believe it was envisioned that many hits to edit pages are stray clicks without actual edits following them: that is, that starting sessions on 'edit' link click would inflate the total number of sessions more than we could handle. I'm not sure whether we had stats to back that up or not though.

Also IIRC there was concern about blocking edits from users who have disabled cookies. The patch already contains a fallback mechanism for non-session-based tokens though so I'm not sure there is a problem to solve here.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts.

The session is not started there, as Brion pointed out. The session gets started in SubmitAction, i.e. after the user clicks "save" "publish" (or "preview" or "show changes").

For example, API:Edit documents the following query as a way to get page information and an edit token:

That needs updating. prop=info&intoken=edit has been officially deprecated since 2014 (fdddf9457). The modern method is to use meta=tokens. But that returns a similar error

https://en.wikipedia.org/w/api.php?action=query&meta=tokens&type=csrf&callback=test

/**/test({
    "batchcomplete": "",
    "warnings": {
        "tokens": {
            "*": "Tokens may not be obtained when the same-origin policy is not applied."
        }
    }
})

and aside from opaque cross-domain requests within web browsers intentionally abusing this problem with a hardcoded edit token, doesn't seem like this would break anything.

And clients that just hard-code the anonymous token because it has been constant so they can write simpler (but more fragile) code. If there's a stupid way to do something, some unmaintained API client somewhere is probably doing it.

I think that's an acceptable breakage as long as we announce it to mediawiki-api-announce when the time comes.

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts.

The session is not started there, as Brion pointed out. The session gets started in SubmitAction, i.e. after the user clicks "save" "publish" (or "preview" or "show changes").

Ah, okay. So those pages are currently cache-bypassing without being session-starting. I assume the patch would need to change this to do both. Or is there another way? I haven't checked the numbers, but I imagine we should be able to handle the increased load from that (basically, anon bounce rate of edit page; possibly inflated by crawlers).

As far as I know, we already cache-miss red links and other views of the edit page. That's where the session starts. So we shouldn't have any problem using proper edit tokens for those anon sessions, similar to logged-in users.

I believe it was envisioned that many hits to edit pages are stray clicks without actual edits following them: that is, that starting sessions on 'edit' link click would inflate the total number of sessions more than we could handle. I'm not sure whether we had stats to back that up or not though.

This was and remains true.

Just crunched some data on this based on the current editing metrics. Of editor loads that are initiated over the past month, only just over 5% involve someone trying to submit (pressing publish, diff, or preview), and all but another 4% of editor loads are aborted without the user even entering anything into the edit window.

Ah, okay. So those pages are currently cache-bypassing without being session-starting. I assume the patch would need to change this to do both. Or is there another way? I haven't checked the numbers, but I imagine we should be able to handle the increased load from that (basically, anon bounce rate of edit page; possibly inflated by crawlers).

Use a token that can be statically generated, such as hmac(<ip>, <user-agent>, $timestamp, install-secret) . "|$timestamp+\\"

If they take more time in submitting than whatever margin we decide to allow for $timestamp, or they have switched to a new IP address in the meantime, they will get a session loss error (and as they sent it, a proper session will now be created).

Expecting that users didn't change their IP while they perform their first edit seems reasonable. We had reports in the past about ISPs that had users get connect from a different IP each time, although I don't know if that's still the case.
(if we dropped the ip from the above list, the evil webpage could just grab a valid one for the user)

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Use a token that can be statically generated, such as hmac(<ip>, <user-agent>, $timestamp, install-secret) . "|$timestamp+\\"

More usefully, pass whatever static data (IP, agent) to MediaWiki\Session\Token::__construct() via LoggedOutEditToken::__construct() instead of trying to reinvent token handling.

You could have LoggedOutEditToken::match() enforce a maximum value for $maxAge if you want these tokens to expire after a certain amount of time.

Expecting that users didn't change their IP while they perform their first edit seems reasonable. We had reports in the past about ISPs that had users get connect from a different IP each time, although I don't know if that's still the case.

It was the case as recently as February 2016. See T40417#2024227. It would be worthwhile to check again rather than just assuming every ISP suddenly changed behavior in only 2 years.

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Note we'd then have to vary on that cookie, at least for views served via EditPage.

If a user has a constantly changing IP address, their edit will fail with a CSRF error, a session will be started due to the edit form submission attempt, and on their next submit the token will be session-based and will succeed. That seems acceptable.

In T40417#3897811, @Tgr wrote:

and on their next submit the token will be session-based

Assuming we code it to only use the non-session-based token if the anon doesn't have a session, as is the case in https://gerrit.wikimedia.org/r/#/c/318917/.

Per T172477, constantly changing IP addresses may be more common than previously thought. I'd expect fair increase in CSRF token failures. Aside from keeping a very close eye on that metric (which we have, in Graphite), we should also ensure we have a metric for abandonment as result of CRSF token failure. But assuming the worst, do we have alternatives worth considering?

I'm thinking of two separate scenarios:

  1. Secure token for anon edit with cookie support, without requiring a persistent backend session.
  2. Secure token for anon edit without cookies (e.g. cookies disabled in browser, or API client without cookiejar).

For case 1, I think we can do it without keying on (ip, ua). Perhaps, instead using (ip, ua) as the constant between edit-page view and edit-page submission, it seems like we could solve it by giving the user their own constant in the form of a cookie.

The cookie would contain a token generated only based on a private salt and rand, not tied to their connection by IP or UA. Given an attacker could presumably not force the same-origin cookie to exist, this should be secure. On submission, instead of validating the token based on age + ip/ua, we'd validate it based on the age + cookie, and the cookie in turn can be validated as well. We could even keep UA in the mix as extra entropy.

This would leave case 2 (no cookies). The secondary fallback @Tgr mentioned (re-try with a session) won't help no-cookie users. Whether we use (ip, ua) as primary or as fallback, both will fail for IP-changing no-cookie users. This would apply to API-users as well, for which not supporting cookies is even more common. (But security is less of a concern there, so presumably IP-changing API users will have to enable cookies)

It seems the only way to make IP-changing edits work securely is by using a generic token stored and sent back in a way the attacker can't do themselves. It seems cookies are the only way for that, which is also the one thing no-cookie users don't have. Even using JavaScript won't help because anything JS could fetch and insert in the form, the attacker could as well.

So overall, I think using a generic token as constant, instead of IP/UA, would reduce CSRF error rates a lot. But regardless, this change would mean no more edits from users that have variable IPs and cookies disabled. I suppose we could adapt the CSRF error message (if not already) to include that users need to enable cookies if the token error is persistent for them.

Krinkle raised the priority of this task from Lowest to Medium.Jan 13 2018, 12:00 AM

Alternatively, give them a real cookie containing a temporary token, which is what we will be using.

Note we'd then have to vary on that cookie, at least for views served via EditPage.

I don't think it would be needed, since it would only be used on paths that are currently uncached (as soon as they press submit, a "real session" would be provided).

It seems the only way to make IP-changing edits work securely is by using a generic token stored and sent back in a way the attacker can't do themselves. It seems cookies are the only way for that, which is also the one thing no-cookie users don't have. Even using JavaScript won't help because anything JS could fetch and insert in the form, the attacker could as well.

localStorage, but it's very unlikely to be available when cookies aren't. Or one of the evercookie methods, but those are creepy.

...actually ETags don't seem that bad: put the same random token in the HTTP form and the ETag and hope that the browser sends back the ETag in the If-None-Match field (not writable by the attacker). But it's murky when exactly the browser is required to do that, especially if the edit form is generated via POST.

TechCom has scheduled an IRC Meeting for this RFC on 2018-04-11 in the #wikimedia-office channel at 2pm PST(22:00 UTC, 23:00 CET)

  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Rather than "JSONP mode", it might be better to refer to ApiBase::lacksSameOriginSecurity() which checks for that, CORS with origin=*, and a few other things.

action=tokens has been deprecated since MediaWiki 1.24, although numerous clients still use it (sigh). The currently recommended method for fetching tokens from the API is via action=query&meta=tokens (ApiQueryTokens).

Currently all the token-fetching methods in the core API return no tokens when lacksSameOriginSecurity() is true, so in that sense it's already "disabled". None currently raise an error in this situation though, although some (including ApiQueryTokens) do report a warning.

I don't think that having either ApiTokens or ApiQueryTokens raise an error would be a major problem, although it's possible it might break some clients that always try to fetch a token and use the absence of a token in the response as a sort of permission check. The even-older token-fetching methods like action=query&prop=info&intoken=edit are more likely to run into the "permission check" problem, since those methods include actual permissions checks in addition to "JSONP mode" checks.

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode.

The API's "token handler" wouldn't know anything about the cookie in the first place. To actually fetch Token objects the API (mostly[1]) goes through $this->getUser()->getEditTokenObject().

Rather than trying to ignore the new cookie, it might make more sense to have ApiBase::validateToken() return false or even throw an error when lacksSameOriginSecurity() is true so it never gets to the point of attempting to validate a token against it. Failing that, you'd need to have ApiMain somehow turn off $this->getUser()->getEditTokenObject() using the cookie (or duplicate the logic from that method).

[1]: 'login' and 'createaccount' tokens persist a session and go directly to $this->getRequest()->getSession()->getToken(), specifically to bypass the anon non-session-based "+\" token. Extensions might do something similar, particularly old ones.

  • Disable action=tokens in JSONP mode. This ensures an early and descriptive error.

Rather than "JSONP mode", it might be better to refer to ApiBase::lacksSameOriginSecurity() which checks for that, CORS with origin=*, and a few other things.

Agreed. @Krinkle partly got that language from me, and when I was more closely involved with API development we didn't have those things yet. But as time passes my API knowledge gets increasingly out of date :)

Currently all the token-fetching methods in the core API return no tokens when lacksSameOriginSecurity() is true, so in that sense it's already "disabled". None currently raise an error in this situation though, although some (including ApiQueryTokens) do report a warning.

This is good, and is the first line of defense (see also below).

I don't think that having either ApiTokens or ApiQueryTokens raise an error would be a major problem, although it's possible it might break some clients that always try to fetch a token and use the absence of a token in the response as a
sort of permission check. The even-older token-fetching methods like action=query&prop=info&intoken=edit are more likely to run into the "permission check" problem, since those methods include actual permissions checks in addition > to "JSONP mode" checks.

I agree that it's generally better to return an empty token or drop token fields from the output (in addition to a warning that explains what's going on), than to throw an error. Especially since the former is what we've been doing already.

  • Make the API's token handler ignore this new cookie (if exists) when in JSONP mode.

The API's "token handler" wouldn't know anything about the cookie in the first place. To actually fetch Token objects the API (mostly[1]) goes through $this->getUser()->getEditTokenObject().

Rather than trying to ignore the new cookie, it might make more sense to have ApiBase::validateToken() return false or even throw an error when lacksSameOriginSecurity() is true so it never gets to the point of attempting to validate a token against it. Failing that, you'd need to have ApiMain somehow turn off $this->getUser()->getEditTokenObject() using the cookie (or duplicate the logic from that method).

Right now we already have code in ApiMain.php that sets $wgUser to an anonymous user if we are in lacksSameOriginSecurity mode. This is a backstop for if the first line of defense mentioned above fails: even if we don't notice that we're supposed to refuse to perform privileged actions and try to perform one anyway, it'll fail because we sabotaged ourselves at the start. We could do something analogous for anon tokens, like deleting the anon token cookie from $_COOKIES in this code block, or setting some variable or something that sabotages any future anon cookie generation.

dbarratt updated the task description. (Show Details)

The reason this works, is because the request is a simple request which does not invoke a preflight request.

Since the user does not have a session, then this is by definition not a CSRF attack. I think it would therefore be inappropriate to abuse the CSRF tokens for this purpose.

I see three ways to solve this problem:

  1. Change the API to make edit (or any other requests with side effects) requests a non-simple request. Possible Changes:
    1. Use a method other than GET or POST
    2. Require Custom Header for non-authenticated requests.
    3. Require JSON for non-authenticated requests (and the Content-Type header set to application/json)
  2. Force the client to read a "token" that is randomly generated (in as cryptographically secure as possible) and not tied to a session, and send it back to the server. This is different from a CSRF token as it does not require a session. These tokens would be stored on the server and allowed a single(?) use before being destroyed or destroyed automatically after a small time period (5 minutes?).
  3. Require authentication (in the broadest sense possible) in order to edit (at least via the API). Effectively, starting a new session (even if that requires zero actions on the user's part).
  1. Change the API to make edit (or any other requests with side effects) requests a non-simple request. Possible Changes:
    1. Use a method other than GET or POST
    2. Require Custom Header for non-authenticated requests.
    3. Require JSON for non-authenticated requests (and the Content-Type header set to application/json)

Note this task isn't just about the API, it's also relevant to the Web UI. But, from an Action API perspective, any of these would be an extremely significant breaking change. Since less-breaking options are available, I'd oppose this idea.

  1. Force the client to read a "token" that is randomly generated (in as cryptographically secure as possible) and not tied to a session, and send it back to the server. This is different from a CSRF token as it does not require a session. These tokens would be stored on the server and allowed a single use before being destroyed or destroyed automatically after a small time period (5 minutes?).

That's the main option that has been discussed above, with the refinement that the "not-really-CSRF token" is being returned and accepted in exactly the same ways as the CSRF token is returned and accepted so from the client perspective there's no difference between the two.

In other words, while you're technically correct that it's not a CSRF token, very little of the codebase actually needs to care about the difference. Which is a major win from a back-compat perspective.

  1. Require authentication (in the broadest sense possible) in order to edit (at least via the API). Effectively, starting a new session (even if that requires zero actions on the user's part).

This is a bit of a non sequitur. There's no need for any sort of authentication in order to create a session, all that's needed to create a session is to set a session cookie.

This is also already discussed above. It would work well enough for the Action API to create the session when action=query&meta=tokens is hit, but it would have significant negative effects on caching if the Web UI created a session when the edit page is loaded.

As part of T232176 I realized that a simple way to fix this problem is by rejecting write (POST) requests from users who 1) do not have a matching Origin to the allowlist or the current wiki Origin and 2) are not logged in.

This problem, is only a problem for anon edits from origins that are not trusted (i.e. origins not on the allowlist). Untrusted origins already cannot make logged-in edits since they wont have a valid CSRF token, so why should we allow them to make logged-out edits? If someone really wants to do that, they can setup a server to make the edit (which will fix the IP address).

If that seems acceptable, I will create a patch that implements something similar to what I did for the REST API.

Not all browsers set Origin reliably on non-CORS requests.

In T40417#6423655, @Tgr wrote:

Not all browsers set Origin reliably on non-CORS requests.

It's still better than what we have now, is it not?

Is there another way to determine that it came from a web browser rather than some other type of client?

In T40417#6423655, @Tgr wrote:

Not all browsers set Origin reliably on non-CORS requests.

It looks like it should be included for all requests that are not GET or HEAD https://stackoverflow.com/a/42242802/864374
There was a bug in Firefox that wouldn't send it when it was the same-origin, but it looks like that was fixed several years ago. Worst case scenario, someone is allowed to edit from the same-origin, which we would allow anyways.

Currently everyone can edit Wikipedia; with the change, people with certain browsers (possibly including the majority of certain countries, e.g. old IE versions are still very popular in China I believe) could not. That does not sound like an acceptable tradeoff for an (AIUI) mostly theoretical security hardening.

Trying to differentiate between browsers and other clients seems somewhat pointless. I'm sure there are a number of heuristics, but they are easily faked, and non-browser clients could trivially send a fake Origin header anyway. This task is about distributed attacks using the browsers of unwitting internet users; trying to block spam from arbitrary clients would be a much more difficult task.

In T40417#6423925, @Tgr wrote:

Currently everyone can edit Wikipedia; with the change, people with certain browsers (possibly including the majority of certain countries, e.g. old IE versions are still very popular in China I believe) could not. That does not sound like an acceptable tradeoff for an (AIUI) mostly theoretical security hardening.

That is not the case, nor what I'm proposing. They wouldn't be allowed to edit from an untrusted origin (which is already the case for logged in users). Arguably we do not support this anyways and opens a huge privacy concern (see the demo in the task description which will publicly reveal your IP address just by clicking on it).

@Tgr to clarify, if there is no Origin header on the request, we can assume that it's the same-origin (it might not be, but that's no different than what we are doing now). Effectively modern browsers benefit from the added privacy/security, older browsers do not.

Hmm. I think @dbarratt 's idea (checking origin and failing open if not present) could work. Of course should not do it during cors requests. Probably not the safest idea presented, but the other ideas seem locked in a stalemate. Given attacker cannot choose the victims browser in a csrf attack, even if it only works on most browsers that's still a significant reduction of risk.

Yeah, that sounds like a good plan.

Change 65418 had a related patch set uploaded (by Aklapper; owner: Parent5446):
[mediawiki/core@master] Add referer and origin check to edit token checking

https://gerrit.wikimedia.org/r/65418