## Problem statement
The edit token for logged-out users (anons) is currently a hardcoded string of `+\`. There is absolutely no CSRF protection for it.
That means any website on the internet that can trick a user into visiting their pages, can also (as long as the user isn't logged-in) make them edit Wikipedia on behalf of their IP and browser.
Aside from the user's privacy, this is also subject to potentially large-scale abuse with regards to spam. E.g. anyone visiting the affected website would inadvertently be making edits to Wikipedia (or another wiki being targeted)
Our abuse prevention tools may likely be unable to defend against such an attack. We couldn't block the spammer's IP given they'd be thousands of genuine and unpredictable user IPs. The only option we'd have is to disable disable anonymous editing entirely.
## Requirements
Submitting an edit (via index.php or api.php) will only be possible from a web browser if the user is currently on the wiki's site.
Specifically, this means:
* The token is not a predictable string.
* The token cannot be seen or changed from page on another origin.
* The token is not obtainable cross-origin in web browsers via an API endpoint.
## Restrictions
Between 2012 and 2018, several restrictions have been uncovered whilst exploring different solutions.
1. We cannot start a regular session upon viewing the edit page. While that would meet the requirements and solve the problem, it is unfeasible because the edit page is viewed over GET and receives significant traffic. Allocating session IDs and data for each of those views is infeasible with our current set up. Much of the GET traffic to the edit page is bots and other programmatic interaction that does never results in the form being submitted and thus, does not start and does not need, a session currently.
## Context
##### Logged-in users
For registered users this problem does not exist because upon log-in, the software starts a session. The session is represented by an identifier and contains various key/value pairs saved in the session storage backend (e.g. Redis or MySQL). The user's browser is sent a cookie that identifies the user ID and session. And the session data contains information to verify the validity of the session. Upon viewing the edit page, a ***edit token*** is generated as an HMAC hash that combines a random cryptographic secret (stored in the session on the server), and a timestamp. Upon submitting the edit, the server decodes the packet, confirms the user's session validity, and validates the timestamp expiry.
Aside from registered/logged-in users, we also start this type of session for anonymous (logged-out) users after a a user submits their first edit. That means after they submit the edit page form, all regular session features work. Including talk page notifications, block status notices, and bypassing the static HTML cache for all page views in the session going forward.
##### HTTP cache
The viewing of the edit page currently requires freshness and is uncachable by Varnish. This suggests that despite the restriction on session storage, our web servers (and PHP) can handle the traffic without issue. Which means while the edit page cannot vary based on data in a session, it can vary on anything we can do in PHP based on current HTTP request headers.
## Proposals
### CSRF Cookie
Regular sessions work by generating edit tokens with an HMAC hash of a random token and some other data (salt, timestamp, ..). The random token is persisted in the session data on the backend. And the edit token is validated upon submission by confirming it matches what we'd re-compute at run-time from data we then see in session backend for the current requests's session-cookie ID.
The proposal is to, for users without a session, store the token directly in a cookie. This cookie will be marked `httpOnly` and have similar expiry as we use for session duration.
This CRSF cookie will not relate to session storage, and will not trigger bypassing of CDN cache. (It'd be the same to Varnish as random JS-based cookies.)
Checking requirements:
1. Predictable: No.
2. Accessible to other origins: No. Only JS code running same-origin can get or set cookies. As extra protection, the `httpOnly` mark makes it inaccessible to both cross-origin and same-origin JS code.
3. Obtainable from API: No, but to ensure this we must update the API to extend its current protections in JSONP mode to cover this new use case.
About the API, specifically I propose to:
* Make the API's token handler ignore this new cookie (if exists) when in JSONP mode. This is similar to what JSONP mode already does, in that it does not initialise the same-origin session that co-exist at the same time.
* Disable action=tokens in JSONP mode. This ensures an early and descriptive error.
Point 2 alone does not suffice, because the cookie can still exist due to genuine index.php activity. Point 1 alone would suffice, but it is confusing to users if the request only fails upon submission when the earlier action=tokens request works.
### ..
...