Page MenuHomePhabricator

Add block cookie for browser-based API edits (including VisualEditor & MobileFrontend)
Closed, ResolvedPublic5 Story Points

Description

Block cookies are only added on page load. As many communities use the VisualEditor as the default editor for logged-out users, the block cookie should be added on all users, regardless of editor they use.


Current behavior

When users edit with the desktop source editor, the block cookie drops:

However, they are not dropped with tools that make edits via API, such as the VisualEditor or mobile web editor:

No block cookieStill no block cookie when VE opens and "you are blocked" message appearsNo block cookie when edit is rejected
No cookie when mobile web "you are blocked" message displays

Acceptance criteria

  • Users should receive a block cookie when they attempt to edit with the VisualEditor — when the editor loads and displays the "you are blocked" message
  • Users should receive a block cookie when they attempt to edit with the mobile web editor — when the editor loads
  • The block cookie logic should behave the same it does in T152462 and T5233

Notes

  • We can break these up into as many investigation + implementation tasks as needed.
  • Can we piggy-back off the "you are blocked" notice tracking we implemented in T189724 ?

Patch

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/536315

Details

Related Gerrit Patches:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I tested this again today. The description is still correct. If an IP is hardblocked and if a user attempts to edit via VisualEditor (e.g. the wiki is configured to allow for IP editors to use VisualEditor or the user's preference is set to VE) the block cookie is not set:

No block cookieStill no block cookie when VE opens and "you are blocked" message appearsNo block cookie when edit is rejected

The block cookie does appear on the source editor:

TBolliger updated the task description. (Show Details)Feb 20 2019, 10:03 PM
TBolliger updated the task description. (Show Details)
TBolliger updated the task description. (Show Details)

@dmaza & @Tchanders we should probably just set the cookie when the API request is made from the browser... do you see any problems with doing it that way?

Also doesn't drop a cookie on mobile web:

TBolliger updated the task description. (Show Details)Feb 20 2019, 10:09 PM
TBolliger set the point value for this task to 5.Feb 21 2019, 7:51 PM
TBolliger updated the task description. (Show Details)

Good to know! We should double-check, but it appears this may be a duplicate task.

Niharika triaged this task as Medium priority.Jul 16 2019, 5:14 PM
dbarratt moved this task from Ready to In Progress on the Anti-Harassment (The Letter Song) board.

Change 528596 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/core@master] Add block cookie for API edits

https://gerrit.wikimedia.org/r/528596

Core Platform Team Would be nice to get a code review of this from your team. :)

Change 528893 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/extensions/VisualEditor@master] Set block cookie when blocked user makes an API request

https://gerrit.wikimedia.org/r/528893

FYI We discussed a possible different direction for this, I'll submit a patch for discussion.

Change 529824 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/core@master] Track all requests that load the block with a block cookie

https://gerrit.wikimedia.org/r/529824

Change 529824 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/core@master] Track all requests that load the block with a block cookie
https://gerrit.wikimedia.org/r/529824

This is an alternative way to go about this, that is more aggressive, but prevents extensions from being responsible for cookie blocking.

mobrovac added subscribers: Anomie, mobrovac.

Change 529824 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/core@master] Track all requests that load the block with a block cookie
https://gerrit.wikimedia.org/r/529824

This is an alternative way to go about this, that is more aggressive, but prevents extensions from being responsible for cookie blocking.

This is indeed more aggressive, but given @Anomie's fix in Gerrit 491300 for T216245: VisualEditor, MobileFrontend, and other tools using action=edit do not auto-block IP addresses (which covers ApiEdit) all is left to do here really is to make VE aware of it, if I'm not mistaken. To that end, Gerrit 528893 LGTM.

This is indeed more aggressive, but given @Anomie's fix in Gerrit 491300 for T216245: VisualEditor, MobileFrontend, and other tools using action=edit do not auto-block IP addresses (which covers ApiEdit) all is left to do here really is to make VE aware of it, if I'm not mistaken. To that end, Gerrit 528893 LGTM.

I'm confused, how do autoblocks relate to this task? I think maybe they suffer from the same "spreading" problem as cookie blocks (and perhaps should also have a more aggressive solution)

mobrovac added subscribers: tstarling, daniel.

This is indeed more aggressive, but given @Anomie's fix in Gerrit 491300 for T216245: VisualEditor, MobileFrontend, and other tools using action=edit do not auto-block IP addresses (which covers ApiEdit) all is left to do here really is to make VE aware of it, if I'm not mistaken. To that end, Gerrit 528893 LGTM.

I'm confused, how do autoblocks relate to this task? I think maybe they suffer from the same "spreading" problem as cookie blocks (and perhaps should also have a more aggressive solution)

You are correct, I confused the two.

We need more eyes on this. @tstarling @daniel could you take a look at Gerrit 529824 ?

@Niharika We came up with an alternative way to implement this feature in T196575#5410862. Basically, instead of adding the cookie blocking code to a whitelist of pages (some of which are in extensions), we'll instead automatically add it when a block is checked on the user. This is more aggressive than before. While this simplifies the code base and makes cookie blocking more straightforward, it does cary a risk of collateral damage. If a user is on a wiki with wgCookieSetOnIpBlock enabled and attempts to perform an action (any action that checks authorization) on a blocked IP, that user will get the cookie and remain blocked for 24 hours (even after changing IPs). This is the intention of the software, so I'm not sure this is a huge deal, just wanted to make you aware of it. Previously, this would only happen if you attempted to edit, with the increase in scope of this task, it will happen on any non-cached action (i.e. anything but read).

Krinkle added a subscriber: Krinkle.EditedAug 27 2019, 7:42 PM

@Niharika We came up with an alternative way to implement this feature in T196575#5410862. Basically, instead of adding the cookie blocking code to a whitelist of pages …, we'll instead automatically add it when a block is checked on the user. … it does cary a risk of collateral damage. If a user is on a wiki with wgCookieSetOnIpBlock enabled and attempts to perform an action (any action that checks authorization) on a blocked IP, that user will get the cookie and remain blocked for 24 hours (even after changing IPs). This is the intention of the software, so I'm not sure this is a huge deal, just wanted to make you aware of it. Previously, this would only happen if you attempted to edit, …

The main reason this feature was fragmented the way it was, wasn't so much about which actions it applied to (though prevention of "createaccount" and "edit" are primarily what justify the feature's existence). Rather, it was about where it couldn't apply: Page views by unregistered users.

It's not safe to inspect the user state during web requests that aren't from a session-persisted user (e.g. logged-in users, and anons who've recently edited). Doing so would lead to cached page views being associated with a random IP-address. For example, a talk page notification for an anon could then show up to all readers. Or, for trackBlockWithCookie, it could cause the autoblock cookie to be received by thousands of unrelated readers of the same article the blocked IP visited.

A few options:

  1. Consistently apply it only to the subset of actions we know require a session and/or start from an uncached URL (create account, edit, etc.).
  2. Consistently apply it to any URL with a persisted session attached. This will include the page views within the same browsing session of an anonymous user that made an edit. But, it would not apply if another device on the same IP were used because they'd not have a session and thus hit the Varnish caches.
  3. Apply it whenever possible, including random cacheable page views by anons without a session. "Random" because it would be limited to page views of which the cached copy in Varnish happens to have expired, and the IP-blocked user is the first to visit it. We'd then set the cookie and disable the cache. This would likely not be acceptable by SRE because it would mean Varnish is unable to protect agains stampedes due to request coalescing for page views no longer being safe for non-session requests to /wiki/%.

Using a call to User::getBlock as proxy for 1 or 2 seems dangerous. In theory, that should be safe. But it's a lot less damaging if a 1000 readers are wrongly told "You can't edit this article" (but it works fine if they try), than to actually block a 1000 random readers with a cookie block.

I'd recommend going for number 2 by moving the trackBlockWithCookie call to the general response handler in MediaWiki.php around the same place where we take care of ChronologyProtector and other high-level stuff. It would need to be guarded by a check for whether there is already a session active (you don't want this code to be the deciding factor on whether the response is cacheable to prevent number 3 scenario from above).

It's not safe to inspect the user state during web requests that aren't from a session-persisted user (e.g. logged-in users, and anons who've recently edited). Doing so would lead to cached page views being associated with a random IP-address. For example, a talk page notification for an anon could then show up to all readers. Or, for trackBlockWithCookie, it could cause the autoblock cookie to be received by thousands of unrelated readers of the same article the blocked IP visited.

Absolutely. This patch doesn't change that. If a web request is inspecting the user state, then that is not the concern of this code. This change assumes that if you are inspecting the user state, then the request ought to be varied by that state.

I'm not sure it's as risky as you are saying it is. The "risk" is that there is code that is inspecting user state that shouldn't be. If that's the case then it should stop. But even if that is the case (which I doubt it is) then what's the worst that happens? Either an IP address that is unblocked hits origin and an unblocked action occurs (as it should) or a blocked user hits it and a Set-Cookie header is added which prevents the request from being cached. In other words, before this change, inspecting the user state on a blocked user would result in the action being blocked for all subsequent users, after the change, the request will remain unblocked (assuming the user doesn't already have the cookie, but if they already do, you would again get an uncachable request, unless this cookie is being ignored by the http cache).

Therefore, if this is a risk at all, this change fixes the a problem (but I've never heard of this happening, so I really really doubt it)

  1. Apply it whenever possible, including random cacheable page views by anons without a session. "Random" because it would be limited to page views of which the cached copy in Varnish happens to have expired, and the IP-blocked user is the first to visit it. We'd then set the cookie and disable the cache. This would likely not be acceptable by SRE because it would mean Varnish is unable to protect agains stampedes due to request coalescing for page views no longer being safe for non-session requests to /wiki/%.

I'm not sure why this wouldn't be acceptable. Even if this were the case (which I doubt it is), the next unblocked IP to access the resource would result in the request entering the cache. Since the majority of IP address are not blocked, then the majority of requests would result in the response entering the cache (even if this is an issue).

I'd recommend going for number 2 by moving the trackBlockWithCookie call to the general response handler in MediaWiki.php around the same place where we take care of ChronologyProtector and other high-level stuff. It would need to be guarded by a check for whether there is already a session active (you don't want this code to be the deciding factor on whether the response is cacheable to prevent number 3 scenario from above).

This would not resolve the acceptance criteria of the ticket. Both of the requests that are mentioned are anonymous before the edit takes place. Also, this wouldn't cover the standard action=edit on desktop because the user doesn't have a session yet (afaik). However, blocks are being checked on those pages and therefore, are not cacheable.

It's not safe to inspect the user state during web requests that aren't from a session-persisted user [..]

Absolutely. This patch doesn't change that. [..] This change assumes that if you are inspecting the user state, then the request ought to be varied by that state.

I think we should run under the assumption, there may be bugs in our code until we find otherwise. As I mentioned, in the current state such bug would be hard to notice and would have minimal in impact. It certainly could not result in blocking all readers of a trending article.

We can find out. Perhaps deploy a patch that asks the SessionManager from within getUserBlock whether there is currently a persisted session and/or whether the cache is disabled for other reasons. If we find 0 entries after a week, we'll have proven there is no (common) path that violates this expectation.

Even then, I think that it would be rather surprising to have a getter method result in the side-effect of a cookie being sent to the web response. This makes the code hard to reason about. it also assumes BlockManager is exclusively used within web requests and pre-send. (Post-send, in Jobs, and in CLI; setCookie should throw). I don't think that is a guarantee we want to fulfil?

I'm not sure it's as risky as you are saying it is. [..] a blocked user hits it and a Set-Cookie header is added which prevents the request from being cached.

That assumes all MW code paths are allowed to set cookies and that something else will take responsibility for disabling Cache-Control. I suspect neither is the case, but I haven't yet tried to confirm this.

  1. [..] set the cookie and disable the cache. This would likely not be acceptable by SRE because it would mean Varnish is unable to protect agains stampedes due to request coalescing for page views no longer being safe for non-session requests to /wiki/%.

[..]. Even if this were the case (which I doubt it is), the next unblocked IP to access the resource would result in the request entering the cache. Since the majority of IP address are not blocked, then the majority of requests would result in the response entering the cache (even if this is an issue)

"Request coalescing" means multiple incoming requests result in only one request to the backend, on the speculative assumption that others waiting for it will likely be satisfied by the same response. For this to be effective, Varnish needs to be able to predict whether a request will have a Vary header and what it looks like - before it gets it. This mechanism is manually disabled for session cookies.

I don't remember how that works exactly, but I imagine it's not too far fetched to assume that if some non-session page views where something calls getUserBlock start emitting cookies and disabling their cache, it might make Varnish think these urls no longer qualify for coalescing. Which could perhaps result in recurring burst of cache-stampedes. It could also mean that users with a cookie block (but no session) would now bypass the cache by default if Varnish learns from that. We can also assess the impact of regular page views by non-session users with a cookie block no longer being cacheable due to such heuristic. I'd guess this is acceptable, but it's something neither of is in charge of deciding about. We should involve SRE and review/test such change carefully.

There's a lot of "ifs" and "maybes" here. I agree that it'd be nice if it worked. But, right now I only see risks, and not much assurance.

I'd recommend going for number 2 by moving the trackBlockWithCookie call to the general response handler in MediaWiki.php around the same place where we take care of ChronologyProtector and other high-level stuff. It would need to be guarded by a check for whether there is already a session active (you don't want this code to be the deciding factor on whether the response is cacheable to prevent number 3 scenario from above).

This would not resolve the acceptance criteria of the ticket. Both of the requests that are mentioned are anonymous before the edit takes place. Also, this wouldn't cover the standard action=edit on desktop because the user doesn't have a session yet [..]. However, blocks are being checked on those pages and therefore, are not cacheable.

I believe it would resolve the criteria. As you say, the edit page is not cacheable, as such, the central mechanism in charge of making the page view cacheable will have already been disabled. This would apply to non-session use cases as well.

Also, I'm not sure how likely it is that someone has a cookie block without a session. They will have gotten blocked because they made an edit, which means they have a session. Anyway, that is an aside.

Calling trackBlockWithCookie() as standard part of pre-shutdown on web requests will:

  • Make it possible for cookie-blocks to spread while viewing the edit page without a session.
  • Make it possible for cookie-blocks to consistently spread while browsing the wiki as anonymous editor (within their session).
  • Let getUserBlock remain predictable (no side-effect), and keep BlockManager separate from implicitly depending on WebResponse or otherwise making it unsafe to use post-send or outside web requests.

Modifying getUserBlock, on the other hand, would spread the cookie only when something unrelated is inspecting block information on the page. That is enough to resolve this task given the limited acceptance criteria, but only because EditPage somewhere calls getUserBlock. That seems hard to reason about long-term, and has the aforementioned down sides and risks.

dbarratt added a comment.EditedSep 9 2019, 4:14 PM

Calling trackBlockWithCookie() as standard part of pre-shutdown on web requests

I'm not opposed to this solution, but how do you know if it is "safe" to do so? Is there a way to determine if the request is "catchable" or not? Or is the best we can do is based on user session (which does not resolve the acceptance criteria because users on the edit page, do not yet have a session, but the user's block is being checked).

It would be ideal, imho, if there way a way to "safely" determine if the request was "cacheable" or not. If it is not (for whatever reason) the block cookie can be safely added (if the request has a block).

Anomie added a comment.Sep 9 2019, 5:38 PM

and that something else will take responsibility for disabling Cache-Control.

See OutputPage::sendCacheHeaders and HeaderCallback.

That assumes all MW code paths are allowed to set cookies and that something else will take responsibility for disabling Cache-Control. I suspect neither is the case, but I haven't yet tried to confirm this.

technically, It doesn't matter because the Cache-Control will be ignored (by a caching proxy) if there is a Set-Cookie header (at least by default). That's what I was trying to say at least. Technically all code paths are allowed to set cookies, and it does not make a difference if Cache-Control is disabled or not.

Change 536315 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/core@master] Check and add block cookie to every uncached API request

https://gerrit.wikimedia.org/r/536315

Change 529824 abandoned by Dbarratt:
Track all requests that load the block with a block cookie

https://gerrit.wikimedia.org/r/529824

Change 534933 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] block: Allow cookie-block tracking from any uncached web request

https://gerrit.wikimedia.org/r/534933

Change 537099 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/core@master] Clear block cookie when tracking block, not when checking block

https://gerrit.wikimedia.org/r/537099

Calling trackBlockWithCookie() as standard part of pre-shutdown on web requests will:

  • [...] Let getUserBlock remain predictable (no side-effect), and keep BlockManager separate from implicitly depending on WebResponse or otherwise making it unsafe to use post-send or outside web requests.

getUserBlock can still have the side-effect of clearing the block cookie. Have suggested a fix in: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537099/

daniel added a comment.EditedSep 16 2019, 1:18 PM

Calling trackBlockWithCookie() as standard part of pre-shutdown on web requests will:

  • [...] Let getUserBlock remain predictable (no side-effect), and keep BlockManager separate from implicitly depending on WebResponse or otherwise making it unsafe to use post-send or outside web requests.

getUserBlock can still have the side-effect of clearing the block cookie. Have suggested a fix in: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537099/

The way this would currently work is via global state (RequestContext::getMain or User::getRequest) I assume. Together with @Krinkle I'm currently trying to figure out how to make this kind of thing work without global state. So far I was hoping that read access to information derived from the request would be sufficient. This seems to be a counter-example. Could you describe the use case on T231930: Introduce ActingUser to represent the user performing the current request so we can consider it in the design?

Together with @Krinkle I'm currently trying to figure out how to make this kind of thing work without global state.

I kind of wish there was some sort of way to add "middleware" to MediaWiki like PSR-15 (or literally PSR-15, see T194393). Doing something like that would allow code (and extensions?) to hook into the request/response pipeline (which, in this case, is all we care about because we don't care about non web-requests). It would also allow abstracting other code (like caching, sessions, etc.).

Change 528596 abandoned by Dbarratt:
Add block cookie for API edits

https://gerrit.wikimedia.org/r/528596

dbarratt updated the task description. (Show Details)Oct 9 2019, 1:22 AM

Change 536315 merged by jenkins-bot:
[mediawiki/core@master] Check and add block cookie to every uncached API request

https://gerrit.wikimedia.org/r/536315

Change 528893 abandoned by Dbarratt:
Set block cookie when blocked user makes an API request

https://gerrit.wikimedia.org/r/528893

dom_walden added a subscriber: dom_walden.

In the following I am comparing when the block cookie gets set on https://test.wikipedia.org (without this change) and https://en.wikipedia.beta.wmflabs.org (with this change).

VisualEditor

While logged out with a block against your IP, editing with VisualEditor, block cookies now get set in a couple of different places:

  • Launching the VisualEditor, which launches either:
    • w/api.php?action=visualeditor&paction=metadata&page=$page
    • w/api.php?action=visualeditor&paction=wikitext&page=$page (if you have the new wikitext editor enabled in beta preferences)
    • You also call these when you switch between "Visual editing" and "Source editing"
  • And when you actually submit the edit, which uses w/api.php?action=visualeditoredit
  • w/api.php?action=cspreport (which gets called when you first open VisualEditor. Might be a beta specific thing.)

As with T233594, this is regardless of whether you are blocked from that specific page.

This covers what I believe are the two API endpoints VisualEditor introduces, action=visualeditor and action=visualeditoredit.

While logged in (with an auto block against the user) we don't appear to have changed the behaviour of block cookies, which get set the same places listed above, plus:

  • w/api.php?action=options, which stores the user's preference when switching between "Visual editing" and "Source editing"

MobileFrontend

While logged out with a block against your IP, editing a preexisting page in Mobile view, block cookies now get sets:

  • w/api.php?action=query&prop=revisions%7Cinfo&rvprop=content%7Ctimestamp&titles=$page (not sure what this is for)
  • When you submit the edit: w/api.php?action=edit
  • w/api.php?action=cspreport (same as in VisualEditor)

Creating a new page, it will only get set when you finally submit the edit.

Again, regardless of whether you are blocked from the specific page you are editing.

While logged in, as with VisualEditor, we don't appear to have changed behaviour. Block cookies get set in all the places above, plus:

  • w/api.php?action=parse

In terms of coverage of the MobileFrontend API, we have:

  • action=parse, covered above
  • action=mobileview, I have not seen a block cookie set here (possibly because it is a cached response)

Supposedly this API is to be deprecated (https://www.mediawiki.org/wiki/Extension:MobileFrontend#API).

It should be noted that when I was testing logged out I disabled cookies in the browser, so I did not have to keep deleting the block cookie. I believe having cookies in a request does change the response from the server. For example, forcing responses that might be cached to be uncached (and therefore setting cookies where they might not have been). I haven't explored this further.

Other API actions

Taking all the example API requests in the MediaWiki API documentation (https://en.wikipedia.beta.wmflabs.org/w/api.php?action=help&recursivesubmodules=1), which I hope covers all the API requests in core.

Submitting them on test and beta and comparing responses.

While logged out, block against IP:

  • No caching differences that are significant (I think)
  • Block cookies now set on error messages
  • Block cookies now set on a number of actions, including different query module. See:

While logged in with an auto block against the user:

  • No significant caching differences
  • No differences in when block cookies are set

Not all the example API requests in the documentation were successful. Some returned errors due to invalid parameters, etc. I tried to get as many of them to succeed as possible in the time constraints.

dbarratt closed this task as Resolved.Oct 15 2019, 2:26 PM

@dom_walden thank you for your incredibly thorough testing. :)