Page MenuHomePhabricator

Create HTTP verb and sticky cookie DC routing in VCL
Open, NormalPublic

Description

Per https://www.mediawiki.org/wiki/Requests_for_comment/Master_%26_slave_datacenter_strategy_for_MediaWiki#Request_routing

CDN routing logic is:

a) Cookie "useDC: master" present => master DC
b) is POST => master DC
c) is GET/HEAD/OPTION => local DC

Also, for faster AJAX POST that only fetch data:
d) is POST but has Promise-Non-Write-API-Action: true HTTP header => local DC (exception to rule b)

Details

Related Gerrit Patches:

Event Timeline

aaron created this task.Mar 6 2015, 9:45 PM
aaron claimed this task.
aaron raised the priority of this task from to Normal.
aaron updated the task description. (Show Details)
aaron added subscribers: PleaseStand, gerritbot, bd808 and 2 others.
aaron set Security to None.
mark added a subscriber: BBlack.
mark added a subscriber: mark.
Gilles added a subscriber: Gilles.Apr 2 2015, 12:25 PM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 2 2015, 7:04 AM
faidon edited projects, added Traffic; removed Varnish.Oct 2 2015, 7:20 AM
Krenair added a subscriber: Krenair.Oct 2 2015, 2:09 PM
aaron updated the task description. (Show Details)Oct 22 2015, 3:52 AM
aaron updated the task description. (Show Details)

Change 247970 had a related patch set uploaded (by Ori.livneh):
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

aaron updated the task description. (Show Details)Oct 25 2015, 6:41 AM

Change 248668 had a related patch set uploaded (by Aaron Schulz):
Add ?idempotent=1 flag for API modules

https://gerrit.wikimedia.org/r/248668

aaron updated the task description. (Show Details)Nov 3 2015, 1:01 PM

Change 248668 merged by jenkins-bot:
Add header to flag API POST requests with no write intentions

https://gerrit.wikimedia.org/r/248668

aaron removed aaron as the assignee of this task.Dec 23 2015, 8:51 PM
BBlack moved this task from Triage to Caching on the Traffic board.Oct 4 2016, 12:48 PM

Change 247970 abandoned by Ori.livneh:
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

I explained ChronologyProtector to @Joe and @BBlack just now. They seemed happy with the idea of not sending a useDC cookie for now, and just relying on ChronologyProtector instead to ensure that the user is presented with an up-to-date view of the site. So the current idea would be to have routing based on HTTP verb, with two exceptions:

  • URLs containing action=rollback would be routed to the master DC, to work around T88044.
  • Requests with the Promise-Non-Write-API-Action header would be routed to the nearest DC regardless of HTTP verb.
BBlack added a comment.EditedMay 19 2018, 3:38 PM

Right. Just to re-state for clarity, the sort of logic we should be implementing in VCL (in the cache layers) will look like this pseudo-code:

if ($local_dc != $master_dc) {
    if ((req.method !~ "^(GET|HEAD|OPTIONS)$" && !req.http.Promise-Non-Write-API-Action) || req.url ~ "[?&]action=rollback) {
        use_master_dc();
    } else {
        use_local_dc();
    }
}
Joe added a comment.May 19 2018, 3:45 PM

About ChronologyProtector:

  • If ChronologyProtector kicks in, it should send back a specific header to varnish
  • in the case the request was not sent to the master DC, it should be retried to the master DC.
BBlack added a comment.EditedMay 19 2018, 3:57 PM

Right, I forgot, that was discussed as an optimization (vs having ChronologyProtector just timeout -> stale invisibly in a case where we may have some operational issue causing persistent, large replication lag).

If possible, ChronologyProtector could basically return a stale result after some short timeout which includes some kind of "X-MW-Chronology: Fail" response header or whatever, and then Varnish could use this hint to retry the request against the master dc and save the user some pain.

We might need followup discussion about whether it's best to do this always/automatically, vs having the fallback behavior controlled by some "db lag emergency mode" runtime switch, as always-on fallback could exacerbate a master-load-induced lag situation (but there are other causal situations where it doesn't hurt).

tstarling added a comment.EditedMay 19 2018, 4:20 PM

Currently, ChronologyProtector times out after 10 seconds, this is configurable. Timeout causes "lagged replica mode" to be set, and Skin::lastModified() responds to this mode by putting a warning in the page footer. Maybe OutputPage::output() would be a good place to send a lagged replica mode response header. Note that lagged replica mode will also be set by LoadBalancer::getReaderIndex() if the lag exceeds the "max lag" parameter, currently 6 seconds in db-eqiad.php. Is it acceptable to send a retry hint to varnish in either case?

Change 434034 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] MW-Replica-Lag response header

https://gerrit.wikimedia.org/r/434034

Joe added a comment.May 20 2018, 5:06 AM

ChronologyProtector uses MySQLMasterPos, which can work both with a GTID-based master position or with the old binlog-based master position.

I'm not sure, from a very quick skim at the code, which of the two we use, but we need to use the MariaDB GTID everywhere or all requests to the non-master DC from a user with ChronologyProtector data in their session will succeed.

Also, our sessions are replicated by redis; There is no guarantee that the session will have less replication lag than MySql at the moment.

I am starting to wonder if the sticky cookie isn't a better idea.

Change 434034 abandoned by Tim Starling:
MW-Replica-Lag response header

Reason:
Probably not necessary given the other things that already exist

https://gerrit.wikimedia.org/r/434034

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

But ignoring that for the moment, I would agree that it seems simpler to:

  • On DB write in a request:
    • write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).
    • set cookies cpPosIndex=.. and UseDC=master for the same amount of time.
  • On GET request in MediaWiki:
    • Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Instead of:

  • On DB write in a request:
    • write DB position to replicated ChronologyProtector store (redis).
    • set cookies cpPosIndex=..
  • On GET request in MediaWiki:
    • If cpPosIndex is not known in local ChronologyProtector store, first wait/poll for the key to be replicated (if ever), with a timeout.
    • Then, once we have that, wait again, for the actual DB position on the local slaves, with a timeout.

The main part that seems problematic here is the polling of the ChronologyProtector store. It's hard to distinguish an error scenario from a lagged scenario, and the impact is stalling user requests, which is very visible. E.g. for cases where the key isn't going to be replicated (already expired, got lost, corrupted client-side, or some other issue). Together each of these factors is fine, but they become somewhat fragile when combined: expiring data, replicated data, and client-side storing of the key.

Joe added a comment.Jun 4 2018, 4:18 PM

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

Right. So we need a header that makes MediaWiki set a cookie I guess?

But ignoring that for the moment, I would agree that it seems simpler to:

  • On DB write in a request:
    • write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).

Why local? We don't really have anything like an "unreplicated redis"

    • set cookies cpPosIndex=.. and UseDC=master for the same amount of time.
  • On GET request in MediaWiki:
    • Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Well, the whole point of what we're trying to do is to attempt not to give the users potentially stale content, not dc-local-not-stale.

Which also makes me wonder: do we need to let MediaWiki write to the sessions from the read-only datacenter? If so, that can't be done with our current session storage system (which has one-way replication only).

BBlack added a comment.Jun 4 2018, 4:21 PM

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

Joe added a comment.Jun 4 2018, 4:28 PM

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

That's surely a less risky first step, and I think it's the best choice for now if we want to go on and turn this functionality on.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

aaron added a comment.Jun 6 2018, 6:46 AM

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

Session writes are fairly rare (initialization and periodic preemptive refreshes). I suppose write access could be tunneled, with tricks to deal with replication delay. Tunneling all access would only be OK for some early iteration of deployment.

Ideally, the session store could take writes from either DC and wait (without a sanity timeout) on the remote one. That was at least the idea I had in mind. I was hoping something like dynomite (redis based) could do the trick. The wmf package for that and something like https://gerrit.wikimedia.org/r/#/c/415789/ (except for sessions, not wancache, and with the ssl features enabled) could be used. The consistency models only support DC local "quorum" at most (https://github.com/Netflix/dynomite/wiki/Consistency) I something like ChronologyProtector's cpPosIndex could be used to avoid replication being raced out between the client making requests heterogeneously among the DCs. Since the cookie will be sent when the session is created, then some cookie could hold an index and be used to make MW wait a bit for a value with that index to appear in the session store.

An older idea was to explore using a small Cassandra cluster. That engine already has "global quorum" style writes as an option. See T134811.

Anomie added a subscriber: Anomie.Jun 6 2018, 3:53 PM
On IRC, @tstarling wrote:

<TimStarling> pity the SessionManager refactor did not add replication awareness

The SessionManager refactor was designed to allow any storage backend to be used, as long as someone implements a BagOStuff interface for it. If someone makes a replication-aware BagOStuff (or a wrapper like MultiWriteBagOStuff), SessionManager can use it.

At a glance it seems WANObjectCache (which is not a BagOStuff) won't work here because its ->set() doesn't replicate either. WANObjectCache is stated to be intended as a cache on top of some other storage, primarily populated by getWithSetCallback(), rather than itself being the storage.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit.

In general, to find things starting a session look for calls to a MediaWiki\Session\Session object's ->persist() method. At a glance,

  • Special:CreateAccount does the same as Special:UserLogin.
  • The action API when used to fetch a login or createaccount token.
  • CentralAuth's auto-login.
  • CentralAuth (or OAuth or other extensions) triggering auto-creation of a local account, or if the auto-creation fails because the IP-user lacks the 'createaccount' and 'autocreateaccount' rights.

Also, if session data for an existing session changes that'll also trigger a write to the storage backend. And, as Aaron mentioned, the periodic keep-alive refresh on a request if the existing session is more than halfway to its expiration.