Page MenuHomePhabricator

Create HTTP verb and sticky cookie DC routing in VCL
Closed, ResolvedPublic

Description

CDN routing logic should be:

  1. If HTTP POST => master DC.
    • Reason: This will perform writes to the primary database which should be done locally.
  2. If cookie "UseDC=master" is present => master DC.
    • Reason: The user has recently made writes (database, session). To ensure the user sees their own actions reflected, and to minimise chances of needing to do synchronous waits, the user is "stickied" to the primary DC for a few seconds until we're confident cross-dc DB and session replication has completed.
    • This also ensures we don't need a multi-dc aware ChronologyProtector, per T254634.
  3. If URL param "cpPosIndex=" is present => master DC.
    • Reason: The user has recently made writes (database, session) and is now being redirected to a cross-wiki domain. To ensure the user sees their own actions reflected, and to minimise chances of needing to do synchronous waits, the user is "stickied" to the primary DC for a few seconds until we're confident cross-dc DB and session replication has completed.
    • This also ensures we don't need a multi-dc aware ChronologyProtector, per T254634.
  4. Requests that perform database writes over GET (like HTTP POST)
    • GET index.php action=rollback => master DC (per T88044).
      • Reason: This is a user action that for legacy reason cannot yet use form submissions.
    • GET login.wikimedia.org Special:CentralAutoLogin => master DC.
      • Reason: This auto-creates local accounts and cross-domain login sessions through a chain of redirects and hence can't use POST. In addition to sometimes performing db writes, it also needs access to the latest user sessions and ChronologyProtector, per T254634#6211514. Note that the Special:CentralAutoLogin url is never localised to make this easy.
  5. Cache/stash write optimisations => master DC
    • GET/POST api.php action=centralauthtoken or centralauthtoken=, or an Authorization header which starts with CentralAuthToken (T267270)
      • Reason: Foreign API tokens need to be set and then immediately consumed. Latency will probably be reduced by routing these requests at the CDN layer rather using mcrouter to do cross-DC memcached requests.
  6. Anything else, e.g. HTTP GET/HEAD/OPTION => local DC.
  7. HTTP POST with Promise-Non-Write-API-Action: true header => local DC (exception to rule 1).
    • Reason: These are AJAX POST requests that only fetch data. They use POST due to the limited payload size that GET requests allow.

See also:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron updated the task description. (Show Details)

Change 247970 had a related patch set uploaded (by Ori.livneh):
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

Change 248668 had a related patch set uploaded (by Aaron Schulz):
Add ?idempotent=1 flag for API modules

https://gerrit.wikimedia.org/r/248668

Change 248668 merged by jenkins-bot:
Add header to flag API POST requests with no write intentions

https://gerrit.wikimedia.org/r/248668

aaron removed aaron as the assignee of this task.Dec 23 2015, 8:51 PM

Change 247970 abandoned by Ori.livneh:
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

I explained ChronologyProtector to @Joe and @BBlack just now. They seemed happy with the idea of not sending a useDC cookie for now, and just relying on ChronologyProtector instead to ensure that the user is presented with an up-to-date view of the site. So the current idea would be to have routing based on HTTP verb, with two exceptions:

  • URLs containing action=rollback would be routed to the master DC, to work around T88044.
  • Requests with the Promise-Non-Write-API-Action header would be routed to the nearest DC regardless of HTTP verb.

Right. Just to re-state for clarity, the sort of logic we should be implementing in VCL (in the cache layers) will look like this pseudo-code:

if ($local_dc != $master_dc) {
    if ((req.method !~ "^(GET|HEAD|OPTIONS)$" && !req.http.Promise-Non-Write-API-Action) || req.url ~ "[?&]action=rollback) {
        use_master_dc();
    } else {
        use_local_dc();
    }
}

About ChronologyProtector:

  • If ChronologyProtector kicks in, it should send back a specific header to varnish
  • in the case the request was not sent to the master DC, it should be retried to the master DC.

Right, I forgot, that was discussed as an optimization (vs having ChronologyProtector just timeout -> stale invisibly in a case where we may have some operational issue causing persistent, large replication lag).

If possible, ChronologyProtector could basically return a stale result after some short timeout which includes some kind of "X-MW-Chronology: Fail" response header or whatever, and then Varnish could use this hint to retry the request against the master dc and save the user some pain.

We might need followup discussion about whether it's best to do this always/automatically, vs having the fallback behavior controlled by some "db lag emergency mode" runtime switch, as always-on fallback could exacerbate a master-load-induced lag situation (but there are other causal situations where it doesn't hurt).

Currently, ChronologyProtector times out after 10 seconds, this is configurable. Timeout causes "lagged replica mode" to be set, and Skin::lastModified() responds to this mode by putting a warning in the page footer. Maybe OutputPage::output() would be a good place to send a lagged replica mode response header. Note that lagged replica mode will also be set by LoadBalancer::getReaderIndex() if the lag exceeds the "max lag" parameter, currently 6 seconds in db-eqiad.php. Is it acceptable to send a retry hint to varnish in either case?

Change 434034 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] MW-Replica-Lag response header

https://gerrit.wikimedia.org/r/434034

ChronologyProtector uses MySQLMasterPos, which can work both with a GTID-based master position or with the old binlog-based master position.

I'm not sure, from a very quick skim at the code, which of the two we use, but we need to use the MariaDB GTID everywhere or all requests to the non-master DC from a user with ChronologyProtector data in their session will succeed.

Also, our sessions are replicated by redis; There is no guarantee that the session will have less replication lag than MySql at the moment.

I am starting to wonder if the sticky cookie isn't a better idea.

Change 434034 abandoned by Tim Starling:
MW-Replica-Lag response header

Reason:
Probably not necessary given the other things that already exist

https://gerrit.wikimedia.org/r/434034

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

But ignoring that for the moment, I would agree that it seems simpler to:

  • On DB write in a request:
    • write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).
    • set cookies cpPosIndex=.. and UseDC=master for the same amount of time.
  • On GET request in MediaWiki:
    • Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Instead of:

  • On DB write in a request:
    • write DB position to replicated ChronologyProtector store (redis).
    • set cookies cpPosIndex=..
  • On GET request in MediaWiki:
    • If cpPosIndex is not known in local ChronologyProtector store, first wait/poll for the key to be replicated (if ever), with a timeout.
    • Then, once we have that, wait again, for the actual DB position on the local slaves, with a timeout.

The main part that seems problematic here is the polling of the ChronologyProtector store. It's hard to distinguish an error scenario from a lagged scenario, and the impact is stalling user requests, which is very visible. E.g. for cases where the key isn't going to be replicated (already expired, got lost, corrupted client-side, or some other issue). Together each of these factors is fine, but they become somewhat fragile when combined: expiring data, replicated data, and client-side storing of the key.

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

Right. So we need a header that makes MediaWiki set a cookie I guess?

But ignoring that for the moment, I would agree that it seems simpler to:

  • On DB write in a request:
    • write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).

Why local? We don't really have anything like an "unreplicated redis"

    • set cookies cpPosIndex=.. and UseDC=master for the same amount of time.
  • On GET request in MediaWiki:
    • Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Well, the whole point of what we're trying to do is to attempt not to give the users potentially stale content, not dc-local-not-stale.

Which also makes me wonder: do we need to let MediaWiki write to the sessions from the read-only datacenter? If so, that can't be done with our current session storage system (which has one-way replication only).

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

That's surely a less risky first step, and I think it's the best choice for now if we want to go on and turn this functionality on.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

Session writes are fairly rare (initialization and periodic preemptive refreshes). I suppose write access could be tunneled, with tricks to deal with replication delay. Tunneling all access would only be OK for some early iteration of deployment.

Ideally, the session store could take writes from either DC and wait (without a sanity timeout) on the remote one. That was at least the idea I had in mind. I was hoping something like dynomite (redis based) could do the trick. The wmf package for that and something like https://gerrit.wikimedia.org/r/#/c/415789/ (except for sessions, not wancache, and with the ssl features enabled) could be used. The consistency models only support DC local "quorum" at most (https://github.com/Netflix/dynomite/wiki/Consistency) I something like ChronologyProtector's cpPosIndex could be used to avoid replication being raced out between the client making requests heterogeneously among the DCs. Since the cookie will be sent when the session is created, then some cookie could hold an index and be used to make MW wait a bit for a value with that index to appear in the session store.

An older idea was to explore using a small Cassandra cluster. That engine already has "global quorum" style writes as an option. See T134811.

On IRC, @tstarling wrote:

<TimStarling> pity the SessionManager refactor did not add replication awareness

The SessionManager refactor was designed to allow any storage backend to be used, as long as someone implements a BagOStuff interface for it. If someone makes a replication-aware BagOStuff (or a wrapper like MultiWriteBagOStuff), SessionManager can use it.

At a glance it seems WANObjectCache (which is not a BagOStuff) won't work here because its ->set() doesn't replicate either. WANObjectCache is stated to be intended as a cache on top of some other storage, primarily populated by getWithSetCallback(), rather than itself being the storage.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit.

In general, to find things starting a session look for calls to a MediaWiki\Session\Session object's ->persist() method. At a glance,

  • Special:CreateAccount does the same as Special:UserLogin.
  • The action API when used to fetch a login or createaccount token.
  • CentralAuth's auto-login.
  • CentralAuth (or OAuth or other extensions) triggering auto-creation of a local account, or if the auto-creation fails because the IP-user lacks the 'createaccount' and 'autocreateaccount' rights.

Also, if session data for an existing session changes that'll also trigger a write to the storage backend. And, as Aaron mentioned, the periodic keep-alive refresh on a request if the existing session is more than halfway to its expiration.

I've updated the task descriptoin to include:

  • the grandfathering of action=rollback (as agreed a few years ago),
  • the exemption for CentralAuth autologin (recenly discussed in context of ChronologyProtector at T254634#6211514).

Change 661763 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661763

Change 661760 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661760

Change 661760 merged by jenkins-bot:
[mediawiki/core@master] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661760

Change 661763 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661763

Change 661934 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@REL1_35] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661934

Change 661936 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@REL1_35] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661936

Change 661934 merged by jenkins-bot:
[mediawiki/core@REL1_35] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661934

Change 661936 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@REL1_35] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661936

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Change 801621 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[operations/puppet@production] [WIP] Implement MediaWiki multi-DC traffic component

https://gerrit.wikimedia.org/r/801621

Change 801621 merged by Tim Starling:

[operations/puppet@production] Implement MediaWiki multi-DC traffic component

https://gerrit.wikimedia.org/r/801621

Krinkle assigned this task to tstarling.