Create HTTP verb and sticky cookie DC routing in VCL
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	aaron
	Mar 6 2015, 9:45 PM

Description

CDN routing logic should be:

If HTTP POST => master DC.
- Reason: This will perform writes to the primary database which should be done locally.
If cookie "UseDC=master" is present => master DC.
- Reason: The user has recently made writes (database, session). To ensure the user sees their own actions reflected, and to minimise chances of needing to do synchronous waits, the user is "stickied" to the primary DC for a few seconds until we're confident cross-dc DB and session replication has completed.
- This also ensures we don't need a multi-dc aware ChronologyProtector, per T254634.
If URL param "cpPosIndex=" is present => master DC.
- Reason: The user has recently made writes (database, session) and is now being redirected to a cross-wiki domain. To ensure the user sees their own actions reflected, and to minimise chances of needing to do synchronous waits, the user is "stickied" to the primary DC for a few seconds until we're confident cross-dc DB and session replication has completed.
- This also ensures we don't need a multi-dc aware ChronologyProtector, per T254634.
Requests that perform database writes over GET (like HTTP POST)
- GET index.php action=rollback => master DC (per T88044).
  - Reason: This is a user action that for legacy reason cannot yet use form submissions.
- GET login.wikimedia.org Special:CentralAutoLogin => master DC.
  - Reason: This auto-creates local accounts and cross-domain login sessions through a chain of redirects and hence can't use POST. In addition to sometimes performing db writes, it also needs access to the latest user sessions and ChronologyProtector, per T254634#6211514. Note that the Special:CentralAutoLogin url is never localised to make this easy.
Cache/stash write optimisations => master DC
- GET/POST api.php action=centralauthtoken or centralauthtoken=, or an Authorization header which starts with CentralAuthToken (T267270)
  - Reason: Foreign API tokens need to be set and then immediately consumed. Latency will probably be reduced by routing these requests at the CDN layer rather using mcrouter to do cross-DC memcached requests.
Anything else, e.g. HTTP GET/HEAD/OPTION => local DC.
HTTP POST with Promise-Non-Write-API-Action: true header => local DC (exception to rule 1).
- Reason: These are AJAX POST requests that only fetch data. They use POST due to the limited payload size that GET requests allow.

Details

Subject	Repo	Branch	Lines +/-
Implement MediaWiki multi-DC traffic component	operations/puppet	production	+616 -13
Rename magic header to be consistent with WMF CDN infrastructure	mediawiki/extensions/VisualEditor	REL1_35	+1 -1
MWLBFactory: rename magic HTTP header for opting out of sqlite write lock	mediawiki/core	REL1_35	+3 -3
MWLBFactory: rename magic HTTP header for opting out of sqlite write lock	mediawiki/core	master	+3 -3
Rename magic header to be consistent with WMF CDN infrastructure	mediawiki/extensions/VisualEditor	master	+1 -1
MW-Replica-Lag response header	mediawiki/core	master	+22 -0
varnish: add prototype cookie-based backend selection	operations/puppet	production	+12 -0
Add header to flag API POST requests with no write intentions	mediawiki/core	master	+32 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	aaron	T88445 MediaWiki active/active datacenter investigation and work (tracking)
Resolved	Krinkle	T270223 FY2021-2022: Enable basic Multi-DC operations for read traffic (tracking)
Resolved	• tstarling	T91820 Create HTTP verb and sticky cookie DC routing in VCL
Resolved	aaron	T91816 Add code to enable setting sticky DC cookies for POST requests
Resolved	aaron	T121440 Dedicated post-edit cache busting cookie to prevent stale reads (session consistency)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

faidon edited projects, added Traffic; removed Varnish.Oct 2 2015, 7:20 AM

Krenair subscribed.Oct 2 2015, 2:09 PM

aaron updated the task description. (Show Details)Oct 22 2015, 3:52 AM

aaron updated the task description. (Show Details)

Change 247970 had a related patch set uploaded (by Ori.livneh):
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

gerritbot added a project: Patch-For-Review.Oct 22 2015, 4:34 AM

aaron updated the task description. (Show Details)Oct 25 2015, 6:41 AM

Change 248668 had a related patch set uploaded (by Aaron Schulz):
Add ?idempotent=1 flag for API modules

https://gerrit.wikimedia.org/r/248668

aaron updated the task description. (Show Details)Nov 3 2015, 1:01 PM

Change 248668 merged by jenkins-bot:
Add header to flag API POST requests with no write intentions

https://gerrit.wikimedia.org/r/248668

ReleaseTaggerBot added projects: MW-1.27-release (WMF-deploy-2015-11-10_(1.27.0-wmf.6)), MW-1.27-release-notes.Nov 6 2015, 11:00 PM

aaron removed aaron as the assignee of this task.Dec 23 2015, 8:51 PM

aaron mentioned this in T92357: Fix database master queries from HTTP GET/HEAD before active-active multi-dc.Jun 17 2016, 1:50 AM

aaron edited projects, added Wikimedia-Multiple-active-datacenters; removed MW-1.27-release-notes, MW-1.27-release (WMF-deploy-2015-11-10_(1.27.0-wmf.6)), Patch-For-Review.Aug 12 2016, 4:54 AM

BBlack moved this task from Backlog to Caching on the Traffic board.Oct 4 2016, 12:48 PM

Change 247970 abandoned by Ori.livneh:
varnish: add prototype cookie-based backend selection

https://gerrit.wikimedia.org/r/247970

mark moved this task from Backlog to Blocked on the Wikimedia-Multiple-active-datacenters board.Dec 14 2016, 5:14 PM

Krinkle edited projects, added Multiple-active-datacenters archived; removed Wikimedia-Multiple-active-datacenters, Sustainability.May 3 2017, 7:37 PM

Krinkle edited projects, added Sustainability (MediaWiki-MultiDC); removed Multiple-active-datacenters archived.May 3 2017, 7:58 PM

Krinkle moved this task from Later to Blocked on the Sustainability (MediaWiki-MultiDC) board.May 5 2017, 3:30 AM

Krinkle mentioned this in T121440: Dedicated post-edit cache busting cookie to prevent stale reads (session consistency).

I explained ChronologyProtector to @Joe and @BBlack just now. They seemed happy with the idea of not sending a useDC cookie for now, and just relying on ChronologyProtector instead to ensure that the user is presented with an up-to-date view of the site. So the current idea would be to have routing based on HTTP verb, with two exceptions:

URLs containing action=rollback would be routed to the master DC, to work around T88044.
Requests with the Promise-Non-Write-API-Action header would be routed to the nearest DC regardless of HTTP verb.

Right. Just to re-state for clarity, the sort of logic we should be implementing in VCL (in the cache layers) will look like this pseudo-code:

if ($local_dc != $master_dc) {
    if ((req.method !~ "^(GET|HEAD|OPTIONS)$" && !req.http.Promise-Non-Write-API-Action) || req.url ~ "[?&]action=rollback) {
        use_master_dc();
    } else {
        use_local_dc();
    }
}

BBlack added a project: Wikimania-Hackathon-2018.May 19 2018, 3:39 PM

About ChronologyProtector:

If ChronologyProtector kicks in, it should send back a specific header to varnish
in the case the request was not sent to the master DC, it should be retried to the master DC.

Right, I forgot, that was discussed as an optimization (vs having ChronologyProtector just timeout -> stale invisibly in a case where we may have some operational issue causing persistent, large replication lag).

If possible, ChronologyProtector could basically return a stale result after some short timeout which includes some kind of "X-MW-Chronology: Fail" response header or whatever, and then Varnish could use this hint to retry the request against the master dc and save the user some pain.

We might need followup discussion about whether it's best to do this always/automatically, vs having the fallback behavior controlled by some "db lag emergency mode" runtime switch, as always-on fallback could exacerbate a master-load-induced lag situation (but there are other causal situations where it doesn't hurt).

Currently, ChronologyProtector times out after 10 seconds, this is configurable. Timeout causes "lagged replica mode" to be set, and Skin::lastModified() responds to this mode by putting a warning in the page footer. Maybe OutputPage::output() would be a good place to send a lagged replica mode response header. Note that lagged replica mode will also be set by LoadBalancer::getReaderIndex() if the lag exceeds the "max lag" parameter, currently 6 seconds in db-eqiad.php. Is it acceptable to send a retry hint to varnish in either case?

Change 434034 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] MW-Replica-Lag response header

https://gerrit.wikimedia.org/r/434034

gerritbot added a project: Patch-For-Review.May 19 2018, 4:59 PM

ChronologyProtector uses MySQLMasterPos, which can work both with a GTID-based master position or with the old binlog-based master position.

I'm not sure, from a very quick skim at the code, which of the two we use, but we need to use the MariaDB GTID everywhere or all requests to the non-master DC from a user with ChronologyProtector data in their session will succeed.

Also, our sessions are replicated by redis; There is no guarantee that the session will have less replication lag than MySql at the moment.

I am starting to wonder if the sticky cookie isn't a better idea.

Change 434034 abandoned by Tim Starling:
MW-Replica-Lag response header

Reason:
Probably not necessary given the other things that already exist

https://gerrit.wikimedia.org/r/434034

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

But ignoring that for the moment, I would agree that it seems simpler to:

On DB write in a request:
- write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).
- set cookies cpPosIndex=.. and UseDC=master for the same amount of time.
On GET request in MediaWiki:
- Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Instead of:

On DB write in a request:
- write DB position to replicated ChronologyProtector store (redis).
- set cookies cpPosIndex=..
On GET request in MediaWiki:
- If cpPosIndex is not known in local ChronologyProtector store, first wait/poll for the key to be replicated (if ever), with a timeout.
- Then, once we have that, wait again, for the actual DB position on the local slaves, with a timeout.

The main part that seems problematic here is the polling of the ChronologyProtector store. It's hard to distinguish an error scenario from a lagged scenario, and the impact is stalling user requests, which is very visible. E.g. for cases where the key isn't going to be replicated (already expired, got lost, corrupted client-side, or some other issue). Together each of these factors is fine, but they become somewhat fragile when combined: expiring data, replicated data, and client-side storing of the key.

Krinkle removed a project: Patch-For-Review.May 20 2018, 11:12 AM

• Imarlier subscribed.May 29 2018, 2:20 PM

In T91820#4218746, @Krinkle wrote:

There are cases where a cookie doesn't work (specifically, for the log-in use case where there are cross-origin redirects involved from wikipedia.org to login.wikimedia.org).

Right. So we need a header that makes MediaWiki set a cookie I guess?

But ignoring that for the moment, I would agree that it seems simpler to:

On DB write in a request:

write DB position to local ChronologyProtector store (e.g. memcached, or unreplicated redis).

Why local? We don't really have anything like an "unreplicated redis"

set cookies cpPosIndex=.. and UseDC=master for the same amount of time.

On GET request in MediaWiki:

Use ChronologyProtector to wait for any DB position from cpPosIndex (like now). Except that we'd use a dc-local CP store, only wait for dc-local DB lag, and don't wait at all if the CP store doesn't have the given key, because we know in that case that it is either already expired or otherwise permanently lost.

Well, the whole point of what we're trying to do is to attempt not to give the users potentially stale content, not dc-local-not-stale.

Which also makes me wonder: do we need to let MediaWiki write to the sessions from the read-only datacenter? If so, that can't be done with our current session storage system (which has one-way replication only).

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

In T91820#4254387, @BBlack wrote:

Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-sessioned readonly requests (basically cache misses for GET/HEADs with no session token) across the sides, where we basically don't care if content is stale in the replication sense, and leave POST-like methods and all sessioned traffic master-only.

That's surely a less risky first step, and I think it's the best choice for now if we want to go on and turn this functionality on.

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

In T91820#4259902, @tstarling wrote:

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

Session writes are fairly rare (initialization and periodic preemptive refreshes). I suppose write access could be tunneled, with tricks to deal with replication delay. Tunneling all access would only be OK for some early iteration of deployment.

Ideally, the session store could take writes from either DC and wait (without a sanity timeout) on the remote one. That was at least the idea I had in mind. I was hoping something like dynomite (redis based) could do the trick. The wmf package for that and something like https://gerrit.wikimedia.org/r/#/c/415789/ (except for sessions, not wancache, and with the ssl features enabled) could be used. The consistency models only support DC local "quorum" at most (https://github.com/Netflix/dynomite/wiki/Consistency) I something like ChronologyProtector's cpPosIndex could be used to avoid replication being raced out between the client making requests heterogeneously among the DCs. Since the cookie will be sent when the session is created, then some cookie could hold an index and be used to make MW wait a bit for a value with that index to appear in the session store.

An older idea was to explore using a small Cassandra cluster. That engine already has "global quorum" style writes as an option. See T134811.

On IRC, @tstarling wrote:

<TimStarling> pity the SessionManager refactor did not add replication awareness

The SessionManager refactor was designed to allow any storage backend to be used, as long as someone implements a BagOStuff interface for it. If someone makes a replication-aware BagOStuff (or a wrapper like MultiWriteBagOStuff), SessionManager can use it.

At a glance it seems WANObjectCache (which is not a BagOStuff) won't work here because its ->set() doesn't replicate either. WANObjectCache is stated to be intended as a cache on top of some other storage, primarily populated by getWithSetCallback(), rather than itself being the storage.

In T91820#4259902, @tstarling wrote:

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit.

In general, to find things starting a session look for calls to a MediaWiki\Session\Session object's ->persist() method. At a glance,

Special:CreateAccount does the same as Special:UserLogin.
The action API when used to fetch a login or createaccount token.
CentralAuth's auto-login.
CentralAuth (or OAuth or other extensions) triggering auto-creation of a local account, or if the auto-creation fails because the IP-user lacks the 'createaccount' and 'autocreateaccount' rights.

Also, if session data for an existing session changes that'll also trigger a write to the storage backend. And, as Aaron mentioned, the periodic keep-alive refresh on a request if the existing session is more than halfway to its expiration.

• mobrovac added a project: Services (watching).Jun 7 2018, 4:57 PM

• Rfarrand moved this task from Backlog to Project on the Wikimania-Hackathon-2018 board.Jul 5 2018, 5:33 PM

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:02 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:00 PM

Krinkle mentioned this in T254634: Determine and implement multi-dc strategy for ChronologyProtector.Jun 10 2020, 3:37 PM

I've updated the task descriptoin to include:

the grandfathering of action=rollback (as agreed a few years ago),
the exemption for CentralAuth autologin (recenly discussed in context of ChronologyProtector at T254634#6211514).

CDanis subscribed.Jun 10 2020, 4:14 PM

jijiki subscribed.Jun 11 2020, 7:53 AM

Krinkle added a subtask: T121440: Dedicated post-edit cache busting cookie to prevent stale reads (session consistency).Jun 15 2020, 9:27 PM

Krinkle closed subtask T121440: Dedicated post-edit cache busting cookie to prevent stale reads (session consistency) as Resolved.

Krinkle moved this task from Blocked to Later on the Sustainability (MediaWiki-MultiDC) board.Jun 23 2020, 8:52 PM

Krinkle moved this task from Later to Discuss next on the Sustainability (MediaWiki-MultiDC) board.Jul 2 2020, 2:49 PM

Krinkle removed a project: Platform Team Legacy (Watching / External).Jul 2 2020, 2:52 PM

Krinkle mentioned this in T88044: Make rollback use POST instead of GET (use AJAX in GUI).

Krinkle mentioned this in T128592: Add redundancy to IRC recent changes service.Jul 2 2020, 4:38 PM

Aklapper removed subscribers: Anomie, • Imarlier.Oct 16 2020, 5:02 PM

Krinkle added a parent task: T270223: FY2021-2022: Enable basic Multi-DC operations for read traffic (tracking).Dec 16 2020, 2:08 AM

RhinosF1 subscribed.Jan 25 2021, 8:26 PM

Change 661763 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@master] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661763

Change 661760 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661760

Change 661760 merged by jenkins-bot:
[mediawiki/core@master] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661760

Change 661763 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661763

Change 661934 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@REL1_35] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661934

Change 661936 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/VisualEditor@REL1_35] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661936

Krinkle mentioned this in T267270: Determine multi-dc strategy for CentralAuth.Feb 6 2021, 6:50 AM

DannyS712 updated the task description. (Show Details)Feb 14 2021, 6:42 AM

Krinkle mentioned this in T274724: Add support for "Max-Age" to WebResponse::setCookie (more reliable than "Expires").Feb 14 2021, 7:03 AM

aaron updated the task description. (Show Details)Feb 16 2021, 4:49 AM

Change 661934 merged by jenkins-bot:
[mediawiki/core@REL1_35] MWLBFactory: rename magic HTTP header for opting out of sqlite write lock

https://gerrit.wikimedia.org/r/661934

Change 661936 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@REL1_35] Rename magic header to be consistent with WMF CDN infrastructure

https://gerrit.wikimedia.org/r/661936

Krinkle mentioned this in T270225: Finish session storage to actually meet multi-DC requirements.Mar 19 2021, 2:26 AM

BBlack moved this task from Caching to Icebox-Temp on the Traffic board.Oct 8 2021, 5:28 PM

Krinkle removed projects: Patch-For-Review, Services (watching), Wikimania-Hackathon-2018.Oct 8 2021, 5:31 PM

Krinkle added a project: Performance-Team (Radar).

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

Krinkle removed a subscriber: • Gilles.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Krinkle mentioned this in T134842: SpecialCentralAutoLogin calls User::saveSettings() on HTTP GET presend.Apr 5 2022, 2:01 AM

BBlack moved this task from Backlog to Complicated on the Traffic-Icebox board.Apr 7 2022, 9:07 PM

Krinkle mentioned this in T219592: Frequent Echo DB_PRIMARY write queries on HTTP GET.May 11 2022, 3:02 PM

Change 801621 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[operations/puppet@production] [WIP] Implement MediaWiki multi-DC traffic component

https://gerrit.wikimedia.org/r/801621

gerritbot added a project: Patch-For-Review.May 31 2022, 6:46 AM

• tstarling updated the task description. (Show Details)Jun 21 2022, 1:57 AM

• tstarling updated the task description. (Show Details)Jun 21 2022, 2:03 AM

Change 801621 merged by Tim Starling:

[operations/puppet@production] Implement MediaWiki multi-DC traffic component

https://gerrit.wikimedia.org/r/801621

Krinkle closed this task as Resolved.Jul 6 2022, 12:57 AM

Krinkle assigned this task to • tstarling.

Krinkle mentioned this in T314434: Avoid ChronologyProtector queries on majory of pageviews that have no recent positions.Aug 2 2022, 8:16 PM

Krinkle mentioned this in T352481: Remove 'db-replicated' and ReplicatedBagOStuff.Nov 30 2023, 8:11 PM

Create HTTP verb and sticky cookie DC routing in VCL Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Create HTTP verb and sticky cookie DC routing in VCL
Closed, ResolvedPublic
Actions

Related Objects
Search...