Page MenuHomePhabricator

Varnish support for active:active backend services
Closed, ResolvedPublic

Description

Currently our cache<->cache and cache<->app routing tables in hieradata do not support active:active configurations. This is technologically painless in theory right now for services that are ready, but there are some complex bits to sort out with puppetization and VCL templating and confd, etc...

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

So, this the current state of affairs and the complex bits, using text as an example cluster:

  • hieradata cache::route_table for role::cache:text determines the cache->(cache->...)direct paths for the whole cluster, like this:
cache::route_table:
  eqiad: 'direct'
  codfw: 'eqiad'
  ulsfo: 'codfw'
  esams: 'eqiad'
  • hieradata cache::text::apps has per-service definitions of codfw and/or eqiad service hostnames, and a single 'route' key that determines which one to use, like this:
apps:
  appservers:
    route: 'eqiad'
    backends:
      eqiad: 'appservers.svc.eqiad.wmnet'
      codfw: 'appservers.svc.codfw.wmnet'
  restbase:
    route: 'eqiad'
    backends:
      eqiad: 'restbase.svc.eqiad.wmnet'
      codfw: 'restbase.svc.codfw.wmnet'
  • This allows for multiple DCs' backend caches (eqiad and codfw) to be set to direct in the route table, but then they'd both use the same singular applayer backend (eqiad or codfw) for each application service, meaning half the traffic would be cross-DC (suboptimal, and also unprotected until we sort out HTTPS).
  • If we want to be able to turn on active:active in a meaningful way on a per-service basis, the choice of when and where to route direct has to be per-service, not per-cluster.
  • There are looping concerns with how the cache routing works. The wrong single changes or sequences of overlapping changes (given async rollout) can cause requests to loop between eqiad and codfw on transitions. This is true both in the current setup and in just about any future one for active:active support.

My basic mental model for where we go from here is something like this, at least as a debatable starting point:

  • cache::route_table should probably still be per-cluster (and likely fairly static and identical for all clusters for the foreseeable future, but nevermind that), but does not ever have direct on the right-hand side, and instead is configured with an eqiad<->codfw loop, as in:
cache::route_table:
    esams: eqiad
    ulsfo: codfw
    eqiad: codfw
    codfw: eqiad
  • The per-application-service definitions indicate (a) which cache DCs get direct access and also (b) what hostname each cache DC contacts for direct access. Given MW configured for eqiad-only and RB configured for eqiad+codfw active:active, those entries would look like this:
apps:
  appservers:
    backends:
      eqiad: 'appservers.svc.eqiad.wmnet'
  restbase:
    backends:
      eqiad: 'restbase.svc.eqiad.wmnet'
      codfw: 'restbase.svc.codfw.wmnet'
  • In a logical sense, the way requests work is that we determine the application service they belong to early (frontend recv), and route according to a function over the route_table and the apps entry above. At ulsfo, there is no applayer backend defined for either service, so varnish knows it must use cache::route_table and talk to codfw. At codfw,for RB requests it sees there's an applayer backend defined, so the traffic goes directly. At codfw for MW, it doesn't, so it moves on to eqiad and then drops to the application layer.
  • To depool one side of RB (maintenance in eqiad), you'd just comment out the disabled backend, then put it back later, and that would work correctly and not leak PII even when we don't have HTTPS, and also not create loops. While eqiad: restbase.svc.eqiad.wmnet is missing, eqiad will forward to codfw caches. In a post-HTTPS world, we could instead just modify that to eqiad: restbase.svc.codfw.wmnet and have both sides talk to codfw applayer directly while eqiad applayer is out for maint.
  • To handle MediaWiki active:passive switchovers similarly to how we did in the last codfw-rollout test (to minimize PII-leak window), you'd have to step carefully through these steps:

Initial State:

# all reqs go through eqiad caches to eqiad appservers
eqiad: appservers.svc.eqiad.wmnet

Step 1 - switch to codfw applayer - PII leak begins in pre-HTTPS-world

# all reqs go through eqiad caches to codfw appservers
eqiad: appservers.svc.codfw.wmnet

Step 2 - Let codfw caches talk to that directly - PII leak cut in half

# reqs go through both eqiad + codfw caches directly to codfw appservers
eqiad: appservers.svc.codfw.wmnet
codfw: appservers.svc.codfw.wmnet

Step 3 - Stop eqiad from talking to codfw directly - PII leak gone

# all reqs go through codfw caches to codfw appservers
codfw: appservers.svc.codfw.wmnet

The potential looping hazard (which exists in another form in our current setup, too) is if someone jumped straight from the Initial State or Step 1 config to the Step 3 config, without fully deploying the Step 2 config and ensuring it's complete at all caches in-between, some requests would get stuck in eqiad<->codfw loops.

Similarly, in the RB active:active maintenance scenario pre-HTTPS (commenting out entries), if you planned to do serial maintenance (down 1 DC for maintenance, then the other right afterwards), it would be important that you fully-deploy the normal active:active config inbetween the two. If you jumped straight from a config with only an eqiad key to only a codfw key (or vice-versa), looped requests would ensue during async config application to the nodes.

We should probably go forward with something along these kinds of lines, document the switching procedures well, and also build loop-prevention into the VCL (by checking X-Cache in backend varnishes for an entry from any backend varnish in the same DC, or checking total length of X-Cache as a recursion limit) and emit 503 on loops to minimize the problems caused by mistakes.

chasemp triaged this task as Medium priority.May 5 2016, 8:46 PM

Change 294478 had a related patch set uploaded (by BBlack):
VCL: prevent inter-cache loop bugs

https://gerrit.wikimedia.org/r/294478

Some further thinking: without changing the cache-level stuff discussed above, this would also support a config like:

restbase:
    backends:
      eqiad: 'restbase.svc.wmnet'
      codfw: 'restbase.svc.wmnet'

Where restbase.svc.wmnet is defined in gdnsd and uses the closest underlying service endpoint that's not been marked as down there (gdnsd already has info on our network ranges, so it would know to normally choose eqiad for eqiad and codfw for codfw).

Where restbase.svc.wmnet is defined in gdnsd and uses the closest underlying service endpoint that's not been marked as down there (gdnsd already has info on our network ranges, so it would know to normally choose eqiad for eqiad and codfw for codfw).

T125069: Create a service location / discovery system for locating local/master resources easily across all WMF applications is aiming to address service discovery as well, with DNS being one of the possible interfaces.

Change 294478 merged by BBlack:
VCL: prevent inter-cache loop bugs with X-DCPath

https://gerrit.wikimedia.org/r/294478

Change 294530 had a related patch set uploaded (by BBlack):
VCL: protect loop protection from restarts

https://gerrit.wikimedia.org/r/294530

Change 294530 merged by BBlack:
VCL: protect loop protection from restarts

https://gerrit.wikimedia.org/r/294530

I've merged the "declarative config" ticket to here, it's worth perusing the older comments/commits there at T110717. The rationale for the merge is that the primary functionality/complexity driver on the declarative config that we can't ignore is the work in this ticket: getting per-backend/service active:active support working right in all cases.

@ema and I discussed at length the tradeoffs with the existing set of 5 patches starting at https://gerrit.wikimedia.org/r/#/c/300574/ . While those accomplish most of the goals about declarative config, the pragmatic tradeoff is pretty horrible: we get to declare declarative victory, but in the net we're deleting a handful of easy-to-understand lines of code, and adding a huge chunk of very messy code in the form of large chunks of Ruby embedded in VCL templates, which generates VCL code fragments as output.

We're trying to step back a bit and find a more pragmatic path towards the big picture goal, which I think requires looking at both the original declarative-backends task as well as the active:active end-game.

The primary structural code change we're trying to make here for active:active looks something like this:

existing pseudo-code, where cache::route_table is the primary determinant, and once it has determined we are in the singular "direct" DC, we switch to using the singular app service hostname defined at that DC:

if (this DC is "direct" in cache::route_table) {
   if (HTTP host and/or path matches appservice X) {
      set backend = <singular backend hostname for service X in eqiad or codfw>
      # (as determined by cache::text::app::appservers::route, etc)
   }
   elsif (... other services similarly ...) {
   }
}
else {
   set backend = <varnish in the next DC according to cache::route_table>;
}

future structure, where the decision to contact the applayer or not happens on a per-service basis, and we fall back to inter-caching routing via cache::route_table only when there's no local appservice backend definition in this DC:

if (HTTP host and/or path matches appservice X) {
    if (this DC has an entry in cache::text::appservers::X) {
        set backend = <backend hostname defined for this service in this DC>;
    }
    else {
        set backend = <varnish in the next DC according to cache::route_table>;
    }
}
elsif (... other services similarly ...) {
}

Change 339464 had a related patch set uploaded (by BBlack):
varnish: move applayer be_opts defaulting into template

https://gerrit.wikimedia.org/r/339464

Change 339464 merged by BBlack:
varnish: move applayer be_opts defaulting into template

https://gerrit.wikimedia.org/r/339464

Change 339667 had a related patch set uploaded (by BBlack):
varnish: move "apps" data back into manifests [WIP, 1/4]

https://gerrit.wikimedia.org/r/339667

Change 339668 had a related patch set uploaded (by BBlack):
varnish: switch all clusters to req_handling [WIP, 2/4]

https://gerrit.wikimedia.org/r/339668

Change 339669 had a related patch set uploaded (by BBlack):
varnish: per-app routing [WIP, 3/4]

https://gerrit.wikimedia.org/r/339669

Change 339671 had a related patch set uploaded (by BBlack):
varnish: move applayer info back to hiera [WIP, 4/4]

https://gerrit.wikimedia.org/r/339671

Change 339668 abandoned by BBlack:
varnish: switch all clusters to req_handling [WIP, 2/4]

Reason:
Squashed into Ia29f6a41914bdc6b7e998cd8b5f5b073ff8eb461

https://gerrit.wikimedia.org/r/339668

Change 339669 abandoned by BBlack:
varnish: per-app routing [WIP, 3/4]

Reason:
Squashed into Ia29f6a41914bdc6b7e998cd8b5f5b073ff8eb461

https://gerrit.wikimedia.org/r/339669

Change 339671 abandoned by BBlack:
varnish: move applayer info back to hiera [WIP, 4/4]

Reason:
Squashed into Ia29f6a41914bdc6b7e998cd8b5f5b073ff8eb461

https://gerrit.wikimedia.org/r/339671

Change 339667 merged by BBlack:
[operations/puppet@production] varnish: refactor all clusters for active/active

https://gerrit.wikimedia.org/r/339667

BBlack claimed this task.