Page MenuHomePhabricator

Outbound HTTPS for varnish backend instances
Open, NormalPublic

Description

We need the ability for varnish backends to make outbound HTTPS connections in at least two scenarios:

  1. For tier-1 cross-DC applayer traffic (e.g. cp2001.codfw.wmnet -> appservers.svc.eqiad.wmnet) - we don't have any known other solution to secure this traffic on inter-DC links at this time.
  2. For inter-tier varnish-be->varnish-be traffic: IPSec is currently protecting this, but HTTPS has the potential to be operationally-better and make the current IPSec deployment less-critical, or open us to ditching the current host-based IPSec and waiting on tunnels of some kind.

Known options for making this happen:

  1. Deploy an stunnel configuration locally on the varnish machines.
    • The idea here would be to deploy stunnel with a separately-configured tunnel instance for each defined varnish backend.
    • Instead of simply backending to appservers.svc.eqiad.wmnet:80 today, it would backend to localhost:12345 (unique port assignment), which is an stunnel configured to connect to appservers.svc.eqiad.wmnet:443.
    • There could be other alternatives similar to stunnel, but stunnel looks like a legit/default option here.
    • Significant Con: Yet another piece of software in the request flow for reliability/debugging woes.
  2. Patch varnish3 ourselves for outbound HTTPS
    • Possibly using Amazon's s2n library, as it could be much simpler than using OpenSSL directly.
    • Would deepen the amount of source-level customization we're doing with Varnish3 today, which is already a long-term problem for maintainability and tech debt.
    • Patch would probably be significantly difficult. This is not lightweight patchwork. There are risks we could make varnish less stable, make a security-affecting mistake in the code, and/or make it much more difficult to continue merging in upstream 3.0.x fixes.
  3. Upgrade to Varnish4, and then create our own custom director module for outbound HTTPS
    • Varnish4 does this already in the commercial Plus variant, but not open source. We could do the same, as open source.
    • It's sad to redundantly re-do the upstream closed-source work here, but as a module in varnish4 it would be significantly cleaner than hacking it into varnish3.
    • Depends on Varnish4 upgrade, which is significantly difficult and off in the Future for now, and may not ever happen if we find an alternative first.
  4. Upgrade to something non-Varnish that supports outbound TLS out of the box:
    • e.g. ATS: T96853
    • As with the above, this is neither near-term nor easy

Event Timeline

BBlack created this task.Aug 17 2015, 4:10 PM
BBlack raised the priority of this task from to Normal.
BBlack updated the task description. (Show Details)
BBlack added projects: Traffic, HTTPS, acl*sre-team.
BBlack added subscribers: faidon, Matanya, gerritbot and 2 others.
BBlack updated the task description. (Show Details)Aug 17 2015, 4:13 PM
BBlack set Security to None.
BBlack added a comment.Sep 8 2015, 4:35 PM

Updates on some exploration of option (2) above (actually looking at the varnish code and the APIs we'd be using in detail):

  • s2n - nice API, would be the simplest/cleanest option - not ready for us to use yet (client-side, client cert, cert-checking in general, configurability...)
  • GnuTLS - better than OpenSSL as an API choice, but like OpenSSL would be a significant chunk of work and code to review
  • OpenSSL - nice in that we have to maintain/care about fewer crypto libs, since we already use this for nginx. Slightly worse than the above for clean implementation...

At this point, I'm leaning towards doing a trial (perhaps incomplete, but proof-of-concept) patch for GnuTLS and seeing how that goes. I don't think we can wait on Varnish4/ATS to fix these problems, so options 3/4 aren't realistic in the near term. Option 1 (stunnel) is still on the table, depending on how well the patching work goes.

Dzahn moved this task from Backlog to Big Picture on the HTTPS board.Dec 4 2015, 8:44 PM

Updates from the passage of time:

Varnish4 is happening and is a realistic blocker (for lots of things) these days, so we're almost certainly looking at something like option 3 (vmod for varnish4) at this point, whenever we can get back around to this.

ema added a subscriber: ema.Feb 26 2016, 12:05 PM
Southparkfan added a subscriber: Southparkfan.
BBlack moved this task from Triage to TLS on the Traffic board.Sep 30 2016, 1:44 PM