Page MenuHomePhabricator

Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination
Closed, ResolvedPublic

Description

Before we can enable HSTS includeSub/preload for wikimedia.org, we need to thoroughly audit all the services there. Those that are mapped to our standard cache termination clusters (text, upload, maps, misc) are dealt with separately in other tickets and are relatively-easy to quantify. This is about auditing all of the ones that aren't on standard termination: those that are one-off public-facing hosts with their own custom configuration for HTTP[S] service termination to the public.

I've begun an audit process that starts with the DNS zonefile for wikimedia.org and scans for all hostnames which serve HTTP or HTTPS at all, will post more updates here. Note also data from @Chmarkine's survey of services here: https://wikitech.wikimedia.org/wiki/HTTPS/domains (we can ignore those on the 4 standard clusters for this ticket's purposes).

Will update with audit data as I get it processed into shape....

Related Objects

StatusAssignedTask
OpenBBlack
ResolvedBBlack
ResolvedArielGlenn
ResolvedChmarkine
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedCCogdill_WMF
DeclinedBBlack
DuplicateBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedKrenair
ResolvedJgreen
ResolvedRobH
DuplicateNone
ResolvedBBlack
InvalidNone
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack
ResolvedBBlack

Event Timeline

BBlack created this task.Apr 12 2016, 10:21 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 12 2016, 10:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Poyekhali triaged this task as Normal priority.Apr 13 2016, 5:12 AM

Audit Data

Methodology:

The starting point is our raw DNS zone data for wikimedia.org on our authdns servers. I filtered down from the raw data for:

  • only hostnames which have A or CNAME records (and specifically not geoip!foo, which is for our cache clusters which are covered in other tickets)
  • removed obvious LVS servers
  • removed obvious network gear (e.g. cr1-*, etc)
  • removed all entries in subdomains eqiad, codfw, esams, ulsfo, and frdev (clearly not public service hostnames and/or can be worked around)
  • the remaining hostnames were filtered for those that actually respond to public traffic on ports 80 and/or 443

Then the data is split into two sets for clarity. The first group I'm calling "servers" - they're public hostnames which respond on port 80, but the HTTP[S] domainnames they serve are not their own hostnames - they're just the hosts on which other service hostnames reside. The second group is actual service hostnames which are accessible over HTTP[S] legitimately.

Servers:

Not important to audit directly (in hostname terms), they're only really listed for reference as we audit the actual service hostnames below.

Of particular note are the following issues/oddities not directly blocking this ticket:

  1. gallium and californium seem to only be webservers for services behind cache_misc, and thus could probably be moved to our internal networks if they don't need direct outside access for some other reason.
  2. ms1001 seems to duplicate dataset1001 functionality, but dataset1001 is the one actually in use?
  3. fundraising-eqiad seems to alias barium, and may just be a virtual name for the same
  4. the labtestweb2001 and labtestcontrol2001 hosts serve some broken-ish things, but probably don't matter at this time
hostnameNotes
bariumserver for civicrm? (fundraising!)
californiumserver for horizon (cache_misc)
carbonserver for mirrors, apt, and ubuntu
dataset1001server for download (cache_text), dumps (direct)
fermiumserver for lists
frdata-eqiadserver for frdata
fundraising-eqiadserver for civicrm? (fundraising!)
galliumserver for doc (cache_misc), integration (cache_misc)
labtestweb2001server for labtesthorizon (nxdomain) and labtestwikitech
labtestcontrol2001server for puppetmaster.wikimedia.org (nxdomain)?
magnesiumserver for rt
ms1001claims server for dumps/download like dataset1001?
neonserver for icinga and tendril
netmon1001server for librenms, servermon, smokeping, and torrus
radiumserver for tor-eqiad-1
silverserver for wikitech
titaniumserver for archiva
uraniumserver for ganglia
ytterbiumserver for gerrit

Services:

hostnameHTTPSHTTP-redirectedHSTSNotes / Minor Issues
ripe-atlas-eqiadYesNoNoHTTPS cert is for XXX.anchors.atlas.ripe.net
ripe-atlas-codfwYesNoNoHTTPS cert is for XXX.anchors.atlas.ripe.net
ripe-atlas-ulsfoYesNoNoHTTPS cert is for XXX.anchors.atlas.ripe.net
dumpsYesYesNone
listsYesYes1y
aptNoNoNoneserver:carbon
archivaYesYesNone
pinkunicornYesYes1y
eventdonationsYesYesNone3rd party, redir=302
benefactoreventsYesYesNone3rd party, redir=302, and http redir is to https on another domain...
frdataYesNoNone
labtest-puppetmaster-codfwNoNoNoPort 443 listens, but "Unknown SSL protocol error..."
gerritYesYes1yredir=302
icingaYesYes1yredir=302
librenmsYesYesNone
paymentsYesYes*180dNo HTTP port 80 at all
payments-listenerYesYes*180dNo HTTP port 80 at all
policyYesYes1y/sub/preloadHSTS jumps the gun here (do not submit), 3rd party
rtYesYes1y
civicrmYesYesNone
streamYesNoNone
blogYesNoNone3rd party
fundraisingYesNoNoneboth protos do proto-rel redirects to //wikimediafoundation.org/wiki/Donate
gangliaYesNoNone
tendrilYesYesNoneredir=302
storeYesNoNone3rd party
statusNoNoNone3rd party
tor-eqiad-1YesNo*Noneuses some tor-specific stuff for cert, no real chain to root, doesn't match hostname, etc
ubuntuNoNoNoneserver:carbon
mirrorsNoNoNoneserver:carbon
wikitechYesYes1yredir=302
wikitech-staticYesYes1yredir=302
labtestwikitechYesYes1yinvalid self-signed cert for CN="Andrew Bogott"

While we should fix all of these issues in the long term (they should all be 301->https on the same domain, should all emit HSTS, etc), we should focus on the most-basic issue first: So long as there are entries on this list which do not support a working HTTPS listener at all, and which users might visit with HSTS-enabled browsers and care about, we can't turn on HSTS-preload for all of wikimedia.org, as it would black out those sites.

From the above, these are the sites with basic HTTPS issues:

hostnamenotes
ripe-atlas-eqiadbad cert, probably ignorable
ripe-atlas-codfwbad cert, probably ignorable
ripe-atlas-ulsfobad cert, probably ignorable
tor-eqiad-1bad cert, probably ignorable
labtestwikitechbad cert, probably not important
labtest-puppetmaster-codfwSSL listener broken, probably not important
apthost is carbon, no HTTPS at all
ubuntuhost is carbon, no HTTPS at all
mirrorshost is carbon, no HTTPS at all
status3rd-party (watchmouse), no HTTPS at all

As for the rest of the work, IMHO we should re-purpose the wiki tracking page at https://wikitech.wikimedia.org/wiki/HTTPS/domains to cover longer-term progress on the rest of these issues and leave this ticket open until we get them all resolved. We can remove the ones with standard cache termination (text, upload, misc, maps) because they're easily enumerated and dealt with elsewhere and will be enforced consistently by the code for the cache clusters, and expand that table with the new entries from the audit above, etc.

In the long run we're going to have to do better about documenting every new HTTP[s] service hostname that's running independently of the cache clusters (and its justification for doing so...) so we can maintain monitoring/auditing of them all going forward. I worry about new services with bad configs slipping through the cracks before we even finish cleaning up the existing ones at this rate.

As for the rest of the work, IMHO we should re-purpose the wiki tracking page at https://wikitech.wikimedia.org/wiki/HTTPS/domains to cover longer-term progress on the rest of these issues and leave this ticket open until we get them all resolved. We can remove the ones with standard cache termination (text, upload, misc, maps) because they're easily enumerated and dealt with elsewhere and will be enforced consistently by the code for the cache clusters, and expand that table with the new entries from the audit above, etc.

Agreed! Please feel free to edit that table.

I've started refactoring https://wikitech.wikimedia.org/wiki/HTTPS/domains - it now has the merged data from the above, columns changed, etc. I'm not done with auditing (much less fixing) everything. In the long run, we probably want to move away from having to re-audit these all the time, and instead build a process around approving new HTTP[S] services in production domains before they get added to DNS.

Change 284502 had a related patch set uploaded (by BBlack):
Add HSTS=1y to several one-off public services

https://gerrit.wikimedia.org/r/284502

Change 284502 merged by BBlack:
Add HSTS=1y to several one-off public services

https://gerrit.wikimedia.org/r/284502

I've missed some meta-tracking (putting bug refs on patches, etc), but status update:

I've fixed all the trivial low-hanging fruit, adding HSTS to several sites, fixing a few minor config/cert issues along the way, etc, and completed a basic audit of them all with the data in https://wikitech.wikimedia.org/wiki/HTTPS/domains

Lots of minor issues remain, but the next easy-ish priorities are:

  1. ganglia.wm.o doesn't redirect
  2. wikitech-static doesn't include a necessary chain cert
  3. several sites redirect with 302 instead of 301
  4. several sites lack "robust" forward secrecy (the ones marked FS:WMB) - for many of them that are on our servers and hosted with apache, T133217 will probably lead to fixing them all en-masse.

And then of course there's status.wikimedia.org, which is the lone real deal-breaker left for STS-preload of wikimedia.org :(

One more thing I didn't note above:

stream.wm.o lacks HSTS on fetch of /, because that's a 404 and nginx doesn't apply add_header to [45]xx...

Change 284760 had a related patch set uploaded (by BBlack):
stream.wm.o: rewrite / => /rcstream_status

https://gerrit.wikimedia.org/r/284760

Change 284803 had a related patch set uploaded (by BBlack):
ganglia web: HTTP->HTTPS redir

https://gerrit.wikimedia.org/r/284803

Change 284811 had a related patch set uploaded (by BBlack):
gerrit web: use 301 for https redir

https://gerrit.wikimedia.org/r/284811

Change 284812 had a related patch set uploaded (by BBlack):
icinga web: use 301 for https redir

https://gerrit.wikimedia.org/r/284812

Change 284813 had a related patch set uploaded (by BBlack):
wikitech: use 301 for https redir

https://gerrit.wikimedia.org/r/284813

Change 284814 had a related patch set uploaded (by BBlack):
tendril: use 301 for https redir

https://gerrit.wikimedia.org/r/284814

Change 284811 merged by BBlack:
gerrit web: use 301 for https redir

https://gerrit.wikimedia.org/r/284811

Change 284812 merged by BBlack:
icinga web: use 301 for https redir

https://gerrit.wikimedia.org/r/284812

Change 284813 merged by BBlack:
wikitech: use 301 for https redir

https://gerrit.wikimedia.org/r/284813

Change 284814 merged by BBlack:
tendril: use 301 for https redir

https://gerrit.wikimedia.org/r/284814

Change 284817 had a related patch set uploaded (by BBlack):
librenms: use chain cert correctly

https://gerrit.wikimedia.org/r/284817

Change 284803 merged by BBlack:
ganglia web: HTTP->HTTPS redir

https://gerrit.wikimedia.org/r/284803

Change 284817 merged by BBlack:
librenms: use chain cert correctly

https://gerrit.wikimedia.org/r/284817

Dzahn added a comment.Apr 26 2016, 7:35 PM

As for the rest of the work, IMHO we should re-purpose the wiki tracking page at https://wikitech.wikimedia.org/wiki/HTTPS/domains to

Agreed! Please feel free to edit that table.

There is also https://wikitech.wikimedia.org/wiki/Httpsless_domains let's somehow merge that?

Change 284760 merged by Ori.livneh:
stream.wm.o: rewrite / => /rcstream_status

https://gerrit.wikimedia.org/r/284760

As for the rest of the work, IMHO we should re-purpose the wiki tracking page at https://wikitech.wikimedia.org/wiki/HTTPS/domains to

Agreed! Please feel free to edit that table.

There is also https://wikitech.wikimedia.org/wiki/Httpsless_domains let's somehow merge that?

Yeah, I'd like to expand HTTPS/domains further and add back the data we've had in the past (including past versions of itself), but in separate tables since they're distinct problems to solve (e.g. wmflabs.org issues and such. maybe the server-only-hostname issues too).

When this work is done, protocol-relative URLs should be declared as deprecated for wikitext.
As of today there are no advantages for this syntax in wikitext, but it makes a lot of workflows, scripts and tools more complicated than necessary.

It will take years, untill all "//bla.foo" Links get fixed, so it should be declared as soon as possible.

demon added a subscriber: demon.Jul 12 2016, 6:24 PM

When this work is done, protocol-relative URLs should be declared as deprecated for wikitext.
As of today there are no advantages for this syntax in wikitext, but it makes a lot of workflows, scripts and tools more complicated than necessary.
It will take years, untill all "//bla.foo" Links get fixed, so it should be declared as soon as possible.

Did we ever make that an official recommendation anywhere?

Short of breaking some of the httpsless domains (linked above on wikitech), there's no reason one should still be using them. Notably, linking to the wikis definitely doesn't need to and has been useless since we went https-only awhile ago.

Where would such an deprecation be announced? I would think it's more of a matter of updating style guides on the issue, if they exist.

Where would such an deprecation be announced? I would think it's more of a matter of updating style guides on the issue, if they exist.

you can to this in https://meta.wikimedia.org/wiki/Tech/News together with one of the commits of this task.

if the protocol relative links are declared as deprecated, it would help editors to remove it.

Did we ever make that an official recommendation anywhere?

I think we pushed the interface messages in that direction, when the HTTP vs Secure experiences were widely separate, after we started pushing people to HTTPS it wasn't as widely needed.

We probably have to be a little bit careful about broad "https all the things everywhere" sorts of efforts at this point in time. The bulk of our internal, private service endpoints are still HTTP-only. Internal, inter-service traffic may follow links as well, and may be broken if they're all explicitly https when the actual internal connection they're using is not https-capable. There are tasks around implementing HTTPS for all internal service traffic as well, but we haven't even finished tackling the basics on these, such as setting up processes around internal certificates signed by our internal CA, etc.

Sub-tasks updated to be in sync with latest audit data. Primary issues here are the open tasks for store, blog, and the FR hosts.

BBlack moved this task from Triage to TLS on the Traffic board.Sep 30 2016, 1:48 PM
Jgreen added a parent task: Restricted Task.Jul 10 2017, 1:15 PM
Jgreen removed a parent task: Restricted Task.
Jgreen added a subtask: Restricted Task.
BBlack removed a subtask: Restricted Task.Jul 11 2017, 4:26 PM
BBlack closed this task as Resolved.Jul 11 2017, 5:10 PM
BBlack claimed this task.

Resolving this and moving the last remaining ticket up the tree as a direct child of the tracker. There's no point having a sub-category for one thing.