status.wikimedia.org has no (valid) HTTPS
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	duplicatebug
	Dec 4 2011, 7:24 PM

Description

Currently, status.wikimedia.org has no HTTPS at all. I suspect this was the "workaround" for it having an incorrect certificate in the past.

Previous description: status.wikimedia.org is using an security certificate from *.io.watchmouse.com which give a warning in IE and Chrome.

Is it possible to install a wikimedia certificate on that domain? Thanks.

Details

Reference: bz32796

Related Objects
Search...

Status	Assigned	Task
Resolved	BBlack	T104681 HTTPS Plans (tracking / high-level info)
Resolved	BBlack	T104244 Preload HSTS
Resolved	BBlack	T40516 Enable HSTS on Wikimedia sites
Resolved	None	T37313 SSL cert invalid for bugzilla.wikipedia.org redirect
Declined	Krinkle	T38126 *.mobile.wikipedia.org domains are using invalid SSL certificate
Resolved	Jgreen	T88199 Enable HSTS on https://payments.wikimedia.org
Resolved	Chmarkine	T90527 Enable HSTS and point rel=canonical to HTTPS for all Russian Wikimedia projects
Resolved	BBlack	T132521 Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination
Resolved	BBlack	T103919 let all services on misc-web enforce http->https redirects
Resolved	Dzahn	T103773 check if services behind misc-web enforce http->https redirect or not
Resolved	• ezachte	T93702 Fix the mixed content issue on Wikimedia Statistics
Resolved	BBlack	T132459 HTTPS redirects for config-master.wikimedia.org
Resolved	BBlack	T132460 HTTPS redirects for git.wikimedia.org
Resolved	BBlack	T132461 HTTPS redirects for graphite.wikimedia.org
Resolved	BBlack	T132462 HTTPS redirects for parsoid-tests.wikimedia.org
Resolved	BBlack	T132463 HTTPS redirects for datasets.wikimedia.org
Resolved	BBlack	T132464 HTTPS redirects for transparency.wikimedia.org
Resolved	BBlack	T132465 HTTPS redirects for stats.wikimedia.org
Resolved	Dzahn	T132543 enable HSTS on *.planet.wikimedia.org
Resolved	BBlack	T132685 Preload STS for wikimedia.org
Duplicate	None	T123135 Invalid web certificate on status.wikimedia.org
Resolved	BBlack	T34796 status.wikimedia.org has no (valid) HTTPS

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:00 AM

• bzimport added projects: HTTPS, acl*sre-team.

• bzimport set Reference to bz32796.

• bzimport added a subscriber: Unknown Object (MLST).

duplicatebug created this task.Dec 4 2011, 7:24 PM

Because it's offsite.

It'll need it's own specific cert buying and assigning

I doubt this is feasible - it is hosted on Amazon AWS, so they'd have to fire up a separate watchmouse AWS LB instance just to serve Wikimedia status? ;-)

If we can't get it on a correct cert, we might want to redirect to its canonical domain instead so it at least loads properly. Main obvious downside is if our redirector or iframe wrapper goes down, you don't see it on our pretty domain anymore. ;)

Removing dependency to bug 27946 which is secure.wikimedia.org.

Bug 44760 has been marked as a duplicate of this bug. ***

The ops ticket is RT #1849

So this certificate is served by Nimsoft, and we have no control over it. I'll paste the reasoning from RT:

it is just a CNAME for status.watchmouse.com

status.wikimedia.org is an alias for status.watchmouse.com.
status.watchmouse.com is an alias for dualstack.lb-1710199131.us-east-1.elb.amazonaws.com.

In the watchmouse UI, in "Public folders" setup you can change the CNAME but nothing about SSL or certificates.

And the failure is on their side already anyways, because status.watchmouse.com itself does not show the correct cert

status.watchmouse.com uses an invalid security certificate.

The certificate is only valid for *.io.watchmouse.com

Then folks ask if we can redirect, the answer is no. It is a status page for when the cluster is down, therefore redirecting via the cluster is non-ideal.

So this is a wontfix, because we cantfix.

Re-opening this for further consideration. A fair bit has changed since 2013, including a strong push for HTTPS/TLS/SSL support across both Wikimedia and the rest of the Internet.

https://status.wikimedia.org/ failing isn't really okay. We should figure out some way to make this work or we should kill the service entirely, in my opinion.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 19 2015, 11:18 PM

• MZMcBride reopened this task as Open.Oct 19 2015, 11:19 PM

• MZMcBride set Security to None.

• MZMcBride added subscribers: Dzahn, BBlack.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 19 2015, 11:19 PM

Agreed, let's consider buying that. Adding "traffic" for opinions.

Dzahn added a project: Traffic.Oct 19 2015, 11:21 PM

Over 2 years later, and we still have pages like status.watchmouse.com giving

This server could not prove that it is status.watchmouse.com; its security certificate is from status.io.watchmouse.com.

Dzahn claimed this task.Oct 20 2015, 4:19 AM

Do we really care of having status.wikimedia.org to be served over TLS? I am not sure it is worth it (and the price of a host cert), so I would rather disable HTTPS and just use http.

In T34796#1737605, @hashar wrote:

Do we really care of having status.wikimedia.org to be served over TLS?

yes

but that doesn't mean i have the solution how to fix it since it's on watchmouse's servers

Dzahn removed Dzahn as the assignee of this task.Oct 21 2015, 12:23 AM

Dzahn moved this task from Backlog to Blocked on External on the HTTPS board.Dec 3 2015, 7:32 PM

Josve05a added a parent task: T123135: Invalid web certificate on status.wikimedia.org.Jan 8 2016, 8:18 PM

Legoktm merged a task: T123135: Invalid web certificate on status.wikimedia.org.Jan 8 2016, 8:18 PM

Legoktm added subscribers: StudiesWorld, Josve05a.

Sjoerddebruin subscribed.Jan 8 2016, 8:21 PM

Josve05a updated the task description. (Show Details)Jan 8 2016, 8:21 PM

Josve05a removed a subscriber: Sjoerddebruin.

Sjoerddebruin awarded a token.Jan 8 2016, 8:21 PM

Josve05a added a subscriber: Sjoerddebruin.Jan 8 2016, 8:22 PM

hashar unsubscribed.Jan 8 2016, 8:26 PM

Peachey88 merged a task: T131017: HTTPS error on status.wikimedia.org (watchmouse certificate mismatch).Mar 27 2016, 8:52 AM

Peachey88 added subscribers: Peachey88, Poyekhali.

Dzahn mentioned this in T111967: Preload HSTS for select hostnames within wikimedia.org.Apr 12 2016, 2:32 PM

BBlack renamed this task from status.wikimedia.org is using SSL cert from other domain to status.wikimedia.org has no (valid) HTTPS.Apr 14 2016, 1:30 PM

BBlack updated the task description. (Show Details)

BBlack added a parent task: T132521: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination.

BBlack added a parent task: T132685: Preload STS for wikimedia.org.Apr 14 2016, 1:44 PM

Considering that watchmouse's own status pages, e.g. http://status.cloudmonitor.ca.com/ and http://stations.status.cloudmonitor.ca.com/ don't offer HTTPS at all (connection refused on 443), I doubt we'll get far with asking for it for our status site.

The current importance of this is that it's likely to be the very very last thing (there's one other pending, but it can be solved relatively easily) preventing us from doing a blanket STS-preload for all of wikimedia.org, which is a big deal.

Basic options that come to mind in a few minutes:

Talk to watchmouse, see if there's any way they can HTTPS this with a legit cert, like we've done with other 3rd party vendors (where we purchase the cert and hand them the key securely). Seems unlikely, but worth a shot!
Cancel/Replace this service? I don't know that we have an equivalent replacement anywhere, but this option was mentioned once before! If replacing it means spending a long time looking for a new replacement service and setting that up first, that sucks timeline-wise.
Move it to another domain. We could move the CNAME to some other domain we own that we're not trying to STS-preload, like say status.wmftest.net or something. We could support the old name (transitionally, but not as the advertised name, because it might not work when our infra is down!) by having status.wikimedia.org map to one of the prod varnish clusters securely and generate a 301 -> the new hostname.

They might also have the option to simply use a hostname within their domains rather than bothering with another name of ours. e.g. configuring it as wikimedia.status.asm.ca.com and calling that the official name. After all this is supposed to be independent of our infrastructure. Ideally that would include our authdns, too.

The actual blocker for 2. was that Catchpoint was able to replace almost all features of Watchmouse, _except_ that it doesn't have that kind of status page. So maybe an option is also keep poking them about that, argueing that we pay them quite a bit already.

BBlack mentioned this in T132685: Preload STS for wikimedia.org.Apr 15 2016, 7:04 PM

In the settings, we can see that http://status.wikimedia.org/ is also available at http://status.asm.ca.com/8777 . There don't appear to be any TLS-related settings :(

We could do a few things trivially, which have un-ideal tradeoffs, but might be acceptable:

We could set up a revproxy for it internally, on perhaps a ganeti misc web node? Then it would TLS'd as part of the misc cluster, but it won't be available if any of several parts of our infra are in trouble, which is probably when we want it the most.
We could set up a static page for it on the misc cluster (via varnish synthesis), which simply links to (or does a slow HTML refresh to?) the http://status.asm.ca.com/8777 . At least then people could bookmark the real thing independent of our infra, and it would rely on less of our infra (just authdns + LVS + cache_misc, and be DC-independent).
We could host a revproxy as in (1) above, but externally like we do for https://wikitech-static.wikimedia.org/ (perhaps even on the same host?).

Chmarkine subscribed.May 5 2016, 11:43 AM

Option (3) sounds like the easiest way forward to me and an acceptable option. My only concern would be whether it could handle a surge of traffic (the kind of traffic it'd see when we're down at some point). I don't think we advertise the status page much, so I wouldn't expect it to. I don't think we have or could get any access statistics for it right now, but having it be frontend by infrastructure we control could allow us to, which is another plus. If we do this move, let's at least set up some logging and/or monitoring for it and check it during the next outage :)

Longer-term I think we should overhaul that whole status page. This is currently backed by Watchmouse which isn't very accurate (or pretty). We could either use some external status page service (statuspage.io etc.) or (my preference) build something ourselves using e.g. wikitech-static or some other externally-hosted infrastructure. Something like Cachet could be reused for this to save us from all the frontend trouble.

But all of that can wait; for the purposes of this task (HTTPS support), option (3) is a good compromise, IMHO.

Yeah I tend to agree too. I think if we're concerned at all about status.wm.o perf during outages, we could probably also tack on a secondary task to extend the apache config there to use mod_cache and cache the status with a 1 or 5 minute TTL and cut down on the revproxying load.

There's an upside, too, in that our revproxy will help anonymize clients of stats.wm.o against watchmouse/CA privacy invasion :)

Also note: while in there, should convert wikitech-static to cron'd letsencrypt (using our prod script!), and then use that for the status.wm.o cert as well.

Yes, we should. Unfortunately wikitech-static might be a pain since it does not use puppet (and for obvious reasons cannot reach the production puppetmaster). :/

it's ok, we can just copy down the acme-setup script as it exists today (well, and acme-tiny). for a 1-2 cert setup like this, it's not hard to use it puppet-free from cron I think.

Change 292482 had a related patch set uploaded (by BBlack):
status -> wikitech-static hosting T34796

https://gerrit.wikimedia.org/r/292482

Change 292482 merged by BBlack:
status -> wikitech-static hosting T34796

https://gerrit.wikimedia.org/r/292482

I've moved the status.wm.o DNS to wikitech-static, and set up an apache reverse proxy there with a LetsEncrypt cert that auto-renews. It seems to work now, after much experimenting and mucking around!

For the record, since we have no puppet, in case we have to muck with this again, the basic things I did were:

Created a local acme user and group that can't log in
Copied acme-setup, acme_tiny.py, and x509-bundle from our puppet repo to /usr/local/sbin/
Commented out the self-verification portion of acme_tiny.py (this always seems to fail on challenge over redirect to self-signed for me).
Installed the letsencrypt X3 and X4 intermediates in /usr/local/share/ca-certificates and ran update-ca-certificates.
Enabled the following new apache2 modules: proxy, proxy_http, proxy_html
Set up the following as the sites-available/enabled file for status.wikimedia.org.conf (note especially the crazy html translation hacks for re-mapping links URLs, especially the mongocache one (which is for ajax data loaded from a separate HTTP-only URL belonging to CA...):

# vim: filetype=apache

<VirtualHost *:80>
	ServerAdmin noc@wikimedia.org
        ServerName status.wikimedia.org

	SSLEngine off
	
	RewriteEngine on
	RewriteCond %{SERVER_PORT} !^443$
	RewriteRule ^/(.*)$ https://status.wikimedia.org/$1 [L,R=301]

	ErrorLog /var/log/apache2/error.log

	# Possible values include: debug, info, notice, warn, error, crit,
	# alert, emerg.
	LogLevel warn

	CustomLog /var/log/apache2/access.log combined
	ServerSignature Off

</VirtualHost>
<VirtualHost *:443>
	ServerAdmin noc@wikimedia.org 
	ServerName status.wikimedia.org

        SSLEngine on
        SSLCertificateFile /etc/acme/cert/status.chained.crt
        SSLCertificateKeyFile /etc/acme/key/status.key
  	SSLProtocol all -SSLv2 -SSLv3
	SSLCipherSuite -ALL:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA
	SSLHonorCipherOrder On
	Header always set Strict-Transport-Security "max-age=31536000"

	<Location />
		ProxyPass "http://status.asm.ca.com/8777/"
		ProxyPassReverse "http://status.asm.ca.com/8777/"
		RequestHeader unset Accept-Encoding
		Header always set Content-Security-Policy upgrade-insecure-requests
		ProxyHTMLEnable On
		ProxyHTMLExtended On
		ProxyHTMLLinks	a		href
		ProxyHTMLLinks	area		href
		ProxyHTMLLinks	link		href
		ProxyHTMLLinks	img		src longdesc usemap
		ProxyHTMLLinks	object		classid codebase data usemap
		ProxyHTMLLinks	q		cite
		ProxyHTMLLinks	blockquote	cite
		ProxyHTMLLinks	ins		cite
		ProxyHTMLLinks	del		cite
		ProxyHTMLLinks	form		action
		ProxyHTMLLinks	input		src usemap
		ProxyHTMLLinks	head		profile
		ProxyHTMLLinks	base		href
		ProxyHTMLLinks	script		src for
		ProxyHTMLEvents	onclick ondblclick onmousedown onmouseup onmouseover onmousemove onmouseout onkeypress onkeydown onkeyup onfocus onblur onload onunload onsubmit onreset onselect onchange
   		ProxyHTMLURLMap //status\.asm\.ca\.com/8777(/|$) //status.wikimedia.org/ [Ri]
		ProxyHTMLURLMap //mongocache.asm.ca.com/ //status.wikimedia.org/.mongocache/
   		ProxyHTMLURLMap http:// https:// [i]
		SetOutputFilter proxy-html
	</Location>
        <Location /.mongocache>
		ProxyPass "http://mongocache.asm.ca.com/"
		ProxyPassReverse "http://mongocache.asm.ca.com/"
        </Location>

	<Location /.well-known/acme-challenge>
		ProxyPass "!"
	</Location>

	Alias "/.well-known/acme-challenge" "/var/acme/challenge"
	<IfVersion >= 2.4>
    	<Directory "/var/acme/challenge">
       		Require all granted
    	</Directory>
	</IfVersion>

	ErrorLog /var/log/apache2/error.log

	# Possible values include: debug, info, notice, warn, error, crit,
	# alert, emerg.
	LogLevel debug

	CustomLog /var/log/apache2/access.log combined
	ServerSignature Off

</VirtualHost>

Ran the initial acme-setup for self-signed:

/usr/local/sbin/acme-setup -i status -s status.wikimedia.org -m self -u acme

Reloaded apache2
Re-run to get a real cert:

/usr/local/sbin/acme-setup -i status -s status.wikimedia.org -u acme -m acme -w apache2

Created a cronjob running exactly the above once a day at 17:17, which will auto-renew when necessary.

(note, above has been edited a few times to correct missing stuff, will keep doing that so this task serves as a good reference)

Dzahn awarded a token.Jun 2 2016, 11:32 PM

status.wikimedia.org has no (valid) HTTPSClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

status.wikimedia.org has no (valid) HTTPS
Closed, ResolvedPublic
Actions

Related Objects
Search...