Page MenuHomePhabricator

HTTPS-only for stream.wikimedia.org
Closed, DeclinedPublic

Description

stream.wikimedia.org was moved behind cache_misc in T134871. It was noted during this transition that most of the current clients are using unencrypted HTTP, and that our default/sample websocket client implementations tend to break on a 301 redirect rather than follow it, so this transition may be painful.

We should start with ensuring our own documentation uses https:// and/or wss:// URLs for the stream service as appropriate, and make some announcements to the community about the problem and a plea to update their URLs for secure access, and set a future date on which we'll make this service redirect to HTTPS with 301 like all of our other public, cache-terminated hostnames in wikimedia.org.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

Change 299892 had a related patch set uploaded (by Ori.livneh):
rcstream: log X-Forwarded-Proto

https://gerrit.wikimedia.org/r/299892

Change 299892 merged by Ori.livneh:
rcstream: log X-Forwarded-Proto

https://gerrit.wikimedia.org/r/299892

Have we sent any announcement about this to the community? We might have already, just not tracked in here.

Looking at the past couple days of access logs from ori's nginx patch above, it looks like the current split is still 88% insecure, 12% secure :/

I did some digging and found /shared/pywikipedia/core/pywikibot/comms/rcstream.py in tools suggests stream.wikimedia.org on port 80

That's at least one bot in tools fixed. Can you filter those access logs down to labs entries only (208.80.155.128 - 208.80.155.255), and write the result to my home directory on some production server?

current access log is only about 9% https and a chunk of that is all Catchpoint monitoring.

About 32% are from python-requests UA, of which under 1% use https :/

Can you filter those access logs down to labs entries only (208.80.155.128 - 208.80.155.255), and write the result to my home directory on some production server?

I only see the 10.64.x.x IPs of cp servers in the access log. it has the http_x_forwarded_proto but not the remote IP.

Can you filter those access logs down to labs entries only (208.80.155.128 - 208.80.155.255), and write the result to my home directory on some production server?

I only see the 10.64.x.x IPs of cp servers in the access log. it has the http_x_forwarded_proto but not the remote IP.

We could perhaps enable apache logging of the X-Client-IP header to see through the caches for this.

We could perhaps enable apache logging of the X-Client-IP header to see through the caches for this.

Done in https://gerrit.wikimedia.org/r/#/c/318296/ (forgot to put Bug: link in there)

At a glance, it seems like the bulk of the query traffic comes from GCE and AWS, and the bulk of it's still not HTTPS.

We're going to leave this as-is and assume eventstream replacement (which will be HTTPS-only from the get-go) will handle this for us.