|operations/puppet : production||rcstream: remove internal TLS listener|
|operations/dns : master||Remove stream-lb.eqiad hostname|
|operations/puppet : production||remove rcstream lvs::realserver config|
|operations/puppet : production||Remove old rcstream public LVS config in conftool-data|
|operations/puppet : production||Remove old rcstream public LVS config|
|operations/dns : master||stream.wm.o: move to cache_misc in DNS|
|operations/puppet : production||frontend VCL: stream.wm.o TLS exception|
|operations/puppet : production||stream: use hash(X-Client-IP) for backend selection|
|operations/puppet : production||cache_misc: pass all stream.wm.o|
|operations/puppet : production||cache_misc: add stream.wm.o|
- Mentioned In
- rOPUP2fb9285fc2d6: rcstream: remove internal TLS listener
rOPUP46512b0a48ab: rcstream: remove internal TLS listener
rOPUP1a0a640eafc0: remove rcstream lvs::realserver config
rOPUPfa409666ff9f: Remove old rcstream public LVS config in conftool-data
rOPUP4b8287241484: Remove old rcstream public LVS config
rOPUPc95d68dfd82b: remove rcstream lvs::realserver config
rOPUP11fb00455615: Remove old rcstream public LVS config in conftool-data
rOPUPf57ddb7d4586: Remove old rcstream public LVS config
rOPUPdefa226365d5: Remove old rcstream public LVS config in hieradata
rOPUP3f22ea9f74ff: Remove old rcstream public LVS config in conftool-data
rOPUP61c6d142aee1: Remove old rcstream public LVS config in conftool-data
rOPUP77c18dd4c86a: Remove old rcstream public LVS config
rOPUP1f8d35fba473: Remove old rcstream public LVS config
T140128: HTTPS-only for stream.wikimedia.org
rOPUP24dd863ff9c2: cache_misc: add stream.wm.o
T137915: stream.wikimedia.org doesn't redirect to HTTPS
rOPUP121a39619ab3: frontend VCL: stream.wm.o TLS exception
rOPUP9f616b769b3d: stream: use hash(X-Client-IP) for backend selection
rOPUP6d4745d0469a: cache_misc: pass all stream.wm.o
rOPUP6ff2290d1fa4: cache_misc: add stream.wm.o
This seems to be working now. It's fully-configured on cache_misc other than switching the DNS resolution for stream.wm.o to cache_misc, and can be tested by hacking local DNS resolution.
I've been testing it with a minimal python client as shown in https://wikitech.wikimedia.org/wiki/RCStream#Python , but with the constructor changed from socketIO_client.SocketIO('stream.wikimedia.org', 80) to socketIO_client.SocketIO('https://stream.wikimedia.org').
If I change the constructor back to using port 80, the client fails with: websocket._exceptions.WebSocketBadStatusException: Handshake status 301, indicating it can't handle the HTTP->HTTPS redirect before upgrading HTTP to websockets.
The current stream.wm.o (which uses LVS to talk directly to rcs100) doesn't redirect on port 80, allowing cleartext rcstream connections or encrypted ones (client's choice, but I guess they have to be explicit in their config).
Is 301->HTTPS not legal for websockets for some reason? Are lots of clients going to break if we do this regardless?
We've got ~13 days until we need to renew the existing SSL cert (or not, if we can switch to cache_misc).
Looking into the Websockets RFC ( https://tools.ietf.org/html/rfc6455 ), it says in section 4.1:
Once the client's opening handshake has been sent, the client MUST wait for a response from the server before sending any further data. The client MUST validate the server's response as follows: 1. If the status code received from the server is not 101, the client handles the response per HTTP [RFC2616] procedures. In particular, the client might perform authentication if it receives a 401 status code; the server might redirect the client using a 3xx status code (but clients are not required to follow them), etc. Otherwise, proceed as follows.
Another datapoint, in nginx logs on rcs1001, most successful operations seem to be non-SSL:
root@rcs1001:/var/log/nginx# grep -v rcstream_status rcstream_access.log|grep -c socket.io 1254 root@rcs1001:/var/log/nginx# grep -v rcstream_status rcstream_ssl_access.log|grep -c socket.io 77
Obviously, if we can't fix the existing non-SSL clients in a timely fashion (or can't assume they can handle redirects), our other option is to punch a temporary hole in cache_misc and make it not force redirects for stream.wm.o (and set a timeline for removing the hole).
With the TLS hole punched above, the clients do work correctly with plain HTTP. This is status-quo, as the current service also allows non-TLS and most clients are using non-TLS today. I think at this point it's worth taking this route - it will get rcstream moved to standard termination, at which point we can:
- remove its public LVS service
- remove the HTTPS listener on rcs100x
- not renew the cert that's expiring in 13 days.
- open a separate ticket about switching off unencrypted HTTP for rcstream (reverting https://gerrit.wikimedia.org/r/#/c/294346/ ) at a later date after announcements and validation, etc.
What's needed now is some confirmation beyond my manual testing with the sample python and js client code from wikitech. Can someone confirm that real clients work (with local DNS hacks to remap stream.wm.o -> misc-web-lb)?
We just tested this via /etc/hosts entry + http://codepen.io/Krinkle/pen/laucI/?editors=0010 and it worked. The implementations that we are familiar with are all based on one of these three clients (node.js, Python, frontend js) so we should be good. The plan of record (per IRC discussion) is for me to announce the change on wikitech-l tomorrow (Thursday, June 16), giving people two days' advance notice, and using the opportunity to remind people to use https. @BBlack will do the actual switch at his convenience.