Page MenuHomePhabricator

Make tools-login / bastion hosts redundant and move them to trusty
Closed, ResolvedPublic


Time to move on. We'll have tools-precise made available for people who still want to use precise.

This wouldn't affect anything else. tools-trusty will still point to a resolvable address.

We will stagger this so that there is always a bastion or two for people to connect from.

This should also provide two hosts, so when one is down due to any hardware failure, we can just switch the floating IP to the other host.

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Toolforge.
yuvipanda added subscribers: yuvipanda, scfc, coren, Legoktm.

Actually, since they are just DNS entries, I can just point them to a different instance, and make backport ones for the older hosts. So people would just get routed to the new address at some point...

This would have people perhaps freaking out with the ssh key mismatch. Perhaps an motd for the occassion?

tools-login is now tools-bastion-01, and maybe I can create tools-bastion-02, and *that* is dev, and they share ssh certificates and we can switch them around if needed (for redundancy!).

After that we can get rid of tools-login and tools-dev and tools-trusty. And I guess we can actually not provide a precise instance at all.

Hmm, actually we probably need to provide a precise instance as long as we have tools running on precise.

I think since we moved the crontabs from tools-login, there is no real compelling reason to differentiate between different bastions; users should be nice to each other everywhere :-).

I still have a Bad Feeling™ about instances sharing host keys. The possibility of one host name being used for multiple instances (yes, you didn't propose that, but …) makes me even more uncomfortable as it makes debugging impossible. ("Did you encounter the problems on tools-dev or tools-dev?")

(Caution: Before tools-login is removed, the toolwatcher role must be enabled on another instance.)

ah, right. I wonder if toolwatcher should actually be on tools-master or something? I guess we can put that up on both tools-master and tools-shadow, so it has redundancy...

@scfc: So the benefits of having bastion-01 and bastion-02 is that when a virt node goes down we can just switch the floating IP as well (hmm, need to see if you can have two IPs refer to the same instance). Also, on -dev if you're compiling something with -j8 you can lock up the CPU for a fair bit... And I"d like to still provide users that ability :D

Re-using host keys also makes my skin crawl, but I still have no idea why....

yuvipanda renamed this task from Move tools-login and tools-dev to trusty to Make tools-login / bastion hosts redundant and move them to trusty.Mar 25 2015, 9:52 PM
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: ToolLabs-Goals-Q4.
yuvipanda set Security to None.
yuvipanda moved this task from Backlog to Redundancy on the ToolLabs-Goals-Q4 board.

I think toolwatcher could run on two instances with low risk of race conditions.

Regarding redundancy for bastions, sure, but I do remember vividly on Sunday wondering why ssh tools-login.eqiad.wmflabs connected me to tools-bastion-01.eqiad.wmflabs before I noticed my corresponding setting in ~/.ssh/config :-). Having to use another bastion instance is only relevant for active developers who want to log in during an outage of one virtual node, but not reading labs-l or #wikimedia-labs. So to me the benefit feels rather small (compared to redundancy for web services & Co.).

scfc triaged this task as Low priority.Apr 6 2015, 7:38 AM
scfc moved this task from Backlog to Ready to be worked on on the Toolforge board.

Alright, tools-dev should move in, say, a week. 22nd April.

Writing announcement email now.

yuvipanda claimed this task.

Done :)