Maniphest T205195

Please fix my screw-up - unbreak SSH access to deployment-maps03 VM
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krenair
	Sep 22 2018, 6:27 PM

Description

I was looking at some T153468 problems and noticed Puppet was no longer appeared to be managing ferm on this machine. I went to remove ferm - so I did iptables-save > iptables.bak (this was in /root) and iptables -F and then promptly realised that I had locked myself out of the instance (didn't notice the default policy was to drop). In the past I think I would've been able to use salt to fix it, and in prod there's the serial consoles, but I don't know what the present workaround for this is in labs. Any chance someone can fix this?
Here's a copy of the iptables.bak file:

# Generated by iptables-save v1.4.21 on Sat Sep 22 18:18:15 2018
*raw
:PREROUTING ACCEPT [89369:45802432]
:OUTPUT ACCEPT [79604:17226958]
-A PREROUTING -p tcp -m tcp --dport 6379 -j NOTRACK
-A OUTPUT -p tcp -m tcp --sport 6379 -j NOTRACK
COMMIT
# Completed on Sat Sep 22 18:18:15 2018
# Generated by iptables-save v1.4.21 on Sat Sep 22 18:18:15 2018
*filter
:INPUT DROP [21:3714]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [79604:17226958]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m pkttype --pkt-type multicast -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -j DROP
-A INPUT -p icmp -j ACCEPT
-A INPUT -s 10.68.17.232/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.18.65/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.18.66/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.18.68/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 9042 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 9160 -j ACCEPT
-A INPUT -s 10.68.21.205/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.20.135/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 6533 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 7000 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 7199 -j ACCEPT
-A INPUT -s 10.68.16.210/32 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 5432 -j ACCEPT
-A INPUT -s 10.196.0.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.196.16.0/21 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.196.32.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.196.48.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.68.0.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.68.16.0/21 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.68.32.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.68.48.0/24 -p tcp -m tcp --dport 9100 -j ACCEPT
-A INPUT -s 10.68.18.91/32 -p tcp -m tcp --dport 6379 -j ACCEPT
-A INPUT -s 10.68.18.66/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.18.68/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.68.21.105/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 6534 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 6535 -j ACCEPT
COMMIT

Related Objects

Mentioned Here: T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs

Event Timeline

Krenair created this task.Sep 22 2018, 6:27 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 22 2018, 6:27 PM

Krenair updated the task description. (Show Details)Sep 22 2018, 6:34 PM

That VM was OOM and killing processes right and left, so it's possible we were locked out by sshd dying or something else unrelated to the iptables change. In any case, rebooting has restored ssh access.

Mentioned in SAL (#wikimedia-cloud) [2018-09-23T01:05:02Z] <andrewbogott> rebooted deployment-maps03; OOM and also T205195

Oh, and to answer your main question -- there isn't a great workaround for accessing VMs when ssh stops working. Salt was good for that but was also a lot of trouble to maintain and I almost never miss it. We also for a while had a remote-console system set up but it was /also/ more trouble than it was worth (it broke a lot, and was very hard to do securely). So now we just fall back on mounting the drive and tinkering with it when we get desperate.

In T205195#4608607, @Andrew wrote:

That VM was OOM and killing processes right and left, so it's possible we were locked out by sshd dying or something else unrelated to the iptables change. In any case, rebooting has restored ssh access.

It happened pretty much the instant that iptables -F returned. Would be a big coincidence. Thank you!

I've now run iptables -P INPUT ACCEPT, iptables -F, apt-get remove ferm and running puppet again shows ferm has not been re-installed, confirming that ferm was unmanaged by puppet. (stuff still works there btw :))

I've been thinking about how rebooting fixed this - I think because ferm was still installed, rebooting it triggered ferm to replace the iptables rules. (Still, I don't want ferm being installed somewhere that puppet is not keeping it up to date, just strikes me as a liability next time we do something like replace a bastion.)

Please fix my screw-up - unbreak SSH access to deployment-maps03 VMClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Please fix my screw-up - unbreak SSH access to deployment-maps03 VM
Closed, ResolvedPublic
Actions