https://wikitech.wikimedia.org/ is currently inaccessible.
Version: wmf-deployment
Severity: normal
URL: https://wikitech.wikimedia.org/
https://wikitech.wikimedia.org/ is currently inaccessible.
Version: wmf-deployment
Severity: normal
URL: https://wikitech.wikimedia.org/
So a few folks worked this issue, and I cannot speak for what they saw or did, only what I saw and worked on.
Virt0 reported a bunch of icinga alerts and was unresponsive. Supposedly it was reporting a ton of packet loss before the reboot. Then system was rebooted.
It failed to come back online properly, with alerts. I was NOT logged into the serial console. (Daniel would have to speak to that.)
I saw on the bonded ports (2 of them on switch) that eth0 had traffic flowing, but nothing on eth1. Checking out more than that on the switch is beyond me ;] (Set vlans, setup basic stuff, check very basic non bonded stuff sure, this, nope.)
Once folks reported virt0 was back, I could see traffic on both ports.
It just had about 95% packet loss and then the packet loss disappeard. That resulted in first thinking it's frozen, which it actually wasn't. It had uptime and running opendj, just puppetmaster wasn't running. And the reboot didn't change anything apparently. The problem disappeared way after that.