Page MenuHomePhabricator

puppetserver1001.eqiad.wmnet is unresponsive
Open, MediumPublic

Description

puppsetserver1001.eqiad.wmnet is unresponsive to SSH.

7:02 PM <+icinga-wm> PROBLEM - SSH on puppetserver1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring

Grafana graphs stop at about the same time (and were unremarkable before they stopped):

image.png (654×1 px, 106 KB)

Event Timeline

Also unable to login via the serial console.

Restarted via the drac and everything seems OK now. I skimmed the logs and didn't see anything that seemed unusual prior to the event.

Volans triaged this task as Medium priority.Mon, Apr 29, 2:33 PM