Page MenuHomePhabricator

wtp1028 unresponsive
Closed, ResolvedPublic

Description

Today wtp1028 went down, nothing on console and powercycle / poweroff / poweron on console don't seem to work (console is blank)

14:10 -icinga-wm:#wikimedia-operations- PROBLEM - Host wtp1028 is DOWN: PING CRITICAL - Packet 
          loss = 100%
14:18  <godog> !log powercycle wtp1028 - nothing on console
14:18 -stashbot:#wikimedia-operations- Logged the message at 
          https://wikitech.wikimedia.org/wiki/Server_Admin_Log
14:26 -logmsgbot:#wikimedia-operations- !log filippo@puppetmaster1001 conftool action : 
          set/pooled=no; selector: name=wtp1028.eqiad.wmnet
14:26 -stashbot:#wikimedia-operations- Logged the message at 
          https://wikitech.wikimedia.org/wiki/Server_Admin_Log

Event Timeline

Interestingly this host logged a "system firmware progress post error 0Fh" error on the 27th

ID  | Date        | Time     | Name             | Type                     | Event
2   | Dec-27-2018 | 07:38:15 | POST Err         | System Firmware Progress | Event Offset = 0Fh

Host is back up now, but still depooled. I'll schedule some downtime and try rebooting to watch for error output during POST.

@herron @fgiunchedi I went to the data center on the 27th and powercycled the server. I thought I updated task but I don't see my update.

Mentioned in SAL (#wikimedia-operations) [2019-01-02T20:52:29Z] <herron> rebooting wtp1028 — looking for POST errors T212624

Not seeing any errors on the console. Host boots up without issues. Repooling.

herron claimed this task.
<logmsgbot> !log herron@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet