This task will track the hardware troubleshooting and repair of server elastic1029.eqiad.wmnet
The first half of the steps should be completed by the person filing the hardware repair task. Some of these steps require access to Icinga to put a host into maintenance mode.
All Other Failures (memory, battery, controller, cpu, etc.)
- - Follow directions on https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#All_Other_Failures
- - Place system in fully offline state for hardware troubleshooting by the onsite. If it cannot be placed offline, coordinate with on-site engineer to schedule a maintenance window.
- - Set system and mgmt interface to maint mode (no checks/alarms on services) for 5 business days (excluding weekends).
- - attach detailed hardware failure log to this task via comment, see hardware troubleshooting runbook on how to accomplish this.
- - Assign task to proper assignee & onsite project, and place in 'hardware troubleshooting' column on workboard.
I don't currently have management interface access to run most of the commands.
I will like to coordinate with a DCops person to schedule downtime for this node.