This task will track the hardware troubleshooting and repair of server <enter FQDN of server here>.
The first X steps should be completed by the person filing the hardware repair task. Some of these steps require access to [[ https://icinga.wikimedia.org/icinga/ | Icinga ]] to put a host into maintenance mode.
Steps for person filing task, all of these should be done immediately at the time of the task being filed via this template:
[] - pull up [[ https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook | hardware troubleshooting runbook ]] for step by step directions. The checkboxes below are summary level items.
[] - update all fields with <> on this task with the required info.
[] - look up device in netbox or use the naming conventions guide to determine where server is located.
[] - look up device in netbox to determine warranty status, APPEND THAT TO THE NEXT STEP:
[] - System warranty expires on: <enter warranty expiry info from netbox>
[] - append in the correct project for server location. IE: If server is in EQIAD, add #ops-eqiad to this tasks projects.
[] - attach detailed hardware failure log to this task via comment, see [[ https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook | hardware troubleshooting runbook ]] on how to accomplish this.
See runbook, as what steps you do next depend on the type of hw failure. Each step below will have the hwfailure type prepending the summary checkbox item:
[] - HDD/SSD/POWERSUPPLY FAILURE: System doesn't need to be offlined, simply reassign this to the proper onsite project with no name assigned & move intot he 'hardware troubleshooting' column of that projects workboard and it will be triaged.
[] - ALL OTHER FAILURES: System will need to be placed in a fully offline state for hardware troubleshooting by the onsite.
[] - ALL OTHER FAILURES: Set system and mgmt interface to maint mode (no checks/alarms on services) for 5 business days (this will often result in 7 calendar days due to weekends).
[] - ALL OTHER FAILURES: set puppet to disabled with comment of this task # (so it doesnt accidentally run and cause issues during hardware troubleshooting)
[] - ALL OTHER FAILURES: ensure task has no assignee, is set to the proper onsite project, and placed in the 'hardware troubleshooting' column on that workboard.