This project is for all hardware failure and troubleshooting tasks being requested of the #dc-ops team via any of the on-site projects: #ops-codfw, #ops-eqdfw, #ops-eqiad, #ops-eqord, #ops-eqsin, #ops-ulsfo.
When you file a request, you need to pick the proper site that the server is located in, and append in that specific site's project. Example: If a server is located in eqiad, its #ops-eqiad. Then you also need to assign it to a user that handles that site.
The project/site to user assignment is as follows:
| site/project | user
| #ops-codfw | @Papaul
| #ops-eqdfw | @Papaul
| #ops-eqiad | @Cmjohnson
| #ops-eqord | @RobH
| #ops-eqsin | @RobH
| #ops-ulsfo | @RobH
All of these tasks have a set number of checklist items that must be accomplished by the person filing the task, **at the time of filing the task!**
Failure to follow the full checklist will result in delays from the normal timeline of repair. All hardware failures are considered high priority initially, but may be re-triaged to normal after review by #dc-ops and the individual server's timeline for repair.
ALL tasks should have the [[ https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook | hardware troubleshooting runbook ]] applied to the task/steps!