- - Provide FQDN of system. an-worker1088.eqiad.wmnet
- - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
- - Put system into a failed state in Netbox.
- - Provide urgency of request, along with justification (redundancy, dependencies, etc)
- - Describe issue and/or attach hardware failure log. (Refer to https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook if you need help)
- - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.
Hello. Please could you replace the RAID controller battery in an-worker1088, when it's convenient for you to do so?
I can shut down the machine for you ahead of time. I haven't marked the machine as failed in netbox, because it's still running, just a bit more slowly than it should.
It's not super-urgent, but it's operating with reduced performance until we replace it. I've done some troubleshooting in T336077: MegaRAID error on an-worker1088 and tried upgrading the firmware of the RAID controller. However, the server is one of those in the batch identified in T318659: Multiple RAID battery failures on hadoop worker hosts so it is not unexpected that the battery should fail around this time.