cp2029 has been depooled due to potentially bad hardware memory discovered during a run-puppet-agent:
May 12 21:04:43 cp2029 kernel: Disabling lock debugging due to kernel taint May 12 21:04:43 cp2029 kernel: mce: [Hardware Error]: Machine check events logged May 12 21:04:43 cp2029 kernel: mce: Uncorrected hardware memory error in user-access at 3a7d753880 May 12 21:04:43 cp2029 kernel: Memory failure: 0x3a7d753: Sending SIGBUS to puppet:2533255 due to hardware memory corruption May 12 21:04:43 cp2029 kernel: Memory failure: 0x3a7d753: recovery action for dirty LRU page: Recovered
The host has been depooled and can safely be serviced. Thanks!
- - Provide FQDN of system.
- - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
- - Put system into a failed state in Netbox.
- - Provide urgency of request, along with justification (redundancy, dependencies, etc)
- - Describe issue and/or attach hardware failure log. (Refer to https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook if you need help)
- - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.