/admin1-> racadm serveraction powerstatus Server power status: OFF /admin1-> racadm serveraction powerup ERROR: Timeout while waiting for server to perform requested power action. /admin1-> racadm serveraction hardreset ERROR: Timeout while waiting for server to perform requested power action. /admin1-> racadm serveraction powerstatus Server power status: OFF /admin1-> racadm serveraction powerup ERROR: Timeout while waiting for server to perform requested power action.
Description
Details
Related Objects
Event Timeline
Resetting the interface does not do anything. Also trying to power it up from the web interface.
Console output after power on is inexistent.
db1058 is most likely cooked. The server was almost too hot to touch. One of the power supplies has failed. I attempted to drain flea power but the server will not power on. I am letting it cool down to see if that helps.
The server is out of warranty now. In the past a main board replacement was the fix.
Thank you. This should be ones of the replaced ones from the new batch. Feel free to unrack it if you need the space.
I will keep this ticket open for decommission purposes.
- Confirm out of cluster/service group
- Remove from puppet stored configuration files.
- Remove from site.pp (puppet:///manifests/site.pp)
- Remove from netboot.cfg
- Remove from DHCPD lease file
- Disable puppet
- Remove from Icinga monitoring
- Revoke keys from puppet/salt
- Remove DNS entries for the production and management.
- Remove from Rack
- Update Racktables
Change 287145 had a related patch set uploaded (by Southparkfan):
Remove DNS entries of db1058
Just to learn how the process works, I've submitted a patch for the DNS adjustments. I noticed db1058 is referenced in the dhcpd and manifests/role/coredb.pp files in puppet but I have no idea how the latter one works, so I'll leave the puppet work to someone else.
Change 287183 had a related patch set uploaded (by Jcrespo):
Depool db1070 for maintenance
@Southparkfan We have a pretty strict way of removing servers. It is all documented here https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_or_Decommission.
We do this so we do not break anything else in the process or cause unnecessary alerts.
@Cmjohnson yeah, perhaps I have been a bit too fast by already doing the DNS part (despite that's the only thing I can do it seems) :-)
Anyway, ops know more than me, so they can do whatever is necessary.
Change 287224 had a related patch set uploaded (by Jcrespo):
Retire db1058 from the service group
Change 287591 had a related patch set uploaded (by Jcrespo):
Remove (almost) all references to db1058 on puppet
@Cmjohnson I have removed it from "mediawiki" and "puppet", dhcp, salt, puppet certs, neon. I have not removed it from netboot/preseed as a range is used and name should not be reused, but feel free to disagree.
I've left DNS unmerged, in case you want to do something with the management interface still: https://gerrit.wikimedia.org/r/287593
DNS Removed...@jcrespo I do see some entries in puppet
manifests/role/coredb.pp: 'hosts' => { 'eqiad' => [ 'db1021', 'db1026', 'db1037', 'db1045', 'db1049', 'db1058' ] },
manifests/role/coredb.pp: 'masters' => { 'eqiad' => 'db1058' },
That is a deprecated script, and I am waiting for this week's failover to nuke it completely (coredb otherwise is not in use).
Change 287145 abandoned by Dzahn:
Remove DNS entries of db1058
Reason:
already done by chris in commit 2016979ded611256e5f4b321
I have abandoned 2 pending changes in DNS repo for this, that were already duplicate by Chris' change. Just cleaning up.