Decommission broken db1058
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jcrespo
	May 4 2016, 10:33 AM

Description

/admin1-> racadm serveraction powerstatus
Server power status: OFF
/admin1-> racadm serveraction powerup
ERROR: Timeout while waiting for server to perform requested power action.
/admin1-> racadm serveraction hardreset
ERROR: Timeout while waiting for server to perform requested power action.
/admin1-> racadm serveraction powerstatus
Server power status: OFF
/admin1-> racadm serveraction powerup
ERROR: Timeout while waiting for server to perform requested power action.

Details

Subject	Repo	Branch	Lines +/-
Remove db1058 entries	operations/dns	master	+0 -0
Remove DNS entries of db1058	operations/dns	master	+0 -4
Remove (almost) all references to db1058 on puppet	operations/puppet	production	+3 -8
Retire db1058 from the service group	operations/mediawiki-config	master	+0 -2
Depool db1070 for maintenance	operations/mediawiki-config	master	+1 -2

Customize query in gerrit

Related Objects

Mentioned In: rOPUP420404887f9b: Remove (almost) all references to db1058 on puppet
Mentioned Here: rODNS2016979ded61: Removing dns entries for db1058

Event Timeline

jcrespo created this task.May 4 2016, 10:33 AM

Restricted Application added a project: SRE. · View Herald TranscriptMay 4 2016, 10:33 AM

Restricted Application added subscribers: Zppix, Southparkfan, Aklapper. · View Herald Transcript

Resetting the interface does not do anything. Also trying to power it up from the web interface.

Console output after power on is inexistent.

jcrespo added a subscriber: Volans.May 4 2016, 10:58 AM

MoritzMuehlenhoff assigned this task to • Cmjohnson.May 4 2016, 11:12 AM

db1058 is most likely cooked. The server was almost too hot to touch. One of the power supplies has failed. I attempted to drain flea power but the server will not power on. I am letting it cool down to see if that helps.

The server is out of warranty now. In the past a main board replacement was the fix.

Thank you. This should be ones of the replaced ones from the new batch. Feel free to unrack it if you need the space.

I will keep this ticket open for decommission purposes.

• Cmjohnson renamed this task from db1058 does not come up after restart to Decommission broken db1058.May 5 2016, 8:47 PM

Change 287145 had a related patch set uploaded (by Southparkfan):
Remove DNS entries of db1058

https://gerrit.wikimedia.org/r/287145

gerritbot added a project: Patch-For-Review.May 5 2016, 9:13 PM

Just to learn how the process works, I've submitted a patch for the DNS adjustments. I noticed db1058 is referenced in the dhcpd and manifests/role/coredb.pp files in puppet but I have no idea how the latter one works, so I'll leave the puppet work to someone else.

Change 287183 had a related patch set uploaded (by Jcrespo):
Depool db1070 for maintenance

https://gerrit.wikimedia.org/r/287183

@Southparkfan We have a pretty strict way of removing servers. It is all documented here https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_or_Decommission.

We do this so we do not break anything else in the process or cause unnecessary alerts.

Change 287183 merged by Jcrespo:
Depool db1070 for maintenance

https://gerrit.wikimedia.org/r/287183

@Cmjohnson yeah, perhaps I have been a bit too fast by already doing the DNS part (despite that's the only thing I can do it seems) :-)

Anyway, ops know more than me, so they can do whatever is necessary.

Change 287224 had a related patch set uploaded (by Jcrespo):
Retire db1058 from the service group

https://gerrit.wikimedia.org/r/287224

Change 287224 merged by Jcrespo:
Retire db1058 from the service group

https://gerrit.wikimedia.org/r/287224

Confirm out of cluster/service group

Change 287591 had a related patch set uploaded (by Jcrespo):
Remove (almost) all references to db1058 on puppet

https://gerrit.wikimedia.org/r/287591

Change 287591 merged by Jcrespo:
Remove (almost) all references to db1058 on puppet

https://gerrit.wikimedia.org/r/287591

Change 287593 had a related patch set uploaded (by Jcrespo):
Remove db1058 entries

https://gerrit.wikimedia.org/r/287593

@Cmjohnson I have removed it from "mediawiki" and "puppet", dhcp, salt, puppet certs, neon. I have not removed it from netboot/preseed as a range is used and name should not be reused, but feel free to disagree.

I've left DNS unmerged, in case you want to do something with the management interface still: https://gerrit.wikimedia.org/r/287593

• Cmjohnson triaged this task as Low priority.May 13 2016, 6:07 PM

jcrespo mentioned this in rOPUP420404887f9b: Remove (almost) all references to db1058 on puppet.Jun 17 2016, 6:10 PM

DNS Removed...@jcrespo I do see some entries in puppet

manifests/role/coredb.pp: 'hosts' => { 'eqiad' => [ 'db1021', 'db1026', 'db1037', 'db1045', 'db1049', 'db1058' ] },
manifests/role/coredb.pp: 'masters' => { 'eqiad' => 'db1058' },

That is a deprecated script, and I am waiting for this week's failover to nuke it completely (coredb otherwise is not in use).

db1058 has been removed from rack

Change 287145 abandoned by Dzahn:
Remove DNS entries of db1058

Reason:
already done by chris in commit 2016979ded611256e5f4b321

https://gerrit.wikimedia.org/r/287145

Change 287593 abandoned by Dzahn:
Remove db1058 entries

Reason:
rebased to nothing

https://gerrit.wikimedia.org/r/287593

I have abandoned 2 pending changes in DNS repo for this, that were already duplicate by Chris' change. Just cleaning up.

Dzahn removed a project: Patch-For-Review.Aug 24 2016, 10:45 PM

Decommission broken db1058Closed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Decommission broken db1058
Closed, ResolvedPublic
Actions