Page MenuHomePhabricator

Decommission labsdb1002
Open, LowPublic

Description

Due to disk failure and replacements about to be installed, usual steps decommission steps have to be done properly (I thought they had already been done):

  • - Check the ownership of the disks
  • - wipe ALL disks on labsdb1002.eqiad.wmnet
  • - unrack (?) this is one of the pending cisco machines
  • - remove mgmt dns entries
  • - update racktables
  • - add decommissioned servers to decom tracking tab
  • - assign task back to @RobH to remove the switch configuration description/vlan assignments.

Ok to wait for labsdb1001 and labsdb1003, which will be done in some months' time, but please state so explicitly.

Details

Related Gerrit Patches:

Event Timeline

jcrespo created this task.Sep 23 2016, 9:12 AM
Restricted Application added a project: Operations. · View Herald TranscriptSep 23 2016, 9:12 AM
Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript
jcrespo triaged this task as Low priority.Sep 23 2016, 3:02 PM
jcrespo edited projects, added ops-eqiad; removed ops-codfw.

Change 312528 had a related patch set uploaded (by Jcrespo):
labsdb1002: remove from dhcp install server config

https://gerrit.wikimedia.org/r/312528

Change 312528 merged by Jcrespo:
labsdb1002: remove from dhcp install server config

https://gerrit.wikimedia.org/r/312528

Change 313894 had a related patch set uploaded (by Jcrespo):
wmnet: remove labsdb1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/313894

Change 313896 had a related patch set uploaded (by Jcrespo):
toolschecker: remove all references to labsdb1002

https://gerrit.wikimedia.org/r/313896

I cannot access the serial interface. Please @Cmjohnson make sure labsdb1002 is down next time you go to the datacenter- I no longer can connect to the host but it still respond to pings.

Actual decommission can wait, but we must make sure the host is fully switched off (it is no longer receiving security patches).

RobH added a comment.EditedOct 4 2016, 3:45 PM

Please note that the process for decommissioning hosts has been clarified/updated from the recent ops offsite meeting.

Please review https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_to_Spares_OR_Decommission

The process has been simplified and the steps needed by general ops (versus DC Ops) has been lessened.

(So the steps listed in the task description actually aren't in the proper order and now many of them are handled by dc-ops.) This should also have hardware-requests appended once its ready for DC ops involvement.

Change 313896 merged by Rush:
toolschecker: remove all references to labsdb1002

https://gerrit.wikimedia.org/r/313896

Change 313894 merged by Jcrespo:
wmnet: remove labsdb1002.eqiad.wmnet

https://gerrit.wikimedia.org/r/313894

labsdb1002 is one of the remaining Cisco UCS servers.

Cmjohnson closed this task as Resolved.Mar 3 2017, 4:45 PM

Removed from rack

Change 548257 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] wmnet: cleanup unused labsdb1002 entries

https://gerrit.wikimedia.org/r/548257

aborrero reopened this task as Open.Nov 5 2019, 3:37 PM
aborrero added subscribers: faidon, aborrero.

Reopening task per @faidon suggestion, since there are a few leftover bits to be resolved:

Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Nov 13 2019, 3:56 PM

@Jclark-ctr @wiki_willy what's the status here? It sounds like a decom that was only partial and that only needs a few more steps to finalize perhaps?

Chatted John a bit on this earlier, who was also talking to Rob about it last week. I think we're all good now, and this should be progressing along soon. Thanks, Willy