Page MenuHomePhabricator

Remove labnodepool1001.eqiad.wmnet
Open, NormalPublic

Description

The Nodepool service is being phased out. It is running on labnodepool1001.eqiad.wmnet which is in the WMCS support LAN. There are firewall rules between production (contint1001 / contint2001) and the WMCS network.

The service can be dropped at anytime, it is no more being used.

START NON-INTERRUPPTABLE STEPS

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox status with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

hashar created this task.Nov 15 2018, 8:42 PM

Change 473838 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] Make labnodepool1001.eqiad.wmnet a spare system

https://gerrit.wikimedia.org/r/473838

Mentioned in SAL (#wikimedia-operations) [2018-11-15T21:05:38Z] <hashar> Stopped nodepool on labnodepool1001.eqiad.wmnet . Service is no more used. T209361 T209642

hashar updated the task description. (Show Details)Nov 15 2018, 9:28 PM
hashar updated the task description. (Show Details)
hashar removed hashar as the assignee of this task.Nov 15 2018, 9:31 PM

CI no more rely on the service that is hosted on labnodepool1001.eqiad.wmnet.

I have manually stopped the nodepool service on the host and disabled the two Icinga checks related to it.

The system can be wiped entirely, Release-Engineering-Team has no need for backups (unless ops need it for security/forensic or whatever reasons).

cloud-services-team might be interested in the machine since it is in the labs support network (to provide a service to WMCS instances).

aborrero added a subscriber: aborrero.

cloud-services-team might be interested in the machine since it is in the labs support network (to provide a service to WMCS instances).

We could discuss in our team meeting. Do we have info on specs and expiration dates for the HW?

ArielGlenn triaged this task as Normal priority.Nov 16 2018, 11:45 AM
Dzahn added a subscriber: Dzahn.Nov 16 2018, 2:33 PM
Do we have info on specs and expiration dates for the HW?

https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=1206

HW warranty expiration: 2014-01-27

HW type: Dell PowerEdge R610

This specific HW is /very/ old and is already overdue for decomissioning (by 3 years no less).

But more generally, (re)allocation of hardware does not work like that. If there are any needs (budgeted or unbudgeted) feel free to submit a hardware-requests task, or in case of an odd ask, to reach out to me or @mark :)

Change 473838 merged by Andrew Bogott:
[operations/puppet@production] Make labnodepool1001.eqiad.wmnet a spare system

https://gerrit.wikimedia.org/r/473838

Andrew claimed this task.Nov 27 2018, 4:41 PM
Andrew moved this task from Needs discussion to Doing on the cloud-services-team (Kanban) board.
Dzahn removed a subscriber: Dzahn.Nov 27 2018, 4:49 PM
RobH moved this task from Backlog to Decommission on the ops-eqiad board.Dec 12 2018, 11:34 PM
RobH added a comment.Dec 14 2018, 9:59 PM

labnodepool1001 asw2-b-eqiad ge-3/0/18

wmf-decommission-host was executed by robh for labnodepool1001.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)

Change 479855 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Clean up remaining nodepool/labnodepool1001 refs

https://gerrit.wikimedia.org/r/479855

Change 479855 merged by Andrew Bogott:
[operations/puppet@production] Clean up remaining nodepool/labnodepool1001 refs

https://gerrit.wikimedia.org/r/479855

Change 479856 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom labnodepool1001 prod dns

https://gerrit.wikimedia.org/r/479856

RobH updated the task description. (Show Details)Dec 14 2018, 10:20 PM

Change 479856 merged by RobH:
[operations/dns@master] decom labnodepool1001 prod dns

https://gerrit.wikimedia.org/r/479856

RobH updated the task description. (Show Details)Dec 14 2018, 10:21 PM
RobH removed a project: Patch-For-Review.
RobH reassigned this task from RobH to Cmjohnson.
RobH added a subscriber: RobH.

This is ready for disk wipe and remainder of steps to decom the system.

Change 480713 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove Hiera file obsoleted by nodepool removal

https://gerrit.wikimedia.org/r/480713

Change 480713 merged by Muehlenhoff:
[operations/puppet@production] Remove Hiera file obsoleted by nodepool removal

https://gerrit.wikimedia.org/r/480713

There is a netbox entry for this host: https://netbox.wikimedia.org/dcim/devices/1638/ CC T214499

We may want to delete? I'm not sure about the policy.