Remove labnodepool1001.eqiad.wmnet
Open, NormalPublic

Description

The Nodepool service is being phased out. It is running on labnodepool1001.eqiad.wmnet which is in the WMCS support LAN. There are firewall rules between production (contint1001 / contint2001) and the WMCS network.

The service can be dropped at anytime, it is no more being used.

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/labnodepool1001.eqiad.wmnet --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)
hashar created this task.Thu, Nov 15, 8:42 PM

Change 473838 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] Make labnodepool1001.eqiad.wmnet a spare system

https://gerrit.wikimedia.org/r/473838

Mentioned in SAL (#wikimedia-operations) [2018-11-15T21:05:38Z] <hashar> Stopped nodepool on labnodepool1001.eqiad.wmnet . Service is no more used. T209361 T209642

hashar updated the task description. (Show Details)Thu, Nov 15, 9:28 PM
hashar updated the task description. (Show Details)
hashar removed hashar as the assignee of this task.Thu, Nov 15, 9:31 PM

CI no more rely on the service that is hosted on labnodepool1001.eqiad.wmnet.

I have manually stopped the nodepool service on the host and disabled the two Icinga checks related to it.

The system can be wiped entirely, Release-Engineering-Team has no need for backups (unless ops need it for security/forensic or whatever reasons).

cloud-services-team might be interested in the machine since it is in the labs support network (to provide a service to WMCS instances).

aborrero added a subscriber: aborrero.

cloud-services-team might be interested in the machine since it is in the labs support network (to provide a service to WMCS instances).

We could discuss in our team meeting. Do we have info on specs and expiration dates for the HW?

ArielGlenn triaged this task as Normal priority.Fri, Nov 16, 11:45 AM
Dzahn added a subscriber: Dzahn.Fri, Nov 16, 2:33 PM
Do we have info on specs and expiration dates for the HW?

https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=1206

HW warranty expiration: 2014-01-27

HW type: Dell PowerEdge R610

This specific HW is /very/ old and is already overdue for decomissioning (by 3 years no less).

But more generally, (re)allocation of hardware does not work like that. If there are any needs (budgeted or unbudgeted) feel free to submit a hardware-requests task, or in case of an odd ask, to reach out to me or @mark :)

Change 473838 merged by Andrew Bogott:
[operations/puppet@production] Make labnodepool1001.eqiad.wmnet a spare system

https://gerrit.wikimedia.org/r/473838

Andrew claimed this task.Tue, Nov 27, 4:41 PM
Andrew moved this task from Needs discussion to Doing on the cloud-services-team (Kanban) board.
Dzahn removed a subscriber: Dzahn.Tue, Nov 27, 4:49 PM
RobH moved this task from Backlog to Decommission on the ops-eqiad board.Wed, Dec 12, 11:34 PM