Page MenuHomePhabricator

Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet
Closed, ResolvedPublic

Description

pc1004 pc1005 and pc1006 leases expires the 31st Dec 2018 and they need to be returned (T204556) - so don't just stack with the other decoms, this is a high priority for return this month (December 2018!)

The new hosts are online (T208383)

pc1004

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

pc1005

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

pc1006

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 3 2018, 6:23 AM
Marostegui moved this task from Triage to In progress on the DBA board.Dec 3 2018, 6:23 AM
Marostegui triaged this task as High priority.
Marostegui updated the task description. (Show Details)Dec 3 2018, 6:26 AM

Change 477190 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc1004,1005,1006: Disable notifications

https://gerrit.wikimedia.org/r/477190

Change 477190 merged by Marostegui:
[operations/puppet@production] pc1004,1005,1006: Disable notifications

https://gerrit.wikimedia.org/r/477190

Marostegui updated the task description. (Show Details)Dec 3 2018, 6:36 AM

Change 477192 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Decommission pc1004,pc1005 and pc1006

https://gerrit.wikimedia.org/r/477192

Change 477192 merged by Marostegui:
[operations/puppet@production] mariadb: Decommission pc1004,pc1005 and pc1006

https://gerrit.wikimedia.org/r/477192

Marostegui updated the task description. (Show Details)Dec 3 2018, 6:49 AM

Mentioned in SAL (#wikimedia-operations) [2018-12-03T06:52:50Z] <marostegui> Remove pc1004, pc1005 and pc1006 from tendril and zarcillo - T210969

Mentioned in SAL (#wikimedia-operations) [2018-12-03T07:09:13Z] <marostegui> Stop MySQL on pc1004, pc1005 and pc1006 as they will be decommissioned - T210969

Marostegui updated the task description. (Show Details)Dec 3 2018, 7:11 AM
Marostegui reassigned this task from Marostegui to RobH.
Marostegui moved this task from In progress to Done on the DBA board.
Marostegui added a subscriber: Cmjohnson.

pc1004, pc1005 and pc1006 are now fully ready for DC-Ops to take over and finish their decommission.

Restricted Application added a project: Operations. · View Herald TranscriptDec 3 2018, 7:12 AM

Priority is high like T209858: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) because these have a hard deadline on the lease expiration

Marostegui added a parent task: Unknown Object (Task).Dec 3 2018, 7:14 AM
Marostegui mentioned this in Unknown Object (Task).Dec 3 2018, 7:16 AM
RobH moved this task from pending onsite steps (eqiad) to Backlog on the decommission board.

wmf-decommission-host was executed by robh for pc1004.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for pc1005.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for pc1006.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH added a comment.Dec 3 2018, 9:34 PM

Switch ports noted for @Cmjohnson to clear their descriptions once they are unracked:

pc1004 asw-a-eqiad:ge-3/0/18
pc1005 asw2-c-eqiad:ge-7/0/17
pc1006 asw2-d-eqaid:ge-3/0/30

RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Cmjohnson.
RobH moved this task from Backlog to pending onsite steps (eqiad) on the decommission board.
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH updated the task description. (Show Details)Mar 19 2019, 12:40 AM
RobH added a comment.Mar 26 2019, 6:43 PM

Please note these systems still need their SSDs securely erased per https://wikitech.wikimedia.org/wiki/Dc-operations/Securely_Erasing_Media

Cmjohnson closed this task as Resolved.Wed, Apr 24, 3:01 PM

All the disk were securely wiped and server reset to server defaults

Marostegui reopened this task as Open.Thu, Apr 25, 8:51 AM

@RobH @Cmjohnson there are still DNS entries for all these hosts:

templates/wmnet:pc1004          1H  IN A        10.64.0.12
templates/wmnet:pc1005          1H  IN A        10.64.32.72
templates/wmnet:pc1006          1H  IN A        10.64.48.128
templates/wmnet:pc1004          1H      IN A            10.65.2.33
templates/wmnet:pc1005          1H      IN A            10.65.2.34
templates/wmnet:pc1006          1H      IN A            10.65.2.35
templates/10.in-addr.arpa:12  1H IN PTR   pc1004.eqiad.wmnet.
templates/10.in-addr.arpa:72      1H IN PTR       pc1005.eqiad.wmnet.
templates/10.in-addr.arpa:128     1H  IN PTR      pc1006.eqiad.wmnet.
templates/10.in-addr.arpa:33  1H  IN PTR      pc1004.mgmt.eqiad.wmnet.
templates/10.in-addr.arpa:34  1H  IN PTR      pc1005.mgmt.eqiad.wmnet.
templates/10.in-addr.arpa:35  1H  IN PTR      pc1006.mgmt.eqiad.wmnet.

Change 506415 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Remove pc1004-pc1006

https://gerrit.wikimedia.org/r/506415

@RobH @Cmjohnson there are also entries on site.pp, I have sent a patch for that: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506415/
The DNS ones I would prefer if you handle them yourselves.

Change 506415 merged by Marostegui:
[operations/puppet@production] site.pp: Remove pc1004-pc1006

https://gerrit.wikimedia.org/r/506415

@RobH @Cmjohnson I have removed the spare role entries, so only pending the DNS entries described at: T210969#5136891

Marostegui reassigned this task from Cmjohnson to RobH.Fri, Apr 26, 5:00 AM

Change 506674 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removind dns entries for decom hosts pc1004-6

https://gerrit.wikimedia.org/r/506674

Change 506674 merged by Marostegui:
[operations/dns@master] Removind dns entries for decom hosts pc1004-6

https://gerrit.wikimedia.org/r/506674

So DNS is now clean
Are the switches ports also cleaned up? T210969#4795686

Marostegui updated the task description. (Show Details)Fri, Apr 26, 2:08 PM
Marostegui updated the task description. (Show Details)Fri, Apr 26, 2:21 PM

I have tried to set these servers as unracked, but I have failed to do so on Netbox (I guess I don't have rights)

Marostegui closed this task as Resolved.Fri, Apr 26, 2:43 PM

I read https://wikitech.wikimedia.org/wiki/Server_Lifecycle#States wrongly, the servers have to be Offline, I just set them like that. So we are all done!