Page MenuHomePhabricator

Decommission parsercache hosts: pc1004.eqiad.wmnet pc1005.eqiad.wmnet pc1006.eqiad.wmnet
Closed, ResolvedPublic

Description

pc1004 pc1005 and pc1006 leases expires the 31st Dec 2018 and they need to be returned (T204556) - so don't just stack with the other decoms, this is a high priority for return this month (December 2018!)

The new hosts are online (T208383)

pc1004

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

pc1005

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

pc1006

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 3 2018, 6:23 AM
Marostegui triaged this task as High priority.Dec 3 2018, 6:23 AM
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)Dec 3 2018, 6:26 AM

Change 477190 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc1004,1005,1006: Disable notifications

https://gerrit.wikimedia.org/r/477190

Change 477190 merged by Marostegui:
[operations/puppet@production] pc1004,1005,1006: Disable notifications

https://gerrit.wikimedia.org/r/477190

Marostegui updated the task description. (Show Details)Dec 3 2018, 6:36 AM

Change 477192 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Decommission pc1004,pc1005 and pc1006

https://gerrit.wikimedia.org/r/477192

Change 477192 merged by Marostegui:
[operations/puppet@production] mariadb: Decommission pc1004,pc1005 and pc1006

https://gerrit.wikimedia.org/r/477192

Marostegui updated the task description. (Show Details)Dec 3 2018, 6:49 AM

Mentioned in SAL (#wikimedia-operations) [2018-12-03T06:52:50Z] <marostegui> Remove pc1004, pc1005 and pc1006 from tendril and zarcillo - T210969

Mentioned in SAL (#wikimedia-operations) [2018-12-03T07:09:13Z] <marostegui> Stop MySQL on pc1004, pc1005 and pc1006 as they will be decommissioned - T210969

Marostegui reassigned this task from Marostegui to RobH.Dec 3 2018, 7:11 AM
Marostegui updated the task description. (Show Details)
Marostegui moved this task from In progress to Done on the DBA board.
Marostegui added a subscriber: Cmjohnson.

pc1004, pc1005 and pc1006 are now fully ready for DC-Ops to take over and finish their decommission.

Restricted Application added a project: Operations. · View Herald TranscriptDec 3 2018, 7:12 AM

Priority is high like T209858: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) because these have a hard deadline on the lease expiration

Marostegui added a parent task: Unknown Object (Task).Dec 3 2018, 7:14 AM
Marostegui mentioned this in Unknown Object (Task).Dec 3 2018, 7:16 AM

wmf-decommission-host was executed by robh for pc1004.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for pc1005.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for pc1006.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH added a comment.Dec 3 2018, 9:34 PM

Switch ports noted for @Cmjohnson to clear their descriptions once they are unracked:

pc1004 asw-a-eqiad:ge-3/0/18
pc1005 asw2-c-eqiad:ge-7/0/17
pc1006 asw2-d-eqaid:ge-3/0/30

RobH reassigned this task from RobH to Cmjohnson.Dec 3 2018, 9:36 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH updated the task description. (Show Details)Mar 19 2019, 12:40 AM
RobH added a comment.Mar 26 2019, 6:43 PM

Please note these systems still need their SSDs securely erased per https://wikitech.wikimedia.org/wiki/Dc-operations/Securely_Erasing_Media

Cmjohnson closed this task as Resolved.Apr 24 2019, 3:01 PM

All the disk were securely wiped and server reset to server defaults

Marostegui reopened this task as Open.Apr 25 2019, 8:51 AM

@RobH @Cmjohnson there are still DNS entries for all these hosts:

templates/wmnet:pc1004          1H  IN A        10.64.0.12
templates/wmnet:pc1005          1H  IN A        10.64.32.72
templates/wmnet:pc1006          1H  IN A        10.64.48.128
templates/wmnet:pc1004          1H      IN A            10.65.2.33
templates/wmnet:pc1005          1H      IN A            10.65.2.34
templates/wmnet:pc1006          1H      IN A            10.65.2.35
templates/10.in-addr.arpa:12  1H IN PTR   pc1004.eqiad.wmnet.
templates/10.in-addr.arpa:72      1H IN PTR       pc1005.eqiad.wmnet.
templates/10.in-addr.arpa:128     1H  IN PTR      pc1006.eqiad.wmnet.
templates/10.in-addr.arpa:33  1H  IN PTR      pc1004.mgmt.eqiad.wmnet.
templates/10.in-addr.arpa:34  1H  IN PTR      pc1005.mgmt.eqiad.wmnet.
templates/10.in-addr.arpa:35  1H  IN PTR      pc1006.mgmt.eqiad.wmnet.

Change 506415 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Remove pc1004-pc1006

https://gerrit.wikimedia.org/r/506415

@RobH @Cmjohnson there are also entries on site.pp, I have sent a patch for that: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506415/
The DNS ones I would prefer if you handle them yourselves.

Change 506415 merged by Marostegui:
[operations/puppet@production] site.pp: Remove pc1004-pc1006

https://gerrit.wikimedia.org/r/506415

@RobH @Cmjohnson I have removed the spare role entries, so only pending the DNS entries described at: T210969#5136891

Marostegui reassigned this task from Cmjohnson to RobH.Apr 26 2019, 5:00 AM

Change 506674 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removind dns entries for decom hosts pc1004-6

https://gerrit.wikimedia.org/r/506674

Change 506674 merged by Marostegui:
[operations/dns@master] Removind dns entries for decom hosts pc1004-6

https://gerrit.wikimedia.org/r/506674

So DNS is now clean
Are the switches ports also cleaned up? T210969#4795686

Marostegui updated the task description. (Show Details)Apr 26 2019, 2:08 PM
Marostegui updated the task description. (Show Details)Apr 26 2019, 2:21 PM

I have tried to set these servers as unracked, but I have failed to do so on Netbox (I guess I don't have rights)

Marostegui closed this task as Resolved.Apr 26 2019, 2:43 PM

I read https://wikitech.wikimedia.org/wiki/Server_Lifecycle#States wrongly, the servers have to be Offline, I just set them like that. So we are all done!