Page MenuHomePhabricator

decommission: labtestservices2001.wikimedia.org
Closed, ResolvedPublic

Description

This task will track the decommission-hardware of server labtestservices2001.wikimedia.org.

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

labtestservices2001.wikimedia.org

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw-b-codfw:ge-8/0/12
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptMar 11 2019, 12:17 PM
aborrero updated the task description. (Show Details)Mar 11 2019, 1:07 PM

right now labtestservices2001 is the only host for the labtest ldap db. So we should move that someplace before we decom, unless we want to start with a fresh db entirely.

Change 497293 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] wmcs: decommision several codfw servers

https://gerrit.wikimedia.org/r/497293

Mentioned in SAL (#wikimedia-operations) [2019-03-18T13:03:58Z] <arturo> T218022 disable icinga checks for labtestservices2001.wikimedia.org

aborrero renamed this task from Hardware decommission: labtestservices2001.wikimedia.org to decommission: labtestservices2001.wikimedia.org.Mar 18 2019, 1:04 PM
aborrero updated the task description. (Show Details)
aborrero added a subscriber: RobH.
aborrero changed the task status from Open to Stalled.Mar 18 2019, 1:38 PM

right now labtestservices2001 is the only host for the labtest ldap db. So we should move that someplace before we decom, unless we want to start with a fresh db entirely.

I don't really know what that database is about. But perhaps we want to do it at the same time as T218569: Openstack codfw DBs: move to m5-master.eqiad.wmnet. Would you mind updating that tickets so we have all the DB-reallocating info in a single place?

I will block this task on that so we don't accidentally wipe the server :-)

aborrero updated the task description. (Show Details)Mar 18 2019, 1:39 PM

I don't really know what that database is about. But perhaps we want to do it at the same time as T218569: Openstack codfw DBs: move to m5-master.eqiad.wmnet. Would you mind updating that tickets so we have all the DB-reallocating info in a single place?

It's not a mysql database. Labtest has its own testing ldap -- that ldap is stored on ldapservices1001 so we'd lose all that state unless we sync this to a different ldap server.

I don't really know what that database is about. But perhaps we want to do it at the same time as T218569: Openstack codfw DBs: move to m5-master.eqiad.wmnet. Would you mind updating that tickets so we have all the DB-reallocating info in a single place?

It's not a mysql database. Labtest has its own testing ldap -- that ldap is stored on ldapservices1001 so we'd lose all that state unless we sync this to a different ldap server.

OK, Then we should probably create a LDAP server in codfw if we want to have both environments as close as possible? I'm also fine if we just copy&paste the LDAP DB to another server.

Mentioned in SAL (#wikimedia-operations) [2019-03-25T07:58:21Z] <vgutierrez> disable puppet and downtime host in icinga for labtestservices2001 - T218022

Dzahn moved this task from Backlog to Decommission on the ops-codfw board.Apr 12 2019, 12:07 AM
aborrero updated the task description. (Show Details)Apr 22 2019, 10:59 AM

Change 505629 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] labtestservices2001: use spare role

https://gerrit.wikimedia.org/r/505629

Change 505629 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] labtestservices2001: use spare role

https://gerrit.wikimedia.org/r/505629

aborrero changed the task status from Stalled to Open.Apr 22 2019, 11:35 AM
aborrero reassigned this task from aborrero to RobH.
aborrero triaged this task as Medium priority.
aborrero updated the task description. (Show Details)
aborrero removed a subscriber: aborrero.
RobH updated the task description. (Show Details)

cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: labtestservices2001.wikimedia.org

  • labtestservices2001.wikimedia.org
    • Removed from Puppet master and PuppetDB
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Removed from DebMonitor
RobH updated the task description. (Show Details)Apr 23 2019, 4:13 PM

Change 505810 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission labtestservices2001 production dns

https://gerrit.wikimedia.org/r/505810

Change 505812 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom labtestservices2001

https://gerrit.wikimedia.org/r/505812

Change 505810 merged by RobH:
[operations/dns@master] decommission labtestservices2001 production dns

https://gerrit.wikimedia.org/r/505810

Change 505812 merged by RobH:
[operations/puppet@production] decom labtestservices2001

https://gerrit.wikimedia.org/r/505812

RobH reassigned this task from RobH to Papaul.Apr 23 2019, 4:21 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (codfw) on the decommission-hardware board.

Ready for the remainder of decom steps, then removal from racks, thanks!

RobH updated the task description. (Show Details)Apr 25 2019, 11:53 PM

Mentioned in SAL (#wikimedia-operations) [2019-04-26T09:48:56Z] <marostegui> Remove labtestservices2001 from tendril - T218022

Papaul added a comment.May 2 2019, 4:51 PM

@RobH this server is still showing up on the switch side

papaul@asw-b-codfw> show interfaces ge-8/0/12 descriptions 
Interface       Admin Link Description
ge-8/0/12       up    up   labtestservices2001-eth0
Papaul reassigned this task from Papaul to RobH.May 2 2019, 4:51 PM
Papaul added a subscriber: Papaul.
RobH reassigned this task from RobH to Papaul.May 2 2019, 4:54 PM

Done, port disabled, back to you.

Papaul updated the task description. (Show Details)May 7 2019, 2:44 PM
Papaul updated the task description. (Show Details)May 8 2019, 3:05 PM

Change 510567 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for labtestservices2001

https://gerrit.wikimedia.org/r/510567

Change 510567 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for labtestservices2001

https://gerrit.wikimedia.org/r/510567

Papaul closed this task as Resolved.May 15 2019, 4:32 PM
Papaul updated the task description. (Show Details)

Complete