Page MenuHomePhabricator

decommission phab1001.eqiad.wmnet
Closed, ResolvedPublicRequest

Description

This task will track the decommission-hardware of server phab1001.eqiad.wmnet.

https://netbox.wikimedia.org/dcim/devices/1557/

With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.

phab1001.eqiad.wmnet

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
  • - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.
  • - remove all remaining puppet references and all host entries in the puppet repo
  • - reassign task from service owner to DC ops team member and site project (ops-sitename) depending on site of server

End service owner steps / Begin DC-Ops team steps:

  • - system disks removed (by onsite)
  • - determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Change 824804 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: remove production role from phab1001

https://gerrit.wikimedia.org/r/824804

Change 858662 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] dumps: remove phab1001 from rsync clients

https://gerrit.wikimedia.org/r/858662

Change 858421 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: remove phab1001

https://gerrit.wikimedia.org/r/858421

Change 858419 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mariadb: remove phab1001 from production-m3 grants

https://gerrit.wikimedia.org/r/858419

Change 858420 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: remove phab1001 as src_host from migration class

https://gerrit.wikimedia.org/r/858420

Change 824412 had a related patch set uploaded (by Dzahn; author: jbond):

[operations/puppet@production] O:phabricator: move common settings to role hiera

https://gerrit.wikimedia.org/r/824412

Change 858662 merged by Dzahn:

[operations/puppet@production] dumps: remove phab1001 from rsync clients

https://gerrit.wikimedia.org/r/858662

Change 858420 merged by Dzahn:

[operations/puppet@production] phabricator: remove phab1001 as src_host from migration class

https://gerrit.wikimedia.org/r/858420

LSobanski changed the task status from Stalled to Open.Nov 28 2022, 4:14 PM
LSobanski moved this task from Backlog to Work in Progress on the collaboration-services board.

Change 861498 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: disable phd running on phab1001

https://gerrit.wikimedia.org/r/861498

Change 861498 merged by Dzahn:

[operations/puppet@production] phabricator: disable phd running on phab1001

https://gerrit.wikimedia.org/r/861498

Mentioned in SAL (#wikimedia-operations) [2022-12-05T19:24:19Z] <mutante> phab1001, previous long time phabricator host, is about to be shut down, made a final copy of /srv/deployment, /root, /home, /etc and synced it to phab1004 - T323418

Change 824804 merged by Dzahn:

[operations/puppet@production] phabricator: remove production role from phab1001

https://gerrit.wikimedia.org/r/824804

Mentioned in SAL (#wikimedia-operations) [2022-12-05T19:57:54Z] <mutante> phab1004 (prod) - removing phab1001 from firewall rules, rsync config | phab1001 (formerly prod) - removing prod role T323418 T280597

Change 864840 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/phabricator: fix insetup role name which is now team specific

https://gerrit.wikimedia.org/r/864840

Change 864840 merged by Dzahn:

[operations/puppet@production] site/phabricator: fix insetup role name which is now team specific

https://gerrit.wikimedia.org/r/864840

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: phab1001.eqiad.wmnet

  • phab1001.eqiad.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 858421 merged by Dzahn:

[operations/puppet@production] site: remove phab1001

https://gerrit.wikimedia.org/r/858421

Change 824412 merged by Dzahn:

[operations/puppet@production] O:phabricator: move host based settings to role hiere

https://gerrit.wikimedia.org/r/824412

Dzahn updated the task description. (Show Details)
Dzahn edited projects, added ops-eqiad; removed Patch-For-Review.
Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: Jclark-ctr.

@Marostegui as part of "decom of host phab1001" we can remove any mysql GRANTS for users coming from its former IP 10.64.16.8.

I made a change that shows the details at https://gerrit.wikimedia.org/r/c/operations/puppet/+/858419/3/modules/profile/templates/mariadb/grants/production-m3.sql.erb#b10

If this should be a separate task or isn't needed let me know and happy to do that too.

That's ok Daniel, I will take care of it on this task.

I will merge that change and then proceed and remove grants live

Change 865241 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: remove phab1001 from production-m3 grants

https://gerrit.wikimedia.org/r/865241

Change 865241 merged by Marostegui:

[operations/puppet@production] mariadb: remove phab1001 from production-m3 grants

https://gerrit.wikimedia.org/r/865241

root@db1159.eqiad.wmnet[(none)]> select user,host from mysql.user where host like '10.64.16.8';
+----------------+------------+
| User           | Host       |
+----------------+------------+
| phabricatorphd | 10.64.16.8 |
| phadmin        | 10.64.16.8 |
| phmanifest     | 10.64.16.8 |
| phstats        | 10.64.16.8 |
| phuser         | 10.64.16.8 |
+----------------+------------+

Mentioned in SAL (#wikimedia-operations) [2022-12-07T05:58:04Z] <marostegui> Drop phab1001 grants from m3 databases T323418

All done from the DBA side.

Change 858419 abandoned by Dzahn:

[operations/puppet@production] mariadb: remove phab1001 from production-m3 grants

Reason:

duplicate of https://gerrit.wikimedia.org/r/c/operations/puppet/+/865241/

https://gerrit.wikimedia.org/r/858419

Jclark-ctr updated the task description. (Show Details)