Page MenuHomePhabricator

decommission clouddb1021
Closed, ResolvedPublicRequest

Description

This task will track the decommission-hardware of server clouddb1021.eqiad.wmnet

With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.

clouddb1021.eqiad.wmnet

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
  • - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.
  • - remove all remaining puppet references and all host entries in the puppet repo
  • - reassign task from service owner to no owner and ensure the site project (ops-sitename depending on site of server) is assigned.

End service owner steps / Begin DC-Ops team steps:

  • - system disks removed (by onsite)
  • - determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

@BTullis I'd suggest you just leave all mariadb instances stopped for a week or so before proceeding with the full decommissioning.

Change #1052696 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Disable monitoring on clouddb1021 prior to decommissioning

https://gerrit.wikimedia.org/r/1052696

Change #1052696 merged by Btullis:

[operations/puppet@production] Disable monitoring on clouddb1021 prior to decommissioning

https://gerrit.wikimedia.org/r/1052696

Change #1054516 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Switch the rols of clouddb1021 to insetup::data_engineering

https://gerrit.wikimedia.org/r/1054516

Change #1054516 merged by Btullis:

[operations/puppet@production] Switch the rols of clouddb1021 to insetup::data_engineering

https://gerrit.wikimedia.org/r/1054516

Mentioned in SAL (#wikimedia-analytics) [2024-07-17T08:45:07Z] <btullis> stopping mariadb section 1-8 on clouddb1021 for T368518

I have stopped all of the sections with the following command and confirmed a clean shutdown.

btullis@clouddb1021:~$ for i in $(seq 1 8); do sudo systemctl stop mariadb@s$i ; done
btullis@clouddb1021:~$ for i in $(seq 1 8); do systemctl status mariadb@s$i ; done
● mariadb@s1.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:44:33 clouddb1021 mysqld[1572]: 2024-07-17  8:44:33 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:44:33 clouddb1021 mysqld[1572]: 2024-07-17  8:44:33 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s1/ib_buffer_pool
Jul 17 08:44:33 clouddb1021 mysqld[1572]: 2024-07-17  8:44:33 0 [Note] InnoDB: Restricted to 1135680 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:44:34 clouddb1021 mysqld[1572]: 2024-07-17  8:44:34 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:44:34
Jul 17 08:44:47 clouddb1021 mysqld[1572]: 2024-07-17  8:44:47 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:44:47 clouddb1021 mysqld[1572]: 2024-07-17  8:44:47 0 [Note] InnoDB: Shutdown completed; log sequence number 113589266738333; transaction id 102207308293
Jul 17 08:44:47 clouddb1021 mysqld[1572]: 2024-07-17  8:44:47 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:44:47 clouddb1021 systemd[1]: mariadb@s1.service: Succeeded.
Jul 17 08:44:47 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:44:47 clouddb1021 systemd[1]: mariadb@s1.service: Consumed 2month 1w 3d 14h 29min 37.568s CPU time.
● mariadb@s2.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:44:47 clouddb1021 mysqld[2253]: 2024-07-17  8:44:47 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:44:47 clouddb1021 mysqld[2253]: 2024-07-17  8:44:47 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s2/ib_buffer_pool
Jul 17 08:44:47 clouddb1021 mysqld[2253]: 2024-07-17  8:44:47 0 [Note] InnoDB: Restricted to 648960 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:44:48 clouddb1021 mysqld[2253]: 2024-07-17  8:44:48 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:44:48
Jul 17 08:44:55 clouddb1021 mysqld[2253]: 2024-07-17  8:44:55 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:44:55 clouddb1021 mysqld[2253]: 2024-07-17  8:44:55 0 [Note] InnoDB: Shutdown completed; log sequence number 55854840187894; transaction id 19878151482
Jul 17 08:44:55 clouddb1021 mysqld[2253]: 2024-07-17  8:44:55 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:44:55 clouddb1021 systemd[1]: mariadb@s2.service: Succeeded.
Jul 17 08:44:55 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:44:55 clouddb1021 systemd[1]: mariadb@s2.service: Consumed 1month 2w 6d 5h 31min 24.600s CPU time.
● mariadb@s3.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:44:56 clouddb1021 mysqld[2340]: 2024-07-17  8:44:56 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:44:56 clouddb1021 mysqld[2340]: 2024-07-17  8:44:56 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s3/ib_buffer_pool
Jul 17 08:44:56 clouddb1021 mysqld[2340]: 2024-07-17  8:44:56 0 [Note] InnoDB: Restricted to 648960 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:44:56 clouddb1021 mysqld[2340]: 2024-07-17  8:44:56 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:44:56
Jul 17 08:45:11 clouddb1021 mysqld[2340]: 2024-07-17  8:45:11 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:45:11 clouddb1021 mysqld[2340]: 2024-07-17  8:45:11 0 [Note] InnoDB: Shutdown completed; log sequence number 56966249959673; transaction id 157879306074
Jul 17 08:45:11 clouddb1021 mysqld[2340]: 2024-07-17  8:45:11 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:45:12 clouddb1021 systemd[1]: mariadb@s3.service: Succeeded.
Jul 17 08:45:12 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:45:12 clouddb1021 systemd[1]: mariadb@s3.service: Consumed 3w 2d 14h 33min 31.696s CPU time.
● mariadb@s4.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:45:12 clouddb1021 mysqld[2630]: 2024-07-17  8:45:12 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:45:12 clouddb1021 mysqld[2630]: 2024-07-17  8:45:12 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s4/ib_buffer_pool
Jul 17 08:45:12 clouddb1021 mysqld[2630]: 2024-07-17  8:45:12 0 [Note] InnoDB: Restricted to 1135680 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:45:12 clouddb1021 mysqld[2630]: 2024-07-17  8:45:12 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:45:12
Jul 17 08:45:26 clouddb1021 mysqld[2630]: 2024-07-17  8:45:26 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:45:26 clouddb1021 mysqld[2630]: 2024-07-17  8:45:26 0 [Note] InnoDB: Shutdown completed; log sequence number 108140171632214; transaction id 148638089131
Jul 17 08:45:26 clouddb1021 mysqld[2630]: 2024-07-17  8:45:26 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:45:26 clouddb1021 systemd[1]: mariadb@s4.service: Succeeded.
Jul 17 08:45:26 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:45:26 clouddb1021 systemd[1]: mariadb@s4.service: Consumed 2month 4d 17h 13min 24.908s CPU time.
● mariadb@s5.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:45:26 clouddb1021 mysqld[2706]: 2024-07-17  8:45:26 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:45:26 clouddb1021 mysqld[2706]: 2024-07-17  8:45:26 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s5/ib_buffer_pool
Jul 17 08:45:26 clouddb1021 mysqld[2706]: 2024-07-17  8:45:26 0 [Note] InnoDB: Restricted to 648960 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:45:27 clouddb1021 mysqld[2706]: 2024-07-17  8:45:27 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:45:27
Jul 17 08:45:36 clouddb1021 mysqld[2706]: 2024-07-17  8:45:36 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:45:36 clouddb1021 mysqld[2706]: 2024-07-17  8:45:36 0 [Note] InnoDB: Shutdown completed; log sequence number 46120329261157; transaction id 55105376582
Jul 17 08:45:36 clouddb1021 mysqld[2706]: 2024-07-17  8:45:36 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:45:36 clouddb1021 systemd[1]: mariadb@s5.service: Succeeded.
Jul 17 08:45:36 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:45:36 clouddb1021 systemd[1]: mariadb@s5.service: Consumed 2w 2d 14h 2min 58.033s CPU time.
● mariadb@s6.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:45:36 clouddb1021 mysqld[2768]: 2024-07-17  8:45:36 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:45:36 clouddb1021 mysqld[2768]: 2024-07-17  8:45:36 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s6/ib_buffer_pool
Jul 17 08:45:36 clouddb1021 mysqld[2768]: 2024-07-17  8:45:36 0 [Note] InnoDB: Restricted to 486720 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:45:36 clouddb1021 mysqld[2768]: 2024-07-17  8:45:36 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:45:36
Jul 17 08:45:49 clouddb1021 mysqld[2768]: 2024-07-17  8:45:49 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:45:49 clouddb1021 mysqld[2768]: 2024-07-17  8:45:49 0 [Note] InnoDB: Shutdown completed; log sequence number 50393850712529; transaction id 18949402029
Jul 17 08:45:49 clouddb1021 mysqld[2768]: 2024-07-17  8:45:49 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:45:49 clouddb1021 systemd[1]: mariadb@s6.service: Succeeded.
Jul 17 08:45:49 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:45:49 clouddb1021 systemd[1]: mariadb@s6.service: Consumed 1month 3d 12h 7min 28.374s CPU time.
● mariadb@s7.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:45:49 clouddb1021 mysqld[2823]: 2024-07-17  8:45:49 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:45:49 clouddb1021 mysqld[2823]: 2024-07-17  8:45:49 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s7/ib_buffer_pool
Jul 17 08:45:49 clouddb1021 mysqld[2823]: 2024-07-17  8:45:49 0 [Note] InnoDB: Restricted to 811200 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:45:50 clouddb1021 mysqld[2823]: 2024-07-17  8:45:50 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:45:50
Jul 17 08:46:05 clouddb1021 mysqld[2823]: 2024-07-17  8:46:05 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:46:05 clouddb1021 mysqld[2823]: 2024-07-17  8:46:05 0 [Note] InnoDB: Shutdown completed; log sequence number 45213780952973; transaction id 183602638615
Jul 17 08:46:05 clouddb1021 mysqld[2823]: 2024-07-17  8:46:05 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:46:05 clouddb1021 systemd[1]: mariadb@s7.service: Succeeded.
Jul 17 08:46:05 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:46:05 clouddb1021 systemd[1]: mariadb@s7.service: Consumed 3w 5d 8h 14min 47.946s CPU time.
● mariadb@s8.service - mariadb database server
     Loaded: loaded (/lib/systemd/system/mariadb@.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 17 08:46:05 clouddb1021 mysqld[2925]: 2024-07-17  8:46:05 0 [Note] InnoDB: Starting shutdown...
Jul 17 08:46:05 clouddb1021 mysqld[2925]: 2024-07-17  8:46:05 0 [Note] InnoDB: Dumping buffer pool(s) to /srv/sqldata.s8/ib_buffer_pool
Jul 17 08:46:05 clouddb1021 mysqld[2925]: 2024-07-17  8:46:05 0 [Note] InnoDB: Restricted to 1135680 pages due to innodb_buf_pool_dump_pct=25
Jul 17 08:46:05 clouddb1021 mysqld[2925]: 2024-07-17  8:46:05 0 [Note] InnoDB: Buffer pool(s) dump completed at 240717  8:46:05
Jul 17 08:46:28 clouddb1021 mysqld[2925]: 2024-07-17  8:46:28 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Jul 17 08:46:28 clouddb1021 mysqld[2925]: 2024-07-17  8:46:28 0 [Note] InnoDB: Shutdown completed; log sequence number 61138035850405; transaction id 151936343679
Jul 17 08:46:28 clouddb1021 mysqld[2925]: 2024-07-17  8:46:28 0 [Note] /opt/wmf-mariadb106/bin/mysqld: Shutdown complete
Jul 17 08:46:28 clouddb1021 systemd[1]: mariadb@s8.service: Succeeded.
Jul 17 08:46:28 clouddb1021 systemd[1]: Stopped mariadb database server.
Jul 17 08:46:28 clouddb1021 systemd[1]: mariadb@s8.service: Consumed 1month 1w 1d 10h 28min 39.323s CPU time.

Now waiting for a while to make sure that it doesn't cause any issues.

I am going to remove this host from zarcillo database - even if it is used for reimage tests it will be eventually decommissioned.

Mentioned in SAL (#wikimedia-operations) [2024-07-30T13:58:18Z] <marostegui> Remove clouddb1021 from zarcillo database T368518

cookbooks.sre.hosts.decommission executed by btullis@cumin1002 for hosts: clouddb1021.eqiad.wmnet

  • clouddb1021.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change #1059854 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove remaining references to cloudb1021

https://gerrit.wikimedia.org/r/1059854

Change #1059854 merged by Btullis:

[operations/puppet@production] Remove remaining references to cloudb1021

https://gerrit.wikimedia.org/r/1059854

BTullis updated the task description. (Show Details)
BTullis added a project: ops-eqiad.
VRiley-WMF updated the task description. (Show Details)

This has been decommissioned