Page MenuHomePhabricator

Re-arrange core multi-instance hosts
Closed, ResolvedPublic

Description

  • db1170 (s2,s7) -> s7 + reimage to bookworm
  • db1244 (s4, s5) -> s4 + reimage to bookworm
  • db1246 (s2, s4) -> s2 + reimage to bookworm
  • db1213 (s5, s6) -> s5 + reimage to bookworm
  • db2168 (s7,s8) -> s7 + reimage to bookworm
  • db2167 (s1, s8) -> s8 + reimage to bookworm
  • db2169 (s6, s7) -> s6 + reimage to bookworm
  • db2171 (s5, s6) -> s5 + reimage to bookworm
  • db2170 (s1, s2) -> s1 + reimage to bookworm
  • db2138 (s2, s4) -> s2 + reimage to bookworm
  • db2137 (s4, s5) -> s4 + reimage to bookworm - reimage doesn't work T357951 - Reminder drop root@cumin1001 user
  • db2194 (s6,s7) -> s3 + reimage to bookworm

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:20:43Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57233 and previous config saved to /var/cache/conftool/dbconfig/20240220-082043-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:22:13Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2171 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57234 and previous config saved to /var/cache/conftool/dbconfig/20240220-082213-root.json

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2168.codfw.wmnet with OS bookworm completed:

  • db2168 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402200803_marostegui_994542_db2168.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:25:15Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57235 and previous config saved to /var/cache/conftool/dbconfig/20240220-082515-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:35:48Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57237 and previous config saved to /var/cache/conftool/dbconfig/20240220-083547-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:37:18Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2171 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57238 and previous config saved to /var/cache/conftool/dbconfig/20240220-083718-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2167.codfw.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:50:53Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57243 and previous config saved to /var/cache/conftool/dbconfig/20240220-085052-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T08:52:23Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2171 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57244 and previous config saved to /var/cache/conftool/dbconfig/20240220-085222-root.json

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2138.codfw.wmnet with OS bookworm completed:

  • db2138 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402200840_marostegui_1002127_db2138.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-20T09:05:58Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57247 and previous config saved to /var/cache/conftool/dbconfig/20240220-090557-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T09:21:05Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57251 and previous config saved to /var/cache/conftool/dbconfig/20240220-092102-root.json

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2167.codfw.wmnet with OS bookworm completed:

  • db2167 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402200905_marostegui_1005689_db2167.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-20T09:36:08Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2168 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57256 and previous config saved to /var/cache/conftool/dbconfig/20240220-093607-root.json

Change 1005038 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Place db2169 in s6

https://gerrit.wikimedia.org/r/1005038

Mentioned in SAL (#wikimedia-operations) [2024-02-20T09:53:27Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57261 and previous config saved to /var/cache/conftool/dbconfig/20240220-095327-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2169.codfw.wmnet with OS bookworm

Change 1005038 merged by Marostegui:

[operations/puppet@production] mariadb: Place db2169 in s6

https://gerrit.wikimedia.org/r/1005038

Mentioned in SAL (#wikimedia-operations) [2024-02-20T10:08:32Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57269 and previous config saved to /var/cache/conftool/dbconfig/20240220-100832-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T10:23:37Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57272 and previous config saved to /var/cache/conftool/dbconfig/20240220-102337-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T10:38:42Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57276 and previous config saved to /var/cache/conftool/dbconfig/20240220-103842-root.json

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2169.codfw.wmnet with OS bookworm completed:

  • db2169 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402201018_marostegui_1017878_db2169.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-20T10:42:32Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57278 and previous config saved to /var/cache/conftool/dbconfig/20240220-104231-root.json

Change 1005047 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2194: Move it to s3

https://gerrit.wikimedia.org/r/1005047

Change 1005047 merged by Marostegui:

[operations/puppet@production] db2194: Move it to s3

https://gerrit.wikimedia.org/r/1005047

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2194.codfw.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:00:33Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57283 and previous config saved to /var/cache/conftool/dbconfig/20240220-110008-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:00:54Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57284 and previous config saved to /var/cache/conftool/dbconfig/20240220-110011-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:04:45Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Place db2194 in s3 depooled T354826', diff saved to https://phabricator.wikimedia.org/P57287 and previous config saved to /var/cache/conftool/dbconfig/20240220-110444-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:15:21Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57289 and previous config saved to /var/cache/conftool/dbconfig/20240220-111516-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:15:29Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2167 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57290 and previous config saved to /var/cache/conftool/dbconfig/20240220-111525-root.json

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2194.codfw.wmnet with OS bookworm completed:

  • db2194 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402201108_marostegui_1027000_db2194.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:30:21Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57295 and previous config saved to /var/cache/conftool/dbconfig/20240220-113021-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T11:45:26Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57298 and previous config saved to /var/cache/conftool/dbconfig/20240220-114526-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T12:00:43Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57301 and previous config saved to /var/cache/conftool/dbconfig/20240220-120031-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T13:47:43Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2190 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57321 and previous config saved to /var/cache/conftool/dbconfig/20240220-134734-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T13:51:05Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2194 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57324 and previous config saved to /var/cache/conftool/dbconfig/20240220-135104-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:02:40Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2190 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57326 and previous config saved to /var/cache/conftool/dbconfig/20240220-140239-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:06:10Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57328 and previous config saved to /var/cache/conftool/dbconfig/20240220-140609-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:17:45Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2190 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57331 and previous config saved to /var/cache/conftool/dbconfig/20240220-141744-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:21:15Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57333 and previous config saved to /var/cache/conftool/dbconfig/20240220-142114-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:32:49Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57337 and previous config saved to /var/cache/conftool/dbconfig/20240220-143249-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:36:20Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2194 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57339 and previous config saved to /var/cache/conftool/dbconfig/20240220-143619-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:47:54Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2190 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57344 and previous config saved to /var/cache/conftool/dbconfig/20240220-144753-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:51:25Z] <marostegui@cumin1002> dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57346 and previous config saved to /var/cache/conftool/dbconfig/20240220-145124-root.json

Change 1005217 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2167: Remove package declaration

https://gerrit.wikimedia.org/r/1005217

Change 1005217 merged by Marostegui:

[operations/puppet@production] db2167: Remove package declaration

https://gerrit.wikimedia.org/r/1005217

Marostegui updated the task description. (Show Details)

db2137 was reimaged, it is now catching up, but it has almost two days of backlog to process. So remains depooled