According to the parent task, we shoud not have anything running cinder-backups, but these Ganeti VMs are. It seems like they were just forgotten and need to be decom'd.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Bstorm | T216208 ToolsDB overload and cleanup | |||
Resolved | • Bstorm | T216441 Evaluate transferring the non-replicated tables to the new toolsdb server | |||
Resolved | fnegri | T236101 Find a way to remove non-replicated tables from ToolsDB | |||
Resolved | dcaro | T301951 toolsdb: full disk on clouddb1001 broke clouddb1002 (secondary) replication | |||
Open | None | T301967 toolsdb: evaluate storage usage by some tools | |||
Open | fnegri | T291782 Migrate largest ToolsDB users to Trove | |||
Open | None | T272395 Cloud: reduce NAT exceptions from cloud to production | |||
Resolved | Andrew | T291405 [NFS] Reduce or eliminate bare-metal NFS servers | |||
Resolved | Andrew | T292546 cloud NFS: figure out backups for cinder volumes | |||
Resolved | Andrew | T344065 Replace cinder-backup process with backy2 | |||
Resolved | Andrew | T358855 Use cloudbackup100[12]-dev for cinder backup test/dev |
Event Timeline
I think the right thing here is to update these to replicate the behavior of cloudbackup200[12]. I'll have a look at that.
Change #1016447 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Make cloudbackup200[12]-dev into codfw1dev cinder backup hosts
Change #1016446 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cinder backups: move schedule config from a template into hiera
Change #1016446 merged by Andrew Bogott:
[operations/puppet@production] cinder backups: move schedule config from a template into hiera
Change #1016447 merged by Andrew Bogott:
[operations/puppet@production] Make cloudbackup200[12]-dev into codfw1dev cinder backup hosts
Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm completed:
- cloudbackup1001-dev (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404041654_andrew_690147_cloudbackup1001-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm completed:
- cloudbackup1002-dev (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404041804_andrew_701926_cloudbackup1002-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm completed:
- cloudbackup1002-dev (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404051519_andrew_863461_cloudbackup1002-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change #1017319 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] role:cinder_backups: include full env scripts in codfw1dev
Change #1017319 merged by Andrew Bogott:
[operations/puppet@production] role:cinder_backups: include full env scripts in codfw1dev
Change #1017332 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cloudbackup100[12]-dev: include ceph admin creds
Change #1017332 merged by Andrew Bogott:
[operations/puppet@production] cloudbackup100[12]-dev: include ceph admin creds
Change #1017342 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cloudbackup: some host-specific enable_v2_messenger overrides
Change #1017342 merged by Andrew Bogott:
[operations/puppet@production] cloudbackup: remove some host-specific enable_v2_messenger overrides
Change #1017347 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cinder_backups: add some real work to codfw1dev backups
Change #1017347 merged by Andrew Bogott:
[operations/puppet@production] cinder_backups: add some real work to codfw1dev backups
Change #1017351 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cinder_backups: apply codfw1dev env files for eqiad backup hosts
Change #1017351 merged by Andrew Bogott:
[operations/puppet@production] cinder_backups: apply codfw1dev env files for eqiad backup hosts
Change #1017354 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] designate: move codfw1dev settings to 'common' for cross-site access
Change #1017356 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[labs/private@master] openstack: move a bunch of codfw1dev passwords from 'codfw' to 'common'
Change #1017356 merged by Andrew Bogott:
[labs/private@master] openstack: move a bunch of codfw1dev passwords from 'codfw' to 'common'
Change #1017354 merged by Andrew Bogott:
[operations/puppet@production] designate: move codfw1dev settings to 'common' for cross-site access