# Overview
This task tracks the hardware migration of the `alert1002` and `alert2002` hosts to replace the `alert1001` and `alert2001` hosts.
The four host use Debian Bookworm so it's expected for the Puppet role to work as is.
# Proposed solution:
## Stage 1: Prepare hosts
1. Add the `alert1002`, and the `alert2002` hosts to the Acme Chief list of authorized domains.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064107 | Gerrit patch #1064107 alert: Add the alertx002 hosts to acme chief ]]
1. Apply the `alerting_host` role for the `alert1002`, and `alert2002` hosts.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062444 | Gerrit patch #1062444 - alert: Ensure the alert*002 hosts use the alerting_host role ]]
1. Run Puppet on the alert[12]002 hosts: `sudo cumin 'alert*[02]*' 'run-puppet-agent'`
1. Arm the keyholder agent in the `alert1002`, and `alert2002` hosts.
1. Use the `metamonitor` passphrase pwstore/pw.git/metamonitor-key-passphrase
1. Ensure the keyholder agent is armed ` sudo cumin 'alert*002wikimedia.org,' 'keyholder status'`
1. Verify the IP addresses are propagated across the infrastructure
1. Add the new hosts to the Prometheus Blackbox exporter list [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064097 | Gerrit patch #1064097 - alert: Add the alertx002 hosts to Prometheus blackbox exporter ]]
## Stage 2: Enable hosts as passive alertmanagers
1. Allow connections from the alert[12]002 addresses.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064818 | Gerrit patch #1064818 - alert: Allow connections from the alertx002 addresses ]]
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064821 | Gerrit patch #1064821 - alert: Allow Apache2 connections for the alertx002 hosts ]]
1. Run Puppet on the alert hosts: `sudo cumin 'alert*' 'run-puppet-agent'`
1. Add the `alert1002`, and `alert2002` hosts as Icinga and Alertmanager partners to work as passive hosts.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064820 | Gerrit patch #1064820 - alert: Add the alertx002 hosts as Icinga and AM partners ]]
1. Run Puppet on the alert hosts: `sudo cumin 'alert*' 'run-puppet-agent'`
1. Verify the hosts are working as intended as standby hosts (e.g. no puppet or unit failures)
## Stage 3: Make alert2002 the active alertmanager host
1. Disable [[ https://wikitech.wikimedia.org/wiki/Wikitech-static#Meta-monitoring | meta-monitoring ]] for the alert hosts.
1. SSH as `root` into `wikitech-static.wikimedia.org` with the `metamonitor-key-passphrase`.
1. Comment the following crontab entries to stop meta-monitoring against both alert hosts:
- `*/2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2001.wikimedia.org`
- `*/2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org`
1. Stop services in the `alert1001` host.
1. Make `alert2002` the active host.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064826 | Gerrit patch #1064826 - alert: Failover from alert1001 to alert2002 ]]
1. Run Puppet on the alert hosts: `sudo cumin 'alert*' 'run-puppet-agent'`
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/dns/+/1065258 | Gerrit patch #1065258 - alert: Resolve alerts DNS queries to alert2002 ]]
1. Update DNS records: `$ sudo cumin 'dns1004.wikimedia.org' 'sudo -i authdns-update'`
1. Ensure services work as expected.
1. Enable metamonitoring for the `alert1001`, and `alert2002` hosts.
1. SSH as `root` into `wikitech-static.wikimedia.org` with the `metamonitor-key-passphrase`.
1. Uncomment the following crontab entries to enable meta-monitoring for the `alert1001` host, and add meta-monitoring for the `alert2002` host.
- `# */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org`
- `# */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2002.wikimedia.org`
## Stage 4: Make alert1002 the active alertmanager host
1. Disable [[ https://wikitech.wikimedia.org/wiki/Wikitech-static#Meta-monitoring | meta-monitoring ]] for the alert hosts.
1. SSH as `root` into `wikitech-static.wikimedia.org` with the `metamonitor-key-passphrase`.
1. Comment the following crontab entries to stop meta-monitoring against both alert hosts:
- `*/2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org`
- `*/2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2002.wikimedia.org`
1. Stop services in the `alert2002` host.
1. Make `alert1002` the active host.
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064828 | Gerrit patch #1064828 - alert: Ensure alert1002 is the active alert host ]]
1. Run Puppet on the alert hosts: `sudo cumin 'alert*' 'run-puppet-agent'`
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/dns/+/1063078 | Gerrit patch #1063078 - alert: Resolve alerts DNS queries to alert1002 ]]
1. Update DNS records: `$ sudo cumin 'dns1004.wikimedia.org' 'sudo -i authdns-update'`
1. Ensure services work as expected.
1. Enable metamonitoring for the `alert1002`, and `alert2002` hosts.
1. SSH as `root` into `wikitech-static.wikimedia.org` with the `metamonitor-key-passphrase`.
1. Uncomment the following crontab entries to enable meta-monitoring for the `alert2002` host, and add meta-monitoring for the `alert1002` host.
- `# */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1002.wikimedia.org`
- `# */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2002.wikimedia.org`
## Step 5: Cleanup
1. Update hostnames for alertmanager tests
1. Merge [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1063235 | Gerrit patch #1063235 - alert: Update alertmanager tests hostnames ]]