At the request of serviceops all racking tasks for new hardware also have a sub-task for service ops implementation tracking.
Once parent task T326366 shows resolved, this can proceed via the service ops team.
topic branch with related patches: https://gerrit.wikimedia.org/r/q/topic:gerrit-bullseye
1 | # schedule and announce downtime |
---|---|
2 | # on gerrit1001: shortly before the scheduled downtime: |
3 | # on gerrit1001, as root, in a screen: rsync -avp --delete --bwlimit=100m /var/lib/gerrit2/review_site/ rsync://gerrit1003.wikimedia.org/gerrit-var-lib/ |
4 | # on gerrit1001, as root, in a screen: rsync -avp --delete --bwlimit=100m /srv/gerrit/ rsync://gerrit1003.wikimedia.org/gerrit-data/ |
5 | # on gerrit1003: rsync -avp /srv/gerrit/plugins/lfs/ /srv/gerrit/data/lfs/ |
6 | # on gerrit1003: chown -R gerrit2:gerrit2 /var/lib/gerrit2 |
7 | # on gerrit1003: chown -R gerrit2:gerrit2 /srv/gerrit |
8 | # scheduled downtime begins / IRC announcement |
9 | # on cumin1001:sudo cookbook sre.hosts.downtime -r 'maintenance' -D 30 gerrit1001.wikimedia.org |
10 | # on cumin1001:sudo cookbook sre.hosts.downtime -r 'maintenance' -H 1 gerrit1003.wikimedia.org |
11 | # on icinga.wikimedia.org - manually schedule downtime for the checks connected to virtual server "gerrit.wikimedia.org". The cookbook does not find this virtual host. |
12 | # on gerrit1003: disable puppet; stop gerrit? (sudo disable-puppet 'gerrit maintenance'; systemctl stop gerrit) |
13 | # merge DNS change that removes gerrit-new and switches IP of gerrit.wikimedia.org - in web UI of gerrit(-old) |
14 | # run authdns-update on ns0.wikimedia.org, see the diff but do NOT commit yet |
15 | # on gerrit1001: disable puppet; stop gerrit! (sudo disable-puppet 'gerrit maintenance'; systemctl stop gerrit) |
16 | # on gerrit1001, as root, in a screen: rsync -avp --delete --bwlimit=100m /var/lib/gerrit2/review_site/ rsync://gerrit1003.wikimedia.org/gerrit-var-lib/ |
17 | # on gerrit1001, as root, in a screen: rsync -avp --delete --bwlimit=100m /srv/gerrit/ rsync://gerrit1003.wikimedia.org/gerrit-data/ |
18 | # on gerrit1003: rsync -avp /srv/gerrit/plugins/lfs/ /srv/gerrit/data/lfs/ |
19 | # on gerrit1003: chown -R gerrit2:gerrit2 /var/lib/gerrit2 |
20 | # on gerrit1003: chown -R gerrit2:gerrit2 /srv/gerrit |
21 | # on gerrit1003: start gerrit |
22 | # say "yes" to authdns-update and actually merge DNS change that removes gerrit-new and switches IP of gerrit.wikimedia.org |
23 | # wait 5 minutes |
24 | # ..test https (https://gerrit.wikimedia.org in browser) |
25 | # ..test ssh (e.g. ssh dzahn@gerrit-new.wikimedia.org -p 29418) |
26 | # announce downtime is over |
27 | # ensure gerrit1001 has puppet disabled and/or services are masked |
28 | # grace period (how long?) |
29 | # decom old host -> https://phabricator.wikimedia.org/T336427 |