Page MenuHomePhabricator
Authored By
Kormat
Apr 29 2020, 9:36 AM
Size
2 KB
Referenced Files
None
Subscribers
None
Buster + 10.4 epic: https://phabricator.wikimedia.org/T250666
* Log reimage: `!log reimaging HOST to buster T250666`
* Disable notifications for host (e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/592876)
** Reverted after
* Allow host to pxe install, but pause at partitioning step. (e.g. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/592884)
** Run puppet agent on apt1001 and apt2001
** Reverted after
* ''Q: could these 2 steps be combined, so it's only a single revert?''
* Set host to install as buster (e.g. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/592887)
* Depool host (potentially from multiple sections)
* `systemctl stop mariadb && umount /srv`
* Take copy of `/srv` entry from `/etc/fstab`
* Connect to mgmt interface
* Attach to serial console
** On dells (`/admin1->`), use `console com2`. Escape is `^\`
* From cumin host, inside screen: `sudo -E wmf-auto-reimage --no-verify -p TICKET FQDN`
* When install reaches partitioning step, select "manual", format the 40G partition asext4, set mountpoint as `/`
** Partitioner should wipe `/` and `swap`. Anything else, you done fucked up.
* [[Tendril doesn't like this in-place upgrade, so it requires a disable + drop + add + enable after upgrade, otherwise the Act. (last contact) field doesn't get updated.|[https://wikitech.wikimedia.org/wiki/MariaDB#Stretch_+_10.1_-%3E_Buster_+_10.4_known_issues]]
** Check out [[tendril repo|https://gerrit.wikimedia.org/r/#/admin/projects/operations/software/tendril]] on a cumin host. (Use http, as you don't have your ssh key available).
** Remove host from tendril:<div>
```
./tendril-host-drop.sh HOST PORT | sudo -i mysql -h db1115.eqiad.wmnet tendril
```
</div>
** After, re-add host to tendril:<div>
```
./tendril-host-add.sh HOST PORT ~/.my.cnf.tendril tendril | sudo -i mysql -h db1115.eqiad.wmnet tendril
./tendril-host-enable.sh HOST PORT | sudo -i mysql -h db1115.eqiad.wmnet tendril
```
</div>
* Wait for host to finish reimaging
* Check that wmf-mariadb104 is installed.
* Re-add `/srv` to `/etc/fstab`
* Mount `/srv`
* Check if the contents of `/srv` are already owned by the `mysql` user, if not, fix.
* Disable replication while we run `mysql_upgrade`: ` systemctl set-environment MYSQLD_OPTS="--skip-slave-start"`
** Does not need to be reverted.
* Start mariadb: `systemctl start mariadb`
* Check service logs: `journalctl -xe -u mariadb`, should only see errors about internal tables that will be fixed by `mysql_upgrade`
* Run `mysql_upgrade`
* Start slave: `mysql -e "start slave"`
* Check slave status: `mysql -e "show slave status\G"`
* [[https://phabricator.wikimedia.org/T247290#5956794]]: Restart prom mysql exporter
* Re-add host to tendril (see above)
* Once it's back in tendril, revert partman change
* Wait for icinga to be fully green, then revert notifications change.
* Wait until replication lag is fully gone, then start slowly repooling server. (If it's in codfw, can just go straight to full repoolling).

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
8394228
Default Alt Text
raw.txt (2 KB)

Event Timeline