In T190568 both [[ https://wikitech.wikimedia.org/wiki/Phab1001 | phab1001 ]] and [[ https://wikitech.wikimedia.org/wiki/Phab2001 | phab2001 ]] have been reimaged with [[ https://wiki.debian.org/DebianBuster | buster ]].
Currently the temporary server phab1003 is the production Phabricator server and on stretch.
Set a maintenance window and switch over from [[ https://wikitech.wikimedia.org/wiki/Phab1003 | phab1003 ]] to phab1001.
After a little while shut down phab1003 and decom it / give it back to dcops.
---
https://etherpad.wikimedia.org/p/Phabricator-migration-20191203
---
https://phabricator.wikimedia.org/T238956
switch prod Phabricator from phab1003 to phab1001
2019-12-03
branches:
phab-buster (the actual switch)
https://gerrit.wikimedia.org/r/q/topic:%22phab-buster%22+(status:open%20OR%20status:merged)
phab1003-decom: (later)
https://gerrit.wikimedia.org/r/q/topic:%22phab1003-decom%22+(status:open%20OR%20status:merged)
phab-buster:
PREPARE:
site/phabricator: apply phab role on phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/536712 (MERGED)
phabricator: support buster with PHP 7.3 packages - https://gerrit.wikimedia.org/r/c/operations/puppet/+/541666 (MERGED)
phabricator::httpd: support stretch/buster with/without php-fpm - https://gerrit.wikimedia.org/r/c/operations/puppet/+/541930 (MERGED)
phabricator: install s-nail instead of heirloom-mailx on buster - https://gerrit.wikimedia.org/r/c/operations/puppet/+/541967 (MERGED)
phabricator: install s-nail instead of heirloom-mailx on any distro - https://gerrit.wikimedia.org/r/c/operations/puppet/+/542191 (MERGED)
log downtime in icinga - DONE - scheduled downtime for host and all services on phab1003 until in 2 days. MORE IS NEEDED, the PAGING service https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=phabricator.wikimedia.org , also downtimed
stop phd puppet agent --disable "https://phabricator.wikimedia.org/T238956" and "systemctl stop phd" on both phab1001 and phab1003
change the "phabricator_failover_server" in Hiera common.yaml to the new server to allow it to rsync repo data from active server https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554390
rsync /srv/repos from phab1003 to phab1001 IN PROGRESS
SWITCH:
[x] rsync /srv/repos from phab1003 to phab1001 again for good measure, run it with --delete as well and ensure both sides have the same size
[x] verify code in /srv/phab is up to date and both servers are on the same git tag
[x] dumps/phabricator: switch dumps host from phab1003 to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552593 (ON DUMPS SERVERS)
[x] put phab on phab1003 in maintenance mode
[x] phabricator: switch "phabricator_server" from phab1003 to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/554397 (ON PHAB HOST), influences rsync ferm rules
[x] phabricator: switch "active server" from phab1003 to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552591 (on PHAB HOST), influences icinga monitoring, dumps enabled, aphlict unless otherwise disabled, rsyncd setup
[x] switch discovery record for phabricator to 1001 for ATS - https://gerrit.wikimedia.org/r/c/operations/dns/+/552598 (DNS, ATS, the switch of https backend)
[x] varnish: switch phabricator backend to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552595 (varnish, probably not used, now ATS)
[x] phabricator: switch mail destination to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552597 (ON MAIL SERVER)
[x] depool phab1003-vcs in confctl
[x] phabricator/conftool: switch phab-vcs (git-ssh) service to phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552589 (in CONFTOOL, git-ssh requires conftool change, which changes pybal config, may cause Icinga alerts)
[x] Phabricator: install python3-pygments instead of python-pygments - ] https://gerrit.wikimedia.org/r/#/c/554401/
[x] remove Icinga downtime for "phabricator.wikimedia.org" meta service, keep phab1003 downtimes extended where needed
phab1003-decom:
DECOM:
phabricator: remove phab1003 from list of phab servers - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552592
remove service IPs and IPv6 for phab1003 - https://gerrit.wikimedia.org/r/c/operations/dns/+/552599
remove production IPs for phab1003 - https://gerrit.wikimedia.org/r/c/operations/dns/+/552601
mariadb: remove grants for users on phab1003 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552607
site: turn phab1003 into a spare::system - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552603
mtail: stop using phab1003 for tests, use phab1001 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552604 <--- part of SWITCH ?? | No, but replaced by https://gerrit.wikimedia.org/r/c/operations/puppet/+/554403
install_server: remove phab1003 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/552609
ABANDON?!
phabricator-new: https://gerrit.wikimedia.org/r/c/operations/puppet/+/551286 (abandoned)
phabricator-new: https://gerrit.wikimedia.org/r/c/operations/dns/+/551284 (abandoned)
DAY 2 (We have to switch back again to phab1003, change a BIOS setting on phab1001, reimage phab1001 and finally switch over again :/)
SIMPLIFY SETUP FOR MIGRATION (rsync, hiera, puppet)
https://gerrit.wikimedia.org/r/c/operations/puppet/+/554610 (merged)
https://gerrit.wikimedia.org/r/c/operations/puppet/+/554628 (merged)
https://gerrit.wikimedia.org/r/c/operations/puppet/+/554643 (merged)
SWITCH BACK to phab1003:
[x] schedule icinga downtimes
[x] switch "phabricator_server" for firewall rules for http/smtp and enable dumps https://gerrit.wikimedia.org/r/c/operations/puppet/+/554644
[x] revert: discovery record (DNS, ATS) https://gerrit.wikimedia.org/r/c/operations/dns/+/554589
[x] revert: dumps sync source https://gerrit.wikimedia.org/r/c/operations/puppet/+/554592
[x] revert: varnish https://gerrit.wikimedia.org/r/c/operations/puppet/+/554590
[x] revert: mail routing https://gerrit.wikimedia.org/r/c/operations/puppet/+/554591
[x] change BIOS settings for ATA mode to AHCI on phab1001
[x] reimage phab1001
[x] rsync /srv/repos. pull from phab1003 on phab1001 with --delete
[x] reboot phab1001 to clear "microcode vulns not fixed" Icinga alert
[x] rsync again
[x] verify code in /srv/phab is up to date and both servers are on the same git tag
FINAL SWITCH BACK TO PHAB1001
[x] schedule downtimes in Icinga
[] stop puppet and phd on both servers
[x] rsync srv/repos one more time
[] put phab1003 into readonly mode
[x] revert: switch "phabricator_server"
[x] revert:revert: discovery record
[x] revert:revert: dumps sync source
[x] revert:revert: varnish
[x] revert:revert: mail routing
[] re-enable puppet on both servers
[x] restart ssh-phab service to make it listten on IPv6 (it wasn't because puppet starts the service before it adds the v6 IP on the interface) that cleared Icinga alerts
[x] delete stale confd files on puppetmaster to clear more Icinga alerts about confd template compilation failing (because reimage script crashed so it did not get to delete them)
[x] check status of git-ssh (that wasn't switched twice) https://gerrit.wikimedia.org/r/c/operations/puppet/+/554957 (!) and removed the lo:LVS IPs (v4 and v6!) from interface on phab1003, restart ssh-phab
[x] make phd run on correct server https://gerrit.wikimedia.org/r/c/operations/puppet/+/554960 to avoid breakage of repos
[] remove downtimes in icinga