Blockers:
- T144255 Migrate tools and misc data to labstore1004
- T144633 Setup monitoring for secondary HA cluster
- T127567 revise/fix labstore replicate backup jobs
- T146153 Performance test new secondary labstore HA cluster
MIGRATION PLAN
Prior to the maintenance window
[x] Set up NFS kernel server on labstore1004/5 (DRBD Primary) [DONE]
[x] Define tools as a labstore-secondary mount [DONE]
[] Mount tools from labstore1001 and labstore-secondary simultaneously (need to add tools as a new mount name in nfs-mounts.yaml)
During migration:
[] Update lists/irc channel on start of migration
[] Silence shinken alarms
[] Disable puppet across tools
[] K8S Master - Unmount NFS, stop process that depends on NFS
[] Grid Master/Shadow - Stop the grid master process
[] Deploy webservice debian package that sends logs to /dev/null (T149946)
[] Restart all tools-webgrid* nodes to apply new webservice package
[] Make tools share on labstore1001 Read Only
[] Snapshot and latest sync of the tools share
[] Run nfs-exportd on labstore-secondary to make sure the mount is exported to all tools hosts <== comes w/ nfs-manage up
[] Merge gerrit patch to remove the mount from 1001 (Removing defn from nfs-mount.yaml), and symlink mount path - /mnt/nfs/labstore-secondary-tools/project to /data/project and /mnt/nfs/labstore-secondary-tools/home to /home on tools
[] Enable puppet on few tools hosts at a time and roll out above patch to each node by running puppet. Reboot each exec node after running puppet.
Post Migration
[] Follow up on lists/irc