Page MenuHomePhabricator

VRTS Switchover Process
Open, MediumPublic

Description

What will the VRTS failover process look like?

These are just random bullet points from unrefined ideas that came up in a meeting:

  • Announce Downtime: IRC etc.
  • Swap Configs
  • Database??
  • Update DNS

https://wikitech.wikimedia.org/wiki/VRT_System/Failover

Event Timeline

LSobanski triaged this task as Medium priority.Mar 29 2023, 11:40 AM

Cookbook cookbooks.sre.ganeti.reimage was started by aokoth@cumin1001 for host vrts1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by aokoth@cumin1001 for host vrts1001.eqiad.wmnet with OS bullseye completed:

  • vrts1001 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202305082010_aokoth_3063840_vrts1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed

Change 927749 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] vrts: separate install & ugprade vrts scripts

https://gerrit.wikimedia.org/r/927749

Change 927749 merged by AOkoth:

[operations/puppet@production] vrts: separate install & ugprade vrts scripts

https://gerrit.wikimedia.org/r/927749

Change 928084 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] vrts: post script cleanup & export variables

https://gerrit.wikimedia.org/r/928084

Change 928084 merged by AOkoth:

[operations/puppet@production] vrts: post script cleanup & export variables

https://gerrit.wikimedia.org/r/928084

Change 928133 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] vrts: Fix issue in install script

https://gerrit.wikimedia.org/r/928133

Change 928133 merged by AOkoth:

[operations/puppet@production] vrts: Fix issue in install script

https://gerrit.wikimedia.org/r/928133

Change 928136 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] vrts: use variables in rsyncquickdatacopy

https://gerrit.wikimedia.org/r/928136