Page MenuHomePhabricator

Switchover es5 master (es1024 -> es1023)
Closed, ResolvedPublic

Description

When: Anytime, writes will be disabled

Checklist:

NEW primary: es1023
OLD primary: es1024

  • Check configuration differences between new and old primary:
sudo pt-config-diff --defaults-file /root/.my.cnf h=es1024.eqiad.wmnet h=es1023.eqiad.wmnet

Failover prep:

  • Silence alerts on all hosts:
sudo cookbook sre.hosts.downtime --hours 1 -r "Primary switchover es5 T317739" 'A:db-section-es5'
  • Set NEW primary with weight 0 (and depool it from API or vslow/dump groups if it is present).
sudo dbctl instance es1023 set-weight 0
sudo dbctl config commit -m "Set es1023 with weight 0 T317739"
  • Topology changes, move all replicas under NEW primary
sudo db-switchover --timeout=25 --only-slave-move es1024 es1023
  • Disable puppet on both nodes
sudo cumin 'es1023* or es1024*' 'disable-puppet "primary switchover T317739"'

Failover:

  • Log the failover:
!log Starting es5 eqiad failover from es1024 to es1023 T317739
  • Switch primaries:
sudo db-switchover --skip-slave-move es1024 es1023
echo "===== es1024 (OLD)"; sudo db-mysql es1024 -e 'show slave status\G'
echo "===== es1023 (NEW)"; sudo db-mysql es1023 -e 'show slave status\G'
  • Promote NEW primary in dbctl, and remove read-only
sudo dbctl --scope eqiad section es5 set-master es1023
sudo dbctl config commit -m "Promote es1023 to es5 primary T317739"
  • Restart puppet on both hosts:
sudo cumin 'es1023* or es1024*' 'run-puppet-agent -e "primary switchover T317739"'

Clean up tasks:

  • Clean up heartbeat table(s).
sudo db-mysql es1023 heartbeat -e "delete from heartbeat where file like 'es1024%';"
  • change events for query killer:
events_coredb_master.sql on the new primary es1024
events_coredb_slave.sql on the new slave es1024
sudo dbctl instance es1024 set-candidate-master --section es5 true
sudo dbctl instance es1023 set-candidate-master --section es5 false
(dborch1001): sudo orchestrator-client -c untag -i es1023 --tag name=candidate
(dborch1001): sudo orchestrator-client -c tag -i es1024 --tag name=candidate
sudo db-mysql db1115 zarcillo -e "select * from masters where section = 'es5';"
  • (If needed): Depool es1024 for maintenance.
sudo dbctl instance es1024 depool
sudo dbctl config commit -m "Depool es1024 T317739"
  • Change es1024 weight to mimic the previous weight es1023:
sudo dbctl instance es1024 edit
  • Update/resolve this ticket.

Event Timeline

Marostegui updated the task description. (Show Details)
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui renamed this task from Switchover es4 master (es1024 -> es1023) to Switchover es5 master (es1024 -> es1023).Sep 14 2022, 7:43 AM

Change 832152 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/mediawiki-config@master] db-production.php: Disable writes in es5

https://gerrit.wikimedia.org/r/832152

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:44:17Z] <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:44:33Z] <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739

Change 832152 merged by jenkins-bot:

[operations/mediawiki-config@master] db-production.php: Disable writes in es5

https://gerrit.wikimedia.org/r/832152

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:46:18Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Set es1023 with weight 0 T317739', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json

Change 832153 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Promote es1023 to es5 master

https://gerrit.wikimedia.org/r/832153

Change 832154 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Update es5-master CNAME

https://gerrit.wikimedia.org/r/832154

Change 832153 merged by Marostegui:

[operations/puppet@production] mariadb: Promote es1023 to es5 master

https://gerrit.wikimedia.org/r/832153

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:50:02Z] <marostegui@deploy1002> Synchronized wmf-config/db-production.php: Disable writes on es5 T317739 (duration: 04m 13s)

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:55:01Z] <marostegui> Starting es5 eqiad failover from es1024 to es1023 T317739

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:55:50Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Promote es1023 to es5 primary T317739', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json

Change 832154 merged by Marostegui:

[operations/dns@master] wmnet: Update es5-master CNAME

https://gerrit.wikimedia.org/r/832154

Mentioned in SAL (#wikimedia-operations) [2022-09-14T07:57:23Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es1024 T317739', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json

Mentioned in SAL (#wikimedia-operations) [2022-09-14T08:02:53Z] <marostegui@deploy1002> Synchronized wmf-config/db-production.php: Enable writes on es5 T317739 (duration: 03m 38s)

Old master (es1024) upgraded