Page MenuHomePhabricator

Switchover es5 master
Closed, ResolvedPublic

Description

Part of T300006: Upgrade es5 to Bullseye

When: FIXME

Checklist:

  • Create a task to communicate the chosen date and send an announcement to the community: FIXME
  • Create a calendar entry for the maintenance, invite sre-data-persistence@
  • Add to deployments calendar. E.g.:
{{Deployment calendar event card
    |when=2021-08-24 23:00 SF
    |length=0.5
    |window=Database primary switchover for s7
    |who={{ircnick|kormat|Stevie Beth Mhaol}}, {{ircnick|marostegui|Manuel 'Early Bird' Arostegui}}, {{ircnick|Amir1|Amir}}
    |what=https://phabricator.wikimedia.org/T300006
}}

NEW primary: es1023
OLD primary: es1024

  • Check configuration differences between new and old primary:
sudo pt-config-diff --defaults-file /root/.my.cnf h=es1024.eqiad.wmnet h=es1023.eqiad.wmnet

Failover prep:

  • Silence alerts on all hosts:
sudo cookbook sre.hosts.downtime --hours 1 -r "Primary switchover es5 T300006" 'A:db-section-es5'
  • Set NEW primary with weight 0
sudo dbctl instance es1023 set-weight 0
sudo dbctl config commit -m "Set es1023 with weight 0 T300006"
  • Topology changes, move all replicas under NEW primary
sudo db-switchover --timeout=15 --only-slave-move es1024 es1023
  • Disable puppet on both nodes
sudo cumin 'es1024* or es1023*' 'disable-puppet "primary switchover T300006"'

Failover:

  • Log the failover:
!log Starting es5 eqiad failover from es1024 to es1023 - T300006
sudo dbctl --scope eqiad section es5 ro "Maintenance until 05:15 UTC - T300006"
sudo dbctl config commit -m "Set es5 eqiad as read-only for maintenance - T300006"
  • Check es5 is indeed read-only
  • Switch primaries:
sudo db-switchover --skip-slave-move es1024 es1023
echo "===== es1024 (OLD)"; sudo db-mysql es1024 -e 'show slave status\G'
echo "===== es1023 (NEW)"; sudo db-mysql es1023 -e 'show slave status\G'
  • Promote NEW primary in dbctl, and remove read-only
sudo dbctl --scope eqiad section es5 set-master es1023
sudo dbctl --scope eqiad section es5 rw
sudo dbctl config commit -m "Promote es1023 to es5 primary and set section read-write T300006"
  • Restart puppet on both hosts:
sudo cumin 'es1024* or es1023*' 'run-puppet-agent -e "primary switchover T300006"'

Clean up tasks:

  • Clean up heartbeat table(s).
  • change events for query killer:
events_coredb_master.sql on the new primary es1023
events_coredb_slave.sql on the new slave es1024
sudo dbctl instance es1024 set-candidate-master --section es5 true
sudo dbctl instance es1023 set-candidate-master --section es5 false
sudo dbctl instance es1024 depool
sudo dbctl config commit -m "Depool es1024 T300006"
  • Apply outstanding schema changes to es1024 (if any)
  • Update/resolve this ticket.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ladsgroup triaged this task as Medium priority.Feb 4 2022, 1:48 PM
Ladsgroup updated the task description. (Show Details)
Ladsgroup moved this task from Triage to In progress on the DBA board.

Change 762557 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] db-production: Stop writes to es5

https://gerrit.wikimedia.org/r/762557

Change 762558 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mariadb: Promote es1023 to es5 master

https://gerrit.wikimedia.org/r/762558

Change 762557 merged by jenkins-bot:

[operations/mediawiki-config@master] db-production: Stop writes to es5

https://gerrit.wikimedia.org/r/762557

Mentioned in SAL (#wikimedia-operations) [2022-02-15T10:01:25Z] <ladsgroup@deploy1002> Synchronized wmf-config/db-production.php: Config: [[gerrit:762557|db-production: Stop writes to es5 (T300976)]] (duration: 00m 49s)

Change 762558 merged by Ladsgroup:

[operations/puppet@production] mariadb: Promote es1023 to es5 master

https://gerrit.wikimedia.org/r/762558

Change 762751 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] Revert \"db-production: Stop writes to es5\"

https://gerrit.wikimedia.org/r/762751

Change 762751 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert \"db-production: Stop writes to es5\"

https://gerrit.wikimedia.org/r/762751

Mentioned in SAL (#wikimedia-operations) [2022-02-15T10:23:26Z] <ladsgroup@deploy1002> Synchronized wmf-config/db-production.php: Config: [[gerrit:762751|Revert "db-production: Stop writes to es5" (T300976)]] (duration: 00m 55s)

Change 762779 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/dns@master] Update es5 master

https://gerrit.wikimedia.org/r/762779

Change 762779 merged by Ladsgroup:

[operations/dns@master] Update es5 master

https://gerrit.wikimedia.org/r/762779

Ladsgroup updated the task description. (Show Details)
Ladsgroup moved this task from In progress to Done on the DBA board.

Congrats on your first switchover!