Page MenuHomePhabricator

Switchover s8 master (db1109 -> db1126)
Closed, InvalidPublic


When: During a pre-defined DBA maintenance windows


  • Team calendar invite

Affected wikis::


NEW primary: db1126
OLD primary: db1109

  • Check configuration differences between new and old primary:
sudo pt-config-diff --defaults-file /root/.my.cnf h=db1109.eqiad.wmnet h=db1126.eqiad.wmnet

Failover prep:

  • Silence alerts on all hosts:
sudo cookbook sre.hosts.downtime --hours 1 -r "Primary switchover s8 T330988" 'A:db-section-s8'
  • Set NEW primary with weight 0 (and depool it from API or vslow/dump groups if it is present).
sudo dbctl instance db1126 set-weight 0
sudo dbctl config commit -m "Set db1126 with weight 0 T330988"
  • Topology changes, move all replicas under NEW primary
sudo db-switchover --timeout=25 --only-slave-move db1109 db1126
  • Disable puppet on both nodes
sudo cumin 'db1109* or db1126*' 'disable-puppet "primary switchover T330988"'


  • Log the failover:
!log Starting s8 eqiad failover from db1109 to db1126 - T330988
  • Set section read-only:
sudo dbctl --scope eqiad section s8 ro "Maintenance until 06:15 UTC - T330988"
sudo dbctl config commit -m "Set s8 eqiad as read-only for maintenance - T330988"
  • Check s8 is indeed read-only
  • Switch primaries:
sudo db-switchover --skip-slave-move db1109 db1126
echo "===== db1109 (OLD)"; sudo db-mysql db1109 -e 'show slave status\G'
echo "===== db1126 (NEW)"; sudo db-mysql db1126 -e 'show slave status\G'
  • Promote NEW primary in dbctl, and remove read-only
sudo dbctl --scope eqiad section s8 set-master db1126
sudo dbctl --scope eqiad section s8 rw
sudo dbctl config commit -m "Promote db1126 to s8 primary and set section read-write T330988"
  • Restart puppet on both hosts:
sudo cumin 'db1109* or db1126*' 'run-puppet-agent -e "primary switchover T330988"'

Clean up tasks:

  • Clean up heartbeat table(s).
sudo db-mysql db1126 heartbeat -e "delete from heartbeat where file like 'db1109%';"
  • change events for query killer:
events_coredb_master.sql on the new primary db1126
events_coredb_slave.sql on the new slave db1109
  • Update candidate primary dbctl and orchestrator notes
sudo dbctl instance db1109 set-candidate-master --section s8 true
sudo dbctl instance db1126 set-candidate-master --section s8 false
(dborch1001): sudo orchestrator-client -c untag -i db1126 --tag name=candidate
(dborch1001): sudo orchestrator-client -c tag -i db1109 --tag name=candidate
sudo db-mysql db1115 zarcillo -e "select * from masters where section = 's8';"
  • (If needed): Depool db1109 for maintenance.
sudo dbctl instance db1109 depool
sudo dbctl config commit -m "Depool db1109 T330988"
  • Change db1109 weight to mimic the previous weight db1126:
sudo dbctl instance db1109 edit
  • Apply outstanding schema changes to db1109 (if any)
  • Update/resolve this ticket.

Event Timeline

Change 893430 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] mariadb: Promote db1126 to s8 master

Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to Blocked on the DBA board.
Marostegui subscribed.

To be done after the eqiad row A switch maintenance is done (T329073) as the candidate master is on row A

The script needed to be updated after the dc switchover now (it has change for dns and such while it's not needed), I fixed it so you can fix this ticket or just abandon the patch and call switchmaster again.

Yeah but it has the check option for it and the stuff for --read-only-master in switchover, the usual difference between primary and non-primary dc.

Going to regenerate it

Change 893430 abandoned by Marostegui:

[operations/puppet@production] mariadb: Promote db1126 to s8 master