Page MenuHomePhabricator
Paste P12757

(An Untitled Masterwork)

Authored by Kormat on Sep 23 2020, 12:19 PM.
Referenced Files
F32360852: raw-paste-data.txt
Sep 23 2020, 12:20 PM
F32360851: raw-paste-data.txt
Sep 23 2020, 12:19 PM
Another reason to remove load groups where possible is they make it very difficult to predict what effect depooling a db server will have on the query load over the rest of the section. Say a host has some weight in one or more groups as well as the main traffic group, and is receiving 15K qps. We can't tell from the DBA side what proportions of the incoming traffic correspond to what groups. Depooling it might spread 15K qps over all hosts in the section, or it might dump 15K qps into the other host in the same group as it.

Event Timeline

Kormat edited the content of this paste. (Show Details)

Steps and checklist:


NEW master: db1104
OLD master: db1109

  • Check current topology: db-replication-tree db1109
  • Check configuration differences between new and old master: pt-config-diff h=db1109.eqiad.wmnet,F=/root/.my.cnf h=db1104.eqiad.wmnet,F=/root/.my.cnf
  • Silence alerts on all hosts: cookbook sre.hosts.downtime --hours 1 --reason "switchover to db1104 T239238" '(A:db-section-s8 and A:eqiad) or A:db-labsdb'
  • Set NEW master with weight 0: dbctl instance db1104 set-weight 0 && dbctl config commit -m "Set db1104 with weight 0 T239238"
  • Topology changes, connect everything to db1104: db-switchover --timeout=15 --only-slave-move db1109.eqiad.wmnet db1104.eqiad.wmnet
  • Disable puppet @db1104 and @db1109: cumin 'db110[4,9].eqiad.wmnet' 'disable-puppet "switchover to db1104 T239238"'
  • Merge gerrit puppet change to promote db1104: TBD


  • Start the failover: !log Starting s8 eqiad failover from db1109 to db1104 - T239238
  • Topology changes, move old master beneath new master: db-switchover --replicating-master --read-only-master db1109 db1104
  • Give weight to db1109 (old master): dbctl instance db1109 set-weight 300
  • Promote db1104 as new master: dbctl --scope eqiad section s8 set-master db1104 && dbctl config commit -m "Promote db1104 on s8 eqiad master T239238"
  • Restart puppet on old and new masters (for heartbeat): cumin 'db110[4,9].eqiad.wmnet' 'run-puppet-agent -e "switchover to db1104 T239238"'

Clean up tasks:

  • change events for query killer:
events_coredb_master.sql on the new master db1104
events_coredb_slave.sql on the new slave db1109
  • Update DNS: TBD
  • Update/resolve phabricator ticket about failover TBD
  • Update candidate master dbctl notes and pick new candidate master: db1109