Page MenuHomePhabricator

Implement (or refactor) a script to move slaves when the master is not available
Open, MediumPublic

Description

Right now we use repl.pl to move slaves around:

ie: when a master failover is needed, we use it to move all the slaves under the new master.

However, this script doesn't work when the master is unavailable.

It would be a good start to either refactor repl.pl or create a new script that could move slaves under a different host when the master is unavailable.

ie: master has crashed and we have to move all the slaves to replicate from the candidate master during an emergency.

Details

TitleReferenceAuthorSource BranchDest Branch
[WIP] Introduce emergency switchoverrepos/sre/wmfmariadbpy!3ladsgroupemergency_switchovermain
Customize query in GitLab

Event Timeline

Marostegui triaged this task as Medium priority.Jun 4 2018, 1:12 PM
Marostegui created this task.
Marostegui moved this task from Triage to Backlog on the DBA board.
Vvjjkkii renamed this task from Implement (or refactor) a script to move slaves when the master is not available to pobaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Marostegui renamed this task from pobaaaaaaa to Implement (or refactor) a script to move slaves when the master is not available.Jul 2 2018, 5:15 AM
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)

@jcrespo - I have been thinking about this ticket lately.
Given that switchover.py works so well already, do you think it would be doable to do a --emergency-slave-switch $new_master (or whatever option) to be able to move the slaves under a given host without checking the master?
This would allow us to do emergency failovers if a master isn't reachable - obviously this needs to be execute carefully, but during an emergency, it can simplify the process of having to execute the change master host to the preferred host.
A human should still check:

  1. Which is the host that is most advanced in terms of replication to promote that one (in the case that all the hosts didn't stop in the same position)
  2. The preferred host is running STATEMENT.

Sadly switchover.py wouldn't be reusable or helpful (the replication and other libraries may be) for an emergency- it has to start from 0. Switchover.py assumes all hosts are reachable and have very low lag, replication is working, etc. which won't be the case on a failover. A failover is a much harder case where every possibility of breakage has to be contemplated separately and some safe compromises have to be taken (e.g. what to do if we detect X amount of data has been lost).

Ah, I see!.
Yeah, I was thinking about a very primitive way to do it (for now), which would require human intervention to decide which is the most suitable host to be the new master and then the script to actually execute the batch of change master to master host.

Ah, I see!.
Yeah, I was thinking about a very primitive way to do it (for now), which would require human intervention to decide which is the most suitable host to be the new master and then the script to actually execute the batch of change master to master host.

Yeah, I understood you that -not a fully automated and autonomous script- but even that is not easy and still not reusable, as it would have to make it without using the master, and the requires arbitrary master changes that neither gtid nor WMFReplication.move() allow yet. We would need to implement binlog position matching first, and a way to detect replicas from a master down (tendril replacement "zarcillo" database?). All doable, but not immediate or reusable from existing code.

and a way to detect replicas from a master down (tendril replacement "zarcillo" database?).

Good point - with the master down there is not a canonical place to detect which hosts are hanging apart from tendril/zarcillo indeed.

With the great work done by @Ladsgroup at T281249: Create or modify an existing tool that quickly shows the db replication status in case of master failure I think we are a step closer to get this done.
Once we have that script, we could implement another one based on that one (rather than refactor db-switchover) which would take care of, once passed, the right candidate master, simply configure replication on all the other replicas.

The safety measure the script should be to disallow hosts that have the following items:

  • Multi-instance
  • Other slaves hanging
  • binlog format not STATEMENT
  • Not in the active DC

@Ladsgroup would you be ok working on this task?

Definitely. I can start next week.

@Ladsgroup: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

@Marostegui To get the list of direct replicas, something like this would work in cumin:

1import argparse
2import json
3
4import requests
5
6parser = argparse.ArgumentParser()
7parser.add_argument('section', help='Must be the section name in orchestrator')
8args = parser.parse_args()
9data_ = requests.get(
10 'https://orchestrator.wikimedia.org/api/cluster/alias/' +
11 args.section).json()
12db_data = []
13for db in data_:
14 analyzed_db = {}
15 if db['MasterKey']['Hostname'] + ':' + \
16 str(db['MasterKey']['Port']) != db['ClusterName']:
17 # not a direct replica
18 continue
19 db_data.append(db['Key']['Hostname'] + ':' + str(db['Key']['Port']))
20
21print('direct replicas')
22for db in db_data:
23 print(json.dumps(db))

Which outputs something like this:

ladsgroup@cumin1002:~/ladsgroup/software2/dbtools$ python3 direct_replicas.py s2
direct replicas
"db1156.eqiad.wmnet:3306"
"db1182.eqiad.wmnet:3306"
"db1188.eqiad.wmnet:3306"
"db1197.eqiad.wmnet:3306"
"db1222.eqiad.wmnet:3306"
"db1225.eqiad.wmnet:3312"
"db1229.eqiad.wmnet:3306"
"db1233.eqiad.wmnet:3306"
"db1239.eqiad.wmnet:3312"
"db1246.eqiad.wmnet:3306"
"db2207.codfw.wmnet:3306"
"dbstore1007.eqiad.wmnet:3312"

(it only works from cumin)

We can make it find direct replicas for secondary dc too. The hardest part is to refactor db-switchover to take that list (in itself it's not hard, it's that it assumes in many places that the old master is reachable which is a good assumption for the main usecase). Maybe I should copy paste that into a new file and see what happens.

@Marostegui To get the list of direct replicas, something like this would work in cumin:

1import argparse
2import json
3
4import requests
5
6parser = argparse.ArgumentParser()
7parser.add_argument('section', help='Must be the section name in orchestrator')
8args = parser.parse_args()
9data_ = requests.get(
10 'https://orchestrator.wikimedia.org/api/cluster/alias/' +
11 args.section).json()
12db_data = []
13for db in data_:
14 analyzed_db = {}
15 if db['MasterKey']['Hostname'] + ':' + \
16 str(db['MasterKey']['Port']) != db['ClusterName']:
17 # not a direct replica
18 continue
19 db_data.append(db['Key']['Hostname'] + ':' + str(db['Key']['Port']))
20
21print('direct replicas')
22for db in db_data:
23 print(json.dumps(db))

Which outputs something like this:

ladsgroup@cumin1002:~/ladsgroup/software2/dbtools$ python3 direct_replicas.py s2
direct replicas
"db1156.eqiad.wmnet:3306"
"db1182.eqiad.wmnet:3306"
"db1188.eqiad.wmnet:3306"
"db1197.eqiad.wmnet:3306"
"db1222.eqiad.wmnet:3306"
"db1225.eqiad.wmnet:3312"
"db1229.eqiad.wmnet:3306"
"db1233.eqiad.wmnet:3306"
"db1239.eqiad.wmnet:3312"
"db1246.eqiad.wmnet:3306"
"db2207.codfw.wmnet:3306"
"dbstore1007.eqiad.wmnet:3312"

(it only works from cumin)

We can make it find direct replicas for secondary dc too. The hardest part is to refactor db-switchover to take that list (in itself it's not hard, it's that it assumes in many places that the old master is reachable which is a good assumption for the main usecase). Maybe I should copy paste that into a new file and see what happens.

Keep in mind that you don't really need the list of replicas for the secondary, if you have that master, that is all you need. You don't really need to touch the secondary replicas. That is: all you need would be to reconfigure db2207 to replicate under the new primary master, their replicas don't need anything.

I agree, let's create db-emergency-switchover and work there without touching the current db-switchover for now.