Page MenuHomePhabricator

Add a safe failover for analytics1003
Closed, ResolvedPublic

Description

The analytics1003 host runs a mariadb database and various Hadoop daemons, like Hive (server/metastore), Camus (executed periodically via cron), Oozie, etc.. The database is also used by other hosts/clusters like:

  • Druid analytics (druid100[1-3])
  • Druid public (druid100[4-6])
  • Thorium (Hue)

We currently back up the database via a LVM snapshot copied to analytics1002, without stopping Mariadb first (so there might be a chance that the snapshot used in a restore emergency operation leads to a corrupted database).

The database is ~13G in /var/lib/mysql, and it is a relatively low volume/traffic. The main issue though is that:

  1. If analytics1003 goes down temporarily, then Druid might also be momentarily impacted (and also Hue).
  1. if analytics1003 goes down permanently (hw failure), then all the Hadoop related scheduled and recurrent jobs will be stopped too.

Ideally we should:

  1. Have a backup host somewhere, maybe in Ganeti
  1. Have a mysql automatic failover in case the database on analytics1003 is not reachable.
  1. Use the new host to back up the database, maybe via LVM and periodically via Bacula.

In case of complete failure of analytics1003 we could temporarily apply there the analytics_cluster coordinator's role and keep going with Hadoop jobs, until the original host is fixed.

Event Timeline

elukey triaged this task as Medium priority.Jun 25 2018, 1:00 PM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 25 2018, 1:00 PM

there might be a chance that the snapshot used in a restore emergency operation leads to a corrupted database

Is there? We don't stop Mariadb,but mylvmbackup locks the tables (and flushes writes?) before taking the snapshot.

there might be a chance that the snapshot used in a restore emergency operation leads to a corrupted database

Is there? We don't stop Mariadb,but mylvmbackup locks the tables (and flushes writes?) before taking the snapshot.

I had a chat with @Marostegui in Prague and IIUC without stopping mariadb there might be chance since some pages are kept in memory and not flushed to disk until a stop is issued.

Hm! interesting.

That is correct. It might or might not work. MariaDB will go thru a normal InnoDB recovery process (like if it had crashed). So there are chances that it might work, but it can also end up with corruption.
If the database is small enough, you guys might want to try mydumper, which takes a logical backup: https://wikitech.wikimedia.org/wiki/MariaDB#Dumping_tables_with_mydumper

elukey added a subscriber: jcrespo.Jun 27 2018, 1:08 PM

@jcrespo, @Marostegui: I'd really like to get your opinion on this if you have time. What I am currently trying to do is get rid of the single point of failures in the Analytics infrastructure, and at the moment most of them are of course databases :) My intent is not to offload the work to you or similar, but just to get some guidelines about what's best.

In this case, we'd like to set up an automatic failover if possible, but we'd need to get the support from the dbproxy layer that you guys maintain. Is it possible or out of the question for this use case? If not, we can surely go forward with a simple mysql replica and manual failover in case it is needed.

In this case, we'd like to set up an automatic failover if possible

The dbproxy roles should be generic enough, for writes and reads respectively (one is supposed to point to a master, and fails over to a replica, but does not recover on master recovering, -to implement a soft STONITH- the other is to point to a set of replicas in read-only mode and have load-balancing, too).

You can see them on "profile/manifests/mariadb/proxy/{master,replicas}.pp"

An example configuration is:

jynus@sangai:/mnt/jynus/puppet/hieradata/hosts$ cat dbproxy1010.yaml 
profile::mariadb::proxy::replicas::servers:
  labsdb1010:
    address: '10.64.37.23:3306'
  labsdb1011:
    address: '10.64.37.24:3306'
profile::mariadb::proxy::firewall: 'disabled'
jynus@sangai:/mnt/jynus/puppet/hieradata/hosts$ cat dbproxy1011.yaml 
profile::mariadb::proxy::master::primary_name: 'labsdb1009'
profile::mariadb::proxy::master::primary_addr: '10.64.4.14:3306'
profile::mariadb::proxy::master::secondary_name: 'labsdb1010'
profile::mariadb::proxy::master::secondary_addr: '10.64.37.23:3306'
profile::mariadb::proxy::firewall: 'disabled'

The only generic part that may need tuning for you is implementing new firewall profiles.

What we don't have is spare new hosts (we have spares but to be decommisioned- only of for temporary testing). e.g. dbproxy1004 and dbproxy1009 are the old proxies for eventlogging and idle, but have to be refreshed soon.

If you need more advanced proxying, we also have proxysql, but only the module and debian package, without a lot of actual deploys (needs more work, but it is supposed to be on production soon):

modules/profile/manifests/proxysql.pp
modules/proxysql/manifests/init.pp

Unlike other services, actually I wouldn't have any issue with just moving this to db misc cluster- basically owning everything at db layer, from high availability to backups. I would see, however, some things we would need to solve for that to happen:

  • We need to check available hw resources, and separation between services (as I said above we are a bit low at the moment on proxies until we purchase more). I would also need you to have configurable port addresses for the applications (e.g. hiera keys)
  • All misc hosts are in the production network, so I am not sure that is compatible with your current setup (does it need to be on an analytics-network?)
Vvjjkkii renamed this task from Add a safe failover for analytics1003 to qcaaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
elukey renamed this task from qcaaaaaaaa to Add a safe failover for analytics1003.Jul 2 2018, 6:02 AM
elukey claimed this task.
elukey updated the task description. (Show Details)

Sorry for the delay!

Unlike other services, actually I wouldn't have any issue with just moving this to db misc cluster- basically owning everything at db layer, from high availability to backups.

I think that this would be awesome! Before starting though, I have to warn you that this mysql instance carries multiple (low volume) databases:

show databases

| druid              |
| druid_public_eqiad |
| hive_metastore     |
| hue                |
| oozie              |
| superset           |

They are basically all our analytics tools that need a db to store their settings. Not sure if we'd need to move all of them to separate instances, just want to verify this with you first. The amount of maintenance that we did in the past is relatively small, mostly due to software upgrades.

I would see, however, some things we would need to solve for that to happen:

  • We need to check available hw resources, and separation between services (as I said above we are a bit low at the moment on proxies until we purchase more). I would also need you to have configurable port addresses for the applications (e.g. hiera keys)

This shouldn't be a big issue..

  • All misc hosts are in the production network, so I am not sure that is compatible with your current setup (does it need to be on an analytics-network?)

This should be ok since we already have some dbproxies whitelisted in the analytics vlan's firewall, so it should be a matter of adding another one.

Thanks for this discussion!

CommunityTechBot lowered the priority of this task from High to Medium.Jul 5 2018, 7:06 PM
elukey added a comment.Jul 9 2018, 7:37 AM

Any comment about this? (No rush/hurry, I am just checking my open tasks :)

This should be ok since we already have some dbproxies whitelisted in the analytics vlan's firewall, so it should be a matter of adding another one.

I don't think that is ok- we need at least 2 proxies and 2 dedicated databases on the analytics network (4 each if we go multi-dc).

Regarding a possible migration:

  • Can you provide some details about the size of the databases? (something like a du -sh . on the data directory would be ok to have an estimation).
  • How easy would be to schedule a read-only window for all those services whilst we do the migration to the selected misc cluster?
elukey added a comment.Jul 9 2018, 7:48 AM

This should be ok since we already have some dbproxies whitelisted in the analytics vlan's firewall, so it should be a matter of adding another one.

I don't think that is ok- we need at least 2 proxies and 2 dedicated databases on the analytics network (4 each if we go multi-dc).

Sorry my understanding when I read about the db misc cluster was that we'd have checked for room on existing db hosts in the production network, not the need for new hosts from Analytics (the latter use case is not possible for us, too much overhead plus we don't have budget for such an infrastructure for this fiscal).

Regarding a possible migration:

  • Can you provide some details about the size of the databases? (something like a du -sh . on the data directory would be ok to have an estimation).
elukey@analytics1003:~$ sudo du -hs /var/lib/mysql/
14G	/var/lib/mysql/
  • How easy would be to schedule a read-only window for all those services whilst we do the migration to the selected misc cluster?

It should be something to plan since it would stop a big chunk of the regular Hadoop jobs, but I don't think it would be a massive problem (it would just need a bit of time to alert people etc..).

Thanks!

Any comment about my last entry? Sorry to ping you guys, maybe a quick meeting between the three of us would be better?

elukey moved this task from Backlog to In Progress on the User-Elukey board.Jul 18 2018, 1:57 PM

This should be ok since we already have some dbproxies whitelisted in the analytics vlan's firewall, so it should be a matter of adding another one.

I don't think that is ok- we need at least 2 proxies and 2 dedicated databases on the analytics network (4 each if we go multi-dc).

Sorry my understanding when I read about the db misc cluster was that we'd have checked for room on existing db hosts in the production network, not the need for new hosts from Analytics (the latter use case is not possible for us, too much overhead plus we don't have budget for such an infrastructure for this fiscal).

I think this still needs a bit of clarification because I am not sure whether Jaime meant existing hosts or buy new ones

@jcrespo any feedback for this task? I am really sorry to keep pinging for a non planned request, I'd like to get a sense if it is possible or not and think about alternative solutions. I can say for sure that the Analytics team would not have any budget/resource to maintain new proxy/db/etc.. hosts, so in that case I'll decline the task :)

I think there is a confusion here, I apologize. What I am trying to say is that I don't think that for an internal analytics database service is ok to have a proxy outside of the network that points back to a database inside the network. I hope you agree with me that is quite a strange setup- considering we can put a proxy inside instead with no issue (we can later see how, but that is another discussion). Sharing resources is also a bad idea that leads to outages- for example, m5 has been lately quite instable, but at least it is caused and "eaten" by cloud team :-) My suggestion is to setup a, let's call it, "m4" with 2 proxies and 2 dedicated databases and put it inside analytics network. We have the proxies- the m4 ones you stopped using, and for the databases, can you spare 2 separate hosts, even if not fully dedicated? However, if it is not dedicated, normally we will not support them (dbas will not manage them) because half of the services there will cause the database to interact/fail, there are maintenance that we cannot do (restarts) because we depend on it, and other inconveniences, and make us impossible to manager properly due to being blocked on special attention needed by shared resources- e.g. like we do with cloud, we will ping you that we need to do a rolling restart, and you will have to take care of those yourselves (that is why we recommend not sharing from our point of view resources with other services). I hope it is clear now.

Thanks for the answer! So I agree that having proxies outside the analytics network that route data back into it is not a great idea, my question about moving the analytics1003's db away from the analytics network to a shared database misc cluster. This idea is not great too as far as I can read, but we (as analaytics) can't afford a 2 proxy/2 database infrastructure only for the analytics1003's database, it would be way too much for the work that it needs to do. So I'd say that the only possibility left could be to use another host/vm as mysql replica, without automatic failover of course. I'll see with Andrew what's best!

Please consider asking for those extra hw resources, you shouldn't assume those cannot be available (although of course, you shouldn't assume those are either). But they for sure won't if you don't ask for them.

elukey closed this task as Resolved.Aug 27 2018, 8:56 AM