Page MenuHomePhabricator

Repurpose db1108 as generic Analytics db replica
Open, MediumPublic

Description

In T231208 some issue were highlighted:

  • the analytics dbs (matomo, superset, oozie, etc..) are all running on single db hosts without any replication, and taking backups following SRE best practices leads to issue like excessive lock contention between applications and backup software.
  • the current backups of the Analytics databases diverged a lot from the best practices that SRE follows and might lead to inconsistent snapshots in some cases.

In T231858 some issues were highlighted:

  • the log database on db1107 is way different from db1108, they can't be really exchanged without user noticing it in their query results.
  • the logdatabase contains historical data that would be nice to have available (read-only) for more time before completely relying on HDFS data. We still have the past 1.5y of Eventlogging data on HDFS already, we are sunsetting the Mysql support.
  • moving the log database to one of the dbstore nodes would require a lot of engineering time and probably not be the best solution in terms of availability and resource usage of the dbstore cluster.

Given the above points, I have a proposal for db1108:

  • after the deprecation of mysql-eventlogging, remove all eventlogging-related replication code.
  • repurpose it as generic analytics database replica: keep the log database as it is, and replicate the matomo, superset, etc.. from the Analytics db hosts (an-coord1001, matomo1001)
  • add mariadb-bacula backups configuration for db1108

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

How does the proposal sound?

Details

Related Gerrit Patches:

Event Timeline

elukey created this task.Oct 7 2019, 1:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 7 2019, 1:37 PM

I am fine with this plan. I assume this service will be still owned and maintained by Analytics, right? (Of course we can help with the setup and all that as we normally do).
What I want to make sure is that we are all on the same page about who owns and is the contact point for this new "service". As I believe the way it has been split with db1107/db1108 was worked well in all this years, no?

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

Personally, I believe that it also needs to be clear that this database on this particular host can go down anytime and might not be back (ie: catastrophic storage failure).

On a side note, what would happen with the current db1107?

elukey added a comment.Oct 7 2019, 1:50 PM

I am fine with this plan. I assume this service will be still owned and maintained by Analytics, right? (Of course we can help with the setup and all that as we normally do).
What I want to make sure is that we are all on the same page about who owns and is the contact point for this new "service". As I believe the way it has been split with db1107/db1108 was worked well in all this years, no?

+1, if possible it would be great to keep going with what we currently have for db1107/8. I'll probably need some help from you to set up the slave settings :)

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

Personally, I believe that it also needs to be clear that this database on this particular host can go down anytime and might not be back (ie: catastrophic storage failure).

This is understood, but given the fact that db1108 is still under warranty we should get a new replacement if a catastrophic failure happens right? The log db will have its own archived version stored, so we'll be able to re-create it if needed. The downtime is the part that might be very long, and that should be clear to users. Is my understanding correct or am I missing something?

On a side note, what would happen with the current db1107?

A gift for your team! :D

I am fine with this plan. I assume this service will be still owned and maintained by Analytics, right? (Of course we can help with the setup and all that as we normally do).
What I want to make sure is that we are all on the same page about who owns and is the contact point for this new "service". As I believe the way it has been split with db1107/db1108 was worked well in all this years, no?

+1, if possible it would be great to keep going with what we currently have for db1107/8. I'll probably need some help from you to set up the slave settings :)

Absolutely

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

Personally, I believe that it also needs to be clear that this database on this particular host can go down anytime and might not be back (ie: catastrophic storage failure).

This is understood, but given the fact that db1108 is still under warranty we should get a new replacement if a catastrophic failure happens right? The log db will have its own archived version stored, so we'll be able to re-create it if needed. The downtime is the part that might be very long, and that should be clear to users. Is my understanding correct or am I missing something?

The host is under warranty for 1 more year, but the data could be lost (ie: broken raid controller).

On a side note, what would happen with the current db1107?

A gift for your team! :D

<3 <3 <3

elukey added a comment.Oct 7 2019, 1:54 PM

Important note about the log database: the plan is to take a full snapshot of the db and archive it in HDFS before starting any procedure. It will be made clear that the log database will be available as read-only support with the caveat that any maintenance or host hw-event will require downtime. The Analytics team is going to maintain the log database as best effort support and it will be made clear to users.

Personally, I believe that it also needs to be clear that this database on this particular host can go down anytime and might not be back (ie: catastrophic storage failure).

This is understood, but given the fact that db1108 is still under warranty we should get a new replacement if a catastrophic failure happens right? The log db will have its own archived version stored, so we'll be able to re-create it if needed. The downtime is the part that might be very long, and that should be clear to users. Is my understanding correct or am I missing something?

The host is under warranty for 1 more year, but the data could be lost (ie: broken raid controller).

Sure, but the bacula backups for the analytics databases and the snapshot for the log database should be enough for this use case. It is not mission critical that we have every transaction to those databases, we can tolerate some loss, the important part is having regular backups.

Sure, I just wanted to make sure expectations for the users will be handled beforehand :)

Milimetric triaged this task as High priority.Oct 7 2019, 3:27 PM
Milimetric lowered the priority of this task from High to Medium.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

bacula backups for the analytics databases and the snapshot for the log database should be enough for this use case

Q, will the bacula backups also include the log database? Might be nice if it could, then if we had to restore this MySQL instances after a hw failure, the log database will continue to exist as is for users.

bacula backups for the analytics databases and the snapshot for the log database should be enough for this use case

Q, will the bacula backups also include the log database? Might be nice if it could, then if we had to restore this MySQL instances after a hw failure, the log database will continue to exist as is for users.

Probably @jcrespo can give more insights on that specific bit about the log database and bacula :)

Oh, I think it was a Q for Luca about how we intended to set that up. I assume we can do it either way. We wouldn't have to back up log to Bacula if it is too large since we'd have a snapshot in Hadoop, but it might be nice for convenience if a restored Bacula backup would restore the whole thing without an extra step of restoring log from the snapshot in Hadoop.

elukey added a comment.Oct 8 2019, 1:19 PM

If the log db can be stored in Bacula it would be great! Otherwise HDFS is fine in my opinion..

Marostegui changed the status of subtask T238113: Repurpose db1107 as a generic database from Open to Stalled.Tue, Nov 12, 4:48 PM

Removing the DBA tag from here, as there are no actionables for us (I will keep subscribed to the task just in case)
Once this task gets resolved, we should be able to get repurpose db1107, and for that I have made: T238113

@elukey can we move this to Analytics-Kanban?

Marostegui changed the status of subtask T238113: Repurpose db1107 as a generic database from Stalled to Open.Thu, Nov 21, 6:12 AM
elukey added a comment.EditedFri, Nov 22, 9:44 AM

High level plan:

  1. Upgrade db1108, if possible, to buster + last version of mariadb. Since we'd like to keep the current log database I'd prefer not to wipe the srv partition. Currently:
Debian GNU/Linux 9.11 (stretch)

elukey@db1108:~$ dpkg -l wmf-mariadb101
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                            Version              Architecture         Description
+++-===============================-====================-====================-===================================================================
ii  wmf-mariadb101                  10.1.39-1            amd64                MariaDB 10.1 with Wikimedia-specific patches.
  1. Change the current role name + mariadb config to something more generic
node 'db1108.eqiad.wmnet' {
    role(mariadb::misc::eventlogging::replica)
}

class role::mariadb::misc::eventlogging::replica {

    include ::profile::standard
    include ::profile::base::firewall
    ::profile::mariadb::ferm { 'eventlogging_replica': }
    include ::profile::mariadb::monitor

    include ::profile::mariadb::misc::eventlogging::database

    system::role { 'role::mariadb::misc::eventlogging::replica':
        description => 'Eventlogging Datastore Custom Replica',
    }
}

class profile::mariadb::misc::eventlogging::database (
    $shard  = hiera('profile::mariadb::misc::eventlogging::database::shard'),
    $master = hiera('profile::mariadb::misc::eventlogging::database::master'),
) {
[..]
    # History context: there used to be a distinction between
    # EL master and slaves, namely that only the master was not
    # in read only mode. The Analytics team removed this constraint
    # before deploying the eventlogging_cleaner script (T156933),
    # that needed to DELETE/UPDATE rows on the job database without
    # running as root for obvious reasons.
    class { 'mariadb::config':
        basedir       => $mariadb_basedir,
        config        => 'profile/mariadb/misc/eventlogging/eventlogging.my.cnf.erb',
        datadir       => '/srv/sqldata',
        tmpdir        => '/srv/tmp',
        socket        => $mariadb_socket,
        port          => 3306,
        read_only     => 0,
        ssl           => 'puppet-cert',
        p_s           => 'off',
        binlog_format => 'MIXED',
    }

Probably something like role::mariadb::misc::analytics::backup and profile::mariadb::misc::analytics::backup::database ?

  1. Add config to enable replication betwen matomo/analytics-meta (matomo1001/an-coord1001) to db1108. I have never done it before but IIUC it is all manual config.
  1. Add alarms about in sync slaves
  1. Add Bacula backup configuration and remove the current ones in various hosts.

@Marostegui does it sound good?

buster + last version of mariadb

Buster will install 10.3. We have available 10.3, 10.4 and percona-server 8.0. Those (and buster) are in theory supported, but please note that it is likely to have rough edge. However, we shouldn't be reimaging any host to stretch right now normally.

something more generic

mariadb::misc should work well. or multii-instance misc if you plant to replicate 2 services there.

buster + last version of mariadb

Buster will install 10.3. We have available 10.3, 10.4 and percona-server 8.0. Those (and buster) are in theory supported, but please note that it is likely to have rough edge. However, we shouldn't be reimaging any host to stretch right now normally.

something more generic

mariadb::misc should work well. or multii-instance misc if you plant to replicate 2 services there.

Thanks for the feedback! Since the dbs are very little I was planning to have only one mariadb instance acting as multi-source, didn't think about the multi instance. Do you think that multisource would be ok in this case?

High level plan:

  1. Upgrade db1108, if possible, to buster + last version of mariadb. Since we'd like to keep the current log database I'd prefer not to wipe the srv partition. Currently:

Keep in mind that buster ships 10.3, there are no packages for 10.1 on buster. I tried a few days ago and failed miserably :-). So keep in mind that if you go to buster, you'd be using 10.3
We do package 10.3 on our own (I need to check if the last version is on our repo) but a working version is there anyways.
Keep in mind that we are still not sure if we are going to go to 10.3, 10.4 or even vanilla MySQL in production. You can of course choose whatever is easier/more convenient. 10.3 is a perfectly working version and should give you not much issues.

  1. Add config to enable replication betwen matomo/analytics-meta (matomo1001/an-coord1001) to db1108. I have never done it before but IIUC it is all manual config.

Yes, replication threads are set up manually.
You would need to add grants for that - we can help with it. It is not a big deal.

  1. Add alarms about in sync slaves

+1 (you'd need to evaluate whether you want it to page or just IRC). If the slaves aren't critical, maybe IRC should be enough.

  1. Add Bacula backup configuration and remove the current ones in various hosts.

Keep in mind that @jcrespo is actively working on the new bacula setup, so make sure to be in sync with him to avoid stepping on each other's toes.

High level plan:

  1. Upgrade db1108, if possible, to buster + last version of mariadb. Since we'd like to keep the current log database I'd prefer not to wipe the srv partition. Currently:

Keep in mind that buster ships 10.3, there are no packages for 10.1 on buster. I tried a few days ago and failed miserably :-). So keep in mind that if you go to buster, you'd be using 10.3
We do package 10.3 on our own (I need to check if the last version is on our repo) but a working version is there anyways.
Keep in mind that we are still not sure if we are going to go to 10.3, 10.4 or even vanilla MySQL in production. You can of course choose whatever is easier/more convenient. 10.3 is a perfectly working version and should give you not much issues.

Thanks! On this point, I can even wait some months before upgrading, there is no rush on my side (I just thought it would have been a good moment to do so). If it is ok to keep using Stretch for the moment I can skip the reimage/upgrade part :)

We are still using stretch in all our production hosts. But we are soon going to start exploring what to do with the Buster upgrade, so we can keep in touch for that :)

jcrespo added a comment.EditedFri, Nov 22, 12:30 PM

I was planning to have only one mariadb instance acting as multi-source

I strongly suggest to use several instances- it works well, and precisely if they are small. While it has some performance penalty vs only one instance, we discovered that (multisource) has lots of administration and maintenance drawbacks (eg. you can transfer and recover full instances at once), and multiple instances is more secure (no interference between instances). Multisource also doesn't work with gtid. T159423 The only multisource hosts we have right now are labs db wikireplicas, and that architecture has caducity date.

You won't have to maintain any puppet special code, that is fully maintained already, and we use them for both misc and production mediawiki hosts: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mariadb/misc/multiinstance.pp

I can even wait some months before upgrading

You can upgrade already, there is no blockers- the warning was that you may have to do some testing for us.

I was planning to have only one mariadb instance acting as multi-source

I strongly suggest to use several instances- it works well, and precisely if they are small. While it has some performance penalty vs only one instance, we discovered that (multisource) has lots of administration and maintenance drawbacks (eg. you can transfer and recover full instances at once), and multiple instances is more secure (no interference between instances). Multisource also doesn't work with gtid. T159423 The only multisource hosts we have right now are labs db wikireplicas, and that architecture has caducity date.

+1 not to use multi-source, we do not have great experiences with it and as Jaime said one of our items in our roadmap is to get rid of multi-source on the wikireplicas.
If this service and server will be fully owned by Analytics, you can of course use multisource if that fits best your needs, but be aware of its limitation (remember dbstore1002!) and risks as Jaime has described.

Makes sense, I'll go for multi-instance then, I have no intention to resurrect another dbstore1002 :)

Regarding the ownership of the service, I was hoping that we could have gone for something like the (new) dbstore hosts: Analytics and SRE cooperates in the maintenance. This of course mean to follow best practices highlighted by your team!

Makes sense, I'll go for multi-instance then, I have no intention to resurrect another dbstore1002 :)

Yaay!! <3

Regarding the ownership of the service, I was hoping that we could have gone for something like the (new) dbstore hosts: Analytics and SRE cooperates in the maintenance. This of course mean to follow best practices highlighted by your team!

Fine by me, as long as long as we don't use multisource! :-)
In all seriousness, multisource has proven to be a pain for us, so we don't want to maintain new services with it if it is avoidable.

Change 552776 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: set db to read only

https://gerrit.wikimedia.org/r/552776

Note: this is only a proposal after the first reading of the puppet code, only meant to gather feedback, nothing stated in stone. Comments and suggestions are really welcome!

  • rename role::mariadb::misc::eventlogging::replica to role::mariadb::misc::database::replica
  • keep profile::mariadb::misc::eventlogging::database (possibly reducing its memory requirements that seem to be oversized for its current usage)
  • add a new profile to the role aimed to add two mariadb instances:
    • one to replicate the piwik database from matomo1001
    • one to replicate 5 databases (small size) from an-coord1001

The last point could be translated into more than one instance, no idea what's best. Does this sound acceptable or should I somehow try to translate the current log database instance in puppet into a proper mariadb::instance?

Note: this is only a proposal after the first reading of the puppet code, only meant to gather feedback, nothing stated in stone. Comments and suggestions are really welcome!

  • rename role::mariadb::misc::eventlogging::replica to role::mariadb::misc::database::replica

What's your idea? Make it more generic or decouple if from what it currently has? As I can see it includes some specific eventlogging things (as expected). Will that replica new one keep including it?

  • keep profile::mariadb::misc::eventlogging::database (possibly reducing its memory requirements that seem to be oversized for its current usage)

If you are going to have multi-instance, you'll need to decide how much memory each instance will have assigned on its innodb_buffer_pool_size as that is configured per instance (per mysql process).

  • add a new profile to the role aimed to add two mariadb instances:
    • one to replicate the piwik database from matomo1001
    • one to replicate 5 databases (small size) from an-coord1001

The last point could be translated into more than one instance, no idea what's best. Does this sound acceptable or should I somehow try to translate the current log database instance in puppet into a proper mariadb::instance?

If you take a look at the current code, what we have a generic modules/profile/manifests/mariadb/core/multiinstance.pp we have just one profile and we use hiera to decide which databases are included on each specific instance.

# db1101
# Buffer pool sizes/instance enabled
profile::mariadb::core::multiinstance::num_instances: 2
profile::mariadb::core::multiinstance::s8: '185G'
profile::mariadb::core::multiinstance::s7: '185G'

Check db1101's (core multi-instance) example. I think you can probably try to do the same but for your use case.

Note: this is only a proposal after the first reading of the puppet code, only meant to gather feedback, nothing stated in stone. Comments and suggestions are really welcome!

  • rename role::mariadb::misc::eventlogging::replica to role::mariadb::misc::database::replica

What's your idea? Make it more generic or decouple if from what it currently has? As I can see it includes some specific eventlogging things (as expected). Will that replica new one keep including it?

The idea would be to attempt to make it more generic, adding one profile for multi-instance (like you mentioned from the example of db1101) and keep the current eventlogging mariadb config as it is. The alternative is to move it to the multi-instance scheme/hiera config as well, if it makes things more clear..

Change 552776 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: set db to read only

https://gerrit.wikimedia.org/r/552776

Note: this is only a proposal after the first reading of the puppet code, only meant to gather feedback, nothing stated in stone. Comments and suggestions are really welcome!

  • rename role::mariadb::misc::eventlogging::replica to role::mariadb::misc::database::replica

What's your idea? Make it more generic or decouple if from what it currently has? As I can see it includes some specific eventlogging things (as expected). Will that replica new one keep including it?

The idea would be to attempt to make it more generic, adding one profile for multi-instance (like you mentioned from the example of db1101) and keep the current eventlogging mariadb config as it is. The alternative is to move it to the multi-instance scheme/hiera config as well, if it makes things more clear..

I think the latter makes it more consistent as well with what we have in core, meaning that we'd have a similar way rather than multiple ways to handle multi-instance. However, I don't know if that requires a lot more changes from your side, does it?

Note: this is only a proposal after the first reading of the puppet code, only meant to gather feedback, nothing stated in stone. Comments and suggestions are really welcome!

  • rename role::mariadb::misc::eventlogging::replica to role::mariadb::misc::database::replica

What's your idea? Make it more generic or decouple if from what it currently has? As I can see it includes some specific eventlogging things (as expected). Will that replica new one keep including it?

The idea would be to attempt to make it more generic, adding one profile for multi-instance (like you mentioned from the example of db1101) and keep the current eventlogging mariadb config as it is. The alternative is to move it to the multi-instance scheme/hiera config as well, if it makes things more clear..

I think the latter makes it more consistent as well with what we have in core, meaning that we'd have a similar way rather than multiple ways to handle multi-instance. However, I don't know if that requires a lot more changes from your side, does it?

Will check, no issue from my side, I just needed some direction about what would be best :) Will report back after checking again puppet with a solution..

mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.Mon, Nov 25, 5:38 PM

I came up quickly with a new profile for multiinstance, but I have a doubt about the following:

$basedir = '/opt/wmf-mariadb101'
class { 'mariadb::config':
    datadir       => false,
    basedir       => $basedir,
    read_only     => 'ON',
    config        => 'profile/mariadb/mysqld_config/misc_multiinstance.my.cnf.erb',
    p_s           => 'on',
    ssl           => 'puppet-cert',
    binlog_format => 'ROW',
}

class { 'mariadb::config':
    basedir       => $mariadb_basedir,
    config        => 'profile/mariadb/misc/eventlogging/eventlogging.my.cnf.erb',
    datadir       => '/srv/sqldata',
    tmpdir        => '/srv/tmp',
    socket        => $mariadb_socket,
    port          => 3306,
    read_only     => 1,
    ssl           => 'puppet-cert',
    p_s           => 'off',
    binlog_format => 'MIXED',
}

What could be the best approach to make the current mariadb config be part of an instance? Keep misc_multiinstance.my.cnf.erb as my,cnf and then override with a more specific eventlogging.my.cnf.erb ?

What could be the best approach to make the current mariadb config be part of an instance? Keep misc_multiinstance.my.cnf.erb as my,cnf and then override with a more specific eventlogging.my.cnf.erb ?

I believe so, yeah. We still a refactoring for the whole misc pending, but that's part of a longer conversation that probably shouldn't block you, so I believe keeping misc_multiinstance.my.cnf.erb as the default if not defined and then specifying eventlogging.my.cnf.erb on top of it.

Change 553142 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] systemd::slice::all_users: add Debian Buster support

https://gerrit.wikimedia.org/r/553142

Change 553742 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] WIP - Introduce profile::mariadb::misc::analytics

https://gerrit.wikimedia.org/r/553742