Page MenuHomePhabricator

Puppetize Piwik's Database and set up periodical backups
Closed, ResolvedPublic

Description

Up to now we had only performed one backup of Piwik's database (mysql on bohrium) before the recent upgrade to 2.17.1.

It would be great to apply sane and good standards to the Piwik's puppet configuration to:

  1. periodically backup the database and store it somewhere outside bohrium
  2. leave puppet to enforce a basic mysql set up with root user and piwik user retrieved from the private puppet repo.

Current status:

  1. There is no LVM configured on bohrium, so lvm snapshot are not available.
  2. Bohrium is a Ganeti VM with a root partition of ~100GB, and /var/lib/mysql now is ~66GB.
  3. Piwik is almost entirely write oriented, for every POST sent to bohrium there is a correspondent SQL INSERT/UPDATE. There is a constant I/O wait and some tweaks like innodb_flush_log_at_trx_commit=2 and innodb_buffer_pool_size=2048GB have been applied as attempt to improve its performances.
  4. Piwik currently uses Debian Jessie's mysql-server, it might be good to think about mariadb?

(Added DBA to keep them in the loop and ask for suggestions in case they have time. If not, feel free to remove the tag).

Event Timeline

elukey added a subscriber: akosiaris.

@akosiaris: After a chat with Jaime I'd like to explore the possibility of using bacula, but I was told to double check with you requirements. Do you think that Piwik's backup could be hosted in the current bacula infra?

Depends on how often you want it backed up and the rate of growth. So mysql needs to be dumped in some way before it is backed up as backing up the raw files is a recipe for tear shedding. Which means normal Incremental (the changes in the dataset only) backups don't work and hence we need Full (the entire dataset, that is the entire 66GB). Which is impossible to do very often.

So maybe Weekly ?

We have most of the infrastructure for that in place already. That would be backup::mysqlset in the puppet repo. For a database of that size, a mysqldump is probably OK, so you probably want $xtrabackup => false. A good example how to use would be https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/mariadb/backup.pp. But I don't see bohrium having neither the disk space nor the IOPS available for method => predump, so remove that too (it will default to bpipe. slave => true is probably not needed either. I am not sure about the passsword_file cause I lack the specifics of that mysql installation. IIRC correctly it has never been handled by the DBA team (and for good reason), so maybe /etc/mysql/debian.cnf will be enough ? The mysql/mysqldump binaries I would leave it to defaults.

I am a bit concerned about the I/O wait (grafana) issue though. Keep in mind that dumping the database is only gonna exacerbate that. It is bound to make serving requests slower and of course it's gonna be for a pretty long time. If it coincides with high request rates it is bound to cause issues with dropped requests at the very least.

I agree with most of things said, and I actually mentioned some of those to luka on IRC.

BTW, for the record- the best way to move forward regarding authentication (independently of what happens in this particular case) is to create a dump user with enough permissions and enable the unix socket authentication plugin to allow passwordless access for backups- that is what we are going to implement for the mediawiki dumps and for most monitoring services (prometheus, icinga) to get rid of unnecessary password handling.

About those 66GB: Piwik uses ~18.5 GB of data, but /var/lib/mysql/ibdata1 is 66GB (no innodb_file_per_table set).

Change 351584 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set adapter=MYSQLI to Piwik's config.ini

https://gerrit.wikimedia.org/r/351584

Change 351584 merged by Elukey:
[operations/puppet@production] Set adapter=MYSQLI to Piwik's config.ini

https://gerrit.wikimedia.org/r/351584

A lot of things changed from my last post, most of them due to the fact that now the apps are not sending anymore data to Piwik:

  • no more constant IO wait
  • mysqldump takes 5 minutes to generate a 10G uncompressed file (was > 20G before). It doesn't seem to generate any heavy slowdown of bohrium.
  • space available is now 30GB, plenty for dumping the smaller Piwik DB.

So I'd like to set up bacula if possible, now all the blockers seems gone.

Change 364195 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::piwik::server: add regular bacula backups

https://gerrit.wikimedia.org/r/364195

Change 364195 merged by Elukey:
[operations/puppet@production] role::piwik::server: add regular bacula backups

https://gerrit.wikimedia.org/r/364195

I want to observe how the patch that I merged behaves during the next days before closing.

Change 365590 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::piwik::backup: use quotes for the backup's password field

https://gerrit.wikimedia.org/r/365590

Change 365590 merged by Elukey:
[operations/puppet@production] profile::piwik::backup: use quotes for the backup's password field

https://gerrit.wikimedia.org/r/365590

Change 365595 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::piwik::backup: remove duplicate old backup clean cron

https://gerrit.wikimedia.org/r/365595

Change 365595 merged by Elukey:
[operations/puppet@production] profile::piwik::backup: remove duplicate old backup clean cron

https://gerrit.wikimedia.org/r/365595

Fixed the predump script that wasn't able to backup to /srv/backup on bohrium, now everything should be ok. Since the predump script also compresses the output files it will not be a problem to store more than one backup on bohrium. They will get deleted by the predump script after 15 days automatically.

FYI, we uniformized not a long time ago local dumps going to /srv/backups on database hosts, in case you want to do it on the same named directory. Not a huge deal in any case.

@jcrespo sorry it is /srv/backups, missed a 's' in my last post :(

Mentioned in SAL (#wikimedia-analytics) [2017-07-17T14:31:13Z] <elukey> set innodb_flush_log_at_trx_commit on bohrium to 1 (default value)- T164073

Change 365611 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::piwik::database: puppetize Piwik's database config

https://gerrit.wikimedia.org/r/365611

Change 365611 merged by Elukey:
[operations/puppet@production] profile::piwik::database: puppetize Piwik's database config

https://gerrit.wikimedia.org/r/365611

Change 365632 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::piwik::database: relax disk flush policy to reduce iowait

https://gerrit.wikimedia.org/r/365632

Change 365632 merged by Elukey:
[operations/puppet@production] profile::piwik::database: relax disk flush policy to reduce iowait

https://gerrit.wikimedia.org/r/365632

Mentioned in SAL (#wikimedia-analytics) [2017-07-17T16:27:49Z] <elukey> set innodb_flush_log_at_trx_commit on bohrium to 2 and sync_binlog=300 to reduce iowait - T164073