Page MenuHomePhabricator

Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization
Closed, ResolvedPublic

Description

For https://phabricator.wikimedia.org/T108850 we'd need to apply the eventlogging_cleaner.py purging strategy to db1107 (m4-master) to be fully compliant with data retention policies. We don't expect any big issue since we triple checked data consistency on db1108 (that is already running the purging strategy) but a bit of paranoia whispered in my ears that a backup before proceeding would surely be better (for example, say we discover a weird purging bug two weeks from now, it would be nice to have a way to recover data if needed).

Two options are available:

  1. binary backup, in which we 1) stop mysql 2) copy data from /srv/sqldata to /srv/backup 3) start mysql. This is surely faster but it has one caveat: once mysql on db1107 goes down, m4-master will failover to db1108. We could stop mysql insertion from eventlog1001 so it shouldn't be a big problem.
  1. logical backup, using mydumper to /srv/backup, that would not require to stop mysql but it may take hours to complete.

I am fine with both options, maybe 1) seems the quickest and less painful for everybody. We'd need to complete this work early this week if possible, to start purging data and apply the eventlogging_cleaner.py's cron script permanently.

Thanks in advance!

Event Timeline

Change 398836 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Add mydumper to misc:s4 databases

https://gerrit.wikimedia.org/r/398836

Change 398836 merged by Jcrespo:
[operations/puppet@production] mariadb: Add mydumper to misc:m4 databases

https://gerrit.wikimedia.org/r/398836

Mentioned in SAL (#wikimedia-operations) [2017-12-18T13:58:12Z] <jynus> starting one-time backup of eventlogging database on db1107:/srv/backups T183123

Backup is ongoing on db1107:/srv/backups/export-20171218-135659

kill the mydumper process if it slows down too much the regular operations: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=db1107&var-network=eth0&from=1513594906656&to=1513605706656

Mentioned in SAL (#wikimedia-operations) [2017-12-18T14:13:10Z] <elukey> temporarily stopped mysql consumers on eventlog1001 to ease a mysql backup on db1107 - T183123

Ready to close when @elukey is ready

Everything looks good, thank a lot!