Implement cron-based mydumper backups on the dbstore role
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jcrespo
	Jul 3 2017, 10:50 AM

Details

Subject	Repo	Branch	Lines +/-
mariadb: backup user must be dump to match already in use mysql account	operations/puppet	production	+10 -10
mariadb-backups: Fix require on cronjob	operations/puppet	production	+1 -1
mariadb: Implement regular logical backups using mydumper	operations/puppet	production	+112 -1
[WIP]mariadb: First attempt at a mydumper-based dump script	operations/puppet	production	+31 -0
mariadb: Disable buffer pool loading and dumping on new dbstores	operations/puppet	production	+4 -3
mariadb: Install mydumper on dbstore_multiinstance hosts, drop tls	operations/puppet	production	+2 -9

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T138562 Improve regular production database backups handling
Resolved	jcrespo	T162789 Create less overhead on bacula jobs when dumping production databases
Resolved	jcrespo	T169658 Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking)
Resolved	jcrespo	T169516 Implement cron-based mydumper backups on the dbstore role

Event Timeline

jcrespo created this task.Jul 3 2017, 10:50 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2017, 10:50 AM

jcrespo added a parent task: T162789: Create less overhead on bacula jobs when dumping production databases.Jul 3 2017, 10:56 AM

jcrespo removed a parent task: T138562: Improve regular production database backups handling.

Marostegui moved this task from Triage to Backlog on the DBA board.Jul 3 2017, 12:13 PM

jcrespo added a parent task: T169658: Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking).Jul 4 2017, 3:51 PM

jcrespo claimed this task.Aug 14 2017, 8:57 AM

Change 371925 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Install mydumper on dbstore_multiinstance hosts, drop tls

https://gerrit.wikimedia.org/r/371925

gerritbot added a project: Patch-For-Review.Aug 14 2017, 8:57 AM

Change 371925 merged by Jcrespo:
[operations/puppet@production] mariadb: Install mydumper on dbstore_multiinstance hosts, drop tls

https://gerrit.wikimedia.org/r/371925

Trying:

shard=s1
numthreads=8
mydumper --compress --host=localhost --threads=$numthreads --user=$(( whoami )) --socket=/run/mysqld/mysqld.$shard.sock --triggers --routines --events --rows=100000000 --logfile=$backupdir/dump.log --outputdir=$backupdir/$shard.$(( date +%Y%m%d%H%M%S ))

Change 371935 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Disable buffer pool loading and dumping on new dbstores

https://gerrit.wikimedia.org/r/371935

Change 371935 merged by Jcrespo:
[operations/puppet@production] mariadb: Disable buffer pool loading and dumping on new dbstores

https://gerrit.wikimedia.org/r/371935

It took 3 hours to do the dump:

Started dump at: 2017-08-14 09:02:16
...
Finished dump at: 2017-08-14 12:04:49

That is much worse than T162789#3238231 but the difference is:

current instance was using only 15G GB instead of 512G of memory
Host is much older, probably CPU, too
No SSD
Replication was running for other instances, reducing iops
Replication was running for the dumped database enwiki
It has compressed tables, which probably makes queries much slower
In the last half an hour, some threads were idle, waiting for all threads to complete (watchlist and templatelinks finishing)

Maybe we can stop replication on all hosts and measure how that changes the export time.

Mentioned in SAL (#wikimedia-operations) [2017-08-14T12:21:22Z] <jynus> stopping replication on all instances of dbstore2001 T169516

s2 took less, 1 hour an a half, but its tables are much smaller, and we used 16 threads, and all replication threads were stopped:

Started dump at: 2017-08-14 12:24:42
Finished dump at: 2017-08-14 13:57:25

Change 371944 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] [WIP]mariadb: First attempt at a mydumper-based dump script

https://gerrit.wikimedia.org/r/371944

Change 374560 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Implement regular logical backups using mydumper

https://gerrit.wikimedia.org/r/374560

Change 371944 abandoned by Jcrespo:
[WIP]mariadb: First attempt at a mydumper-based dump script

Reason:
in favour of https://gerrit.wikimedia.org/r/374560

https://gerrit.wikimedia.org/r/371944

We need this ASAP dbstore1001 crashed and it is not in a good state; plus it can no longer catch up with replication reasonably well.

Change 374560 merged by Jcrespo:
[operations/puppet@production] mariadb: Implement regular logical backups using mydumper

https://gerrit.wikimedia.org/r/374560

Change 381472 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Fix require on cronjob

https://gerrit.wikimedia.org/r/381472

Change 381472 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Fix require on cronjob

https://gerrit.wikimedia.org/r/381472

Change 381491 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: backup user must be dump to match already in use mysql account

https://gerrit.wikimedia.org/r/381491

Change 381491 merged by Jcrespo:
[operations/puppet@production] mariadb: backup user must be dump to match already in use mysql account

https://gerrit.wikimedia.org/r/381491

Basic script works and users setup on dbstore2001:s5- pending lot of followup both in hardware and scripting.

Marostegui awarded a token.Sep 29 2017, 7:40 PM

jcrespo mentioned this in T162789: Create less overhead on bacula jobs when dumping production databases.Sep 29 2017, 7:43 PM

Implement cron-based mydumper backups on the dbstore roleClosed, ResolvedPublicActions

Details

Related ObjectsSearch...

Event Timeline

Implement cron-based mydumper backups on the dbstore role
Closed, ResolvedPublic
Actions

Related Objects
Search...