Page MenuHomePhabricator

Ceph VM image backups
Closed, ResolvedPublic

Description

The proof of concept in T259192 is looking pretty good; let's move ahead with making a real backup setup.

  • identify hardware for this. 20Tb would be a nice amount of space to start with. Going to try to use cloudvirt1024. As a ceph node, its drives are idle; we'll see if running backup jobs interferes with VM performance.

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+28 -6
operations/puppetproduction+0 -1
operations/puppetproduction+9 -0
operations/puppetproduction+8 -0
operations/puppetproduction+4 -0
operations/puppetproduction+1 -0
operations/puppetproduction+11 -0
operations/puppetproduction+3 -0
operations/puppetproduction+4 -3
operations/puppetproduction+1 -5
operations/puppetproduction+5 -5
operations/puppetproduction+59 -1
operations/puppetproduction+7 -4
operations/puppetproduction+1 -1
operations/puppetproduction+5 -5
operations/puppetproduction+27 -3
operations/puppetproduction+6 -1
operations/puppetproduction+2 -0
operations/puppetproduction+16 -1
operations/puppetproduction+9 -2
operations/puppetproduction+27 -21
operations/puppetproduction+858 -0
operations/puppetproduction+0 -15
operations/puppetproduction+1 -2
Show related patches Customize query in gerrit

Related Objects

Event Timeline

Change 621022 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Revert "backy2: temporarily hack data dir to /var/lib/nova/instances"

https://gerrit.wikimedia.org/r/621022

Change 621023 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs/ceph/backy: move backup engine to cloudstore1009

https://gerrit.wikimedia.org/r/621023

Change 621024 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: remove some unused hiera settings

https://gerrit.wikimedia.org/r/621024

Change 621025 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: hack in a fix to an upstream bug in 'backy2 du'

https://gerrit.wikimedia.org/r/621025

Change 621022 merged by Andrew Bogott:
[operations/puppet@production] Revert "backy2: temporarily hack data dir to /var/lib/nova/instances"

https://gerrit.wikimedia.org/r/621022

Change 621024 merged by Andrew Bogott:
[operations/puppet@production] backy2: remove some unused hiera settings

https://gerrit.wikimedia.org/r/621024

Change 621025 merged by Andrew Bogott:
[operations/puppet@production] backy2: hack in a fix to an upstream bug in 'backy2 du'

https://gerrit.wikimedia.org/r/621025

Change 621023 merged by Andrew Bogott:
[operations/puppet@production] wmcs/ceph/backy: move backup engine to cloudstore1009

https://gerrit.wikimedia.org/r/621023

Change 621058 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1024: move to Buster and make a ceph cloudvirt

https://gerrit.wikimedia.org/r/621058

Change 621058 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1024: move to Buster and make a ceph cloudvirt

https://gerrit.wikimedia.org/r/621058

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1024.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202008182043_andrew_31694.log.

Completed auto-reimage of hosts:

['cloudvirt1024.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1024.eqiad.wmnet']

Change 621077 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1024: move to new role, 'virt_ceph_and_backy'

https://gerrit.wikimedia.org/r/621077

Change 621077 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1024: move to new role, 'virt_ceph_and_backy'

https://gerrit.wikimedia.org/r/621077

Change 621538 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] ceph backups: exclude integration agents

https://gerrit.wikimedia.org/r/621538

Change 621538 merged by Andrew Bogott:
[operations/puppet@production] ceph backups: exclude integration agents

https://gerrit.wikimedia.org/r/621538

Andrew updated the task description. (Show Details)

I just ran some performance tests on a VM while backup jobs were running. I didn't notice any change in behavior.

Once we're closer to full network capacity all bets are off, but there's no clear downside to the backups at the moment.

Andrew updated the task description. (Show Details)

Change 632960 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs server backups: Add a way to assign projects to backup hosts

https://gerrit.wikimedia.org/r/632960

re-opening because cloudvirt1024 isn't big enough for all our backups.

Change 632961 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs backups: remove the 'special_projects' logic

https://gerrit.wikimedia.org/r/632961

Change 632976 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs backy2: allow hiera config of when the backup runs

https://gerrit.wikimedia.org/r/632976

Change 632976 merged by Andrew Bogott:
[operations/puppet@production] wmcs backy2: allow hiera config of when the backup runs

https://gerrit.wikimedia.org/r/632976

Change 632960 merged by Andrew Bogott:
[operations/puppet@production] wmcs server backups: Add a way to assign projects to backup hosts

https://gerrit.wikimedia.org/r/632960

Change 632961 merged by Andrew Bogott:
[operations/puppet@production] wmcs backups: remove the 'special_projects' logic

https://gerrit.wikimedia.org/r/632961

Change 633049 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs-backup-instances: add missing argument

https://gerrit.wikimedia.org/r/633049

Change 633049 merged by Andrew Bogott:
[operations/puppet@production] wmcs-backup-instances: add missing argument

https://gerrit.wikimedia.org/r/633049

Change 633306 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: throttle bandwidth for reading and writing

https://gerrit.wikimedia.org/r/633306

Change 633306 merged by Andrew Bogott:
[operations/puppet@production] backy2: throttle bandwidth for reading and writing

https://gerrit.wikimedia.org/r/633306

Change 633741 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] clouddvirt102[1-9]: apply libvirt-backy-ssd partman recipe

https://gerrit.wikimedia.org/r/633741

Change 633741 merged by Andrew Bogott:
[operations/puppet@production] clouddvirt102[1-9]: apply libvirt-backy-ssd partman recipe

https://gerrit.wikimedia.org/r/633741

Change 633744 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1022: move to virt_ceph_and_backy

https://gerrit.wikimedia.org/r/633744

Change 633744 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1022: move to virt_ceph_and_backy

https://gerrit.wikimedia.org/r/633744

Change 633771 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1021: add backy support

https://gerrit.wikimedia.org/r/633771

Change 633771 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1021: add backy support

https://gerrit.wikimedia.org/r/633771

Change 633829 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs VM backups: add two more backup hosts, increase days to 7

https://gerrit.wikimedia.org/r/633829

Change 633829 merged by Andrew Bogott:
[operations/puppet@production] wmcs VM backups: add two more backup hosts, increase days to 7

https://gerrit.wikimedia.org/r/633829

Change 635014 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs instance backup: move a few more projects to cloudvirt1021

https://gerrit.wikimedia.org/r/635014

Change 635014 merged by Andrew Bogott:
[operations/puppet@production] wmcs instance backup: move a few more projects to cloudvirt1021

https://gerrit.wikimedia.org/r/635014

Andrew triaged this task as Medium priority.Oct 20 2020, 4:21 PM

Change 636020 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Define backy2::backup_time for cloudvirt102[5-8]

https://gerrit.wikimedia.org/r/636020

Change 636020 merged by Andrew Bogott:
[operations/puppet@production] Define backy2::backup_time for cloudvirt102[5-8]

https://gerrit.wikimedia.org/r/636020

Change 637704 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps instance backups: ignore clouddb-services project

https://gerrit.wikimedia.org/r/637704

Change 637704 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps instance backups: ignore clouddb-services project

https://gerrit.wikimedia.org/r/637704

Change 637713 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs instance backups: move more projects from cloudvirt1024 to cloudvirt1021

https://gerrit.wikimedia.org/r/637713

Change 637713 merged by Andrew Bogott:
[operations/puppet@production] wmcs instance backups: move more projects from cloudvirt1024 to cloudvirt1021

https://gerrit.wikimedia.org/r/637713

Change 646797 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move some project backups to cloudvirt1025

https://gerrit.wikimedia.org/r/646797

We aren't going to have space to backup everything in two places; right now I'm working on spreading the backups onto 1025-1028; after that I'll probably declare this finished.

Change 646797 merged by Andrew Bogott:
[operations/puppet@production] Move some project backups to cloudvirt1025

https://gerrit.wikimedia.org/r/646797

Change 647003 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup

https://gerrit.wikimedia.org/r/647003

Change 647003 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup

https://gerrit.wikimedia.org/r/647003

Mentioned in SAL (#wikimedia-cloud) [2020-12-28T12:23:04Z] <arturo> icinga downtime cloudvirt1026 disk space check until january 5 (T260692)

Change 652182 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud: drop dumps project backups

https://gerrit.wikimedia.org/r/652182

Change 652182 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud: drop dumps project backups

https://gerrit.wikimedia.org/r/652182

Mentioned in SAL (#wikimedia-cloud) [2021-02-03T09:59:20Z] <dcaro> Doing a full vm backup on cloudvirt1024 with the new script (T260692)

Change 661348 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] wmcs.backups: Use the wmcs-backup script for vms

https://gerrit.wikimedia.org/r/661348

Change 661348 merged by David Caro:
[operations/puppet@production] wmcs.backups: Use the wmcs-backup script for vms

https://gerrit.wikimedia.org/r/661348

This can be considered done, there's some improvements left, but those can be tracked individually.