Page MenuHomePhabricator

Expand media backup storage available space to 960 TB per datacenter
Closed, ResolvedPublic

Assigned To
Authored By
jcrespo
Oct 10 2024, 12:12 PM
Referenced Files
F57744915: image.png
Nov 25 2024, 9:02 AM
F57744909: image.png
Nov 25 2024, 9:02 AM
F57604423: image.png
Oct 10 2024, 12:25 PM
Subscribers

Description

We recently run into low space issues, expand the dedicated size for media backups to 960 TB (6 hosts) on both eqiad and codfw.

image.png (842×1 px, 58 KB)

Current utilization (as of 2024-10-10) is 645 708 607 840 309 bytes.
Current utilization (as of 2024-11-21) is 656 178 463 691 108 bytes.

Event Timeline

Change #1082172 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Setup new host for mediabackups backup[12]012

https://gerrit.wikimedia.org/r/1082172

Change #1082172 merged by Jcrespo:

[operations/puppet@production] mediabackups: Setup new host for mediabackups backup1012

https://gerrit.wikimedia.org/r/1082172

Change #1091731 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups

https://gerrit.wikimedia.org/r/1091731

Change #1091731 merged by Jcrespo:

[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups

https://gerrit.wikimedia.org/r/1091731

Mentioned in SAL (#wikimedia-operations) [2024-11-20T14:23:26Z] <jynus> starting resharding of commons backup files into new host backup1010 T376892

Mentioned in SAL (#wikimedia-operations) [2024-11-20T15:31:53Z] <jynus> starting resharding of commons backup files into new host backup2010 T376892

Change #1093377 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Setup backup1010 as the 6th media backup host in eqiad

https://gerrit.wikimedia.org/r/1093377

Change #1093379 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Setup backup2010 as the 6th media backup host in codfw

https://gerrit.wikimedia.org/r/1093379

14 hours more for transfers to complete.

Change #1093377 merged by Jcrespo:

[operations/puppet@production] mediabackup: Setup backup1010 as the 6th media backup host in eqiad

https://gerrit.wikimedia.org/r/1093377

Change #1093379 merged by Jcrespo:

[operations/puppet@production] mediabackup: Setup backup2010 as the 6th media backup host in codfw

https://gerrit.wikimedia.org/r/1093379

Capacity reached 94.2% and finally it is on a downward trend: 93.7% 🎉

Timestamp is in CET:

[12:26:36] <jinxer-wm> RESOLVED: DiskSpace: Disk space backup2011:9100:/srv/objectstorage 5.998% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=backup2011 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace

This is now done. Catchup and purging is ongoing, but after that finishes, we should be able to store almost 1 PB of media backups on both datacenters. Backups will continue without problem or manual interventions in the next months.

Data is not 100% balanced, so that may require later tunings, but the initial goal, which was expanding the existing space and make sure hosts were not full, was accomplished already.