WMF media storage must be adequately backed up
Open, HighPublic
Actions

Assigned To

None

Authored By

	jcrespo
	Sep 11 2020, 1:00 PM

Description

There is a desire to have 100% backup coverage of all data hosted at Wikimedia Foundation in a centralized solution. After wiki content database backups were finally set up (T79922), multimedia –specifically data stored on Swift to serve Wiki non-text content– was the highest priority in terms of impact (if lost), overall size, and desire by the several WMF stakeholders to be backed up.

While there is redundancy in place for media, high availability, while a must to protect against service loss, is not a substitute for proper backups: software bugs, operator mistakes, employee sabotage, hardware issues and malicious attacks are all vectors that online redundancy would not necessarily protect effectively against. Geographically remote offline copies are needed -in addition to service HA- to effectively recover in the eventuality of a data loss.

Details

Subject	Repo	Branch	Lines +/-
mediabackups: Add test units for the Util helper unit	operations/software/mediabackups	master	+154 -2
mediabackups: Backup s8 media files at codfw	operations/puppet	production	+3 -3
mediabackups: Backup s7 media files at codfw	operations/puppet	production	+3 -3
mediabackups: Backup s6 media files at codfw	operations/puppet	production	+3 -3
mediabackups: Backup s5 media files at codfw	operations/puppet	production	+3 -3
mediabackups: Backup s3 media files at codfw	operations/puppet	production	+3 -3
mediabackups: Backup s2 media files at codfw	operations/puppet	production	+3 -3
mediabackup: Update mediawiki replica for s1 backup on codfw	operations/puppet	production	+2 -2
mediabackup: Backup s1 (enwiki) media files on codfw	operations/puppet	production	+1 -1
mediabackup: Backup testcommonswiki on codfw	operations/puppet	production	+1 -1
mediabackups: Add minio port to ipv6 connections	operations/puppet	production	+6 -4
mediabackup: Add an encryption key to store private files securely	operations/puppet	production	+34 -4
puppetmaster: Install 'age' on puppetmaster frontends	operations/puppet	production	+4 -1
mediabackup: Add dummy age private key for mediabackups	labs/private	master	+4 -0
mediabackups: Make wiki optional, add another optional parameter: dblist	operations/puppet	production	+11 -4
mediabackups: Backup testcommonswiki	operations/puppet	production	+3 -3
mediabackups: Backup s2 wikis, starting with bgwiki	operations/puppet	production	+3 -3
mediabackups: Backup frwiki media on eqiad	operations/puppet	production	+3 -3
mediabackups: Backup srwiki media on eqiad	operations/puppet	production	+1 -1
mediabackups: Backup shwiki media on eqiad	operations/puppet	production	+1 -1
mediabackups: Backup mgwiktionary media on eqiad	operations/puppet	production	+1 -1
mediabackups: Backup jvwikisource media on eqiad	operations/puppet	production	+1 -1
mediabackups: Backup enwikivoyage media on eqiad	operations/puppet	production	+1 -1
mediabackups: Start backup of dewiki files on eqiad	operations/puppet	production	+3 -3
mediabackups: Backup enwiki local originals	operations/puppet	production	+6 -6
mediabackup: Enable prometheus monitoring of minio	operations/puppet	production	+33 -5

Related Objects
Search...

Status	Assigned	Task
Open	None	T262668 WMF media storage must be adequately backed up
Resolved	jcrespo	T262669 Plan logical and physical design for media backups
Resolved	jcrespo	T264189 Prepare a proof of concept of the minimum setup capable of backup and recover testwiki media files
Resolved	jcrespo	T264190 Research storage solutions for media backups
Resolved	jcrespo	T160229 Back up of Commons files
Open	None	T267365 Develop maintenance script for enumerating Swift media files from MediaWiki (for backup processing)
Resolved	fgiunchedi	T267338 Depool codfw swift cluster
Resolved	jcrespo	T276442 Puppetize media backups infrastructure
Resolved	jcrespo	T288195 Update media backup TLS certificates
Resolved	jcrespo	T276445 Create a first release of the media backups automation tools
		Unknown Object (Task)
		Unknown Object (Task)
Resolved	Papaul	T274202 (Need By: 2021-03-31) rack/setup/install ms-backup200[12]
Resolved	jcrespo	T277323 (Need By: 2021-04-30) rack/setup/install backup200[4-7]
		Unknown Object (Task)
		Unknown Object (Task)
Resolved	jcrespo	T274206 (Need By: 2021-03-31) rack/setup/install ms-backup100[12]
Resolved	RobH	T277327 (Need By: 2021-04-30) rack/setup/install backup100[4-7]
Resolved	jcrespo	T299764 Document media recovery use case proposals and decide their priority
Resolved	jcrespo	T300020 Develop, package, deploy and document a single file recovery utility
Resolved	jcrespo	T311215 Create a script to easily query and remove backups from the media storage backups (primarily to attend T&S deletion requests)
Resolved	jcrespo	T327157 Create and deploy the logic to generate incremental backups of MediaWiki media files, to keep its file storage backup up to date, automatically
Resolved	MatthewVernon	T269108 Create a read-only swift identity for backup taking

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Backup of commonswiki started, around 70K files backed up (slowly) so far:

root@db1176.eqiad.wmnet[mediabackups]> START TRANSACTION; select count(*), status_name, backup_status_name FROM files JOIN wikis ON wikis.id = files.wiki JOIN backup_status ON backup_status.id = files.backup_status JOIN file_status ON files.status = file_status.id WHERE wiki=392 GROUP BY status, backup_status; select count(*), TIMESTAMPDIFF(MINUTE, min(backup_time), max(backup_time)) from backups where wiki=392; COMMIT;
Query OK, 0 rows affected (0.001 sec)

+----------+-------------+--------------------+
| count(*) | status_name | backup_status_name |
+----------+-------------+--------------------+
| 76194185 | public      | pending            |
|     8000 | public      | processing         |
|    71989 | public      | backedup           |
|        9 | public      | error              |
|        2 | public      | duplicate          |
|  5910460 | archived    | pending            |
|  6551674 | deleted     | pending            |
+----------+-------------+--------------------+
7 rows in set (2 min 25.138 sec)

+----------+-----------------------------------------------------------+
| count(*) | TIMESTAMPDIFF(MINUTE, min(backup_time), max(backup_time)) |
+----------+-----------------------------------------------------------+
|    71989 |                                                        55 |
+----------+-----------------------------------------------------------+
1 row in set (1.027 sec)

Query OK, 0 rows affected (0.001 sec)

This is with low concurrency (8 threads), but it it is backing up things at ~1300 files/minute, ~~which will mean 1 day for a full backup?~~

I made a mistake by an order of magnitude, we have backed up approximately 2.5TB or half a million of files in less than 6 hours, this will put, at the current low concurrency, an end time of 45 days at the current speed of 25 files per second. Let's see if we can increase the pace once maintenance finishes on codfw and becomes passive again.

Legoktm awarded a token.Aug 31 2021, 6:51 PM

In T262668#7322172, @jcrespo wrote:

I made a mistake by an order of magnitude, we have backed up approximately 2.5TB or half a million of files in less than 6 hours, this will put, at the current low concurrency, an end time of 45 days at the current speed of 25 files per second. Let's see if we can increase the pace once maintenance finishes on codfw and becomes passive again.

I think we should crank concurrency up and see how much read throughput we can get. Maintenance/rebalance is ongoing but I'm not expecting to be affecting throughput very much, and if it does we should find out IMHO

I think we should crank concurrency up and see how much read throughput we can get. Maintenance/rebalance is ongoing but I'm not expecting to be affecting throughput very much, and if it does we should find out IMHO

Sorry, when I said:

once maintenance finishes on codfw and becomes passive again

what I really meant is:

"once maintenance finishes on codfw and eqiad becomes passive again". I am running the backup on eqiad right now (not on codfw). Can I increase eqiad load?

Hey, @Ottomata I believe you organized or helped organize the watch party for "Turning the database inside-out". This may be offtopic here, but I wanted you give you my comments about it. And I think is a good example of redesign of a workflow in this way.

As we discussed on this ticket before, aside from historical issues and technical debt, most of the complexity on generating file backups is that they are not stored in a simple append-only/event-based format. My work on file backups, among other things, implies "turning the database upside down" by migrating the non-trivial file storage model for backups to a simpler model. I hope to have the Data Engineering team support to show the engineering behind what was done for file storage metadata to, e.g. Core and Product teams to convince them of the advantages of this workflow, for backups and recoveries, for analytics, for dumps, for database management, checking consistency errors, etc and possibly work towards similar models in production, too. I don't necessarily think it will be easy to apply it in all scopes, but probably faster on more specific ones.

The context of files, for example, would be the architecture work at T28741.

TL;TR: I support 100% working towards that model, please let's ally to convince other people it is good and we all will profit from it! :-).

+1 <3

In T262668#7323887, @jcrespo wrote:

I think we should crank concurrency up and see how much read throughput we can get. Maintenance/rebalance is ongoing but I'm not expecting to be affecting throughput very much, and if it does we should find out IMHO

Sorry, when I said:

once maintenance finishes on codfw and becomes passive again

what I really meant is:

"once maintenance finishes on codfw and eqiad becomes passive again". I am running the backup on eqiad right now (not on codfw). Can I increase eqiad load?

I haven't seen a higher-than-expected increase in latency so yeah IMHO good to bump concurrency a little

Thank you @godog, will do, slowly.

On the extreme, a 4x-8x the number of current threads would anyway move the bottleneck to minio, first (writes > reads in load, and we are way less sharded!).

Increased threads to 14 -7 on each worker. We now get a backup speed of over 44 files/s, which would imply a pending commons backup time of 21 days (although constant backup speed is probably not the case- it will vary depending on the production load, plus I expect finding more duplicates in the deleted and archived files).

I love to see it. Thanks for doing this. I was just dreaming of having such a backup off-cluster... Let me know when there is a half-PB of files that I and others can host. (Maybe get a physical copy to IArchive, they can stand up a second copy on ipfs+torrent, and a mirror network can start mirroring shards that way)

Hopefully the next time around only new + changed files will need to be backed up, in "commons-tarball-incremental-YYYY-NN"?

Ladsgroup awarded a token.Sep 3 2021, 11:23 PM

AntiCompositeNumber subscribed.Sep 3 2021, 11:44 PM

jeremyb-phone subscribed.Sep 4 2021, 1:08 AM

@Sj Sadly, we think we solved (private) backups, but we decided, for the scope of this task, to not solve dumps because it is a harder issue due to dynamicity and granularity of mediawiki permissions (files are deleted, undeleted and renamed all the time).

I discussed and even proposed to work together with @ArielGlenn on exports, but because of the specific needs of backups (mostly the ability to quickly recover and delete just a few files), that got out of scope of this, more immediate work. Eventual dumps design, however, could benefit from this work and even be built on top of it.

The average backup speed is now around 50 files/s, with a 3% overhead over normal traffic. We had backed up almost 20 million Commons files, close to 85TB in size, which is around 21% of the total. At this speed the full backup should take around 16 more days.

jcrespo changed the task status from Open to In Progress.Sep 16 2021, 2:40 PM

jcrespo changed the status of subtask T160229: Back up of Commons files from Open to In Progress.

jcrespo mentioned this in T53001: Image tarball dumps on your.org are not being generated.Sep 16 2021, 4:57 PM

We reached at some points (with the dc depooled, during night) peaks of 150 files/s, but it got as low as 6 files/s for the 20TB of TIFF images from the library of congress (lots of files of over 100MB each). Progress is now at over 80% completion. It may finish by Tuesday.

First pass of Commons full originals completed after 19 days (eqiad), with 99.94% success.

Most misses expected, due mostly to files moved after metadata acquisition, before physical copy

Stats (eqiad):

+-------------+----------+-----------------+
| wiki_name   | count(*) | sum(size)       |
+-------------+----------+-----------------+
| commonswiki | 88661107 | 363866620200562 |
| enwiki      |  7867722 |   2051528672500 |
| testwiki    |    11613 |     62868866309 |
+-------------+----------+-----------------+

Next steps- fix misses, finish all other smaller wikis + same backup on codfw - waiting on swift maintenance

Change 728341 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Start backup of dewiki files on eqiad

https://gerrit.wikimedia.org/r/728341

gerritbot added a project: Patch-For-Review.Oct 8 2021, 10:31 AM

Change 728341 merged by Jcrespo:

[operations/puppet@production] mediabackups: Start backup of dewiki files on eqiad

https://gerrit.wikimedia.org/r/728341

Change 730483 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup enwikivoyage media on eqiad

https://gerrit.wikimedia.org/r/730483

Change 730483 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup enwikivoyage media on eqiad

https://gerrit.wikimedia.org/r/730483

Change 730533 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup jvwikisource media on eqiad

https://gerrit.wikimedia.org/r/730533

Change 730533 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup jvwikisource media on eqiad

https://gerrit.wikimedia.org/r/730533

Change 730534 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup mgwiktionary media on eqiad

https://gerrit.wikimedia.org/r/730534

Change 730534 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup mgwiktionary media on eqiad

https://gerrit.wikimedia.org/r/730534

Change 730550 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup shwiki media on eqiad

https://gerrit.wikimedia.org/r/730550

Change 730550 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup shwiki media on eqiad

https://gerrit.wikimedia.org/r/730550

Change 730560 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup srwiki media on eqiad

https://gerrit.wikimedia.org/r/730560

Change 730560 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup srwiki media on eqiad

https://gerrit.wikimedia.org/r/730560

Change 730723 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup frwiki media on eqiad

https://gerrit.wikimedia.org/r/730723

Change 730723 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup frwiki media on eqiad

https://gerrit.wikimedia.org/r/730723

Change 740124 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s2 wikis, starting with bgwiki

https://gerrit.wikimedia.org/r/740124

Change 740124 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s2 wikis, starting with bgwiki

https://gerrit.wikimedia.org/r/740124

Change 740862 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup testcommonswiki

https://gerrit.wikimedia.org/r/740862

Change 740862 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup testcommonswiki

https://gerrit.wikimedia.org/r/740862

Nemo_bis awarded a token.Dec 13 2021, 8:10 PM

Nemo_bis subscribed.

Change 747064 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Make wiki optional, add another optional parameter: dblist

https://gerrit.wikimedia.org/r/747064

Change 747064 merged by Jcrespo:

[operations/puppet@production] mediabackups: Make wiki optional, add another optional parameter: dblist

https://gerrit.wikimedia.org/r/747064

Change 747113 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Add an encryption key to store private file securely

https://gerrit.wikimedia.org/r/747113

Change 747160 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[labs/private@master] mediabackup: Add dummy age private key for mediabackups

https://gerrit.wikimedia.org/r/747160

Change 747160 merged by Jcrespo:

[labs/private@master] mediabackup: Add dummy age private key for mediabackups

https://gerrit.wikimedia.org/r/747160

jcrespo mentioned this in rLPRIc67b997429c6: mediabackup: Add dummy age private key for mediabackups.Dec 14 2021, 4:41 PM

Change 747170 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] puppetmaster: Install 'age' on puppetmaster frontends

https://gerrit.wikimedia.org/r/747170

Proof it is working:

Screenshot_20211214_192706.png (918×2 px, 138 KB)

Screenshot_20211214_192631.png (918×2 px, 65 KB)

This is the list of media backup errors (making it NDA-only, as I haven't checked yet everything there is non-private):
{P18232}

The worse wikis are easily explainable, or easily solvable, or both:

[mediabackups]> WITH  errors AS ( select count(*) as count, wiki_name FROM files FORCE INDEX(backup_status) JOIN wikis ON files.wiki = wikis.id JOIN backup_status ON backup_status.id = files.backup_status WHERE backup_status in (4) GROUP BY backup_status, wiki), total AS ( select count(*) as count, wiki_name FROM files JOIN wikis ON files.wiki = wikis.id GROUP BY wiki) SELECT wiki_name, errors.count * 100.0 / total.count AS percentage_errors FROM errors JOIN total USING(wiki_name) ORDER by percentage_errors DESC;
+----------------+-------------------+
| wiki_name      | percentage_errors |
+----------------+-------------------+
| gewikimedia    |         100.00000 | <- only has 1 public file + a very small number of deleted ones (<10)
| nlwikivoyage   |          99.36709 | <- some wikivoyage wikis have a large number of missing pre-WMF import, pre-2012 file references, all deleted/unused
| eswikinews     |          20.00000 | <- no public files + a very small number of deleted ones <(10)
| enwikivoyage   |          17.77054 | <- wikivoyage 
| arywiki        |          16.00000 | <- only has 21 public files (all backed up) + a very small number of deleted ones (<10)
| gdwiktionary   |          12.50000 | <- only has 7 public files (all backed up) + a very small number of deleted ones (<10)
| frwikivoyage   |           5.12821 | <- wikivoyage

Change 747170 merged by Jcrespo:

[operations/puppet@production] puppetmaster: Install 'age' on puppetmaster frontends

https://gerrit.wikimedia.org/r/747170

Change 747113 merged by Jcrespo:

[operations/puppet@production] mediabackup: Add an encryption key to store private files securely

https://gerrit.wikimedia.org/r/747113

A first pass on eqiad finished successfully: 101,970,844 files backed up successfully, with a total size of 373,335,321,603,376 bytes and an error rate (by size) of 0.035%.

Codfw is ongoing, with 59,510,150 backed up so far, and 245,128,583,098,537 bytes by size.

This is a prototype version of the (trivial/non-massive) recovery script, interactive version:

I got inspired by bacula interactive text ui, but I am not good at designing UIs, so I will need feedback because I am not sure it is easy to understand :-/.

Codfw commonswiki backups are at 75% completion (68854627 files/301887395014767 bytes backed up), and will likely finish by next week.

Change 749561 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Add minio port to ipv6 connections

https://gerrit.wikimedia.org/r/749561

jcrespo closed subtask T160229: Back up of Commons files as Resolved.Dec 23 2021, 5:09 PM

Commonswiki codfw backup copy, with 91823709 files backed up and a 0.04% error rate.

The number of duplicates and errors stayed constant, which looks good- most likely we are only getting errors from the files that were changed while the snapshot was running.

To finish the codfw snapshot, only around 10 million files are pending from the other wikis- that will be done in early 2022.

Sj updated the task description. (Show Details)Dec 30 2021, 4:29 PM

Sj mentioned this in T298394: Produce regular public dumps of Commons media files.Dec 30 2021, 4:49 PM

Dinoguy1000 mentioned this in T298416: Support for rendering to HTML of pages as stored in Wikipedia dumps.Jan 1 2022, 6:50 AM

Change 749561 abandoned by Jcrespo:

[operations/puppet@production] mediabackups: Add minio port to ipv6 connections

Reason:

Adding the missing/lost ipv6 dns records fixed the issue with current puppet code.

https://gerrit.wikimedia.org/r/749561

Change 752996 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Backup testcommonswiki on codfw

https://gerrit.wikimedia.org/r/752996

Change 752996 merged by Jcrespo:

[operations/puppet@production] mediabackup: Backup testcommonswiki on codfw

https://gerrit.wikimedia.org/r/752996

Change 753095 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Backup s1 (enwiki) media files on codfw

https://gerrit.wikimedia.org/r/753095

Change 753095 merged by Jcrespo:

[operations/puppet@production] mediabackup: Backup s1 (enwiki) media files on codfw

https://gerrit.wikimedia.org/r/753095

Change 753099 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Update mediawiki replica for s1 backup on codfw

https://gerrit.wikimedia.org/r/753099

Change 753099 merged by Jcrespo:

[operations/puppet@production] mediabackup: Update mediawiki replica for s1 backup on codfw

https://gerrit.wikimedia.org/r/753099

Change 754013 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s2 media files at codfw

https://gerrit.wikimedia.org/r/754013

Change 754013 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s2 media files at codfw

https://gerrit.wikimedia.org/r/754013

Change 754022 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s3 media files at codfw

https://gerrit.wikimedia.org/r/754022

Change 754023 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s5 media files at codfw

https://gerrit.wikimedia.org/r/754023

Change 754024 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s6 media files at codfw

https://gerrit.wikimedia.org/r/754024

Change 754025 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s7 media files at codfw

https://gerrit.wikimedia.org/r/754025

Change 754026 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Backup s8 media files at codfw

https://gerrit.wikimedia.org/r/754026

Change 754022 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s3 media files at codfw

https://gerrit.wikimedia.org/r/754022

Change 754023 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s5 media files at codfw

https://gerrit.wikimedia.org/r/754023

Change 754024 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s6 media files at codfw

https://gerrit.wikimedia.org/r/754024

Change 754025 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s7 media files at codfw

https://gerrit.wikimedia.org/r/754025

Change 754026 merged by Jcrespo:

[operations/puppet@production] mediabackups: Backup s8 media files at codfw

https://gerrit.wikimedia.org/r/754026

Codfw first pass finished for all wikis, this is the percentage of errors:

{P18787}

The ones with high number of errors are known issues:

[mediabackups]> WITH  errors AS ( select count(*) as count, wiki_name FROM files FORCE INDEX(backup_status) JOIN wikis ON files.wiki = wikis.id JOIN backup_status ON backup_status.id = files.backup_status WHERE backup_status in (4) GROUP BY backup_status, wiki), total AS ( select count(*) as count, wiki_name FROM files JOIN wikis ON files.wiki = wikis.id GROUP BY wiki) SELECT wiki_name, errors.count * 100.0 / total.count AS percentage_errors FROM errors JOIN total USING(wiki_name) ORDER by percentage_errors DESC;
+----------------+-------------------+
| wiki_name      | percentage_errors |
+----------------+-------------------+
| jvwikisource   |         100.00000 | <--- "new wiki", not yet properly configured to be backed up
| gewikimedia    |         100.00000 | <--- "new wiki", not yet properly configured to be backed up
| nlwikivoyage   |          99.36709 | <--- some wikivoyage wikis have a large number of missing pre-WMF import, pre-2012 file references, all deleted/unused
| eswikinews     |          20.00000 | <- no public files + a very small number of deleted ones <(10)
| arywiki        |          20.00000 | only has 23 public files (all backed up) + a very small number of deleted ones (<10)
| enwikivoyage   |          17.57966 | <--- wikivoyage
| gdwiktionary   |          12.50000 | <--- no public files
| frwikivoyage   |           5.09554 | <--- wikivoyage

jcrespo changed the task status from In Progress to Open.Jan 25 2022, 11:35 AM

jcrespo changed the status of subtask T300020: Develop, package, deploy and document a single file recovery utility from Open to In Progress.

jcrespo closed subtask T276445: Create a first release of the media backups automation tools as Resolved.Mar 22 2022, 6:27 PM

jcrespo closed subtask T300020: Develop, package, deploy and document a single file recovery utility as Resolved.Mar 31 2022, 8:25 AM

Change 802501 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/mediabackups@master] mediabackups: Add test units for the Util helper unit

https://gerrit.wikimedia.org/r/802501

Change 802501 merged by Jcrespo:

[operations/software/mediabackups@master] mediabackups: Add test units for the Util helper unit

https://gerrit.wikimedia.org/r/802501

jcrespo mentioned this in rOSMBb19df1e5adf1: mediabackups: Add test units for the Util helper unit.Jun 2 2022, 2:25 PM

jcrespo closed subtask T311215: Create a script to easily query and remove backups from the media storage backups (primarily to attend T&S deletion requests) as Resolved.Jun 30 2022, 5:05 PM

jcrespo closed subtask T299764: Document media recovery use case proposals and decide their priority as Resolved.Jun 30 2022, 5:08 PM

Pppery removed a project: Patch-For-Review.Nov 25 2023, 10:23 PM

jcrespo closed subtask T327157: Create and deploy the logic to generate incremental backups of MediaWiki media files, to keep its file storage backup up to date, automatically as Resolved.Jan 11 2024, 11:36 AM

Cloning speed for 133 GB / 28K objects:

# rclone copy -P backup2007:mediabackups/commonswiki/fff backup2011:mediabackups/commonswiki/
Transferred:      133.243 GiB / 133.243 GiB, 100%, 125.044 MiB/s, ETA 0s
Transferred:        28850 / 28850, 100%
Elapsed time:     14m24.4s

	F34883925: Screenshot_20211214_192631.png
	Dec 14 2021, 6:40 PM

	F34883926: Screenshot_20211214_192706.png
	Dec 14 2021, 6:40 PM

	F34598420: Screenshot from 2021-08-17 18-11-11.png
	Aug 17 2021, 4:12 PM

	F34598421: Screenshot from 2021-08-17 18-10-06.png
	Aug 17 2021, 4:12 PM

	F34598387: Screenshot from 2021-08-17 17-32-21.png
	Aug 17 2021, 3:34 PM

	F34598388: Screenshot from 2021-08-17 17-32-06.png
	Aug 17 2021, 3:34 PM

	F34886587: recovery.png
	Dec 16 2021, 3:40 PM

WMF media storage must be adequately backed upOpen, HighPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

WMF media storage must be adequately backed up
Open, HighPublic
Actions

Related Objects
Search...