Page MenuHomePhabricator

Address "image" table capacity problems by storing pdf/djvu text outside file metadata
Closed, ResolvedPublic

Description

Image table is getting really big (T222224#6738823) and it turned out ~90% of the space taken by image table is text tag in metadata of pdf/djvu files (T28741#6750249). This needs addressing as it's causing infrastructure issues.

One way to do it can be to compress it, the other can be to push it to another storage like another table. the third way would be to delete it and read it directly from the file if needed. See T28741#6750249 onwards for more discussion.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 699907 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@master] filerepo: Fix parsing metadata in ForeignAPIFile

https://gerrit.wikimedia.org/r/699907

Change 699907 merged by jenkins-bot:

[mediawiki/core@master] filerepo: Fix parsing metadata in ForeignAPIFile

https://gerrit.wikimedia.org/r/699907

Change 697935 merged by jenkins-bot:

[mediawiki/core@master] Optionally split out parts of file metadata to BlobStore

https://gerrit.wikimedia.org/r/697935

Change 701664 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] labs: Set json for metadata array and split metadata to ES when needed

https://gerrit.wikimedia.org/r/701664

Change 701664 merged by jenkins-bot:

[operations/mediawiki-config@master] labs: Set json for metadata array and split metadata to ES when needed

https://gerrit.wikimedia.org/r/701664

Enabled it in beta cluster.

Example of before metadata (a jpg file):

a:44:{s:5:"Model";s:11:"CORPORATION";s:11:"Orientation";i:1;s:11:"XResolution";s:20:"1330334030/893657166";s:11:"YResolution";s:11:"3158065/240";s:14:"ResolutionUnit";i:2;s:8:"DateTime";s:11:"(Macintosh)";s:12:"ExposureTime";s:5:"1/250";s:7:"FNumber";s:3:"8/1";s:15:"ExposureProgram";i:0;s:15:"ISOSpeedRatings";i:200;s:11:"ExifVersion";s:4:"0230";s:16:"DateTimeOriginal";s:19:"2016:04:03 20:03:11";s:17:"DateTimeDigitized";s:19:"2016:04:03 20:03:11";s:17:"ShutterSpeedValue";s:15:"7965784/1000000";s:13:"ApertureValue";s:3:"6/1";s:17:"ExposureBiasValue";s:3:"0/6";s:16:"MaxApertureValue";s:5:"36/10";s:12:"MeteringMode";i:5;s:11:"LightSource";i:0;s:5:"Flash";i:16;s:11:"FocalLength";s:6:"180/10";s:18:"SubSecTimeOriginal";s:1:"2";s:19:"SubSecTimeDigitized";s:1:"2";s:13:"SensingMethod";i:2;s:10:"FileSource";i:3;s:9:"SceneType";i:1;s:14:"CustomRendered";i:0;s:12:"ExposureMode";i:0;s:12:"WhiteBalance";i:0;s:16:"DigitalZoomRatio";s:3:"1/1";s:21:"FocalLengthIn35mmFilm";i:27;s:16:"SceneCaptureType";i:0;s:11:"GainControl";i:0;s:8:"Contrast";i:0;s:10:"Saturation";i:0;s:9:"Sharpness";i:0;s:20:"SubjectDistanceRange";i:0;s:12:"SerialNumber";s:7:"6226628";s:4:"Lens";s:22:"18.0-55.0 mm f/3.5-5.6";s:8:"Software";s:41:"Adobe Photoshop Lightroom 6.0 (Macintosh)";s:16:"DateTimeMetadata";s:19:"2016:04:04 13:17:35";s:18:"OriginalDocumentID";s:32:"F3132CC1FD62A3A3B145BBA083017426";s:10:"iimVersion";i:4;s:22:"MEDIAWIKI_EXIF_VERSION";i:2;}

Example of after (the same file):

{"data":{"Make":"NIKON CORPORATION","Model":"NIKON D5100","Orientation":1,"XResolution":"240/1","YResolution":"240/1","ResolutionUnit":2,"Software":"Adobe Photoshop Lightroom 6.0 (Macintosh)","DateTime":"2016:04:04 20:17:35","ExposureTime":"1/250","FNumber":"8/1","ExposureProgram":0,"ISOSpeedRatings":200,"ExifVersion":"0230","DateTimeOriginal":"2016:04:03 20:03:11","DateTimeDigitized":"2016:04:03 20:03:11","ShutterSpeedValue":"7965784/1000000","ApertureValue":"6/1","ExposureBiasValue":"0/6","MaxApertureValue":"36/10","MeteringMode":5,"LightSource":0,"Flash":16,"FocalLength":"180/10","SubSecTimeOriginal":"2","SubSecTimeDigitized":"2","SensingMethod":2,"FileSource":3,"SceneType":1,"CustomRendered":0,"ExposureMode":0,"WhiteBalance":0,"DigitalZoomRatio":"1/1","FocalLengthIn35mmFilm":27,"SceneCaptureType":0,"GainControl":0,"Contrast":0,"Saturation":0,"Sharpness":0,"SubjectDistanceRange":0,"SerialNumber":"6226628","Lens":"18.0-55.0 mm f/3.5-5.6","DateTimeMetadata":"2016:04:04 13:17:35","OriginalDocumentID":"F3132CC1FD62A3A3B145BBA083017426","iimVersion":4,"MEDIAWIKI_EXIF_VERSION":2}}

The storage in blobs works as well (this pdf file):

{"data":{"Title":"!!Abajo los solteros!! : fantasía cómico-lírica gubernamental en siete cuadros y un real decreto, en prosa","Keywords":"http://archive.org/details/abajolossolteros476riba","Author":"Ribas, Manuel","Creator":"Digitized by the Internet Archive","Producer":"Recoded by LuraDocument PDF v2.65","CreationDate":"Mon Oct 27 17:43:08 2014 UTC","ModDate":"Mon Oct 27 17:48:20 2014 UTC","Tagged":"no","UserProperties":"no","Suspects":"no","Form":"none","JavaScript":"no","Pages":"48","Encrypted":"no","File size":"2495026 bytes","Optimized":"yes","PDF version":"1.5","mergedMetadata":{"DateTimeMetadata":"2014:10:27 17:48:20","DateTimeDigitized":"2014:10:27 17:43:08","DateTime":"2014:10:27 17:48:20","Software":"Digitized by the Internet Archive","ObjectName":{"x-default":"!!Abajo los solteros!! : fantasía cómico-lírica gubernamental en siete cuadros y un real decreto, en prosa","_type":"lang"},"Artist":{"0":"Ribas, Manuel","_type":"ol"},"Keywords":["http://archive.org/details/abajolossolteros476riba"],"pdf-Producer":"Recoded by LuraDocument PDF v2.65","pdf-Encrypted":"no","pdf-PageSize":["339 x 537 pts","326 x 522 pts","294 x 511 pts","328 x 511 pts","317 x 538 pts"],"pdf-Version":"1.5"}},"blobs":{"pages":"tt:698633","text":"tt:698634"}}

API request to retrieve the metadata returns the same between production and beta cluster. So retrieval works fine as well:
https://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&titles=File:!!Abajo_los_solteros!!_-_fantas%C3%ADa_c%C3%B3mico-l%C3%ADrica_gubernamental_en_siete_cuadros_y_un_real_decreto,_en_prosa_(IA_abajolossolteros476riba).pdf&prop=imageinfo&iiprop=metadata
https://en.wikipedia.org/w/api.php?action=query&titles=File:!!Abajo_los_solteros!!_-_fantas%C3%ADa_c%C3%B3mico-l%C3%ADrica_gubernamental_en_siete_cuadros_y_un_real_decreto,_en_prosa_(IA_abajolossolteros476riba).pdf&prop=imageinfo&iiprop=metadata

Change 698372 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Manual and automatic image metadata reserialization

https://gerrit.wikimedia.org/r/698372

Krinkle renamed this task from Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata to Address "image" table capacity problems by storing pdf/djvu text outside file metadata.Jun 26 2021, 11:05 PM
Krinkle assigned this task to Ladsgroup.
Krinkle triaged this task as High priority.

@Xover it seems likely that the column overflow issue correct itself as part of this. After the new storage capability is rolled out, we'll refresh metadata for PDF and Djvu files, in which case there'll be a new chance for some of the files reported in T192866 to get loaded and store their data in the external store. For now I'll keep that open on the off-chance that there might be other issues with those files.

Change 698372 merged by jenkins-bot:

[mediawiki/core@master] Manual and automatic image metadata reserialization

https://gerrit.wikimedia.org/r/698372

Change 703950 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] Set testcommonswiki to use json image metadata

https://gerrit.wikimedia.org/r/703950

Change 703950 merged by jenkins-bot:

[operations/mediawiki-config@master] Set testcommonswiki to use json image metadata

https://gerrit.wikimedia.org/r/703950

Mentioned in SAL (#wikimedia-operations) [2021-07-12T04:08:38Z] <ladsgroup@deploy1002> Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950|Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2021-07-12T04:10:38Z] <Amir1> mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force (T275268)

Change 703951 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] Enable json image metadata everywhere

https://gerrit.wikimedia.org/r/703951

Change 703951 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable json image metadata everywhere

https://gerrit.wikimedia.org/r/703951

Mentioned in SAL (#wikimedia-operations) [2021-07-12T05:06:11Z] <ladsgroup@deploy1002> Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951|Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-07-12T05:14:39Z] <Amir1> start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish (T275268)

Update: This will take ~fifty days to finish ^

I offered on IRC to paste the "backup time" of s4 and size of the image table, this is the latest executions (compared to more typical times of 2-4 hours for other sections), due to the issue of image table being large AND mydumper not being able to parallelize a non-int PK.

This is the info from the dumps:

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |
| 2021-07-06 00:00:02 |           15 | 268851672569 |
| 2021-07-06 00:00:01 |           15 | 268879587237 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-08 01:51:24 |           15 | 264967471625 |
| 2021-06-08 00:00:02 |           15 | 264945077858 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |
| 2021-05-25 01:51:00 |           14 | 263263226547 |
| 2021-05-25 00:00:02 |           15 | 263259316570 |
| 2021-05-18 01:56:11 |           14 | 262979744655 |
| 2021-05-18 00:00:02 |           16 | 262976819333 |
| 2021-05-11 00:00:02 |           14 | 262878511343 |
| 2021-05-11 00:00:02 |           15 | 262878511343 |
| 2021-05-04 02:18:34 |           15 | 262157119436 |
| 2021-05-04 01:47:34 |           14 | 262156978365 |
| 2021-04-27 00:00:02 |           16 | 260466316915 |
| 2021-04-27 00:00:02 |           14 | 260466316915 |
| 2021-04-20 00:00:02 |           15 | 258545284318 |
| 2021-04-20 00:00:02 |           14 | 258545284318 |
| 2021-04-13 02:25:39 |           14 | 256637685035 |
| 2021-04-13 00:00:02 |           15 | 256608826778 |
| 2021-04-06 01:44:21 |           14 | 254821475874 |
| 2021-04-06 00:00:02 |           15 | 254819309244 |
| 2021-03-30 00:00:02 |           14 | 254459949605 |
| 2021-03-30 00:00:02 |           14 | 254459973383 |
| 2021-03-23 02:28:45 |           14 | 254189550463 |
| 2021-03-23 01:01:06 |           14 | 254187667777 |
| 2021-03-16 02:34:38 |           15 | 253694685532 |
| 2021-03-16 00:00:02 |           14 | 253691952312 |
| 2021-03-09 02:59:36 |           14 | 253521305780 |
| 2021-03-09 02:11:04 |           14 | 253520779050 |
| 2021-03-02 01:48:50 |           14 | 253278589546 |
| 2021-03-02 00:00:02 |           14 | 253276580216 |
| 2021-02-23 02:14:47 |           14 | 252899325381 |
| 2021-02-23 00:00:02 |           14 | 252893374032 |
| 2021-02-16 00:00:02 |           14 | 252128972827 |
| 2021-02-16 00:00:01 |           15 | 252128972827 |
| 2021-02-09 01:59:38 |           14 | 250291649634 |
| 2021-02-09 01:50:12 |           14 | 250291547713 |
| 2021-02-02 02:46:51 |           14 | 249100901680 |
| 2021-02-02 00:00:02 |           15 | 249060468232 |
| 2021-01-26 02:25:36 |           15 | 245948385158 |
| 2021-01-26 00:00:02 |           15 | 245922975141 |
| 2021-01-19 02:12:40 |           14 | 244063988986 |
| 2021-01-19 01:49:13 |           14 | 244061150960 |
| 2021-01-12 01:56:33 |           13 | 237667850949 |
| 2021-01-12 01:47:10 |           13 | 237657806475 |
| 2021-01-05 01:45:19 |           13 | 230475789796 |
| 2021-01-05 00:00:02 |           13 | 230428196333 |
| 2020-12-29 02:00:00 |           12 | 216118625426 |
| 2020-12-29 00:00:02 |           12 | 215977141415 |
| 2020-12-22 01:53:29 |           12 | 206109097054 |
| 2020-12-22 00:00:02 |           12 | 206017506857 |
| 2020-12-15 00:00:02 |           12 | 195875966056 |
| 2020-12-15 00:00:01 |           11 | 195875966056 |
| 2020-12-08 00:00:02 |           11 | 192753160825 |
| 2020-12-08 00:00:01 |           11 | 192753148236 |
| 2020-12-01 00:00:02 |           11 | 192226559487 |
| 2020-12-01 00:00:01 |           12 | 192226559487 |
| 2020-11-24 01:46:38 |           11 | 187274544169 |
| 2020-11-24 00:00:02 |           12 | 187228079457 |
| 2020-11-17 02:22:11 |           10 | 182575790456 |
| 2020-11-17 00:00:02 |           11 | 182498181628 |
| 2020-11-10 02:24:10 |           11 | 177313567045 |
| 2020-11-10 00:00:02 |           10 | 177313367533 |
| 2020-11-03 00:57:45 |           10 | 170880585402 |
| 2020-11-03 00:00:02 |           10 | 170855259860 |
| 2020-10-27 01:53:57 |           10 | 161754871778 |
| 2020-10-27 01:50:02 |           10 | 161746186572 |
| 2020-10-20 01:45:35 |            9 | 152171503782 |
| 2020-10-20 00:57:42 |            9 | 152128905069 |
| 2020-10-13 01:45:02 |            8 | 143122246730 |
| 2020-10-13 00:00:02 |            8 | 143033156101 |
| 2020-10-06 01:46:45 |            8 | 134454518104 |
| 2020-10-06 00:00:02 |            8 | 134349828550 |
| 2020-09-29 00:00:02 |            9 | 131897336315 |
| 2020-09-29 00:00:02 |            8 | 131897336315 |
| 2020-09-22 00:55:40 |            8 | 129380150930 |
| 2020-09-22 00:00:02 |            8 | 129367125149 |
| 2020-09-15 00:56:07 |            8 | 128743337329 |
| 2020-09-15 00:00:01 |            8 | 128743383383 |
| 2020-09-08 00:00:01 |            8 | 127308494550 |
| 2020-09-08 00:00:01 |            7 | 127308494550 |
| 2020-09-01 01:55:59 |            7 | 125749536660 |
| 2020-09-01 01:53:13 |            7 | 125749322728 |
| 2020-08-25 00:00:02 |            7 | 120218697028 |
| 2020-08-25 00:00:02 |            7 | 120218697028 |
| 2020-08-18 01:46:41 |            6 | 113499977883 |
| 2020-08-18 00:00:02 |            7 | 113431585541 |
| 2020-08-11 01:44:02 |            6 | 105546207761 |
| 2020-08-11 01:12:47 |            6 | 105521417627 |
| 2020-08-04 05:06:36 |            6 | 100996282624 |
| 2020-08-04 00:00:02 |            7 | 100886727495 |
| 2020-07-28 04:05:19 |            6 |  96185727893 |
| 2020-07-28 00:00:01 |            7 |  96156047628 |
| 2020-07-21 01:49:44 |            5 |  88480710873 |
| 2020-07-21 00:00:01 |            5 |  88367183001 |
| 2020-07-14 00:20:40 |            6 |  80106490695 |
| 2020-07-14 00:00:01 |            6 |  80084405542 |
| 2020-07-07 03:27:33 |            4 |  67212137097 |
| 2020-07-07 02:00:08 |            4 |  67152687054 |
| 2020-06-30 05:02:34 |            3 |  55988135475 |
| 2020-06-30 00:00:01 |            4 |  55419298191 |
| 2020-06-23 00:53:50 |            3 |  40675484578 |
| 2020-06-23 00:00:01 |            3 |  40615163282 |
| 2020-06-16 00:00:02 |            3 |  35306524026 |
| 2020-06-16 00:00:01 |            4 |  35306524026 |
| 2020-06-09 05:05:05 |            2 |  34368136441 |
| 2020-06-09 00:00:02 |            2 |  34365645094 |
| 2020-06-02 00:56:36 |            2 |  33949876740 |
| 2020-06-02 00:00:02 |            3 |  33949811408 |
| 2020-05-26 02:00:23 |            2 |  33804124784 |
| 2020-05-26 00:16:24 |            3 |  33803830623 |
| 2020-05-19 01:16:16 |            2 |  33675610116 |
| 2020-05-19 00:00:01 |            3 |  33675321002 |
| 2020-05-12 02:03:38 |            3 |  33609446229 |
| 2020-05-12 00:00:01 |            2 |  33609269609 |
| 2020-05-05 00:55:16 |            3 |  33560557386 |
| 2020-05-05 00:00:02 |            2 |  33560505783 |
| 2020-04-28 02:18:23 |            3 |  33435191078 |
| 2020-04-28 00:00:01 |            2 |  33435069665 |
| 2020-04-21 05:30:01 |            2 |  33308027872 |
| 2020-04-21 00:00:01 |            4 |  33307004263 |
| 2020-04-14 00:00:01 |            3 |  33036870319 |
| 2020-04-14 00:00:01 |            4 |  33036870319 |
| 2020-04-07 03:47:34 |            3 |  32972786680 |
| 2020-04-07 03:33:05 |            2 |  32972680768 |
| 2020-03-31 03:53:14 |            2 |  32934553630 |
| 2020-03-31 00:00:01 |            2 |  32933118956 |
| 2020-03-24 01:57:05 |            2 |  32848310171 |
| 2020-03-24 00:00:02 |            2 |  32848142049 |
| 2020-03-17 00:00:02 |            3 |  32762971506 |
| 2020-03-17 00:00:01 |            2 |  32762971506 |
| 2020-03-10 06:31:51 |            2 |  32713634097 |
| 2020-03-10 00:00:01 |            3 |  32712255520 |
| 2020-03-03 01:49:26 |            2 |  32614936174 |
| 2020-03-03 01:34:16 |            2 |  32615088510 |
| 2020-02-25 03:50:09 |            2 |  32549206640 |
| 2020-02-25 01:45:20 |            2 |  32548907323 |
| 2020-02-18 06:47:32 |            2 |  32487639675 |
| 2020-02-18 02:34:07 |            2 |  32487375100 |
| 2020-02-11 05:37:28 |            3 |  32425816667 |
| 2020-02-11 03:58:17 |            2 |  32425603187 |
| 2020-02-04 02:39:49 |            2 |  32340794653 |
| 2020-02-04 00:39:46 |            2 |  32340557237 |
| 2020-01-28 01:43:58 |            2 |  32243459339 |
| 2020-01-28 00:00:01 |            2 |  32242675439 |
| 2020-01-21 05:16:39 |            2 |  32160249147 |
| 2020-01-21 00:00:01 |            3 |  32159460412 |
| 2020-01-14 02:00:33 |            2 |  32106453877 |
| 2020-01-14 01:44:25 |            2 |  32106530644 |
| 2020-01-07 02:02:07 |            2 |  32050966633 |
| 2020-01-07 01:08:20 |            2 |  32050945221 |
| 2019-12-31 02:59:18 |            2 |  31576227731 |
| 2019-12-31 00:00:01 |            3 |  31575378901 |
| 2019-12-24 04:44:04 |            2 |  31435707485 |
| 2019-12-24 00:18:58 |            2 |  31435180750 |
| 2019-12-17 04:48:43 |            2 |  31388877854 |
| 2019-12-17 01:06:26 |            2 |  31386903691 |
| 2019-12-10 00:00:01 |            2 |  31653831036 |
| 2019-12-10 00:00:01 |            2 |  31653831036 |
| 2019-12-03 04:15:58 |            2 |  31608394029 |
| 2019-12-03 01:04:30 |            2 |  31607898503 |
| 2019-11-26 05:39:18 |            2 |  31380393305 |
| 2019-11-26 04:15:59 |            2 |  31386032164 |
| 2019-11-19 03:07:56 |            3 |  31324959358 |
| 2019-11-19 00:00:02 |            3 |  31324742360 |
| 2019-11-12 00:00:01 |            2 |  31196107728 |
| 2019-11-12 00:00:01 |            2 |  31196107728 |
| 2019-11-05 03:30:43 |            2 |  30968283404 |
| 2019-11-05 00:44:02 |            3 |  30968014415 |
| 2019-10-29 00:00:02 |            2 |  30919217829 |
| 2019-10-29 00:00:01 |            3 |  30919217829 |
| 2019-10-22 07:29:46 |            2 |  30858343572 |
| 2019-10-22 03:12:46 |            2 |  30857701475 |
| 2019-10-15 04:44:28 |            2 |  30823843688 |
| 2019-10-15 00:00:01 |            2 |  30822729572 |
| 2019-10-08 03:39:41 |            2 |  30787439316 |
| 2019-10-08 00:17:34 |            2 |  30787374558 |
| 2019-10-01 03:28:56 |            2 |  30729235910 |
| 2019-10-01 02:53:26 |            2 |  30729308661 |
| 2019-09-24 04:22:56 |            2 |  30651015020 |
| 2019-09-24 04:20:47 |            3 |  30650983062 |
| 2019-09-17 01:36:38 |            2 |  30597668547 |
| 2019-09-17 00:00:01 |            2 |  30597463238 |
| 2019-09-10 05:44:22 |            3 |  30554097825 |
| 2019-09-10 00:00:01 |            3 |  30552816386 |
| 2019-09-03 03:45:37 |            2 |  30510594019 |
| 2019-09-03 00:16:08 |            2 |  30510154725 |
| 2019-08-27 01:54:23 |            2 |  30464351629 |
| 2019-08-27 00:00:01 |            2 |  30464318829 |
| 2019-08-20 00:00:02 |            2 |  30420034432 |
| 2019-08-20 00:00:01 |            2 |  30420040266 |
| 2019-08-13 04:02:08 |            2 |  30386454049 |
| 2019-08-13 00:00:01 |            2 |  30386105374 |
| 2019-08-06 04:59:55 |            2 |  30328662677 |
| 2019-08-06 00:29:38 |            3 |  30328024413 |
| 2019-07-30 00:44:23 |            2 |  30256697879 |
| 2019-07-30 00:00:01 |            2 |  30256570609 |
| 2019-07-23 00:29:39 |            2 |  30084693222 |
| 2019-07-23 00:00:01 |            2 |  30084802049 |
| 2019-07-16 00:15:43 |            2 |  30036427856 |
| 2019-07-16 00:00:02 |            2 |  30036511351 |
| 2019-07-09 02:52:51 |            2 |  30009364901 |
| 2019-07-09 01:50:30 |            2 |  30009323808 |
| 2019-07-02 04:39:11 |            2 |  29957865796 |
| 2019-07-02 00:00:02 |            2 |  29956723290 |
| 2019-06-25 02:28:54 |            2 |  29900388301 |
| 2019-06-25 00:00:01 |            2 |  29900040908 |
| 2019-06-18 03:55:02 |            3 |  29868030581 |
| 2019-06-18 01:50:22 |            2 |  29867991570 |
| 2019-06-11 04:25:49 |            2 |  29818987896 |
| 2019-06-11 00:00:01 |            2 |  29818757958 |
| 2019-06-04 03:22:01 |            2 |  29777382633 |
| 2019-06-04 00:00:01 |            2 |  29776912745 |
| 2019-05-28 04:44:14 |            2 |  29736593767 |
| 2019-05-28 04:14:00 |            2 |  29736544952 |
| 2019-05-21 00:35:15 |            2 |  29686559045 |
| 2019-05-21 00:00:01 |            3 |  29686559280 |
| 2019-05-14 19:01:45 |            2 |  29642534203 |
| 2019-05-14 18:31:06 |            2 |  29642407875 |
| 2019-05-07 21:36:33 |            2 |  29607993579 |
| 2019-05-07 17:00:02 |            2 |  29606610097 |
| 2019-04-30 21:54:20 |            2 |  29556721347 |
| 2019-04-30 17:20:48 |            2 |  29554962194 |
| 2019-04-23 21:25:55 |            8 |  29518272985 |
| 2019-04-23 20:44:04 |            2 |  29518159420 |
| 2019-04-16 22:15:44 |            2 |  29464199627 |
| 2019-04-16 21:32:50 |            7 |  29463948347 |
| 2019-04-09 22:02:49 |            3 |  29414706787 |
| 2019-04-09 17:00:02 |            7 |  29413436015 |
| 2019-04-02 17:00:01 |            2 |  29381930026 |
| 2019-04-02 17:00:01 |            6 |  29381930026 |
| 2019-03-28 00:41:56 |            6 |  29357407869 |
| 2019-03-27 20:04:49 |            2 |  29356744920 |
| 2019-03-19 18:05:11 |            2 |  29282079413 |
| 2019-03-19 17:00:02 |            5 |  29281445418 |
| 2019-03-13 07:26:40 |            6 |  29234984589 |
| 2019-03-12 22:10:00 |            3 |  29234032721 |
| 2019-03-05 22:49:56 |            5 |  29188342154 |
| 2019-03-05 17:00:01 |            2 |  29186919962 |
| 2019-02-28 14:51:31 |            0 |  29090880854 |
| 2019-02-28 09:15:03 |            0 |  29086593494 |
| 2019-02-19 22:28:15 |            2 |  29005727340 |
| 2019-02-19 17:44:52 |            7 |  29005220516 |
| 2019-02-13 10:40:12 |            6 |  28898355772 |
| 2019-02-12 19:52:59 |            3 |  28885303190 |
| 2019-02-06 02:16:20 |            5 |  28414866767 |
| 2019-02-05 22:10:28 |            2 |  28414270875 |
| 2019-01-30 02:05:54 |            6 |  28304912629 |
| 2019-01-29 22:12:05 |            2 |  28303444101 |
| 2019-01-22 22:16:36 |            3 |  28257409551 |
| 2019-01-22 17:00:01 |            6 |  28254868584 |
| 2019-01-16 09:13:58 |            7 |  28149298898 |
| 2019-01-08 22:35:52 |            2 |  28066426954 |
| 2019-01-08 21:18:35 |            7 |  28067014876 |
| 2019-01-01 23:52:29 |            3 |  27956075594 |
| 2019-01-01 17:17:52 |            6 |  27950378989 |
| 2018-12-25 22:44:00 |            2 |  27894038363 |
| 2018-12-25 17:00:02 |            5 |  27893073950 |
| 2018-12-20 06:30:40 |            4 |  27861791336 |
| 2018-12-19 03:38:18 |            2 |  27855384713 |
| 2018-12-12 01:10:29 |            5 |  27677294042 |
| 2018-12-11 22:02:51 |            2 |  27676712299 |
| 2018-12-05 09:21:11 |            4 |  27624752141 |
| 2018-12-05 01:06:38 |            2 |  27623655338 |
| 2018-11-28 01:28:56 |            5 |  27585348424 |
| 2018-11-27 19:29:18 |            3 |  27584481437 |
| 2018-11-20 17:15:49 |            2 |  27512157480 |
| 2018-11-20 17:00:02 |            6 |  27512157669 |
| 2018-11-13 20:29:17 |            6 |  27450754874 |
| 2018-11-13 17:00:01 |            4 |  27449647580 |
| 2018-11-07 00:35:11 |            6 |  27397559904 |
| 2018-11-06 18:11:54 |            2 |  27396922052 |
| 2018-10-30 23:23:15 |            2 |  28217921899 |
| 2018-10-30 20:16:49 |            6 |  28217218969 |
| 2018-10-24 05:34:58 |            5 |  28103216430 |
| 2018-10-23 17:00:01 |            3 |  28094967410 |
| 2018-10-16 20:35:18 |            2 |  28053165694 |
| 2018-10-16 18:02:19 |            5 |  28051118377 |
| 2018-10-10 06:56:07 |            5 |  28005034108 |
| 2018-10-09 21:44:20 |            3 |  28004153977 |
| 2018-10-03 05:25:08 |            7 |  27906861382 |
| 2018-10-02 17:00:01 |            2 |  27901480482 |
| 2018-09-26 10:01:58 |            6 |  27827535644 |
| 2018-09-25 19:28:50 |            3 |  27803177487 |
| 2018-09-18 20:47:57 |            2 |  27701123230 |
| 2018-09-18 19:53:23 |            6 |  27700782883 |
| 2018-09-12 00:57:28 |            5 |  27611960260 |
| 2018-09-11 21:16:44 |            2 |  27611623115 |
| 2018-09-04 17:51:11 |            6 |  27545502630 |
| 2018-09-04 17:00:02 |            3 |  27545332864 |
| 2018-07-03 20:33:00 |         NULL |  26920831378 |
+---------------------+--------------+--------------+
300 rows in set (0.088 sec)

Similar info from the snapshots:

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='commonswiki' and file_name = 'image.ibd' and section='s4' and type='snapshot' GROUP BY id order by id desc LIMIT 300;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-15 01:18:11 |            1 | 388136697856 |
| 2021-07-15 01:01:26 |            1 | 388052811776 |
| 2021-07-14 01:02:20 |            1 | 388136697856 |
| 2021-07-14 01:00:35 |            1 | 388052811776 |
| 2021-07-12 01:10:52 |            1 | 388124114944 |
| 2021-07-12 01:00:33 |            1 | 388036034560 |
| 2021-07-10 01:12:37 |            1 | 387734044672 |
| 2021-07-10 01:04:26 |            1 | 387822125056 |
| 2021-07-08 01:15:30 |            1 | 387633381376 |
| 2021-07-08 01:10:05 |            2 | 387545300992 |
| 2021-07-07 01:10:22 |            2 | 387457220608 |
| 2021-07-07 00:57:15 |            1 | 387247505408 |
| 2021-07-05 01:02:09 |            1 | 387176202240 |
| 2021-07-05 01:01:40 |            1 | 387083927552 |
| 2021-07-03 01:08:46 |            1 | 386530279424 |
| 2021-07-03 00:51:36 |            1 | 386438004736 |
| 2021-07-01 00:58:22 |            1 | 385783693312 |
| 2021-07-01 00:51:55 |            1 | 385687224320 |
| 2021-06-30 00:52:35 |            1 | 385435566080 |
| 2021-06-30 00:49:45 |            1 | 385343291392 |
| 2021-06-29 12:05:31 |            1 | 385360068608 |
| 2021-06-29 11:46:19 |            2 | 385267793920 |
| 2021-06-28 00:53:59 |            1 | 385225850880 |
| 2021-06-28 00:53:33 |            1 | 385133576192 |
| 2021-06-26 01:12:54 |            1 | 384957415424 |
| 2021-06-26 00:55:40 |            1 | 384860946432 |
| 2021-06-24 01:04:50 |            1 | 384596705280 |
| 2021-06-24 01:04:16 |            1 | 384697368576 |
| 2021-06-23 01:05:27 |            1 | 384491847680 |
| 2021-06-23 00:58:00 |            1 | 384588316672 |
| 2021-06-21 01:01:47 |            1 | 384353435648 |
| 2021-06-21 00:55:39 |            1 | 384256966656 |
| 2021-06-19 00:53:23 |            1 | 384080805888 |
| 2021-06-19 00:49:17 |            1 | 383984336896 |
| 2021-06-17 01:04:57 |            1 | 383787204608 |
| 2021-06-17 00:58:30 |            1 | 383686541312 |
| 2021-06-16 00:56:13 |            1 | 383455854592 |
| 2021-06-15 21:49:20 |            1 | 383514574848 |
| 2021-06-14 00:49:20 |            1 | 382809931776 |
| 2021-06-13 21:48:31 |            1 | 382856069120 |
| 2021-06-12 00:46:24 |            1 | 382113677312 |
| 2021-06-11 21:48:42 |            1 | 382159814656 |
| 2021-06-10 00:47:37 |            1 | 381404839936 |
| 2021-06-09 21:49:50 |            1 | 381455171584 |
| 2021-06-09 00:48:07 |            1 | 380846997504 |
| 2021-06-08 21:49:19 |            1 | 380838608896 |
| 2021-06-07 00:51:50 |            1 | 379911667712 |
| 2021-06-06 21:47:39 |            1 | 379970387968 |
| 2021-06-05 00:52:06 |            1 | 379068612608 |
| 2021-06-04 21:47:42 |            1 | 379165081600 |
| 2021-06-04 11:01:38 |            1 | 378976337920 |
| 2021-06-03 00:50:08 |            1 | 378259111936 |
| 2021-06-02 22:19:29 |            2 | 378342998016 |
| 2021-06-02 00:58:47 |            2 | 378204585984 |
| 2021-06-02 00:55:44 |            2 | 378087145472 |
| 2021-05-31 00:51:36 |            1 | 377709658112 |
| 2021-05-30 23:22:56 |            1 | 377822904320 |
| 2021-05-29 01:04:04 |            1 | 377558663168 |
| 2021-05-29 00:55:07 |            1 | 377428639744 |
| 2021-05-27 01:10:30 |            1 | 377307004928 |
| 2021-05-27 01:05:35 |            1 | 377164398592 |
| 2021-05-26 01:06:13 |            1 | 377126649856 |
| 2021-05-26 00:49:39 |            1 | 375876747264 |
| 2021-05-24 01:06:29 |            1 | 369102946304 |
| 2021-05-24 00:48:39 |            1 | 363226726400 |
| 2021-05-22 01:09:41 |            1 | 355475652608 |
| 2021-05-22 00:52:40 |            1 | 349490380800 |
| 2021-05-20 01:48:44 |            1 | 333744963584 |
| 2021-05-19 01:20:19 |            1 | 387062956032 |
| 2021-05-19 01:02:41 |            1 | 387109093376 |
| 2021-05-17 01:11:12 |            1 | 386949709824 |
| 2021-05-17 00:55:12 |            1 | 386949709824 |
| 2021-05-15 01:05:15 |            1 | 386811297792 |
| 2021-05-15 00:59:52 |            1 | 386811297792 |
| 2021-05-13 01:01:35 |            1 | 386672885760 |
| 2021-05-13 00:55:36 |            1 | 386672885760 |
| 2021-05-12 01:06:32 |            1 | 386618359808 |
| 2021-05-12 01:05:46 |            1 | 386618359808 |
| 2021-05-10 01:05:55 |            1 | 386484142080 |
| 2021-05-10 00:58:11 |            1 | 386484142080 |
| 2021-05-08 01:11:46 |            1 | 385691418624 |
| 2021-05-08 00:52:51 |            1 | 385687224320 |
| 2021-05-06 01:15:06 |            1 | 385435566080 |
| 2021-05-06 01:01:19 |            1 | 385435566080 |
| 2021-05-05 01:10:03 |            2 | 385381040128 |
| 2021-05-05 00:59:07 |            2 | 385381040128 |
| 2021-05-03 01:05:02 |            1 | 385028718592 |
| 2021-05-03 00:54:53 |            1 | 385024524288 |
| 2021-05-01 01:02:03 |            1 | 384298909696 |
| 2021-05-01 00:59:53 |            1 | 384298909696 |
| 2021-04-29 01:17:04 |            1 | 383481020416 |
| 2021-04-29 00:59:50 |            1 | 383472631808 |
| 2021-04-28 01:15:41 |            1 | 383078367232 |
| 2021-04-28 01:08:24 |            1 | 383074172928 |
| 2021-04-26 01:02:12 |            1 | 382247895040 |
| 2021-04-26 00:58:50 |            1 | 382243700736 |
| 2021-04-24 00:57:14 |            1 | 381430005760 |
| 2021-04-22 01:16:17 |            1 | 380574367744 |
| 2021-04-22 01:08:30 |            1 | 380565979136 |
| 2021-04-21 00:59:42 |            1 | 380192686080 |
| 2021-04-21 00:53:12 |            1 | 380188491776 |
| 2021-04-19 01:10:10 |            1 | 379387379712 |
| 2021-04-19 00:51:41 |            1 | 379383185408 |
| 2021-04-17 01:01:52 |            1 | 378510770176 |
| 2021-04-17 00:52:04 |            1 | 378506575872 |
| 2021-04-15 00:59:26 |            1 | 377755795456 |
| 2021-04-15 00:55:29 |            1 | 377751601152 |
| 2021-04-14 01:07:30 |            1 | 377260867584 |
| 2021-04-14 01:03:12 |            1 | 377260867584 |
| 2021-04-12 00:57:43 |            1 | 376577196032 |
| 2021-04-12 00:55:34 |            1 | 376573001728 |
| 2021-04-10 00:51:05 |            1 | 376061296640 |
| 2021-04-10 00:46:40 |            1 | 376057102336 |
| 2021-04-08 00:59:05 |            1 | 374626844672 |
| 2021-04-08 00:52:52 |            1 | 374622650368 |
| 2021-04-07 01:06:14 |            2 | 374337437696 |
| 2021-04-07 00:55:59 |            2 | 374337437696 |
| 2021-04-05 00:52:05 |            1 | 373972533248 |
| 2021-04-05 00:50:54 |            1 | 373976727552 |
| 2021-04-03 01:03:32 |            1 | 373561491456 |
| 2021-04-03 00:47:59 |            1 | 373557297152 |
| 2021-04-01 01:11:51 |            1 | 373167226880 |
| 2021-04-01 01:00:59 |            1 | 373167226880 |
| 2021-03-31 01:06:34 |            1 | 373016231936 |
| 2021-03-31 00:47:30 |            1 | 373012037632 |
| 2021-03-29 01:12:58 |            2 | 372659716096 |
| 2021-03-29 01:06:25 |            1 | 372659716096 |
| 2021-03-27 01:20:16 |            1 | 372424835072 |
| 2021-03-27 01:11:19 |            1 | 372420640768 |
| 2021-03-25 01:18:21 |            2 | 372080902144 |
| 2021-03-25 01:04:18 |            2 | 372076707840 |
| 2021-03-24 01:01:18 |            1 | 371871186944 |
| 2021-03-24 00:54:30 |            1 | 371866992640 |
| 2021-03-22 00:58:27 |            1 | 371565002752 |
| 2021-03-22 00:50:53 |            1 | 371565002752 |
| 2021-03-20 01:01:39 |            1 | 371116212224 |
| 2021-03-20 00:39:47 |            1 | 371107823616 |
| 2021-03-18 00:58:15 |            1 | 370608701440 |
| 2021-03-18 00:55:53 |            1 | 370612895744 |
| 2021-03-17 00:50:23 |            1 | 370382209024 |
| 2021-03-17 00:46:04 |            1 | 370382209024 |
| 2021-03-15 00:56:04 |            1 | 370189271040 |
| 2021-03-15 00:55:30 |            1 | 370189271040 |
| 2021-03-13 00:56:36 |            1 | 369958584320 |
| 2021-03-13 00:43:16 |            1 | 369954390016 |
| 2021-03-11 00:55:04 |            1 | 369815977984 |
| 2021-03-11 00:49:15 |            1 | 369820172288 |
| 2021-03-10 00:59:00 |            1 | 369744674816 |
| 2021-03-10 00:56:47 |            1 | 369740480512 |
| 2021-03-08 00:49:22 |            1 | 369602068480 |
| 2021-03-08 00:41:50 |            1 | 369597874176 |
| 2021-03-06 00:54:48 |            1 | 369396547584 |
| 2021-03-06 00:49:22 |            1 | 369396547584 |
| 2021-03-04 01:07:13 |            1 | 369140695040 |
| 2021-03-04 01:03:00 |            1 | 369140695040 |
| 2021-03-03 00:52:20 |            2 | 368972922880 |
| 2021-03-03 00:49:11 |            1 | 368972922880 |
| 2021-03-01 00:58:43 |            1 | 368662544384 |
| 2021-03-01 00:53:54 |            1 | 368658350080 |
| 2021-02-27 01:03:12 |            1 | 368373137408 |
| 2021-02-27 00:49:13 |            1 | 368373137408 |
| 2021-02-25 00:52:50 |            1 | 368108896256 |
| 2021-02-25 00:46:29 |            1 | 368108896256 |
| 2021-02-24 01:05:19 |            1 | 368025010176 |
| 2021-02-24 00:57:02 |            1 | 368025010176 |
| 2021-02-22 00:51:54 |            1 | 367739797504 |
| 2021-02-22 00:48:54 |            1 | 367739797504 |
| 2021-02-20 00:42:02 |            1 | 367530082304 |
| 2021-02-20 00:41:36 |            1 | 367534276608 |
| 2021-02-18 01:01:16 |            1 | 367194537984 |
| 2021-02-18 00:44:02 |            1 | 367190343680 |
| 2021-02-17 00:42:41 |            1 | 367047737344 |
| 2021-02-17 00:37:43 |            1 | 367047737344 |
| 2021-02-15 01:00:26 |            1 | 366179516416 |
| 2021-02-15 00:52:11 |            1 | 366179516416 |
| 2021-02-13 00:42:58 |            1 | 365386792960 |
| 2021-02-13 00:37:57 |            1 | 365386792960 |
| 2021-02-11 00:54:40 |            1 | 364568903680 |
| 2021-02-11 00:45:33 |            1 | 364568903680 |
| 2021-02-10 00:46:06 |            1 | 364342411264 |
| 2021-02-10 00:40:34 |            1 | 364338216960 |
| 2021-02-08 00:45:36 |            1 | 364065587200 |
| 2021-02-08 00:35:51 |            1 | 364065587200 |
| 2021-02-06 00:41:13 |            1 | 363767791616 |
| 2021-02-06 00:32:49 |            1 | 363767791616 |
| 2021-02-04 00:47:49 |            1 | 363256086528 |
| 2021-02-04 00:41:37 |            1 | 363251892224 |
| 2021-02-03 00:45:31 |            2 | 362861821952 |
| 2021-02-03 00:43:12 |            1 | 362861821952 |
| 2021-02-01 00:46:52 |            1 | 361901326336 |
| 2021-02-01 00:30:10 |            1 | 361897132032 |
| 2021-01-30 00:45:13 |            1 | 361041494016 |
| 2021-01-30 00:30:15 |            1 | 361041494016 |
| 2021-01-28 00:44:50 |            1 | 359846117376 |
| 2021-01-28 00:35:17 |            1 | 359841923072 |
| 2021-01-27 00:39:53 |            1 | 358852067328 |
| 2021-01-27 00:33:05 |            1 | 358847873024 |
| 2021-01-25 00:50:00 |            1 | 358080315392 |
| 2021-01-25 00:33:04 |            1 | 358080315392 |
| 2021-01-23 00:40:29 |            1 | 357686050816 |
| 2021-01-23 00:35:13 |            1 | 357686050816 |
| 2021-01-21 00:50:38 |            1 | 356561977344 |
| 2021-01-21 00:32:08 |            1 | 356549394432 |
| 2021-01-20 00:34:16 |            1 | 355974774784 |
| 2021-01-20 00:23:42 |            1 | 355970580480 |
| 2021-01-18 00:38:04 |            1 | 354548711424 |
| 2021-01-18 00:33:06 |            1 | 354548711424 |
| 2021-01-16 00:47:15 |            1 | 353328168960 |
| 2021-01-16 00:42:54 |            1 | 353323974656 |
| 2021-01-14 00:31:07 |            1 | 350333435904 |
| 2021-01-14 00:27:34 |            1 | 350329241600 |
| 2021-01-13 00:34:55 |            1 | 348857040896 |
| 2021-01-13 00:28:49 |            1 | 348852846592 |
| 2021-01-11 00:33:36 |            1 | 345354797056 |
| 2021-01-11 00:27:07 |            1 | 345367379968 |
| 2021-01-09 00:43:34 |            1 | 341965799424 |
| 2021-01-09 00:34:25 |            1 | 341974188032 |
| 2021-01-07 00:33:13 |            1 | 338715213824 |
| 2021-01-07 00:29:27 |            1 | 338731991040 |
| 2021-01-06 00:38:45 |            1 | 337096212480 |
| 2021-01-06 00:34:54 |            1 | 337112989696 |
| 2021-01-04 00:44:12 |            1 | 333933707264 |
| 2021-01-04 00:30:10 |            1 | 333929512960 |
| 2021-01-02 00:31:44 |            1 | 329621962752 |
| 2021-01-02 00:28:44 |            1 | 329626157056 |
| 2020-12-31 00:37:43 |            1 | 325683511296 |
| 2020-12-31 00:35:27 |            1 | 325696094208 |
| 2020-12-30 00:31:46 |            1 | 323108208640 |
| 2020-12-30 00:24:30 |            1 | 323108208640 |
| 2020-12-28 00:25:57 |            1 | 319035539456 |
| 2020-12-28 00:20:23 |            1 | 319027150848 |
| 2020-12-26 00:37:53 |            1 | 315285831680 |
| 2020-12-26 00:30:24 |            1 | 315277443072 |
| 2020-12-24 00:25:19 |            1 | 312454676480 |
| 2020-12-24 00:22:30 |            1 | 312450482176 |
| 2020-12-23 00:35:56 |            1 | 311003447296 |
| 2020-12-23 00:35:20 |            1 | 311003447296 |
| 2020-12-21 00:33:37 |            1 | 308272955392 |
| 2020-12-21 00:29:26 |            1 | 308268761088 |
| 2020-12-19 00:22:40 |            1 | 304825237504 |
| 2020-12-19 00:21:04 |            1 | 304825237504 |
| 2020-12-17 00:38:19 |            1 | 300324749312 |
| 2020-12-17 00:32:16 |            1 | 300320555008 |
| 2020-12-16 00:27:21 |            1 | 298521198592 |
| 2020-12-16 00:23:20 |            1 | 298517004288 |
| 2020-12-14 00:28:50 |            1 | 296436629504 |
| 2020-12-14 00:15:27 |            1 | 296424046592 |
| 2020-12-12 00:25:15 |            1 | 294406586368 |
| 2020-12-12 00:17:38 |            1 | 294402392064 |
| 2020-12-11 00:21:06 |            1 | 293949407232 |
| 2020-12-10 00:39:36 |            1 | 293836161024 |
| 2020-12-10 00:15:56 |            1 | 293836161024 |
| 2020-12-09 00:29:24 |            1 | 293664194560 |
| 2020-12-09 00:28:51 |            1 | 293664194560 |
| 2020-12-07 00:24:10 |            1 | 293471256576 |
| 2020-12-07 00:22:44 |            1 | 293467062272 |
| 2020-12-05 00:25:17 |            1 | 293232181248 |
| 2020-12-05 00:14:42 |            1 | 293227986944 |
| 2020-12-03 02:24:35 |            1 | 292942774272 |
| 2020-12-03 01:27:28 |            1 | 292938579968 |
| 2020-12-02 19:00:03 |            4 | 292913414144 |
| 2020-12-02 14:34:52 |            1 | 292888248320 |
| 2020-11-30 13:50:33 |            1 | 292493983744 |
| 2020-11-30 00:16:05 |            1 | 291898392576 |
| 2020-11-28 00:16:05 |            1 | 289872543744 |
| 2020-11-26 15:59:28 |            1 | 288656195584 |
| 2020-11-26 00:29:21 |            1 | 287968329728 |
| 2020-11-25 00:21:33 |            1 | 287087525888 |
| 2020-11-25 00:21:17 |            1 | 287087525888 |
| 2020-11-23 00:27:11 |            1 | 285275586560 |
| 2020-11-23 00:12:35 |            1 | 285267197952 |
| 2020-11-21 00:24:29 |            1 | 283472035840 |
| 2020-11-21 00:14:34 |            1 | 283467841536 |
| 2020-11-19 00:29:13 |            1 | 281710428160 |
| 2020-11-19 00:21:46 |            1 | 281706233856 |
| 2020-11-18 00:26:28 |            1 | 280913510400 |
| 2020-11-18 00:26:13 |            1 | 280913510400 |
| 2020-11-16 00:19:48 |            1 | 279009296384 |
| 2020-11-16 00:11:20 |            1 | 279005102080 |
| 2020-11-14 00:22:51 |            1 | 277176385536 |
| 2020-11-14 00:11:40 |            1 | 277167996928 |
| 2020-11-12 00:26:54 |            1 | 275502858240 |
| 2020-11-12 00:09:58 |            1 | 275486081024 |
| 2020-11-11 00:24:59 |            1 | 274122932224 |
| 2020-11-11 00:12:26 |            1 | 274114543616 |
| 2020-11-09 00:20:42 |            1 | 273107910656 |
| 2020-11-09 00:17:22 |            1 | 273107910656 |
| 2020-11-07 00:21:32 |            1 | 270582939648 |
| 2020-11-07 00:20:46 |            1 | 270582939648 |
| 2020-11-05 00:20:33 |            1 | 267546263552 |
| 2020-11-05 00:17:48 |            1 | 267542069248 |
| 2020-11-04 00:15:10 |            1 | 266615128064 |
| 2020-11-04 00:14:22 |            1 | 266615128064 |
| 2020-11-02 00:23:34 |            1 | 264610250752 |
| 2020-11-02 00:06:28 |            1 | 264576696320 |
| 2020-10-31 00:24:10 |            2 | 261128978432 |
| 2020-10-31 00:10:36 |            1 | 261112201216 |
| 2020-10-29 00:22:49 |            1 | 257677066240 |
| 2020-10-29 00:16:29 |            2 | 257672871936 |
| 2020-10-28 00:12:37 |            2 | 255949012992 |
+---------------------+--------------+--------------+
300 rows in set (0.015 sec)

As I mention on irc, it is likely that the snapshot size won't change until table has been optimized. The main concern right now and that will likely be solved with this is the dump time.

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-27 01:47:08 |            8 | 130723495216 |
| 2021-07-27 00:00:02 |            8 | 130724419461 |
| 2021-07-20 00:00:02 |           13 | 229000803999 |
| 2021-07-20 00:00:02 |           13 | 229001243122 |
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |
root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-08-03 02:50:50 |            3 |  43416319449 |
| 2021-08-03 00:00:02 |            3 |  46623095944 |
| 2021-07-27 01:47:08 |            8 | 130723495216 |
| 2021-07-27 00:00:02 |            8 | 130724419461 |
| 2021-07-20 00:00:02 |           13 | 229000803999 |
| 2021-07-20 00:00:02 |           13 | 229001243122 |
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |
| 2021-07-06 00:00:02 |           15 | 268851672569 |
| 2021-07-06 00:00:01 |           15 | 268879587237 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-08 01:51:24 |           15 | 264967471625 |
| 2021-06-08 00:00:02 |           15 | 264945077858 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |

An update: This clean up will likely be finished (for pdf, not djvu) tomorrow evening.

Fixing djvu is a whole other beast: It needs basically rewriting how we store djvu metadata and I couldn't find any rather non-abandoned php library that would read the djvu metadata (Some context: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/697935/10#message-c72ed88b1c2fd9e31470372325607fc112b4dee4)

Mentioned in SAL (#wikimedia-operations) [2021-08-05T17:25:08Z] <Amir1> end of pdf rebuild on commonswiki (T275268)

Snapshots (before compression) decreased from 1769 GB to 1482 GB approximately, comparing the optimized one from eqiad to the unoptimized one from codfw.

Change 720401 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Drop $wgDjvuToXML

https://gerrit.wikimedia.org/r/720401

Change 720401 merged by jenkins-bot:

[mediawiki/core@master] Drop $wgDjvuToXML

https://gerrit.wikimedia.org/r/720401

Change 738280 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738280

Change 738280 merged by jenkins-bot:

[mediawiki/core@master] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738280

Change 738428 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Temporarily increase memory limit for djvu metadata

https://gerrit.wikimedia.org/r/738428

Change 738428 merged by jenkins-bot:

[mediawiki/core@master] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738428

Change 738637 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.7] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738637

Change 738637 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.7] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738637

Change 738638 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.7] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738638

Change 738638 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.7] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738638

Mentioned in SAL (#wikimedia-operations) [2021-11-15T10:23:56Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.7/includes/media/: Backport: [[gerrit:738638|media: Build and use JSON for metadata of djvu instead of XML (T275268 T192866)]] (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2021-11-15T13:40:39Z] <Amir1> start of djvu clean up in commons in a screen. Gonna take a couple of days (T275268)

Finished refreshing file metadata for 280453 files. 0 needed to be refreshed, 280453 did not need to be but were refreshed anyways, and 0 refreshes were suspicious.

Mentioned in SAL (#wikimedia-operations) [2021-11-21T05:13:20Z] <Amir1> end of djvu metadata maint script run (T275268)

Mentioned in SAL (#wikimedia-operations) [2021-11-21T05:22:35Z] <Amir1> running clean up of djvu files in all wikis (T275268)

Change 740320 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Drop XML metadata support from DjvuHandler

https://gerrit.wikimedia.org/r/740320

Change 740320 merged by jenkins-bot:

[mediawiki/core@master] media: Drop XML metadata support from DjvuHandler

https://gerrit.wikimedia.org/r/740320

Ladsgroup moved this task from In progress to Done on the DBA board.

I made T301039 as a followup, because now that metadata is moved to blobs, it is not possible anymore to access the metadata from the image table SQL dump alone.

Change 773943 had a related patch set uploaded (by Krinkle; author: Amir Sarabadani):

[mediawiki/core@master] filerepo: Enable JSON metadata serialization by default

https://gerrit.wikimedia.org/r/773943

Change 773943 merged by jenkins-bot:

[mediawiki/core@master] filerepo: Enable JSON meta-data serialization by default

https://gerrit.wikimedia.org/r/773943