Address "image" table capacity problems by storing pdf/djvu text outside file metadata
Closed, ResolvedPublic
Actions

Description

Image table is getting really big (T222224#6738823) and it turned out ~90% of the space taken by image table is text tag in metadata of pdf/djvu files (T28741#6750249). This needs addressing as it's causing infrastructure issues.

One way to do it can be to compress it, the other can be to push it to another storage like another table. the third way would be to delete it and read it directly from the file if needed. See T28741#6750249 onwards for more discussion.

Details

Subject	Repo	Branch	Lines +/-
filerepo: Enable JSON meta-data serialization by default	mediawiki/core	master	+6 -3
media: Drop XML metadata support from DjvuHandler	mediawiki/core	master	+66 -122
media: Build and use JSON for metadata of djvu instead of XML	mediawiki/core	wmf/1.38.0-wmf.7	+93 -69
Increase memory limit for DjVu metadata	mediawiki/core	wmf/1.38.0-wmf.7	+1 -1
Increase memory limit for DjVu metadata	mediawiki/core	master	+1 -1
media: Build and use JSON for metadata of djvu instead of XML	mediawiki/core	master	+93 -69
Drop $wgDjvuToXML	mediawiki/core	master	+8 -28
Enable json image metadata everywhere	operations/mediawiki-config	master	+2 -7
Set testcommonswiki to use json image metadata	operations/mediawiki-config	master	+1 -1
Manual and automatic image metadata reserialization	mediawiki/core	master	+333 -41
labs: Set json for metadata array and split metadata to ES when needed	operations/mediawiki-config	master	+7 -0
Optionally split out parts of file metadata to BlobStore	mediawiki/core	master	+596 -36
filerepo: Fix parsing metadata in ForeignAPIFile	mediawiki/core	master	+18 -1
Use the unserialized form of image metadata internally	mediawiki/core	master	+1 K -819
Add maintenance/rebuildFileMetadata.php	mediawiki/core	master	+97 -0
filerepo: Store and retrieve file metadata from blob store if it's large	mediawiki/core	master	+130 -5

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Ladsgroup	T275268 Address "image" table capacity problems by storing pdf/djvu text outside file metadata
		Resolved		Marostegui	T288273 Please optimize image table in commonswiki

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 699907 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@master] filerepo: Fix parsing metadata in ForeignAPIFile

https://gerrit.wikimedia.org/r/699907

This fixes it ^

Change 699907 merged by jenkins-bot:

[mediawiki/core@master] filerepo: Fix parsing metadata in ForeignAPIFile

https://gerrit.wikimedia.org/r/699907

ReleaseTaggerBot added a project: MW-1.37-notes (1.37.0-wmf.11; 2021-06-21).Jun 18 2021, 12:00 PM

Krinkle mentioned this in T285490: InvalidArgumentException: Media handler BmpHandler returned NULL for metadata, should be array.Jun 24 2021, 5:48 PM

Change 697935 merged by jenkins-bot:

[mediawiki/core@master] Optionally split out parts of file metadata to BlobStore

https://gerrit.wikimedia.org/r/697935

Change 701664 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] labs: Set json for metadata array and split metadata to ES when needed

https://gerrit.wikimedia.org/r/701664

Change 701664 merged by jenkins-bot:

[operations/mediawiki-config@master] labs: Set json for metadata array and split metadata to ES when needed

https://gerrit.wikimedia.org/r/701664

ReleaseTaggerBot edited projects, added MW-1.37-notes (1.37.0-wmf.12; 2021-06-28); removed MW-1.37-notes (1.37.0-wmf.11; 2021-06-21).Jun 26 2021, 9:00 PM

Enabled it in beta cluster.

Example of before metadata (a jpg file):

a:44:{s:5:"Model";s:11:"CORPORATION";s:11:"Orientation";i:1;s:11:"XResolution";s:20:"1330334030/893657166";s:11:"YResolution";s:11:"3158065/240";s:14:"ResolutionUnit";i:2;s:8:"DateTime";s:11:"(Macintosh)";s:12:"ExposureTime";s:5:"1/250";s:7:"FNumber";s:3:"8/1";s:15:"ExposureProgram";i:0;s:15:"ISOSpeedRatings";i:200;s:11:"ExifVersion";s:4:"0230";s:16:"DateTimeOriginal";s:19:"2016:04:03 20:03:11";s:17:"DateTimeDigitized";s:19:"2016:04:03 20:03:11";s:17:"ShutterSpeedValue";s:15:"7965784/1000000";s:13:"ApertureValue";s:3:"6/1";s:17:"ExposureBiasValue";s:3:"0/6";s:16:"MaxApertureValue";s:5:"36/10";s:12:"MeteringMode";i:5;s:11:"LightSource";i:0;s:5:"Flash";i:16;s:11:"FocalLength";s:6:"180/10";s:18:"SubSecTimeOriginal";s:1:"2";s:19:"SubSecTimeDigitized";s:1:"2";s:13:"SensingMethod";i:2;s:10:"FileSource";i:3;s:9:"SceneType";i:1;s:14:"CustomRendered";i:0;s:12:"ExposureMode";i:0;s:12:"WhiteBalance";i:0;s:16:"DigitalZoomRatio";s:3:"1/1";s:21:"FocalLengthIn35mmFilm";i:27;s:16:"SceneCaptureType";i:0;s:11:"GainControl";i:0;s:8:"Contrast";i:0;s:10:"Saturation";i:0;s:9:"Sharpness";i:0;s:20:"SubjectDistanceRange";i:0;s:12:"SerialNumber";s:7:"6226628";s:4:"Lens";s:22:"18.0-55.0 mm f/3.5-5.6";s:8:"Software";s:41:"Adobe Photoshop Lightroom 6.0 (Macintosh)";s:16:"DateTimeMetadata";s:19:"2016:04:04 13:17:35";s:18:"OriginalDocumentID";s:32:"F3132CC1FD62A3A3B145BBA083017426";s:10:"iimVersion";i:4;s:22:"MEDIAWIKI_EXIF_VERSION";i:2;}

Example of after (the same file):

{"data":{"Make":"NIKON CORPORATION","Model":"NIKON D5100","Orientation":1,"XResolution":"240/1","YResolution":"240/1","ResolutionUnit":2,"Software":"Adobe Photoshop Lightroom 6.0 (Macintosh)","DateTime":"2016:04:04 20:17:35","ExposureTime":"1/250","FNumber":"8/1","ExposureProgram":0,"ISOSpeedRatings":200,"ExifVersion":"0230","DateTimeOriginal":"2016:04:03 20:03:11","DateTimeDigitized":"2016:04:03 20:03:11","ShutterSpeedValue":"7965784/1000000","ApertureValue":"6/1","ExposureBiasValue":"0/6","MaxApertureValue":"36/10","MeteringMode":5,"LightSource":0,"Flash":16,"FocalLength":"180/10","SubSecTimeOriginal":"2","SubSecTimeDigitized":"2","SensingMethod":2,"FileSource":3,"SceneType":1,"CustomRendered":0,"ExposureMode":0,"WhiteBalance":0,"DigitalZoomRatio":"1/1","FocalLengthIn35mmFilm":27,"SceneCaptureType":0,"GainControl":0,"Contrast":0,"Saturation":0,"Sharpness":0,"SubjectDistanceRange":0,"SerialNumber":"6226628","Lens":"18.0-55.0 mm f/3.5-5.6","DateTimeMetadata":"2016:04:04 13:17:35","OriginalDocumentID":"F3132CC1FD62A3A3B145BBA083017426","iimVersion":4,"MEDIAWIKI_EXIF_VERSION":2}}

The storage in blobs works as well (this pdf file):

{"data":{"Title":"!!Abajo los solteros!! : fantasía cómico-lírica gubernamental en siete cuadros y un real decreto, en prosa","Keywords":"http://archive.org/details/abajolossolteros476riba","Author":"Ribas, Manuel","Creator":"Digitized by the Internet Archive","Producer":"Recoded by LuraDocument PDF v2.65","CreationDate":"Mon Oct 27 17:43:08 2014 UTC","ModDate":"Mon Oct 27 17:48:20 2014 UTC","Tagged":"no","UserProperties":"no","Suspects":"no","Form":"none","JavaScript":"no","Pages":"48","Encrypted":"no","File size":"2495026 bytes","Optimized":"yes","PDF version":"1.5","mergedMetadata":{"DateTimeMetadata":"2014:10:27 17:48:20","DateTimeDigitized":"2014:10:27 17:43:08","DateTime":"2014:10:27 17:48:20","Software":"Digitized by the Internet Archive","ObjectName":{"x-default":"!!Abajo los solteros!! : fantasía cómico-lírica gubernamental en siete cuadros y un real decreto, en prosa","_type":"lang"},"Artist":{"0":"Ribas, Manuel","_type":"ol"},"Keywords":["http://archive.org/details/abajolossolteros476riba"],"pdf-Producer":"Recoded by LuraDocument PDF v2.65","pdf-Encrypted":"no","pdf-PageSize":["339 x 537 pts","326 x 522 pts","294 x 511 pts","328 x 511 pts","317 x 538 pts"],"pdf-Version":"1.5"}},"blobs":{"pages":"tt:698633","text":"tt:698634"}}

API request to retrieve the metadata returns the same between production and beta cluster. So retrieval works fine as well:
https://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&titles=File:!!Abajo_los_solteros!!_-_fantas%C3%ADa_c%C3%B3mico-l%C3%ADrica_gubernamental_en_siete_cuadros_y_un_real_decreto,_en_prosa_(IA_abajolossolteros476riba).pdf&prop=imageinfo&iiprop=metadata
https://en.wikipedia.org/w/api.php?action=query&titles=File:!!Abajo_los_solteros!!_-_fantas%C3%ADa_c%C3%B3mico-l%C3%ADrica_gubernamental_en_siete_cuadros_y_un_real_decreto,_en_prosa_(IA_abajolossolteros476riba).pdf&prop=imageinfo&iiprop=metadata

Change 698372 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Manual and automatic image metadata reserialization

https://gerrit.wikimedia.org/r/698372

Krinkle renamed this task from Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata to Address "image" table capacity problems by storing pdf/djvu text outside file metadata.Jun 26 2021, 11:05 PM

Krinkle assigned this task to Ladsgroup.

Krinkle triaged this task as High priority.

Krinkle added projects: MediaWiki-File-management, Performance-Team (Radar), Wikimedia-Performance-publish.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJun 26 2021, 11:05 PM

Krinkle mentioned this in T32906: Store DjVu, PDF extracted text in a structured table instead of img_metadata.Jun 26 2021, 11:05 PM

Krinkle mentioned this in T99263: Store Pdf extracted text in a structured table instead of img_metadata.

Krinkle moved this task from Untriaged to Not yet on the Wikimedia-Performance-publish board.Jun 26 2021, 11:22 PM

Related T192866?

AntiCompositeNumber subscribed.Jun 27 2021, 3:48 PM

@Xover it seems likely that the column overflow issue correct itself as part of this. After the new storage capability is rolled out, we'll refresh metadata for PDF and Djvu files, in which case there'll be a new chance for some of the files reported in T192866 to get loaded and store their data in the external store. For now I'll keep that open on the off-chance that there might be other issues with those files.

thiemowmde subscribed.Jun 28 2021, 7:20 AM

fgiunchedi subscribed.Jun 28 2021, 7:59 AM

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Jun 28 2021, 6:31 PM

Change 698372 merged by jenkins-bot:

[mediawiki/core@master] Manual and automatic image metadata reserialization

https://gerrit.wikimedia.org/r/698372

ReleaseTaggerBot edited projects, added MW-1.37-notes (1.37.0-wmf.14; 2021-07-12); removed MW-1.37-notes (1.37.0-wmf.12; 2021-06-28).Jun 29 2021, 2:00 PM

JAllemandou mentioned this in T285783: Sqoop image metadata.Jun 29 2021, 5:14 PM

Krinkle mentioned this in T286400: Some uploaded image files show "_error" in image metadata section.Jul 10 2021, 10:03 PM

Change 703950 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] Set testcommonswiki to use json image metadata

https://gerrit.wikimedia.org/r/703950

Change 703950 merged by jenkins-bot:

[operations/mediawiki-config@master] Set testcommonswiki to use json image metadata

https://gerrit.wikimedia.org/r/703950

Mentioned in SAL (#wikimedia-operations) [2021-07-12T04:08:38Z] <ladsgroup@deploy1002> Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950|Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2021-07-12T04:10:38Z] <Amir1> mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force (T275268)

Change 703951 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/mediawiki-config@master] Enable json image metadata everywhere

https://gerrit.wikimedia.org/r/703951

Change 703951 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable json image metadata everywhere

https://gerrit.wikimedia.org/r/703951

Mentioned in SAL (#wikimedia-operations) [2021-07-12T05:06:11Z] <ladsgroup@deploy1002> Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951|Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-07-12T05:14:39Z] <Amir1> start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish (T275268)

Update: This will take ~fifty days to finish ^

I offered on IRC to paste the "backup time" of s4 and size of the image table, this is the latest executions (compared to more typical times of 2-4 hours for other sections), due to the issue of image table being large AND mydumper not being able to parallelize a non-int PK.

This is the info from the dumps:

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |
| 2021-07-06 00:00:02 |           15 | 268851672569 |
| 2021-07-06 00:00:01 |           15 | 268879587237 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-08 01:51:24 |           15 | 264967471625 |
| 2021-06-08 00:00:02 |           15 | 264945077858 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |
| 2021-05-25 01:51:00 |           14 | 263263226547 |
| 2021-05-25 00:00:02 |           15 | 263259316570 |
| 2021-05-18 01:56:11 |           14 | 262979744655 |
| 2021-05-18 00:00:02 |           16 | 262976819333 |
| 2021-05-11 00:00:02 |           14 | 262878511343 |
| 2021-05-11 00:00:02 |           15 | 262878511343 |
| 2021-05-04 02:18:34 |           15 | 262157119436 |
| 2021-05-04 01:47:34 |           14 | 262156978365 |
| 2021-04-27 00:00:02 |           16 | 260466316915 |
| 2021-04-27 00:00:02 |           14 | 260466316915 |
| 2021-04-20 00:00:02 |           15 | 258545284318 |
| 2021-04-20 00:00:02 |           14 | 258545284318 |
| 2021-04-13 02:25:39 |           14 | 256637685035 |
| 2021-04-13 00:00:02 |           15 | 256608826778 |
| 2021-04-06 01:44:21 |           14 | 254821475874 |
| 2021-04-06 00:00:02 |           15 | 254819309244 |
| 2021-03-30 00:00:02 |           14 | 254459949605 |
| 2021-03-30 00:00:02 |           14 | 254459973383 |
| 2021-03-23 02:28:45 |           14 | 254189550463 |
| 2021-03-23 01:01:06 |           14 | 254187667777 |
| 2021-03-16 02:34:38 |           15 | 253694685532 |
| 2021-03-16 00:00:02 |           14 | 253691952312 |
| 2021-03-09 02:59:36 |           14 | 253521305780 |
| 2021-03-09 02:11:04 |           14 | 253520779050 |
| 2021-03-02 01:48:50 |           14 | 253278589546 |
| 2021-03-02 00:00:02 |           14 | 253276580216 |
| 2021-02-23 02:14:47 |           14 | 252899325381 |
| 2021-02-23 00:00:02 |           14 | 252893374032 |
| 2021-02-16 00:00:02 |           14 | 252128972827 |
| 2021-02-16 00:00:01 |           15 | 252128972827 |
| 2021-02-09 01:59:38 |           14 | 250291649634 |
| 2021-02-09 01:50:12 |           14 | 250291547713 |
| 2021-02-02 02:46:51 |           14 | 249100901680 |
| 2021-02-02 00:00:02 |           15 | 249060468232 |
| 2021-01-26 02:25:36 |           15 | 245948385158 |
| 2021-01-26 00:00:02 |           15 | 245922975141 |
| 2021-01-19 02:12:40 |           14 | 244063988986 |
| 2021-01-19 01:49:13 |           14 | 244061150960 |
| 2021-01-12 01:56:33 |           13 | 237667850949 |
| 2021-01-12 01:47:10 |           13 | 237657806475 |
| 2021-01-05 01:45:19 |           13 | 230475789796 |
| 2021-01-05 00:00:02 |           13 | 230428196333 |
| 2020-12-29 02:00:00 |           12 | 216118625426 |
| 2020-12-29 00:00:02 |           12 | 215977141415 |
| 2020-12-22 01:53:29 |           12 | 206109097054 |
| 2020-12-22 00:00:02 |           12 | 206017506857 |
| 2020-12-15 00:00:02 |           12 | 195875966056 |
| 2020-12-15 00:00:01 |           11 | 195875966056 |
| 2020-12-08 00:00:02 |           11 | 192753160825 |
| 2020-12-08 00:00:01 |           11 | 192753148236 |
| 2020-12-01 00:00:02 |           11 | 192226559487 |
| 2020-12-01 00:00:01 |           12 | 192226559487 |
| 2020-11-24 01:46:38 |           11 | 187274544169 |
| 2020-11-24 00:00:02 |           12 | 187228079457 |
| 2020-11-17 02:22:11 |           10 | 182575790456 |
| 2020-11-17 00:00:02 |           11 | 182498181628 |
| 2020-11-10 02:24:10 |           11 | 177313567045 |
| 2020-11-10 00:00:02 |           10 | 177313367533 |
| 2020-11-03 00:57:45 |           10 | 170880585402 |
| 2020-11-03 00:00:02 |           10 | 170855259860 |
| 2020-10-27 01:53:57 |           10 | 161754871778 |
| 2020-10-27 01:50:02 |           10 | 161746186572 |
| 2020-10-20 01:45:35 |            9 | 152171503782 |
| 2020-10-20 00:57:42 |            9 | 152128905069 |
| 2020-10-13 01:45:02 |            8 | 143122246730 |
| 2020-10-13 00:00:02 |            8 | 143033156101 |
| 2020-10-06 01:46:45 |            8 | 134454518104 |
| 2020-10-06 00:00:02 |            8 | 134349828550 |
| 2020-09-29 00:00:02 |            9 | 131897336315 |
| 2020-09-29 00:00:02 |            8 | 131897336315 |
| 2020-09-22 00:55:40 |            8 | 129380150930 |
| 2020-09-22 00:00:02 |            8 | 129367125149 |
| 2020-09-15 00:56:07 |            8 | 128743337329 |
| 2020-09-15 00:00:01 |            8 | 128743383383 |
| 2020-09-08 00:00:01 |            8 | 127308494550 |
| 2020-09-08 00:00:01 |            7 | 127308494550 |
| 2020-09-01 01:55:59 |            7 | 125749536660 |
| 2020-09-01 01:53:13 |            7 | 125749322728 |
| 2020-08-25 00:00:02 |            7 | 120218697028 |
| 2020-08-25 00:00:02 |            7 | 120218697028 |
| 2020-08-18 01:46:41 |            6 | 113499977883 |
| 2020-08-18 00:00:02 |            7 | 113431585541 |
| 2020-08-11 01:44:02 |            6 | 105546207761 |
| 2020-08-11 01:12:47 |            6 | 105521417627 |
| 2020-08-04 05:06:36 |            6 | 100996282624 |
| 2020-08-04 00:00:02 |            7 | 100886727495 |
| 2020-07-28 04:05:19 |            6 |  96185727893 |
| 2020-07-28 00:00:01 |            7 |  96156047628 |
| 2020-07-21 01:49:44 |            5 |  88480710873 |
| 2020-07-21 00:00:01 |            5 |  88367183001 |
| 2020-07-14 00:20:40 |            6 |  80106490695 |
| 2020-07-14 00:00:01 |            6 |  80084405542 |
| 2020-07-07 03:27:33 |            4 |  67212137097 |
| 2020-07-07 02:00:08 |            4 |  67152687054 |
| 2020-06-30 05:02:34 |            3 |  55988135475 |
| 2020-06-30 00:00:01 |            4 |  55419298191 |
| 2020-06-23 00:53:50 |            3 |  40675484578 |
| 2020-06-23 00:00:01 |            3 |  40615163282 |
| 2020-06-16 00:00:02 |            3 |  35306524026 |
| 2020-06-16 00:00:01 |            4 |  35306524026 |
| 2020-06-09 05:05:05 |            2 |  34368136441 |
| 2020-06-09 00:00:02 |            2 |  34365645094 |
| 2020-06-02 00:56:36 |            2 |  33949876740 |
| 2020-06-02 00:00:02 |            3 |  33949811408 |
| 2020-05-26 02:00:23 |            2 |  33804124784 |
| 2020-05-26 00:16:24 |            3 |  33803830623 |
| 2020-05-19 01:16:16 |            2 |  33675610116 |
| 2020-05-19 00:00:01 |            3 |  33675321002 |
| 2020-05-12 02:03:38 |            3 |  33609446229 |
| 2020-05-12 00:00:01 |            2 |  33609269609 |
| 2020-05-05 00:55:16 |            3 |  33560557386 |
| 2020-05-05 00:00:02 |            2 |  33560505783 |
| 2020-04-28 02:18:23 |            3 |  33435191078 |
| 2020-04-28 00:00:01 |            2 |  33435069665 |
| 2020-04-21 05:30:01 |            2 |  33308027872 |
| 2020-04-21 00:00:01 |            4 |  33307004263 |
| 2020-04-14 00:00:01 |            3 |  33036870319 |
| 2020-04-14 00:00:01 |            4 |  33036870319 |
| 2020-04-07 03:47:34 |            3 |  32972786680 |
| 2020-04-07 03:33:05 |            2 |  32972680768 |
| 2020-03-31 03:53:14 |            2 |  32934553630 |
| 2020-03-31 00:00:01 |            2 |  32933118956 |
| 2020-03-24 01:57:05 |            2 |  32848310171 |
| 2020-03-24 00:00:02 |            2 |  32848142049 |
| 2020-03-17 00:00:02 |            3 |  32762971506 |
| 2020-03-17 00:00:01 |            2 |  32762971506 |
| 2020-03-10 06:31:51 |            2 |  32713634097 |
| 2020-03-10 00:00:01 |            3 |  32712255520 |
| 2020-03-03 01:49:26 |            2 |  32614936174 |
| 2020-03-03 01:34:16 |            2 |  32615088510 |
| 2020-02-25 03:50:09 |            2 |  32549206640 |
| 2020-02-25 01:45:20 |            2 |  32548907323 |
| 2020-02-18 06:47:32 |            2 |  32487639675 |
| 2020-02-18 02:34:07 |            2 |  32487375100 |
| 2020-02-11 05:37:28 |            3 |  32425816667 |
| 2020-02-11 03:58:17 |            2 |  32425603187 |
| 2020-02-04 02:39:49 |            2 |  32340794653 |
| 2020-02-04 00:39:46 |            2 |  32340557237 |
| 2020-01-28 01:43:58 |            2 |  32243459339 |
| 2020-01-28 00:00:01 |            2 |  32242675439 |
| 2020-01-21 05:16:39 |            2 |  32160249147 |
| 2020-01-21 00:00:01 |            3 |  32159460412 |
| 2020-01-14 02:00:33 |            2 |  32106453877 |
| 2020-01-14 01:44:25 |            2 |  32106530644 |
| 2020-01-07 02:02:07 |            2 |  32050966633 |
| 2020-01-07 01:08:20 |            2 |  32050945221 |
| 2019-12-31 02:59:18 |            2 |  31576227731 |
| 2019-12-31 00:00:01 |            3 |  31575378901 |
| 2019-12-24 04:44:04 |            2 |  31435707485 |
| 2019-12-24 00:18:58 |            2 |  31435180750 |
| 2019-12-17 04:48:43 |            2 |  31388877854 |
| 2019-12-17 01:06:26 |            2 |  31386903691 |
| 2019-12-10 00:00:01 |            2 |  31653831036 |
| 2019-12-10 00:00:01 |            2 |  31653831036 |
| 2019-12-03 04:15:58 |            2 |  31608394029 |
| 2019-12-03 01:04:30 |            2 |  31607898503 |
| 2019-11-26 05:39:18 |            2 |  31380393305 |
| 2019-11-26 04:15:59 |            2 |  31386032164 |
| 2019-11-19 03:07:56 |            3 |  31324959358 |
| 2019-11-19 00:00:02 |            3 |  31324742360 |
| 2019-11-12 00:00:01 |            2 |  31196107728 |
| 2019-11-12 00:00:01 |            2 |  31196107728 |
| 2019-11-05 03:30:43 |            2 |  30968283404 |
| 2019-11-05 00:44:02 |            3 |  30968014415 |
| 2019-10-29 00:00:02 |            2 |  30919217829 |
| 2019-10-29 00:00:01 |            3 |  30919217829 |
| 2019-10-22 07:29:46 |            2 |  30858343572 |
| 2019-10-22 03:12:46 |            2 |  30857701475 |
| 2019-10-15 04:44:28 |            2 |  30823843688 |
| 2019-10-15 00:00:01 |            2 |  30822729572 |
| 2019-10-08 03:39:41 |            2 |  30787439316 |
| 2019-10-08 00:17:34 |            2 |  30787374558 |
| 2019-10-01 03:28:56 |            2 |  30729235910 |
| 2019-10-01 02:53:26 |            2 |  30729308661 |
| 2019-09-24 04:22:56 |            2 |  30651015020 |
| 2019-09-24 04:20:47 |            3 |  30650983062 |
| 2019-09-17 01:36:38 |            2 |  30597668547 |
| 2019-09-17 00:00:01 |            2 |  30597463238 |
| 2019-09-10 05:44:22 |            3 |  30554097825 |
| 2019-09-10 00:00:01 |            3 |  30552816386 |
| 2019-09-03 03:45:37 |            2 |  30510594019 |
| 2019-09-03 00:16:08 |            2 |  30510154725 |
| 2019-08-27 01:54:23 |            2 |  30464351629 |
| 2019-08-27 00:00:01 |            2 |  30464318829 |
| 2019-08-20 00:00:02 |            2 |  30420034432 |
| 2019-08-20 00:00:01 |            2 |  30420040266 |
| 2019-08-13 04:02:08 |            2 |  30386454049 |
| 2019-08-13 00:00:01 |            2 |  30386105374 |
| 2019-08-06 04:59:55 |            2 |  30328662677 |
| 2019-08-06 00:29:38 |            3 |  30328024413 |
| 2019-07-30 00:44:23 |            2 |  30256697879 |
| 2019-07-30 00:00:01 |            2 |  30256570609 |
| 2019-07-23 00:29:39 |            2 |  30084693222 |
| 2019-07-23 00:00:01 |            2 |  30084802049 |
| 2019-07-16 00:15:43 |            2 |  30036427856 |
| 2019-07-16 00:00:02 |            2 |  30036511351 |
| 2019-07-09 02:52:51 |            2 |  30009364901 |
| 2019-07-09 01:50:30 |            2 |  30009323808 |
| 2019-07-02 04:39:11 |            2 |  29957865796 |
| 2019-07-02 00:00:02 |            2 |  29956723290 |
| 2019-06-25 02:28:54 |            2 |  29900388301 |
| 2019-06-25 00:00:01 |            2 |  29900040908 |
| 2019-06-18 03:55:02 |            3 |  29868030581 |
| 2019-06-18 01:50:22 |            2 |  29867991570 |
| 2019-06-11 04:25:49 |            2 |  29818987896 |
| 2019-06-11 00:00:01 |            2 |  29818757958 |
| 2019-06-04 03:22:01 |            2 |  29777382633 |
| 2019-06-04 00:00:01 |            2 |  29776912745 |
| 2019-05-28 04:44:14 |            2 |  29736593767 |
| 2019-05-28 04:14:00 |            2 |  29736544952 |
| 2019-05-21 00:35:15 |            2 |  29686559045 |
| 2019-05-21 00:00:01 |            3 |  29686559280 |
| 2019-05-14 19:01:45 |            2 |  29642534203 |
| 2019-05-14 18:31:06 |            2 |  29642407875 |
| 2019-05-07 21:36:33 |            2 |  29607993579 |
| 2019-05-07 17:00:02 |            2 |  29606610097 |
| 2019-04-30 21:54:20 |            2 |  29556721347 |
| 2019-04-30 17:20:48 |            2 |  29554962194 |
| 2019-04-23 21:25:55 |            8 |  29518272985 |
| 2019-04-23 20:44:04 |            2 |  29518159420 |
| 2019-04-16 22:15:44 |            2 |  29464199627 |
| 2019-04-16 21:32:50 |            7 |  29463948347 |
| 2019-04-09 22:02:49 |            3 |  29414706787 |
| 2019-04-09 17:00:02 |            7 |  29413436015 |
| 2019-04-02 17:00:01 |            2 |  29381930026 |
| 2019-04-02 17:00:01 |            6 |  29381930026 |
| 2019-03-28 00:41:56 |            6 |  29357407869 |
| 2019-03-27 20:04:49 |            2 |  29356744920 |
| 2019-03-19 18:05:11 |            2 |  29282079413 |
| 2019-03-19 17:00:02 |            5 |  29281445418 |
| 2019-03-13 07:26:40 |            6 |  29234984589 |
| 2019-03-12 22:10:00 |            3 |  29234032721 |
| 2019-03-05 22:49:56 |            5 |  29188342154 |
| 2019-03-05 17:00:01 |            2 |  29186919962 |
| 2019-02-28 14:51:31 |            0 |  29090880854 |
| 2019-02-28 09:15:03 |            0 |  29086593494 |
| 2019-02-19 22:28:15 |            2 |  29005727340 |
| 2019-02-19 17:44:52 |            7 |  29005220516 |
| 2019-02-13 10:40:12 |            6 |  28898355772 |
| 2019-02-12 19:52:59 |            3 |  28885303190 |
| 2019-02-06 02:16:20 |            5 |  28414866767 |
| 2019-02-05 22:10:28 |            2 |  28414270875 |
| 2019-01-30 02:05:54 |            6 |  28304912629 |
| 2019-01-29 22:12:05 |            2 |  28303444101 |
| 2019-01-22 22:16:36 |            3 |  28257409551 |
| 2019-01-22 17:00:01 |            6 |  28254868584 |
| 2019-01-16 09:13:58 |            7 |  28149298898 |
| 2019-01-08 22:35:52 |            2 |  28066426954 |
| 2019-01-08 21:18:35 |            7 |  28067014876 |
| 2019-01-01 23:52:29 |            3 |  27956075594 |
| 2019-01-01 17:17:52 |            6 |  27950378989 |
| 2018-12-25 22:44:00 |            2 |  27894038363 |
| 2018-12-25 17:00:02 |            5 |  27893073950 |
| 2018-12-20 06:30:40 |            4 |  27861791336 |
| 2018-12-19 03:38:18 |            2 |  27855384713 |
| 2018-12-12 01:10:29 |            5 |  27677294042 |
| 2018-12-11 22:02:51 |            2 |  27676712299 |
| 2018-12-05 09:21:11 |            4 |  27624752141 |
| 2018-12-05 01:06:38 |            2 |  27623655338 |
| 2018-11-28 01:28:56 |            5 |  27585348424 |
| 2018-11-27 19:29:18 |            3 |  27584481437 |
| 2018-11-20 17:15:49 |            2 |  27512157480 |
| 2018-11-20 17:00:02 |            6 |  27512157669 |
| 2018-11-13 20:29:17 |            6 |  27450754874 |
| 2018-11-13 17:00:01 |            4 |  27449647580 |
| 2018-11-07 00:35:11 |            6 |  27397559904 |
| 2018-11-06 18:11:54 |            2 |  27396922052 |
| 2018-10-30 23:23:15 |            2 |  28217921899 |
| 2018-10-30 20:16:49 |            6 |  28217218969 |
| 2018-10-24 05:34:58 |            5 |  28103216430 |
| 2018-10-23 17:00:01 |            3 |  28094967410 |
| 2018-10-16 20:35:18 |            2 |  28053165694 |
| 2018-10-16 18:02:19 |            5 |  28051118377 |
| 2018-10-10 06:56:07 |            5 |  28005034108 |
| 2018-10-09 21:44:20 |            3 |  28004153977 |
| 2018-10-03 05:25:08 |            7 |  27906861382 |
| 2018-10-02 17:00:01 |            2 |  27901480482 |
| 2018-09-26 10:01:58 |            6 |  27827535644 |
| 2018-09-25 19:28:50 |            3 |  27803177487 |
| 2018-09-18 20:47:57 |            2 |  27701123230 |
| 2018-09-18 19:53:23 |            6 |  27700782883 |
| 2018-09-12 00:57:28 |            5 |  27611960260 |
| 2018-09-11 21:16:44 |            2 |  27611623115 |
| 2018-09-04 17:51:11 |            6 |  27545502630 |
| 2018-09-04 17:00:02 |            3 |  27545332864 |
| 2018-07-03 20:33:00 |         NULL |  26920831378 |
+---------------------+--------------+--------------+
300 rows in set (0.088 sec)

Similar info from the snapshots:

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='commonswiki' and file_name = 'image.ibd' and section='s4' and type='snapshot' GROUP BY id order by id desc LIMIT 300;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-15 01:18:11 |            1 | 388136697856 |
| 2021-07-15 01:01:26 |            1 | 388052811776 |
| 2021-07-14 01:02:20 |            1 | 388136697856 |
| 2021-07-14 01:00:35 |            1 | 388052811776 |
| 2021-07-12 01:10:52 |            1 | 388124114944 |
| 2021-07-12 01:00:33 |            1 | 388036034560 |
| 2021-07-10 01:12:37 |            1 | 387734044672 |
| 2021-07-10 01:04:26 |            1 | 387822125056 |
| 2021-07-08 01:15:30 |            1 | 387633381376 |
| 2021-07-08 01:10:05 |            2 | 387545300992 |
| 2021-07-07 01:10:22 |            2 | 387457220608 |
| 2021-07-07 00:57:15 |            1 | 387247505408 |
| 2021-07-05 01:02:09 |            1 | 387176202240 |
| 2021-07-05 01:01:40 |            1 | 387083927552 |
| 2021-07-03 01:08:46 |            1 | 386530279424 |
| 2021-07-03 00:51:36 |            1 | 386438004736 |
| 2021-07-01 00:58:22 |            1 | 385783693312 |
| 2021-07-01 00:51:55 |            1 | 385687224320 |
| 2021-06-30 00:52:35 |            1 | 385435566080 |
| 2021-06-30 00:49:45 |            1 | 385343291392 |
| 2021-06-29 12:05:31 |            1 | 385360068608 |
| 2021-06-29 11:46:19 |            2 | 385267793920 |
| 2021-06-28 00:53:59 |            1 | 385225850880 |
| 2021-06-28 00:53:33 |            1 | 385133576192 |
| 2021-06-26 01:12:54 |            1 | 384957415424 |
| 2021-06-26 00:55:40 |            1 | 384860946432 |
| 2021-06-24 01:04:50 |            1 | 384596705280 |
| 2021-06-24 01:04:16 |            1 | 384697368576 |
| 2021-06-23 01:05:27 |            1 | 384491847680 |
| 2021-06-23 00:58:00 |            1 | 384588316672 |
| 2021-06-21 01:01:47 |            1 | 384353435648 |
| 2021-06-21 00:55:39 |            1 | 384256966656 |
| 2021-06-19 00:53:23 |            1 | 384080805888 |
| 2021-06-19 00:49:17 |            1 | 383984336896 |
| 2021-06-17 01:04:57 |            1 | 383787204608 |
| 2021-06-17 00:58:30 |            1 | 383686541312 |
| 2021-06-16 00:56:13 |            1 | 383455854592 |
| 2021-06-15 21:49:20 |            1 | 383514574848 |
| 2021-06-14 00:49:20 |            1 | 382809931776 |
| 2021-06-13 21:48:31 |            1 | 382856069120 |
| 2021-06-12 00:46:24 |            1 | 382113677312 |
| 2021-06-11 21:48:42 |            1 | 382159814656 |
| 2021-06-10 00:47:37 |            1 | 381404839936 |
| 2021-06-09 21:49:50 |            1 | 381455171584 |
| 2021-06-09 00:48:07 |            1 | 380846997504 |
| 2021-06-08 21:49:19 |            1 | 380838608896 |
| 2021-06-07 00:51:50 |            1 | 379911667712 |
| 2021-06-06 21:47:39 |            1 | 379970387968 |
| 2021-06-05 00:52:06 |            1 | 379068612608 |
| 2021-06-04 21:47:42 |            1 | 379165081600 |
| 2021-06-04 11:01:38 |            1 | 378976337920 |
| 2021-06-03 00:50:08 |            1 | 378259111936 |
| 2021-06-02 22:19:29 |            2 | 378342998016 |
| 2021-06-02 00:58:47 |            2 | 378204585984 |
| 2021-06-02 00:55:44 |            2 | 378087145472 |
| 2021-05-31 00:51:36 |            1 | 377709658112 |
| 2021-05-30 23:22:56 |            1 | 377822904320 |
| 2021-05-29 01:04:04 |            1 | 377558663168 |
| 2021-05-29 00:55:07 |            1 | 377428639744 |
| 2021-05-27 01:10:30 |            1 | 377307004928 |
| 2021-05-27 01:05:35 |            1 | 377164398592 |
| 2021-05-26 01:06:13 |            1 | 377126649856 |
| 2021-05-26 00:49:39 |            1 | 375876747264 |
| 2021-05-24 01:06:29 |            1 | 369102946304 |
| 2021-05-24 00:48:39 |            1 | 363226726400 |
| 2021-05-22 01:09:41 |            1 | 355475652608 |
| 2021-05-22 00:52:40 |            1 | 349490380800 |
| 2021-05-20 01:48:44 |            1 | 333744963584 |
| 2021-05-19 01:20:19 |            1 | 387062956032 |
| 2021-05-19 01:02:41 |            1 | 387109093376 |
| 2021-05-17 01:11:12 |            1 | 386949709824 |
| 2021-05-17 00:55:12 |            1 | 386949709824 |
| 2021-05-15 01:05:15 |            1 | 386811297792 |
| 2021-05-15 00:59:52 |            1 | 386811297792 |
| 2021-05-13 01:01:35 |            1 | 386672885760 |
| 2021-05-13 00:55:36 |            1 | 386672885760 |
| 2021-05-12 01:06:32 |            1 | 386618359808 |
| 2021-05-12 01:05:46 |            1 | 386618359808 |
| 2021-05-10 01:05:55 |            1 | 386484142080 |
| 2021-05-10 00:58:11 |            1 | 386484142080 |
| 2021-05-08 01:11:46 |            1 | 385691418624 |
| 2021-05-08 00:52:51 |            1 | 385687224320 |
| 2021-05-06 01:15:06 |            1 | 385435566080 |
| 2021-05-06 01:01:19 |            1 | 385435566080 |
| 2021-05-05 01:10:03 |            2 | 385381040128 |
| 2021-05-05 00:59:07 |            2 | 385381040128 |
| 2021-05-03 01:05:02 |            1 | 385028718592 |
| 2021-05-03 00:54:53 |            1 | 385024524288 |
| 2021-05-01 01:02:03 |            1 | 384298909696 |
| 2021-05-01 00:59:53 |            1 | 384298909696 |
| 2021-04-29 01:17:04 |            1 | 383481020416 |
| 2021-04-29 00:59:50 |            1 | 383472631808 |
| 2021-04-28 01:15:41 |            1 | 383078367232 |
| 2021-04-28 01:08:24 |            1 | 383074172928 |
| 2021-04-26 01:02:12 |            1 | 382247895040 |
| 2021-04-26 00:58:50 |            1 | 382243700736 |
| 2021-04-24 00:57:14 |            1 | 381430005760 |
| 2021-04-22 01:16:17 |            1 | 380574367744 |
| 2021-04-22 01:08:30 |            1 | 380565979136 |
| 2021-04-21 00:59:42 |            1 | 380192686080 |
| 2021-04-21 00:53:12 |            1 | 380188491776 |
| 2021-04-19 01:10:10 |            1 | 379387379712 |
| 2021-04-19 00:51:41 |            1 | 379383185408 |
| 2021-04-17 01:01:52 |            1 | 378510770176 |
| 2021-04-17 00:52:04 |            1 | 378506575872 |
| 2021-04-15 00:59:26 |            1 | 377755795456 |
| 2021-04-15 00:55:29 |            1 | 377751601152 |
| 2021-04-14 01:07:30 |            1 | 377260867584 |
| 2021-04-14 01:03:12 |            1 | 377260867584 |
| 2021-04-12 00:57:43 |            1 | 376577196032 |
| 2021-04-12 00:55:34 |            1 | 376573001728 |
| 2021-04-10 00:51:05 |            1 | 376061296640 |
| 2021-04-10 00:46:40 |            1 | 376057102336 |
| 2021-04-08 00:59:05 |            1 | 374626844672 |
| 2021-04-08 00:52:52 |            1 | 374622650368 |
| 2021-04-07 01:06:14 |            2 | 374337437696 |
| 2021-04-07 00:55:59 |            2 | 374337437696 |
| 2021-04-05 00:52:05 |            1 | 373972533248 |
| 2021-04-05 00:50:54 |            1 | 373976727552 |
| 2021-04-03 01:03:32 |            1 | 373561491456 |
| 2021-04-03 00:47:59 |            1 | 373557297152 |
| 2021-04-01 01:11:51 |            1 | 373167226880 |
| 2021-04-01 01:00:59 |            1 | 373167226880 |
| 2021-03-31 01:06:34 |            1 | 373016231936 |
| 2021-03-31 00:47:30 |            1 | 373012037632 |
| 2021-03-29 01:12:58 |            2 | 372659716096 |
| 2021-03-29 01:06:25 |            1 | 372659716096 |
| 2021-03-27 01:20:16 |            1 | 372424835072 |
| 2021-03-27 01:11:19 |            1 | 372420640768 |
| 2021-03-25 01:18:21 |            2 | 372080902144 |
| 2021-03-25 01:04:18 |            2 | 372076707840 |
| 2021-03-24 01:01:18 |            1 | 371871186944 |
| 2021-03-24 00:54:30 |            1 | 371866992640 |
| 2021-03-22 00:58:27 |            1 | 371565002752 |
| 2021-03-22 00:50:53 |            1 | 371565002752 |
| 2021-03-20 01:01:39 |            1 | 371116212224 |
| 2021-03-20 00:39:47 |            1 | 371107823616 |
| 2021-03-18 00:58:15 |            1 | 370608701440 |
| 2021-03-18 00:55:53 |            1 | 370612895744 |
| 2021-03-17 00:50:23 |            1 | 370382209024 |
| 2021-03-17 00:46:04 |            1 | 370382209024 |
| 2021-03-15 00:56:04 |            1 | 370189271040 |
| 2021-03-15 00:55:30 |            1 | 370189271040 |
| 2021-03-13 00:56:36 |            1 | 369958584320 |
| 2021-03-13 00:43:16 |            1 | 369954390016 |
| 2021-03-11 00:55:04 |            1 | 369815977984 |
| 2021-03-11 00:49:15 |            1 | 369820172288 |
| 2021-03-10 00:59:00 |            1 | 369744674816 |
| 2021-03-10 00:56:47 |            1 | 369740480512 |
| 2021-03-08 00:49:22 |            1 | 369602068480 |
| 2021-03-08 00:41:50 |            1 | 369597874176 |
| 2021-03-06 00:54:48 |            1 | 369396547584 |
| 2021-03-06 00:49:22 |            1 | 369396547584 |
| 2021-03-04 01:07:13 |            1 | 369140695040 |
| 2021-03-04 01:03:00 |            1 | 369140695040 |
| 2021-03-03 00:52:20 |            2 | 368972922880 |
| 2021-03-03 00:49:11 |            1 | 368972922880 |
| 2021-03-01 00:58:43 |            1 | 368662544384 |
| 2021-03-01 00:53:54 |            1 | 368658350080 |
| 2021-02-27 01:03:12 |            1 | 368373137408 |
| 2021-02-27 00:49:13 |            1 | 368373137408 |
| 2021-02-25 00:52:50 |            1 | 368108896256 |
| 2021-02-25 00:46:29 |            1 | 368108896256 |
| 2021-02-24 01:05:19 |            1 | 368025010176 |
| 2021-02-24 00:57:02 |            1 | 368025010176 |
| 2021-02-22 00:51:54 |            1 | 367739797504 |
| 2021-02-22 00:48:54 |            1 | 367739797504 |
| 2021-02-20 00:42:02 |            1 | 367530082304 |
| 2021-02-20 00:41:36 |            1 | 367534276608 |
| 2021-02-18 01:01:16 |            1 | 367194537984 |
| 2021-02-18 00:44:02 |            1 | 367190343680 |
| 2021-02-17 00:42:41 |            1 | 367047737344 |
| 2021-02-17 00:37:43 |            1 | 367047737344 |
| 2021-02-15 01:00:26 |            1 | 366179516416 |
| 2021-02-15 00:52:11 |            1 | 366179516416 |
| 2021-02-13 00:42:58 |            1 | 365386792960 |
| 2021-02-13 00:37:57 |            1 | 365386792960 |
| 2021-02-11 00:54:40 |            1 | 364568903680 |
| 2021-02-11 00:45:33 |            1 | 364568903680 |
| 2021-02-10 00:46:06 |            1 | 364342411264 |
| 2021-02-10 00:40:34 |            1 | 364338216960 |
| 2021-02-08 00:45:36 |            1 | 364065587200 |
| 2021-02-08 00:35:51 |            1 | 364065587200 |
| 2021-02-06 00:41:13 |            1 | 363767791616 |
| 2021-02-06 00:32:49 |            1 | 363767791616 |
| 2021-02-04 00:47:49 |            1 | 363256086528 |
| 2021-02-04 00:41:37 |            1 | 363251892224 |
| 2021-02-03 00:45:31 |            2 | 362861821952 |
| 2021-02-03 00:43:12 |            1 | 362861821952 |
| 2021-02-01 00:46:52 |            1 | 361901326336 |
| 2021-02-01 00:30:10 |            1 | 361897132032 |
| 2021-01-30 00:45:13 |            1 | 361041494016 |
| 2021-01-30 00:30:15 |            1 | 361041494016 |
| 2021-01-28 00:44:50 |            1 | 359846117376 |
| 2021-01-28 00:35:17 |            1 | 359841923072 |
| 2021-01-27 00:39:53 |            1 | 358852067328 |
| 2021-01-27 00:33:05 |            1 | 358847873024 |
| 2021-01-25 00:50:00 |            1 | 358080315392 |
| 2021-01-25 00:33:04 |            1 | 358080315392 |
| 2021-01-23 00:40:29 |            1 | 357686050816 |
| 2021-01-23 00:35:13 |            1 | 357686050816 |
| 2021-01-21 00:50:38 |            1 | 356561977344 |
| 2021-01-21 00:32:08 |            1 | 356549394432 |
| 2021-01-20 00:34:16 |            1 | 355974774784 |
| 2021-01-20 00:23:42 |            1 | 355970580480 |
| 2021-01-18 00:38:04 |            1 | 354548711424 |
| 2021-01-18 00:33:06 |            1 | 354548711424 |
| 2021-01-16 00:47:15 |            1 | 353328168960 |
| 2021-01-16 00:42:54 |            1 | 353323974656 |
| 2021-01-14 00:31:07 |            1 | 350333435904 |
| 2021-01-14 00:27:34 |            1 | 350329241600 |
| 2021-01-13 00:34:55 |            1 | 348857040896 |
| 2021-01-13 00:28:49 |            1 | 348852846592 |
| 2021-01-11 00:33:36 |            1 | 345354797056 |
| 2021-01-11 00:27:07 |            1 | 345367379968 |
| 2021-01-09 00:43:34 |            1 | 341965799424 |
| 2021-01-09 00:34:25 |            1 | 341974188032 |
| 2021-01-07 00:33:13 |            1 | 338715213824 |
| 2021-01-07 00:29:27 |            1 | 338731991040 |
| 2021-01-06 00:38:45 |            1 | 337096212480 |
| 2021-01-06 00:34:54 |            1 | 337112989696 |
| 2021-01-04 00:44:12 |            1 | 333933707264 |
| 2021-01-04 00:30:10 |            1 | 333929512960 |
| 2021-01-02 00:31:44 |            1 | 329621962752 |
| 2021-01-02 00:28:44 |            1 | 329626157056 |
| 2020-12-31 00:37:43 |            1 | 325683511296 |
| 2020-12-31 00:35:27 |            1 | 325696094208 |
| 2020-12-30 00:31:46 |            1 | 323108208640 |
| 2020-12-30 00:24:30 |            1 | 323108208640 |
| 2020-12-28 00:25:57 |            1 | 319035539456 |
| 2020-12-28 00:20:23 |            1 | 319027150848 |
| 2020-12-26 00:37:53 |            1 | 315285831680 |
| 2020-12-26 00:30:24 |            1 | 315277443072 |
| 2020-12-24 00:25:19 |            1 | 312454676480 |
| 2020-12-24 00:22:30 |            1 | 312450482176 |
| 2020-12-23 00:35:56 |            1 | 311003447296 |
| 2020-12-23 00:35:20 |            1 | 311003447296 |
| 2020-12-21 00:33:37 |            1 | 308272955392 |
| 2020-12-21 00:29:26 |            1 | 308268761088 |
| 2020-12-19 00:22:40 |            1 | 304825237504 |
| 2020-12-19 00:21:04 |            1 | 304825237504 |
| 2020-12-17 00:38:19 |            1 | 300324749312 |
| 2020-12-17 00:32:16 |            1 | 300320555008 |
| 2020-12-16 00:27:21 |            1 | 298521198592 |
| 2020-12-16 00:23:20 |            1 | 298517004288 |
| 2020-12-14 00:28:50 |            1 | 296436629504 |
| 2020-12-14 00:15:27 |            1 | 296424046592 |
| 2020-12-12 00:25:15 |            1 | 294406586368 |
| 2020-12-12 00:17:38 |            1 | 294402392064 |
| 2020-12-11 00:21:06 |            1 | 293949407232 |
| 2020-12-10 00:39:36 |            1 | 293836161024 |
| 2020-12-10 00:15:56 |            1 | 293836161024 |
| 2020-12-09 00:29:24 |            1 | 293664194560 |
| 2020-12-09 00:28:51 |            1 | 293664194560 |
| 2020-12-07 00:24:10 |            1 | 293471256576 |
| 2020-12-07 00:22:44 |            1 | 293467062272 |
| 2020-12-05 00:25:17 |            1 | 293232181248 |
| 2020-12-05 00:14:42 |            1 | 293227986944 |
| 2020-12-03 02:24:35 |            1 | 292942774272 |
| 2020-12-03 01:27:28 |            1 | 292938579968 |
| 2020-12-02 19:00:03 |            4 | 292913414144 |
| 2020-12-02 14:34:52 |            1 | 292888248320 |
| 2020-11-30 13:50:33 |            1 | 292493983744 |
| 2020-11-30 00:16:05 |            1 | 291898392576 |
| 2020-11-28 00:16:05 |            1 | 289872543744 |
| 2020-11-26 15:59:28 |            1 | 288656195584 |
| 2020-11-26 00:29:21 |            1 | 287968329728 |
| 2020-11-25 00:21:33 |            1 | 287087525888 |
| 2020-11-25 00:21:17 |            1 | 287087525888 |
| 2020-11-23 00:27:11 |            1 | 285275586560 |
| 2020-11-23 00:12:35 |            1 | 285267197952 |
| 2020-11-21 00:24:29 |            1 | 283472035840 |
| 2020-11-21 00:14:34 |            1 | 283467841536 |
| 2020-11-19 00:29:13 |            1 | 281710428160 |
| 2020-11-19 00:21:46 |            1 | 281706233856 |
| 2020-11-18 00:26:28 |            1 | 280913510400 |
| 2020-11-18 00:26:13 |            1 | 280913510400 |
| 2020-11-16 00:19:48 |            1 | 279009296384 |
| 2020-11-16 00:11:20 |            1 | 279005102080 |
| 2020-11-14 00:22:51 |            1 | 277176385536 |
| 2020-11-14 00:11:40 |            1 | 277167996928 |
| 2020-11-12 00:26:54 |            1 | 275502858240 |
| 2020-11-12 00:09:58 |            1 | 275486081024 |
| 2020-11-11 00:24:59 |            1 | 274122932224 |
| 2020-11-11 00:12:26 |            1 | 274114543616 |
| 2020-11-09 00:20:42 |            1 | 273107910656 |
| 2020-11-09 00:17:22 |            1 | 273107910656 |
| 2020-11-07 00:21:32 |            1 | 270582939648 |
| 2020-11-07 00:20:46 |            1 | 270582939648 |
| 2020-11-05 00:20:33 |            1 | 267546263552 |
| 2020-11-05 00:17:48 |            1 | 267542069248 |
| 2020-11-04 00:15:10 |            1 | 266615128064 |
| 2020-11-04 00:14:22 |            1 | 266615128064 |
| 2020-11-02 00:23:34 |            1 | 264610250752 |
| 2020-11-02 00:06:28 |            1 | 264576696320 |
| 2020-10-31 00:24:10 |            2 | 261128978432 |
| 2020-10-31 00:10:36 |            1 | 261112201216 |
| 2020-10-29 00:22:49 |            1 | 257677066240 |
| 2020-10-29 00:16:29 |            2 | 257672871936 |
| 2020-10-28 00:12:37 |            2 | 255949012992 |
+---------------------+--------------+--------------+
300 rows in set (0.015 sec)

As I mention on irc, it is likely that the snapshot size won't change until table has been optimized. The main concern right now and that will likely be solved with this is the dump time.

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-07-27 01:47:08 |            8 | 130723495216 |
| 2021-07-27 00:00:02 |            8 | 130724419461 |
| 2021-07-20 00:00:02 |           13 | 229000803999 |
| 2021-07-20 00:00:02 |           13 | 229001243122 |
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |

*puts his party hat on*

Meow moved this task from Incoming to Thumbnail and file renderings on the Commons board.Jul 28 2021, 9:04 AM

???

Marostegui moved this task from Refine to In progress on the DBA board.Aug 2 2021, 4:44 AM

root@db1159.eqiad.wmnet[dbbackups]> nopager; select start_date as time, TIMESTAMPDIFF(HOUR, start_date, end_date) as hours_backup, sum(size) as image_size FROM backups JOIN backup_files ON backup_id = id where file_path='' and file_name like 'commonswiki.image.%' and section='s4' and type='dump' GROUP BY id order by id desc;
PAGER set to stdout
+---------------------+--------------+--------------+
| time                | hours_backup | image_size   |
+---------------------+--------------+--------------+
| 2021-08-03 02:50:50 |            3 |  43416319449 |
| 2021-08-03 00:00:02 |            3 |  46623095944 |
| 2021-07-27 01:47:08 |            8 | 130723495216 |
| 2021-07-27 00:00:02 |            8 | 130724419461 |
| 2021-07-20 00:00:02 |           13 | 229000803999 |
| 2021-07-20 00:00:02 |           13 | 229001243122 |
| 2021-07-13 00:00:02 |           15 | 268495128477 |
| 2021-07-13 00:00:02 |           15 | 268495133733 |
| 2021-07-06 00:00:02 |           15 | 268851672569 |
| 2021-07-06 00:00:01 |           15 | 268879587237 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-28 12:00:02 |           15 | 267575857832 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-22 00:00:02 |           15 | 267281738653 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-15 00:00:02 |           15 | 266825899971 |
| 2021-06-08 01:51:24 |           15 | 264967471625 |
| 2021-06-08 00:00:02 |           15 | 264945077858 |
| 2021-06-01 00:00:02 |           16 | 263482186455 |

An update: This clean up will likely be finished (for pdf, not djvu) tomorrow evening.

Fixing djvu is a whole other beast: It needs basically rewriting how we store djvu metadata and I couldn't find any rather non-abandoned php library that would read the djvu metadata (Some context: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/697935/10#message-c72ed88b1c2fd9e31470372325607fc112b4dee4)

Mentioned in SAL (#wikimedia-operations) [2021-08-05T17:25:08Z] <Amir1> end of pdf rebuild on commonswiki (T275268)

Snapshots (before compression) decreased from 1769 GB to 1482 GB approximately, comparing the optimized one from eqiad to the unoptimized one from codfw.

Paladox subscribed.Aug 19 2021, 6:31 PM

Change 720401 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Drop $wgDjvuToXML

https://gerrit.wikimedia.org/r/720401

Change 720401 merged by jenkins-bot:

[mediawiki/core@master] Drop $wgDjvuToXML

https://gerrit.wikimedia.org/r/720401

ReleaseTaggerBot edited projects, added MW-1.37-notes (1.37.0-wmf.23; 2021-09-13); removed MW-1.37-notes (1.37.0-wmf.14; 2021-07-12).Sep 10 2021, 11:00 PM

Marostegui closed subtask T288273: Please optimize image table in commonswiki as Resolved.Sep 27 2021, 2:34 PM

Izno subscribed.Oct 6 2021, 6:56 AM

Krinkle mentioned this in T192866: Some DjVu files have too much metadata to fit in their database column.Oct 20 2021, 8:12 PM

Change 738280 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738280

Change 738280 merged by jenkins-bot:

[mediawiki/core@master] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738280

TheDJ mentioned this in T279019: Refresh DjVu image metadata.Nov 12 2021, 9:29 AM

Change 738428 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Temporarily increase memory limit for djvu metadata

https://gerrit.wikimedia.org/r/738428

Change 738428 merged by jenkins-bot:

[mediawiki/core@master] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738428

ReleaseTaggerBot added a project: MW-1.38-notes (1.38.0-wmf.9; 2021-11-16).Nov 12 2021, 7:00 PM

Change 738637 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.7] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738637

Change 738637 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.7] Increase memory limit for DjVu metadata

https://gerrit.wikimedia.org/r/738637

Change 738638 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.7] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738638

ReleaseTaggerBot edited projects, added MW-1.38-notes (1.38.0-wmf.7; 2021-11-02); removed MW-1.38-notes (1.38.0-wmf.9; 2021-11-16).Nov 15 2021, 10:00 AM

Change 738638 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.7] media: Build and use JSON for metadata of djvu instead of XML

https://gerrit.wikimedia.org/r/738638

Mentioned in SAL (#wikimedia-operations) [2021-11-15T10:23:56Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.7/includes/media/: Backport: [[gerrit:738638|media: Build and use JSON for metadata of djvu instead of XML (T275268 T192866)]] (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2021-11-15T13:40:39Z] <Amir1> start of djvu clean up in commons in a screen. Gonna take a couple of days (T275268)

Xover mentioned this in T296001: DjVuHandler: getDimensionInfoFromMetaTree: PHP Notice: Undefined index: pages.Nov 18 2021, 5:10 PM

Finished refreshing file metadata for 280453 files. 0 needed to be refreshed, 280453 did not need to be but were refreshed anyways, and 0 refreshes were suspicious.

Mentioned in SAL (#wikimedia-operations) [2021-11-21T05:13:20Z] <Amir1> end of djvu metadata maint script run (T275268)

Mentioned in SAL (#wikimedia-operations) [2021-11-21T05:22:35Z] <Amir1> running clean up of djvu files in all wikis (T275268)

Ladsgroup mentioned this in T296143: Optimize commonswiki image table.Nov 21 2021, 5:24 AM

Change 740320 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Drop XML metadata support from DjvuHandler

https://gerrit.wikimedia.org/r/740320

Change 740320 merged by jenkins-bot:

[mediawiki/core@master] media: Drop XML metadata support from DjvuHandler

https://gerrit.wikimedia.org/r/740320

Ladsgroup closed this task as Resolved.Nov 21 2021, 3:56 PM

Ladsgroup moved this task from In progress to Done on the DBA board.

ReleaseTaggerBot edited projects, added MW-1.38-notes (1.38.0-wmf.12; 2021-12-06); removed MW-1.38-notes (1.38.0-wmf.7; 2021-11-02).Nov 21 2021, 4:00 PM

Maintenance_bot moved this task from Incoming to Done on the User-Ladsgroup board.Nov 21 2021, 4:15 PM

thiemowmde mentioned this in T267319: XMP errors when using FileImporter on jpeg images.Dec 15 2021, 1:33 PM

Umherirrender mentioned this in T298398: API action=query&list=filearchive&faprop=metadata gives no metadata and possible warning for LocalRepo with 'useJsonMetadata' option (like wmf).Dec 30 2021, 9:16 PM

Umherirrender mentioned this in T298423: DjVu thumbnails not displayed on Wikisource.Jan 1 2022, 2:45 PM

Umherirrender mentioned this in T298417: Undeleted djvu files show incorrect metadata: 0x0 size, no page number info.Jan 1 2022, 2:48 PM

Umherirrender mentioned this in T298446: MediaWiki API fails with prop=imageinfo&iiprop=metadata to show or continue the metadata blob (for djvu with useSplitMetadata = true).Jan 2 2022, 7:35 PM

Umherirrender mentioned this in T298447: Use of Maintenance::purgeRedundantText could be risky for repos with splitted metadata.Jan 2 2022, 7:42 PM

Ladsgroup mentioned this in T299124: ProofreadPage frontend makes a request to the page before and after in every page view.Jan 13 2022, 11:03 AM

Mitar mentioned this in T301039: Provide a dump of PDF/DjVU metadata.Feb 5 2022, 12:12 PM

I made T301039 as a followup, because now that metadata is moved to blobs, it is not possible anymore to access the metadata from the image table SQL dump alone.

Change 773943 had a related patch set uploaded (by Krinkle; author: Amir Sarabadani):

[mediawiki/core@master] filerepo: Enable JSON metadata serialization by default

https://gerrit.wikimedia.org/r/773943

Change 773943 merged by jenkins-bot:

[mediawiki/core@master] filerepo: Enable JSON meta-data serialization by default

https://gerrit.wikimedia.org/r/773943

ReleaseTaggerBot added a project: MW-1.39-notes (1.39.0-wmf.7; 2022-04-11).Apr 11 2022, 7:00 AM

jcrespo mentioned this in T63111: Convert primary key integers and references thereto from int to bigint (unsigned).May 1 2022, 8:07 PM

Krinkle moved this task from Not yet to Ready for write-up on the Wikimedia-Performance-publish board.Sep 10 2022, 12:14 AM

tstarling mentioned this in T303784: InvalidArgumentException: Cannot add non-finite floats to ApiResult.Nov 13 2022, 11:36 PM

TheDJ mentioned this in T306409: Regression in processing PDFs on Wikimedia Commons: No width, height, no metadata.Nov 30 2022, 10:45 AM

	Ladsgroup
	Feb 20 2021, 9:39 AM

Address "image" table capacity problems by storing pdf/djvu text outside file metadataClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Address "image" table capacity problems by storing pdf/djvu text outside file metadata
Closed, ResolvedPublic
Actions

Related Objects
Search...