Page MenuHomePhabricator

Replication broken on db1110
Closed, ResolvedPublic

Description

I have depooled db1110 after seeing it just got replication broken due to:

Last_SQL_Error: Error 'Index for table 'flaggedimages' is corrupt; try to repair it' on query. Default database: 'dewiki'. Query: 'INSERT /* FlaggedRevision::insert  */ IGNORE INTO `flaggedimages`

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui added a subscriber: Kormat.

So the error started:

Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace dewiki/flaggedimages page [page id: space=80, page number=619577]. You may have to recover from a backup.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
Aug 26 08:38:55 db1110 mysqld[4559]:  len 16384; hex 6943c56f00097439000974380009743a0000002b7a8481b645bf00000000000000000000005000273b9e809b000000003b2f0002009800990000000000000000000000000000000001110000000000000000000000000000000000000000010002001e696e66696d756d0006000
b000073757072656d756d1f0e2100000010006884e25495556e7465726e65686d656e5f416c62696f6e5f31372e31302e313931372e6a706700000001409ed8000001a407b032303039303631353232313834373469393537346a736c716b7930633436647563347663303732666e636e78781f0e2300000018006a84e254955761725f456e73696
76e5f6f665f4765726d616e795f313930332d313931382e73766700000001409ed8000001a407df3230303930353331313934333432316265616b7165387278647a623768723065336474767a7671376e6a7577781f0e0e00000020005584e2549b416b6b61507269736f6e2e6a706700000001409ed8000001a4081032303037303830353135333
331326f7379773165667477613164686466336e6a7830636e676d75697a766462341f0e0a00040028005184e2549b4261676461642e6a706700000001409ed8000001a4082c323030383032303231373537343536393175696a6d3578666a743873323339336c6e30636e36656d7a616575611f0e1800000030005f84e2549b4261686169735f696
e5f41647269616e6f706c652e6a706700000001409ed8000001a408443230303830373237303634303531396e666b307577617361766d6c3578367535766330326f75766c356d72376f1f0e1200000038005984e2549b42616861756c6c61682d626f726e2e6a706700000001409ed8000001a4086a32303038313231323136343632326533626a7
963386a3737773772776f6c6b62633836646471647036306970631f0e1600000040005d84e2549b42616861756c6c61682d70617373706f72742e6a706700000001409ed8000001a4088a32303035303231333136333434373464646b397273347733753832347038306674766e6a6538333931646e316c1f0e0e00040048005584e2549b4261686
1756c6c6168332e6a706700000001409ed8000001a408ae32303035303231333136333034343930706f6e727278787230697269386c6e64366269397935306f6571786f6c1f0e1c00000050006384e2549b42616861756c6c61685f56657262616e6e756e67737765672e706e6700000001409ed8000001a408ca323030383034323831373137313733767264676b766c7363707a6a786b6f706c7a63796f797439326e786e6d661f0e1900000058006084e2549b42616861756c6c61685f66726f6d5f6d696c6c65722e6a706700000001409ed8000001a408f43230303530393031323033393531316265327339327035747a33367068657a786f7475646d787178696e3974391f0e0900000060005084e2549b4261686a692e6a706700000001409ed8000001a4091b323030373038303531353338343363747a73676e6d676b7865777665796b6469376670663777356d786c6a6d791f0e1000040068005784e2549b436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a409323230303730393136303134333239397a683372766a3562737a62656c3933656935366b66636f377136766336751f0e1200000070005984e2549b4561726c795f4261686169732d312e6a706700000001409ed8000001a4095032303037303631353136323231326d7130626b366e34616f7530696a32736861733267317a74783278776d7a361f0e1500000078005c84e2549b4d617a72616968486f757365526f7365732e6a706700000001409ed8000001a409703230303730383035313534353539316379646b336d38747237713668357977716e616e6f776b3131396e3862371f0e2f00000080007684e2549b5072657a696f73695f2d5f5374726164c4835f64696e5f436f6e7374616e74696e6f706f6c2c5f313836382e6a706700000001409ed8000001a4099332303036303932313233303233376a7063366d666565753469706e336d6d78357474786866623478796f7372341f0e1900040088006084e2549b52696476616e2d67617264656e2d626167686461642e6a706700000001409ed8000001a409d0323030373131313431333439343431657a7266757a68657435736531716d6d66663674317a3970346c746c68301f0e1700000090005e84e2549b536872696e652d6f662d42616861756c6c61682e6a706700000001409ed8000001a409f7323030373032323730323034313571707a79326d71357872737a6564617733656c67786a3131736b7a666c70631f0e1400000098005b84e2549b54656872616e2d62616861756c6c61682e6a706700000001409ed8000001a40a1c32303038313231323136343835383972713238316230366468636162653334667661707a6f7a3377796d7375321f0e12000000a0005984e2549b57696b6971756f74652d6c6f676f2e73766700000001409ed8000001a40a3e323030363131323431373338343866713862676f3669746f3077666e756377683564777a706f3479337431776c1f0e11000400a8005884e2549e427261756e66c3a4756c6532332e4a504700000001409ed8000001a40a5e32303130303931363134333431363973727a69303537343963663465346d38317a3131686a77336a72626c32371f0e10000000b0005784e2549e436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a40a7d3230303730393136303134333239397a683372766a3562737a62656c3933656935366b66636f377136766336751f0e24000000b8006b84e2549e48657465726f6261736964696f6e2e616e6e6f73756d2e2d2e6c696e647365792e6a706700000001409ed8000001a40a9b323030383032313431353431353830616b3667796961366737676538627a72757035347a6364386f726a3937691f0e25000000c0006c84e2549e48657465726f6261736964696f6e2e616e6e6f73756d322e2d2e6c696e647365792e6a706700000001409ed8000001a40acd32303038303231343135343230396169356a6a6d316c6577313638686e6d72396763666b64327231706a796d321f0e0d000400c8005484e2549e526f74666165756c652e6a706700000001409ed8000001a40b003230303730343133303731373439323869366a3438623635706465736a6764786c6c776471756c3439786c6d351f0e11000000d0005884e2549e5775727a656c73636877616d6d2e6a706700000001409ed8000001a40b1b32303037303431333038353935366a7331733070723978717a663870377930636362723173676b7836646231691f0e1c000000d8006384e254a4466169727974616c655f54726173685f5175657374696f6e2e73766700000001409ed8000001a40b3a3230303830333138303234343330663035636a7270696778753671683561396e6533716e73336c7637367277361f0e10000000e0005784e254a452617069645f64656c6574652e73766700000001409ed8000001a40b6432303130303133303139313430313031696665666276716a3767366f3274706c76386465696a38386b783067341f0e10000400e8005784e254a6436f6d6d6f6e732d6c6f676f2e73766700000001409ed8000001a40b8232303037303931363031343332393:

                                                                                                                                                                      p8 7!5 3 2 0 /%- ,A* );' & $ " !d   T   c   R   -             V
Aug 26 08:38:55 db1110 mysqld[4559]:    j   v     c  U z   ;
Aug 26 08:38:55 db1110 mysqld[4559]: InnoDB: End of page dump
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Uncompressed page, stored checksum in field1 1766049135, calculated checksums for field1: crc32 49239442, innodb 1766049135,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 49239442, calculated checksums for field2: crc32 49239442, innodb 49239442, none 3735928559,  page LSN 43 2055504310, low 4 bytes of LSN at page end 2055504310, page number (if stored to page already) 619577, space id (if created with >= MySQL-4.1.1 and stored already) 80
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page may be an index page where index id is 273
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Index 273 is `PRIMARY` in table `dewiki`.`flaggedimages`
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace dewiki/flaggedimages page [page id: space=80, page number=619577]. You may have to recover from a backup.
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
Aug 26 08:38:55 db1110 mysqld[4559]: 2020-08-26  8:38:55 495344457 [Note] InnoDB: Uncompressed page, stored checksum in field1 1766049135, calculated checksums for field1: crc32 49239442, innodb 1766049135,  page type 17855 == INDEX.none 3735928559, stored checksum in fie
ld2 49239442, calculated checksums for field2: crc32 49239442, innodb 49239442, none 3735928559,  page LSN 43 2055504310, low 4 bytes of LSN at page end 2055504310, page number (if stored to page already) 619577, space id (if created with >= MySQL-4.1.1 and stored already
) 80

These errors don't look related to the labsdb crashes.
Nothing on HW logs.

@Kormat can you recover this host from backups, run an apt-get upgrade (so the new version gets installed) and reboot it?

Recovery done, mariadb upgraded (and mysql_upgrade run before and after mariadb upgrade), rebooted. It's now up and catching up on replication.

Mentioned in SAL (#wikimedia-operations) [2020-08-26T11:58:51Z] <kormat@cumin1001> dbctl commit (dc=all): 'Start repooling db1110 T261276', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json

Mentioned in SAL (#wikimedia-operations) [2020-08-26T12:21:00Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repooling db1110 @ 20% T261276', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json

Mentioned in SAL (#wikimedia-operations) [2020-08-26T12:47:01Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repooling db1110 @ 30% T261276', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json

Mentioned in SAL (#wikimedia-operations) [2020-08-26T13:07:36Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repooling db1110 @ 50% T261276', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json

Mentioned in SAL (#wikimedia-operations) [2020-08-26T13:17:33Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repooling db1110 @ 75% T261276', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json

Mentioned in SAL (#wikimedia-operations) [2020-08-26T13:37:53Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repooling db1110 @ 100% T261276', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json

Host is fully repooled, icinga is all green.