Page MenuHomePhabricator

deployment-db05 needs replacing following disk corruption
Closed, ResolvedPublic

Description

deployment-db05, the current Beta cluster database master, has disk issues

[ 2886.337845] EXT4-fs error (device vda3): ext4_validate_block_bitmap:384: comm kworker/u16:0: bg 1: bad block bitmap checksum
[12341.153660] EXT4-fs error (device vda3): ext4_validate_block_bitmap:384: comm kworker/u16:2: bg 113: bad block bitmap checksum

Event Timeline

Majavah triaged this task as Unbreak Now! priority.Mar 9 2021, 5:50 PM
Majavah created this task.

Mentioned in SAL (#wikimedia-operations) [2021-03-09T18:02:39Z] <marxarelli> deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:04:01Z] <marxarelli> deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:09:07Z] <Majavah> set deployment-db05 to read-only to avoid issues with T276968

db06 slave status looks worrying:

root@127.0.01[(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: deployment-db05
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: deployment-db05-bin.000062
          Read_Master_Log_Pos: 527654541
               Relay_Log_File: deployment-db06-relay-bin.000180
                Relay_Log_Pos: 40571
        Relay_Master_Log_File: deployment-db05-bin.000062
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 527654541
              Relay_Log_Space: 527655445
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position; the first event 'deployment-db05-bin.000062' at 527654541, the last event read from 'deployment-db05-bin.000062' at 4, the last byte read from 'deployment-db05-bin.000062' at 4.'
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2886731178
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:20:53Z] <marxarelli> disabled puppet on deployment-db06 and started mysqldump (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:21:03Z] <Majavah> create deployment-db07 as g2.cores8.ram16.disk160 Buster T276968

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:25:27Z] <marxarelli> "View 'labswiki.tag_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them" when using LOCK TABLES" during mysqldump on db06 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:38:21Z] <Majavah> installing mariadb 10.4 via role::mariadb::beta to db07 T276968

Mentioned in SAL (#wikimedia-releng) [2021-03-09T18:49:16Z] <marxarelli> restarting db dump on db06 mysqldump -h 127.0.0.1 --events --routines --triggers --all-databases -f --single-transaction (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T19:50:26Z] <marxarelli> restoring database dump on deployment-db07 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T19:56:39Z] <marxarelli> deleting deployment-db05 to free up quota for new replica (T276968)

Change 670273 had a related patch set uploaded (by Majavah; owner: Majavah):
[operations/mediawiki-config@master] betacluster: replace db05 with db07

https://gerrit.wikimedia.org/r/670273

Mentioned in SAL (#wikimedia-releng) [2021-03-09T19:59:55Z] <marxarelli> creating new instance deployment-db08 to use as new beta replica db (T276968)

Change 670277 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] beta: Switch beta to read only on mediawiki level

https://gerrit.wikimedia.org/r/670277

Change 670277 merged by jenkins-bot:
[operations/mediawiki-config@master] beta: Switch beta to read only on mediawiki level

https://gerrit.wikimedia.org/r/670277

Mentioned in SAL (#wikimedia-releng) [2021-03-09T20:33:50Z] <Majavah> install mariadb on deployment-db08 T276968

Mentioned in SAL (#wikimedia-releng) [2021-03-09T20:39:42Z] <marxarelli> doing --skip-grant-tables on deployment-db08 and creating a new root@127.0.0.1 user (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T20:53:32Z] <marxarelli> restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-09T21:54:40Z] <marxarelli> restoring from db06 dump on db07 and db08 following DROP VIEW IF EXISTS user workaround (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-10T00:10:15Z] <marxarelli> restore of db06 failed yet again. trying mariabackup db06 -> db07 instead of mysqldump (after fixing docs/usage of the former) (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-10T00:28:26Z] <marxarelli> mariadb successfully started on db07 following transfer/extraction using mariabackup and following mysql_upgrade (T276968)

dduvall renamed this task from deployment-db05 disk issues to deployment-db05 needs replacing following disk corruption.Mar 10 2021, 12:54 AM

The current status of this is:

  • New Buster based instances deployment-db07 and deployment-db08 were launched with role::mariadb::beta to serve as replacements (the idea being to replace both db05 and db06 and upgrade to Buster in the process of replacement)
  • Data has been copied using mariabackup --innobackupex --stream=xbstream /srv/sqldata --host=127.0.0.1 --slave-info from deployment-db06 to both the new instances.
  • Following data transfer, innodb log files were applied using mariabackup --innobackupex --apply-log --use-memory=10G /srv/sqldata
  • MariaDB was started and schema upgrade (for upgrading 10.1 to 10.4) was performed using /opt/wmf-mariadb104/bin/mysql_upgrade --host=127.0.0.1
  • Following restart, systemctl status mariadb reports active, however, logs report innodb corruption
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Page [page id: space=0, page number=2424] log sequence number 590532326359 is in the future! Current system log sequence number 589662283134.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Page [page id: space=0, page number=2425] log sequence number 590532326359 is in the future! Current system log sequence number 589662283134.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Page [page id: space=0, page number=2426] log sequence number 590532597538 is in the future! Current system log sequence number 589662283134.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Page [page id: space=0, page number=2428] log sequence number 590532729103 is in the future! Current system log sequence number 589662283134.
Mar 10 00:48:38 deployment-db07 mysqld[30569]: 2021-03-10  0:48:38 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

I'm not totally sure how to proceed from here. Should I abandon the attempt to upgrade to buster and mariadb 10.4? Did I perform the backup incorrectly? DBA could use your help please.

greg added a subscriber: greg.

Adding in the DBA tag explicitly so it's seen...

I'm not totally sure how to proceed from here. Should I abandon the attempt to upgrade to buster and mariadb 10.4? Did I perform the backup incorrectly? DBA could use your help please.

Those erros aren't a good thing unfortunately, it looks like InnoDB is very corrupted.

Do you still have a logical dump (done via mysqldump) available?

If you don't another possibility would be to try to set: innodb_force_recovery = 1 on my.cnf before starting that server and then take a mysqldump from all those tables (avoiding the mysql system tables) and then place that one into a new host.

Regarding the following comment:

Mentioned in SAL (#wikimedia-releng) [2021-03-09T20:53:32Z] <marxarelli> restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 (T276968)

This is unfortunately something known and that affects also roles and other systems grants. There's a fix implemented but not on 10.1 as far as I know, only on 10.2 and newer: https://jira.mariadb.org/browse/MDEV-23630
So we should either hack the mysqldump file to exclude those and/or not include them (do not specify --all-databases) and then get the grants via pt-show-grants or simply by querying the mysql.user table and getting that to a text file and them manually importing them on the new host)

If db06 slave is still up, why not taking a mysqldump from that one?

The mysqldump result is located (at least) on deployment-db07:/srv/backup. db06 is still up, yes.

Another option would be to:

  • Assume db06 would be the new master. Switch mysql on that host off, copy its datadir to another new host and connect that new host to db06 as a slave.
  • Assume that maybe db06 might not have the exact same data as db05 if everything wasn't entirely replicated at the moment of the crash (depending on the innodb options on the master)

FYI: The root cause for the corruption could be a force-reboot force-migration that I had to perform on this host while operating the underlying hypervisor for T275753.

  • Assume that maybe db06 might not have the exact same data as db05 if everything wasn't entirely replicated at the moment of the crash (depending on the innodb options on the master)

It's the beta cluster, I don't think we'd care about data loss as long as the database ends up in a consistent state.

  • Assume that maybe db06 might not have the exact same data as db05 if everything wasn't entirely replicated at the moment of the crash (depending on the innodb options on the master)

It's the beta cluster, I don't think we'd care about data loss as long as the database ends up in a consistent state.

In that case, that'd be my approach (would both hosts have the same version?).

  • Stop mysql on the slave
  • Copy the datadir to a new host
  • Start mysql on both hosts
  • Connect the slave to db06

would both hosts have the same version

How important is this? The old hosts db05 (now-deleted with disk corruption) and db06 have MariaDB 10.1 on Stretch, new nodes were created on Buster / MariaDB 10.4.

would both hosts have the same version

How important is this? The old hosts db05 (now-deleted with disk corruption) and db06 have MariaDB 10.1 on Stretch, new nodes were created on Buster / MariaDB 10.4.

That should be fine, as long as the master has the older version.
So 10.1 should be the master and 10.4 the slave.

Once the data is copied to the new host (10.4), mysql_upgrade needs to be executed as soon as the mysql daemon is up.

Mentioned in SAL (#wikimedia-releng) [2021-03-10T14:52:52Z] <Majavah> delete deployment-db08 /srv/sqldata to attempt procedure in https://phabricator.wikimedia.org/T276968#6900199

Mentioned in SAL (#wikimedia-releng) [2021-03-10T15:22:51Z] <Urbanecm> rsync deployment-db06:/srv/sqldata to deployment-db08:/srv/sqldata in a tmux session on deploymdeployment-db08 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-10T15:54:17Z] <Urbanecm> Start mariadb on db08 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-10T15:54:30Z] <Urbanecm> Start root@deployment-db08:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1 (T276968)

Mentioned in SAL (#wikimedia-releng) [2021-03-10T15:57:58Z] <Majavah> set deployment-db06 as readonly from mysql side T276968

Mentioned in SAL (#wikimedia-releng) [2021-03-10T16:06:29Z] <Urbanecm> start root@deployment-db07:/srv/sqldata.db06# rsync --progress -r deployment-db06:/srv/sqldata/ . (T276968)

FYI: The root cause for the corruption could be a force-reboot force-migration that I had to perform on this host while operating the underlying hypervisor for T275753.

I suspect this too. The timing of the migration and the errors was quite close.

Mentioned in SAL (#wikimedia-releng) [2021-03-10T16:12:31Z] <Majavah> deployment-db08 CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='redacted', MASTER_PORT=3306, MASTER_HOST='deployment-db06.deployment-prep.eqiad1.wikimedia.cloud', MASTER_LOG_FILE='deployment-db06-bin.000059', MASTER_LOG_POS=522469730; (T276968)

Thanks for the help, everyone. I would still like to get off of db06 if possible at the end of this process since we have to finish the buster upgrade at some point anyhow. If we can get both db07 and db08 to reach the same point in the binlog from db06, can we simply:

  • STOP SLAVE on db07 and db08
  • FLUSH TABLES WITH READ LOCK on db07. double check SHOW SLAVE STATUS again to verify same position as db08
  • CHANGE MASTER on db08 to replicate from db07
  • START SLAVE on db08
  • configure mediawiki-config to use db07 as master (read load 0) and db08 as replica

Does that sound right?

Forgot the UNLOCK TABLES on db07 :)

@dduvall if possible I would also set read_only=ON on the current master (I guess db06) to be fully sure no writes are happening.
If no writes are happening the output of show master status\G on the master and the output for show slave status\G on both slaves should be the same. If that is the case, then you know that they are all stopped at the same point.

Mentioned in SAL (#wikimedia-releng) [2021-03-10T16:45:53Z] <Urbanecm> root@deployment-db07:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1 # T276968

From @Marostegui in IRC: "To be honest, I would do it in different steps, set db06, make sure all is fine and the slave replicates just fine. And once that is fully ok, then proceed with marxarelli's plan"

Let's do this, i.e.

  • Following successful setup of db07 and db08 as replicas of db06
  • Set db06 to read/write
  • Change mw-config to use db06 as master, db07 and db08
  • Once we verify beta is working again, schedule a time later in the day or week to switch to db07 master, db08 replica

Mentioned in SAL (#wikimedia-releng) [2021-03-10T16:49:37Z] <Majavah> add deployment-db07 as a replica of db06 for T276968

Mentioned in SAL (#wikimedia-releng) [2021-03-10T16:50:02Z] <Majavah> reset slave; on new master deployment-db06 T276968

Change 670273 merged by jenkins-bot:
[operations/mediawiki-config@master] betacluster: add db[07-08], promote db06, remove db05

https://gerrit.wikimedia.org/r/670273

Mentioned in SAL (#wikimedia-releng) [2021-03-10T17:03:14Z] <Majavah> make deployment-db06 read-write T276968

Change 670364 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "beta: Switch beta to read only on mediawiki level"

https://gerrit.wikimedia.org/r/670364

Change 670364 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "beta: Switch beta to read only on mediawiki level"

https://gerrit.wikimedia.org/r/670364

Mentioned in SAL (#wikimedia-releng) [2021-03-10T17:09:46Z] <Majavah> set beta cluster mediawiki as read write on mw config (T276968)

Urbanecm removed dduvall as the assignee of this task.

This was done. Thanks everyone, especially @Majavah who de-facto leaded this change!

Mentioned in SAL (#wikimedia-releng) [2021-03-11T13:41:06Z] <Majavah> stop slave on deployment-db06 T276968