Page MenuHomePhabricator

Recreate replica of matomo database on db1108
Closed, ResolvedPublic

Description

The replica of the matomo database is out of sync and has been for a little over a month.

image.png (82×1 px, 35 KB)

icinga

Incident details

When db1108 was rebooted on February 6th to pick up an updated kernel, (T304492#8589186) the SQL Slave threads for the matomo database were inadvertently left in a stopped state. The alerts were acknowledged, but the threads were not started within the window of time during which the binlogs were available on matomo1002 for the slave to catch up to the position of the master.

Impact

Since db1108 is the host from which backups of the matomo database are taken, the impact of this incident is that we do not currently have a good backup of the matomo database and the replica will need to be recreated.

Action required

We are looking to replace db1108 with new hardware running on bullseye, as mentioned in T304492#8696860
However, we should bring this replica of the matomo database back online and ensure that we have a good backup of matomo before we do that.

The process to follow will be roughly that described here:
https://wikitech.wikimedia.org/wiki/Setting_up_a_MySQL_replica

Related Objects

Event Timeline

Tagging DBA for awareness, but I'm happy to carry out the work myself.

Configured 6 hours of downtime for the matomo database checks on db1108 and the check for the correct number of mysqld processes.

image.png (197×600 px, 24 KB)

Size of /var/log/mysql on matomo1002 is 2.5 GB

Size of the InnoDB tablespace in /var/lib/mysql/piwik on matomo1002 is 1.4 GB

Size of /srv/sqldata.matomo on db1108 is 7.0 GB

I will use the innobackupex method outlined here: https://wikitech.wikimedia.org/wiki/Setting_up_a_MySQL_replica#Transferring_Data

Disabling puppet

btullis@db1108:/srv$ sudo disable-puppet btullis-T332128

Stopping the database service

btullis@db1108:/srv$ sudo systemctl stop mariadb@matomo.service

Backing up the existing replica

btullis@db1108:/srv$ sudo mv sqldata.matomo/ sqldata.matomo.bak

Creating a new directory for the replica.

btullis@db1108:/srv$ sudo mkdir sqldata.matomo
btullis@db1108:/srv$ sudo chown mysql.mysql sqldata.matomo

I suspect that I may need to use Transfer.py to copy the backup. The docs on https://wikitech.wikimedia.org/wiki/Setting_up_a_MySQL_replica#Transferring_Data indicate that we should just be able to use nc on a port, but I think it will be closed.

Let's keep coordinating on IRC like we've been doing an hour ago.
There's probably no need to use xtrabackup for those sizes.

Marostegui claimed this task.

What I am going to do is basically use mysqldump from the master and place it on db1108. The database is tiny, so this is more straightforward.
To do so:

root@db1108:/srv# rmdir sqldata.matomo
root@db1108:/srv# cp -r sqldata.matomo.bak sqldata.matomo
root@db1108:/srv#
root@db1108:/srv# chown -R mysql. sqldata.matomo
root@db1108:/srv# systemctl start mariadb@matomo
root@db1108:/srv# mysql -S /run/mysqld/mysqld.matomo.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.4.22-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:root@localhost [(none)]> set session sql_log_bin=0;
Query OK, 0 rows affected (0.000 sec)

mysql:root@localhost [(none)]> drop database piwik;
Query OK, 287 rows affected (0.306 sec)

Now on the master let's get a backup:

root@matomo1002:~# mysqldump --single-transaction -e --master-data=2 piwik > piwik.sql
root@matomo1002:~# ls -lh piwik.sql
-rw-r--r-- 1 root root 498M Mar 15 11:41 piwik.sql
root@matomo1002:~# cat piwik.sql  | grep -w CHANGE
-- CHANGE MASTER TO MASTER_LOG_FILE='matomo1002-bin.001472', MASTER_LOG_POS=36173193;

Now let's transfer that to db1108 and import it.

root@db1108:/srv# mysql -S /run/mysqld/mysqld.matomo.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 44
Server version: 10.4.22-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:root@localhost [(none)]> create database piwik;
Query OK, 1 row affected (0.001 sec)

mysql:root@localhost [(none)]>


root@db1108:~# ls -lh piwik.sql
-rw-r--r-- 1 root root 498M Mar 15 11:44 piwik.sql
root@db1108:~# cat piwik.sql | mysql -S /run/mysqld/mysqld.matomo.sock piwik
root@db1108:~#
root@db1108:~# mysql -S /run/mysqld/mysqld.matomo.sock piwik
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 75
Server version: 10.4.22-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:root@localhost [piwik]> stop slave;
Query OK, 0 rows affected, 1 warning (0.000 sec)

mysql:root@localhost [piwik]> reset slave all;
Query OK, 0 rows affected (0.002 sec)

mysql:root@localhost [piwik]> change master to master_host='matomo1002.eqiad.wmnet', master_user='repl', master_password='XX', master_log_pos=36173193, master_log_file='matomo1002-bin.001472';
Query OK, 0 rows affected (0.004 sec)

mysql:root@localhost [piwik]> start slave;
Query OK, 0 rows affected (0.001 sec)

mysql:root@localhost [piwik]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: matomo1002.eqiad.wmnet
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: matomo1002-bin.001472
           Read_Master_Log_Pos: 36382450
                Relay_Log_File: matomo-relay-bin.000002
                 Relay_Log_Pos: 209817
         Relay_Master_Log_File: matomo1002-bin.001472
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table: superset\_staging.%
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 36382450
               Relay_Log_Space: 210127
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: Yes
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 171974793
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: No
                   Gtid_IO_Pos:
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 190
1 row in set (0.000 sec)

mysql:root@localhost [piwik]> STOP SLAVE; CHANGE MASTER TO MASTER_USE_GTID=Slave_pos; START SLAVE;
Query OK, 0 rows affected (0.001 sec)

Query OK, 0 rows affected (0.002 sec)

Query OK, 0 rows affected (0.002 sec)

mysql:root@localhost [piwik]>

Replication is flowing now.

Nice work. Thanks, you're right, that's simpler than using xtrabackup.