It seems that the replication lag between the tooldb master and the replica is too high.
Looking at the relpica the slave process is running:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: tools-db-1.tools.eqiad1.wikimedia.cloud
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: log.014138
Read_Master_Log_Pos: 42579366
Relay_Log_File: tools-db-2-relay-bin.021825
Relay_Log_Pos: 16572178
Relay_Master_Log_File: log.014015
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...But the master says that it has sent everything already:
root@tools-db-1:~# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 221909594 Server version: 10.4.28-MariaDB-log MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SHOW PROCESSLIST; +-----------+-----------------+--------------------+---------------------------------+-------------+---------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +-----------+-----------------+--------------------+---------------------------------+-------------+---------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+ ... | 84920738 | repl | 172.16.4.121:34792 | NULL | Binlog Dump | 3378008 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0.000 | ...
Tried to start the slave (START SLAVE;), but did nothing, so restarted the mariadb process in the replica, and that seemed to re-bootstrap the slave:
root@tools-db-2:~# systemctl restart mariadb root@tools-db-2:~# journalctl -u mariadb -n 10000 -f ... Jun 02 13:15:56 tools-db-2 mysqld[411785]: 2023-06-02 13:15:56 11 [Note] Slave I/O thread: connected to master 'repl@tools-db-1.tools.eqiad1.wikimedia.cloud:3306',replication starts at GTID position '0-2886731673-33522724637,2886731673-2886731673-4887243158,2886731301-2886731301-759137700'
On the primary side:
MariaDB [(none)]> show processlist; | 221945597 | repl | 172.16.4.121:34794 | NULL | Binlog Dump | 373 | Writing to net | NULL | 0.000 |
Let's see where that goes.
