This happened twice in the last few weeks:
Oct 28 11:31:41 tools-db-2 mysqld[648831]: 2023-10-28 11:31:41 11 [ERROR] Read invalid event from master: 'Found invalid event in binary log', master could be corrupt but a more likely cause of this is a bug Oct 28 11:31:41 tools-db-2 mysqld[648831]: 2023-10-28 11:31:41 11 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595 Oct 28 11:31:41 tools-db-2 mysqld[648831]: 2023-10-28 11:31:41 11 [Note] Slave I/O thread exiting, read up to log 'log.043518', position 4; GTID position 0-2886731673-33522724637,2886731673-2886731673-4887243158,2886731301-2886731301-2985635060 Oct 28 11:31:41 tools-db-2 mysqld[648831]: 2023-10-28 11:31:41 11 [Note] master was tools-db-1.tools.eqiad1.wikimedia.cloud:3306
Nov 16 09:44:20 tools-db-2 mysqld[832013]: 2023-11-16 9:44:20 11 [ERROR] Read invalid event from master: 'Found invalid event in binary log', master could be corrupt but a more likely cause of this is a bug Nov 16 09:44:20 tools-db-2 mysqld[832013]: 2023-11-16 9:44:20 11 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Internal MariaDB error code: 1595 Nov 16 09:44:20 tools-db-2 mysqld[832013]: 2023-11-16 9:44:20 11 [Note] Slave I/O thread exiting, read up to log 'log.046905', position 4; GTID position 0-2886731673-33522724637,2886731673-2886731673-4887243158,2886731301-2886731301-3282445549 Nov 16 09:44:20 tools-db-2 mysqld[832013]: 2023-11-16 9:44:20 11 [Note] master was tools-db-1.tools.eqiad1.wikimedia.cloud:3306
The first time I think it was linked to T349695: [toolsdb] MariaDB process is killed by OOM killer (October 2023) but maybe it isn't, the times do not coincide with a OOM crash of the primary.
In both cases, it was enough to run START SLAVE; to resume the replication.