Page MenuHomePhabricator

db1118 mysql process crashed (mysql 8.0 test host)
Closed, DeclinedPublic

Description

Not a huge issue, but important to notice (MySQL 8.0.12):

1Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.083670Z 41 [System] [MY-010558] [Server] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <59f31a9e-4a10-11e8-86b4-801844eb9118> : Error
2Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.083859Z 41 [Warning] [MY-010897] [Server] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
3Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.133869Z 41 [Warning] [MY-010584] [Repl] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A probable cause is that the variable is not supported on the master (version: 5.5.5-10.1.34-MariaDB), even though it is on the slave (version: 8.0.12), Error_code: MY-001193
4Sep 17 08:48:16 db1118 systemd[1]: mysql.service: Main process exited, code=killed, status=11/SEGV
5Sep 17 08:48:16 db1118 systemd[1]: mysql.service: Unit entered failed state.
6Sep 17 08:48:16 db1118 systemd[1]: mysql.service: Failed with result 'signal'.
7Sep 17 08:48:21 db1118 systemd[1]: mysql.service: Service hold-off time over, scheduling restart.
8Sep 17 08:48:21 db1118 systemd[1]: Stopped mysql database server.
9Sep 17 08:48:21 db1118 systemd[1]: Starting mysql database server...
10Sep 17 08:48:21 db1118 mysqld[18383]: 2018-09-17T08:48:21.882284Z 0 [System] [MY-010116] [Server] /opt/wmf-mysql80/bin/mysqld (mysqld 8.0.12) starting as process 18383
11Sep 17 08:49:10 db1118 mysqld[18383]: 2018-09-17T08:49:10.092558Z 0 [ERROR] [MY-011071] [Server] Function 'auth_socket' already exists
12Sep 17 08:49:10 db1118 mysqld[18383]: 2018-09-17T08:49:10.092615Z 0 [Warning] [MY-010736] [Server] Couldn't load plugin named 'auth_socket' with soname 'auth_socket.so'.
13Sep 17 08:49:14 db1118 mysqld[18383]: 2018-09-17T08:49:14.149320Z 0 [System] [MY-010229] [Server] Starting crash recovery...
14Sep 17 08:49:14 db1118 mysqld[18383]: 2018-09-17T08:49:14.149394Z 0 [System] [MY-010232] [Server] Crash recovery finished.
15Sep 17 08:49:17 db1118 mysqld[18383]: 2018-09-17T08:49:17.872484Z 0 [Warning] [MY-010068] [Server] CA certificate /etc/ssl/certs/Puppet_Internal_CA.pem is self signed.
16Sep 17 08:49:17 db1118 mysqld[18383]: 2018-09-17T08:49:17.950486Z 0 [Warning] [MY-010604] [Server] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=db1118-relay-bin' to avoid this problem.
17Sep 17 08:49:21 db1118 mysqld[18383]: 2018-09-17T08:49:21.218893Z 0 [System] [MY-010931] [Server] /opt/wmf-mysql80/bin/mysqld: ready for connections. Version: '8.0.12' socket: '/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL).
18Sep 17 08:49:21 db1118 systemd[1]: Started mysql database server.

Also replication issues? It was restarted on the 14th.

It may be a good time to reimage into mariadb 10.3.

Related Objects

Event Timeline

I started an alter table early in the morning on eqiad master for externallinks (130G) and recentchanges, but that went thru at around lunch time, so I don't think it could be the cause of the crash, but maybe put some more pressure on the host or something....or it could just be coincidence.

jcrespo renamed this task from db1118 crashed (mysql 8.0 test host) to db1118 mysql process crashed (mysql 8.0 test host).Sep 18 2018, 9:18 AM

journalctl -u mysql shows errors:

Sep 14 15:13:50 db1118 mysqld[15915]: 2018-09-14T15:13:50.377830Z 41 [Warning] [MY-010897] [Server] Storing MySQL user name or password information in the master info repository
Sep 14 15:13:50 db1118 mysqld[15915]: 2018-09-14T15:13:50.412765Z 41 [System] [MY-010562] [Server] Slave I/O thread for channel '': connected to master 'repl@db1067.eqiad.wmnet:
Sep 14 15:13:50 db1118 mysqld[15915]: 2018-09-14T15:13:50.425074Z 41 [Warning] [MY-010584] [Repl] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A pr
Sep 15 12:47:37 db1118 mysqld[15915]: 2018-09-15T12:47:37.183646Z 41 [ERROR] [MY-010557] [Server] Error reading packet from server for channel '': Lost connection to MySQL serve
Sep 15 12:47:37 db1118 mysqld[15915]: 2018-09-15T12:47:37.183688Z 41 [Warning] [MY-010897] [Server] Storing MySQL user name or password information in the master info repository
Sep 15 12:47:37 db1118 mysqld[15915]: 2018-09-15T12:47:37.215511Z 41 [Warning] [MY-010584] [Repl] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A pr
Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.083670Z 41 [System] [MY-010558] [Server] Slave for channel '': received end packet from server due to dump thread being
Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.083859Z 41 [Warning] [MY-010897] [Server] Storing MySQL user name or password information in the master info repository
Sep 17 08:47:37 db1118 mysqld[15915]: 2018-09-17T08:47:37.133869Z 41 [Warning] [MY-010584] [Repl] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A pr

These errors look like the replication in not compatible:

Sep 18 09:29:37 db1118 mysqld[18383]: 2018-09-18T09:29:37.803219Z 23452 [Warning] [MY-010584] [Repl] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A probable cause is that the variable is not supported on the master (version: 5.5.5-10.1.34-MariaDB), even though it is on the slave (version: 8.0.12), Error_code: MY-001193
Sep 18 09:29:37 db1118 mysqld[18383]: 2018-09-18T09:29:37.807484Z 23452 [Warning] [MY-010445] [Server] QUERY(COMMIT or ROLLBACK) or XID_LOG_EVENT is not expected in an event stream outside a transaction.
Sep 18 09:29:37 db1118 mysqld[18383]: 2018-09-18T09:29:37.807525Z 23452 [Warning] [MY-010590] [Server] An unexpected event sequence was detected by the IO thread while queuing the event received from master 'db1067-bin.001584' binary log file, at position 850855330.

3 minutes later it crashed with sig11

Sep 18 09:32:59 db1118 systemd[1]: mysql.service: Main process exited, code=killed, status=11/SEGV

My suggestion for root cause is that mysql works well as long as gtid is not enabled on its master (it was enabled around some days of the crash). Replicating from a non-gtid enabled master worked well for a long time. I have not proof of this, it should be checked.

There is also the fact that I executed an ALTER table on the DC master for this host (db1067) with replication. It was done early that day:

17th Sept
05:28 marostegui: Deploy schema change on s1 eqiad master (db1067) - T67448 T114117 T51191

That alter probably took like 3-4h to finish on the master.

There is also the fact that I executed an ALTER table

It would fit in my theory because it would be (maybe) the first event with GTID that the host received in a long time.

There is also the fact that I executed an ALTER table

It would fit in my theory because it would be (maybe) the first event with GTID that the host received in a long time.

Also the times of the crashes. As maybe it crashed when the first one got replicated (there were 3 different ones).
@Banyek maybe worth checking the binlogs right before the crash as discussed yesterday on IRC

I have needed to re-image this host to work on an important external request, so unfortunately we won't be able to debug this crash any further.
Will close this task for now (as won't fix), at least we have the error logged for the next time we have a host running MySQL 8.0

Sorry for the inconveniences

To be fair, it was clear this was due to enabling gtid on its master, something it is clearly not supported- so no more work was possible.