Page MenuHomePhabricator

db1170 mysql process crashed
Closed, ResolvedPublic

Description

Jul 18 11:22:14 db1170 mysqld[2100]: 210718 11:22:14 [ERROR] mysqld got signal 7 ;
Jul 18 11:22:14 db1170 mysqld[2100]: This could be because you hit a bug. It is also possible that this binary
Jul 18 11:22:14 db1170 mysqld[2100]: or one of the libraries it was linked against is corrupt, improperly built,
Jul 18 11:22:14 db1170 mysqld[2100]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jul 18 11:22:14 db1170 mysqld[2100]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jul 18 11:22:14 db1170 mysqld[2100]: We will try our best to scrape up some info that will hopefully help
Jul 18 11:22:14 db1170 mysqld[2100]: diagnose the problem, but since we have already crashed,
Jul 18 11:22:14 db1170 mysqld[2100]: something is definitely wrong and this may fail.
Jul 18 11:22:14 db1170 mysqld[2100]: Server version: 10.4.19-MariaDB-log
Jul 18 11:22:14 db1170 mysqld[2100]: key_buffer_size=1048576
Jul 18 11:22:14 db1170 mysqld[2100]: read_buffer_size=131072
Jul 18 11:22:14 db1170 mysqld[2100]: max_used_connections=70
Jul 18 11:22:14 db1170 mysqld[2100]: max_threads=2010
Jul 18 11:22:14 db1170 mysqld[2100]: thread_count=23
Jul 18 11:22:14 db1170 mysqld[2100]: It is possible that mysqld could use up to
Jul 18 11:22:14 db1170 mysqld[2100]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4622922 K  bytes of memory
Jul 18 11:22:14 db1170 mysqld[2100]: Hope that's ok; if not, decrease some variables in the equation.
Jul 18 11:22:14 db1170 mysqld[2100]: Thread pointer: 0x7eeb5c0014f8
Jul 18 11:22:14 db1170 mysqld[2100]: Attempting backtrace. You can use the following information to find out
Jul 18 11:22:14 db1170 mysqld[2100]: where mysqld died. If you see no messages after this, something went
Jul 18 11:22:14 db1170 mysqld[2100]: terribly wrong...
Jul 18 11:22:14 db1170 mysqld[2100]: stack_bottom = 0x7eed143ab698 thread_stack 0x30000
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x55819bacad9e]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x54d)[0x55819b5bae4d]
Jul 18 11:22:15 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f1c506a2730]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xae4cdc)[0x55819b7ddcdc]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xba4948)[0x55819b89d948]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xba51c1)[0x55819b89e1c1]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb4310a)[0x55819b83c10a]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb483e1)[0x55819b8413e1]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb49e3b)[0x55819b842e3b]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb2418c)[0x55819b81d18c]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xa6a900)[0x55819b763900]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0xbb)[0x55819b5c704b]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z21mysql_execute_commandP3THD+0x31fc)[0x55819b3ae0dc]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1e9)[0x55819b3b29e9]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x750)[0x55819b6
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0x60defb)[0x55819b306efb]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x16a2)[0x55819b30f9d2]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xd8289b)[0x55819ba7b89b]
Jul 18 11:22:16 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f1c50697fa3]
Jul 18 11:22:16 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f1c502bc4cf]
Jul 18 11:22:16 db1170 mysqld[2100]: Trying to get some variables.
Jul 18 11:22:16 db1170 mysqld[2100]: Some pointers may be invalid and cause the dump to abort.
Jul 18 11:22:16 db1170 mysqld[2100]: Query (0x7eeb5e01d1bb): UPDATE /* HTMLCacheUpdateJob::invalidateTitles  */  `page` SET page_touched = '202
Jul 18 11:22:16 db1170 mysqld[2100]: Connection ID (thread ID): 37
Jul 18 11:22:16 db1170 mysqld[2100]: Status: NOT_KILLED
Jul 18 11:22:16 db1170 mysqld[2100]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=o
Jul 18 11:22:16 db1170 mysqld[2100]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Jul 18 11:22:16 db1170 mysqld[2100]: information that should help you find out what is causing the crash.
Jul 18 11:22:16 db1170 mysqld[2100]: Writing a core file...
Jul 18 11:22:16 db1170 mysqld[2100]: Working directory at /srv/sqldata.s2
Jul 18 11:22:16 db1170 mysqld[2100]: Resource Limits:
Jul 18 11:22:16 db1170 mysqld[2100]: Limit                     Soft Limit           Hard Limit           Units
Jul 18 11:22:16 db1170 mysqld[2100]: Max cpu time              unlimited            unlimited            seconds
Jul 18 11:22:16 db1170 mysqld[2100]: Max file size             unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max data size             unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max stack size            8388608              unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max core file size        0                    0                    bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max resident set          unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max processes             2058337              2058337              processes
Jul 18 11:22:16 db1170 mysqld[2100]: Max open files            200001               200001               files
Jul 18 11:22:16 db1170 mysqld[2100]: Max locked memory         65536                65536                bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max address space         unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max file locks            unlimited            unlimited            locks
Jul 18 11:22:16 db1170 mysqld[2100]: Max pending signals       2058337              2058337              signals
Jul 18 11:22:16 db1170 mysqld[2100]: Max msgqueue size         819200               819200               bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max nice priority         0                    0
Jul 18 11:22:16 db1170 mysqld[2100]: Max realtime priority     0                    0
Jul 18 11:22:16 db1170 mysqld[2100]: Max realtime timeout      unlimited            unlimited            us
Jul 18 11:22:16 db1170 mysqld[2100]: Core pattern: /var/tmp/core/core.%h.%e.%p.%t
Jul 18 11:22:17 db1170 systemd[1]: mariadb@s2.service: Main process exited, code=killed, status=7/BUS
Jul 18 11:22:17 db1170 systemd[1]: mariadb@s2.service: Failed with result 'signal'.
Jul 18 11:22:22 db1170 systemd[1]: mariadb@s2.service: Service RestartSec=5s expired, scheduling restart.
Jul 18 11:22:22 db1170 systemd[1]: mariadb@s2.service: Scheduled restart job, restart counter is at 1.
Jul 18 11:22:22 db1170 systemd[1]: Stopped mariadb database server.
Jul 18 11:22:22 db1170 systemd[1]: Starting mariadb database server...
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] /opt/wmf-mariadb104/bin/mysqld (mysqld 10.4.19-MariaDB-log) starting as proce
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [ERROR] mysqld: Plugin 'unix_socket' already installed
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [ERROR] Couldn't load plugin 'unix_socket' from 'auth_socket.so'.
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] mysqld: Aria engine: starting recovery
Jul 18 11:22:22 db1170 mysqld[9916]: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 3 2 1 0
Jul 18 11:22:22 db1170 mysqld[9916]:  (0.0 seconds);
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] mysqld: Aria engine: recovery done
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Warning] InnoDB: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option ma
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Using Linux native AIO
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Uses event mutexes
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Jul 06 06:43:58 db1170 mysqld[2100]: 2021-07-06  6:43:58 36 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db11
Jul 06 06:43:58 db1170 mysqld[2100]: 2021-07-06  6:43:58 36 [ERROR] Slave I/O: error reconnecting to master 'repl@db1122.eqiad.wmnet:3306' - re
Jul 18 11:22:14 db1170 mysqld[2100]: 210718 11:22:14 [ERROR] mysqld got signal 7 ;
Jul 18 11:22:14 db1170 mysqld[2100]: This could be because you hit a bug. It is also possible that this binary
Jul 18 11:22:14 db1170 mysqld[2100]: or one of the libraries it was linked against is corrupt, improperly built,
Jul 18 11:22:14 db1170 mysqld[2100]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jul 18 11:22:14 db1170 mysqld[2100]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jul 18 11:22:14 db1170 mysqld[2100]: We will try our best to scrape up some info that will hopefully help
Jul 18 11:22:14 db1170 mysqld[2100]: diagnose the problem, but since we have already crashed,
Jul 18 11:22:14 db1170 mysqld[2100]: something is definitely wrong and this may fail.
Jul 18 11:22:14 db1170 mysqld[2100]: Server version: 10.4.19-MariaDB-log
Jul 18 11:22:14 db1170 mysqld[2100]: key_buffer_size=1048576
Jul 18 11:22:14 db1170 mysqld[2100]: read_buffer_size=131072
Jul 18 11:22:14 db1170 mysqld[2100]: max_used_connections=70
Jul 18 11:22:14 db1170 mysqld[2100]: max_threads=2010
Jul 18 11:22:14 db1170 mysqld[2100]: thread_count=23
Jul 18 11:22:14 db1170 mysqld[2100]: It is possible that mysqld could use up to
Jul 18 11:22:14 db1170 mysqld[2100]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4622922 K  bytes of memory
Jul 18 11:22:14 db1170 mysqld[2100]: Hope that's ok; if not, decrease some variables in the equation.
Jul 18 11:22:14 db1170 mysqld[2100]: Thread pointer: 0x7eeb5c0014f8
Jul 18 11:22:14 db1170 mysqld[2100]: Attempting backtrace. You can use the following information to find out
Jul 18 11:22:14 db1170 mysqld[2100]: where mysqld died. If you see no messages after this, something went
Jul 18 11:22:14 db1170 mysqld[2100]: terribly wrong...
Jul 18 11:22:14 db1170 mysqld[2100]: stack_bottom = 0x7eed143ab698 thread_stack 0x30000
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x55819bacad9e]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x54d)[0x55819b5bae4d]
Jul 18 11:22:15 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f1c506a2730]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xae4cdc)[0x55819b7ddcdc]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xba4948)[0x55819b89d948]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xba51c1)[0x55819b89e1c1]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb4310a)[0x55819b83c10a]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb483e1)[0x55819b8413e1]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb49e3b)[0x55819b842e3b]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xb2418c)[0x55819b81d18c]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xa6a900)[0x55819b763900]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0xbb)[0x55819b5c704b]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z21mysql_execute_commandP3THD+0x31fc)[0x55819b3ae0dc]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1e9)[0x55819b3b29e9]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x750)[0x55819b6
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0x60defb)[0x55819b306efb]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x16a2)[0x55819b30f9d2]
Jul 18 11:22:15 db1170 mysqld[2100]: /opt/wmf-mariadb104/bin/mysqld(+0xd8289b)[0x55819ba7b89b]
Jul 18 11:22:16 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f1c50697fa3]
Jul 18 11:22:16 db1170 mysqld[2100]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f1c502bc4cf]
Jul 18 11:22:16 db1170 mysqld[2100]: Trying to get some variables.
Jul 18 11:22:16 db1170 mysqld[2100]: Some pointers may be invalid and cause the dump to abort.
Jul 18 11:22:16 db1170 mysqld[2100]: Query (0x7eeb5e01d1bb): UPDATE /* HTMLCacheUpdateJob::invalidateTitles  */  `page` SET page_touched = '202
Jul 18 11:22:16 db1170 mysqld[2100]: Connection ID (thread ID): 37
Jul 18 11:22:16 db1170 mysqld[2100]: Status: NOT_KILLED
Jul 18 11:22:16 db1170 mysqld[2100]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=o
Jul 18 11:22:16 db1170 mysqld[2100]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Jul 18 11:22:16 db1170 mysqld[2100]: information that should help you find out what is causing the crash.
Jul 18 11:22:16 db1170 mysqld[2100]: Writing a core file...
Jul 18 11:22:16 db1170 mysqld[2100]: Working directory at /srv/sqldata.s2
Jul 18 11:22:16 db1170 mysqld[2100]: Resource Limits:
Jul 18 11:22:16 db1170 mysqld[2100]: Limit                     Soft Limit           Hard Limit           Units
Jul 18 11:22:16 db1170 mysqld[2100]: Max cpu time              unlimited            unlimited            seconds
Jul 18 11:22:16 db1170 mysqld[2100]: Max file size             unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max data size             unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max stack size            8388608              unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max core file size        0                    0                    bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max resident set          unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max processes             2058337              2058337              processes
Jul 18 11:22:16 db1170 mysqld[2100]: Max open files            200001               200001               files
Jul 18 11:22:16 db1170 mysqld[2100]: Max locked memory         65536                65536                bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max address space         unlimited            unlimited            bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max file locks            unlimited            unlimited            locks
Jul 18 11:22:16 db1170 mysqld[2100]: Max pending signals       2058337              2058337              signals
Jul 18 11:22:16 db1170 mysqld[2100]: Max msgqueue size         819200               819200               bytes
Jul 18 11:22:16 db1170 mysqld[2100]: Max nice priority         0                    0
Jul 18 11:22:16 db1170 mysqld[2100]: Max realtime priority     0                    0
Jul 18 11:22:16 db1170 mysqld[2100]: Max realtime timeout      unlimited            unlimited            us
Jul 18 11:22:16 db1170 mysqld[2100]: Core pattern: /var/tmp/core/core.%h.%e.%p.%t
Jul 18 11:22:17 db1170 systemd[1]: mariadb@s2.service: Main process exited, code=killed, status=7/BUS
Jul 18 11:22:17 db1170 systemd[1]: mariadb@s2.service: Failed with result 'signal'.
Jul 18 11:22:22 db1170 systemd[1]: mariadb@s2.service: Service RestartSec=5s expired, scheduling restart.
Jul 18 11:22:22 db1170 systemd[1]: mariadb@s2.service: Scheduled restart job, restart counter is at 1.
Jul 18 11:22:22 db1170 systemd[1]: Stopped mariadb database server.
Jul 18 11:22:22 db1170 systemd[1]: Starting mariadb database server...
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] /opt/wmf-mariadb104/bin/mysqld (mysqld 10.4.19-MariaDB-log) starting as proce
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [ERROR] mysqld: Plugin 'unix_socket' already installed
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [ERROR] Couldn't load plugin 'unix_socket' from 'auth_socket.so'.
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] mysqld: Aria engine: starting recovery
Jul 18 11:22:22 db1170 mysqld[9916]: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 3 2 1 0
Jul 18 11:22:22 db1170 mysqld[9916]:  (0.0 seconds);
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] mysqld: Aria engine: recovery done
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Warning] InnoDB: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option ma
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Using Linux native AIO
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Uses event mutexes
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Number of pools: 1
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Using SSE2 crc32 instructions
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] mysqld: O_TMPFILE is not supported on /srv/tmp.s2 (disabling future attempts)
Jul 18 11:22:22 db1170 mysqld[9916]: 2021-07-18 11:22:22 0 [Note] InnoDB: Initializing buffer pool, total size = 185G, instances = 8, chunk siz
Jul 18 11:22:27 db1170 mysqld[9916]: 2021-07-18 11:22:27 0 [Note] InnoDB: Completed initialization of buffer pool
Jul 18 11:22:27 db1170 mysqld[9916]: 2021-07-18 11:22:27 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread prior
Jul 18 11:22:27 db1170 mysqld[9916]: 2021-07-18 11:22:27 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=52506897252705
Jul 18 11:22:31 db1170 mysqld[9916]: 2021-07-18 11:22:31 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 45 
Jul 18 11:22:31 db1170 mysqld[9916]: 2021-07-18 11:22:31 0 [Note] InnoDB: Trx id counter is 93067001386
Jul 18 11:22:31 db1170 mysqld[9916]: 2021-07-18 11:22:31 0 [Note] InnoDB: Starting final batch to recover 126115 pages from redo log.
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Last binlog file './db1170-bin.000465', position 932337077
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file fu
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Waiting for purge to start
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: 10.4.19 started; log sequence number 52506899727377; transaction id 9
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Loading buffer pool(s) from /srv/sqldata.s2/ib_buffer_pool
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [ERROR] mysqld: Can't open shared library '/opt/wmf-mariadb104/lib/plugin/semisync_m
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [ERROR] mysqld: Can't open shared library '/opt/wmf-mariadb104/lib/plugin/semisync_s
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [ERROR] mysqld: Plugin 'unix_socket' already installed
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] Recovering after a crash using db1170-bin
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Rolled back recovered transaction 93067001385
Jul 18 11:22:38 db1170 mysqld[9916]: 2021-07-18 11:22:38 0 [Note] InnoDB: Rollback of non-prepared transactions completed
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 0 [Note] Starting crash recovery...
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 0 [Note] Crash recovery finished.
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 0 [Note] Server socket created on IP: '::'.
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 7 [Note] Event Scheduler: scheduler thread started with id 7
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may br
Jul 18 11:22:40 db1170 mysqld[9916]: 2021-07-18 11:22:40 0 [Note] /opt/wmf-mariadb104/bin/mysqld: ready for connections.
Jul 18 11:22:40 db1170 mysqld[9916]: Version: '10.4.19-MariaDB-log'  socket: '/run/mysqld/mysqld.s2.sock'  port: 3312  MariaDB Server
Jul 18 11:22:40 db1170 systemd[1]: Started mariadb database server.
Jul 18 11:25:59 db1170 mysqld[9916]: 2021-07-18 11:25:59 0 [Note] InnoDB: Buffer pool(s) load completed at 210718 11:25:59

Event Timeline

LSobanski triaged this task as High priority.
LSobanski moved this task from Triage to Ready on the DBA board.

The issue is bad ram:

[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]: event severity: corrected
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:  Error 0, type: corrected
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:  fru_text: B6
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   section_type: memory error
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   error_status: 0x0000000000000400
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   physical_address: 0x00000056b1c00980
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   node: 3 card: 2 module: 0 rank: 0 bank: 3 device: 6 row: 19061 column: 592 
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   error_type: 2, single-bit ECC
[Sun Jul 18 10:23:58 2021] {1}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000 
[Sun Jul 18 10:23:58 2021] mce: [Hardware Error]: Machine check events logged
[Sun Jul 18 11:21:47 2021] Disabling lock debugging due to kernel taint
[Sun Jul 18 11:21:47 2021] mce: Uncorrected hardware memory error in user-access at 6e6eb71980
[Sun Jul 18 11:21:47 2021] mce: [Hardware Error]: Machine check events logged
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]: event severity: recoverable
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:  Error 0, type: recoverable
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:  fru_text: B6
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:   section_type: memory error
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:   error_status: 0x0000000000000400
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:   physical_address: 0x0000006e6eb71980
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:   node: 3 card: 2 module: 0 rank: 0 bank: 3 device: 0 row: 17769 column: 224 
[Sun Jul 18 11:21:47 2021] {2}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000 
[Sun Jul 18 11:21:47 2021] mce: [Hardware Error]: Machine check events logged
[Sun Jul 18 11:21:47 2021] Memory failure: 0x6e6eb71: already hardware poisoned
[Sun Jul 18 11:21:47 2021] Memory failure: 0x6e6eb71: Killing mysqld:2100 due to hardware memory corruption
[Sun Jul 18 11:21:47 2021] Memory failure: 0x6e6eb71: recovery action for dirty LRU page: Recovered
[Sun Jul 18 11:21:49 2021] MCE: Killing mysqld:3542 due to hardware memory corruption fault at 7f0ebad71987
racadm>>racadm getsel
Record:      1
Date/Time:   11/16/2020 14:04:56
Source:      system
Severity:    Ok
Description: Log cleared.
-------------------------------------------------------------------------------
Record:      2
Date/Time:   06/14/2021 13:27:49
Source:      system
Severity:    Ok
Description: A problem was detected during Power-On Self-Test (POST).
-------------------------------------------------------------------------------
Record:      3
Date/Time:   06/14/2021 13:27:49
Source:      system
Severity:    Ok
Description: The self-heal operation successfully completed at DIMM DIMM_B1.
-------------------------------------------------------------------------------
Record:      4
Date/Time:   07/18/2021 10:24:32
Source:      system
Severity:    Non-Critical
Description: The memory health monitor feature has detected a degradation in the DIMM installed in DIMM_B6. Reboot system to initiate self-heal process.
-------------------------------------------------------------------------------
Record:      5
Date/Time:   07/18/2021 11:22:21
Source:      system
Severity:    Critical
Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_B6.
-------------------------------------------------------------------------------
Kormat added projects: ops-eqiad, DC-Ops.
Kormat moved this task from Ready to Blocked on the DBA board.
Kormat added subscribers: Cmjohnson, wiki_willy.

Hi @Cmjohnson, can you get us a new dimm please? Cheers :)

Change 705629 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] db1170: Disable notifications

https://gerrit.wikimedia.org/r/705629

Change 705629 merged by Kormat:

[operations/puppet@production] db1170: Disable notifications

https://gerrit.wikimedia.org/r/705629

wiki_willy added a subscriber: Jclark-ctr.

Hi @Kormat - Chris is out this week, so moving over to @Jclark-ctr for him to check out this machine. (under warranty thru Nov 2023) Thanks, Willy

Mentioned in SAL (#wikimedia-operations) [2021-07-22T07:01:15Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump T286888', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json

This host was pooled for dumps which is not moved to codfw, so it can potentially cause issues if dumps were about to start.
I have depooled it and placed others in s2 and s7 to serve dumps. This needs to be reverted once db1170 is back: https://phabricator.wikimedia.org/P16844

A dell ticket for a new DIMM has been submitted.
You have successfully submitted request SR1066678833.

The DIMM arrived, is it safe to turn the server off and swap the DIMM?

@Cmjohnson I can do that now, let me know if that works. If not, just let me know when it would work for you and I will get the server offline for you.

@Cmjohnson I just realised that this host is unreachable, so you can proceed with it anytime and power it back on when you are done. Thanks

@Marostegui the DIMM was replaced, logged cleared and powered on. This should resolve your issue

@Cmjohnson it seems that the host isn't reachable - could you take a look to see if there's any error preventing it to boot up?
Thanks!

The host is now up and the memory is ok - thanks!
This host needs recloning - will do it tomorrow and then close the task

Thanks for your help Chris

Mentioned in SAL (#wikimedia-operations) [2021-08-04T04:45:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1105:3312 to clone db1170:3312 T286888', diff saved to https://phabricator.wikimedia.org/P16950 and previous config saved to /var/cache/conftool/dbconfig/20210804-044507-marostegui.json

I am cloning db1170:3312 from db1105:3312
s7 part needs to wait as:

root@cumin1001:~# dbctl instance db1101:3317 depool
Execution FAILED
Reported errors:
Section s7 is supposed to have minimum 4 replicas, found 3

Mentioned in SAL (#wikimedia-operations) [2021-08-04T06:03:48Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1170:3312, db1105:3312, db1105:3311 T286888', diff saved to https://phabricator.wikimedia.org/P16953 and previous config saved to /var/cache/conftool/dbconfig/20210804-060347-marostegui.json

db1170:3312 is now up and running with GTID enabled and repooled
Will start db1170:3317 as soon as the other s7 transfer is done.

Mentioned in SAL (#wikimedia-operations) [2021-08-04T10:17:20Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1101:3317 T286888', diff saved to https://phabricator.wikimedia.org/P16955 and previous config saved to /var/cache/conftool/dbconfig/20210804-101719-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-08-04T11:36:23Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1170:3317 and db1101:3317 T286888!', diff saved to https://phabricator.wikimedia.org/P16957 and previous config saved to /var/cache/conftool/dbconfig/20210804-113623-marostegui.json

db1170:3317 recloned, gtid enabled, notifications enabled, host pooled.