Page MenuHomePhabricator

db2106 and db2147 crashed
Closed, ResolvedPublic

Description

db2106 and db2147 crashed at the same time with the same issue:

Apr 06 01:25:08 db2106 mysqld[27097]: 2021-04-06 01:25:08 0x7f975e686700  InnoDB: Assertion failure in file /root/mariadb-10.4.15/storage/innobase/row/row0ins.cc line 218
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: Failing assertion: !cursor->index->is_committed()
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: We intentionally generate a memory trap.
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: If you get repeated assertion failures or crashes, even
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: immediately after the mysqld startup, there may be
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Apr 06 01:25:08 db2106 mysqld[27097]: InnoDB: about forcing recovery.
Apr 06 01:25:08 db2106 mysqld[27097]: 210406  1:25:08 [ERROR] mysqld got signal 6 ;
Apr 06 01:25:08 db2106 mysqld[27097]: This could be because you hit a bug. It is also possible that this binary
Apr 06 01:25:08 db2106 mysqld[27097]: or one of the libraries it was linked against is corrupt, improperly built,
Apr 06 01:25:08 db2106 mysqld[27097]: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 06 01:25:08 db2106 mysqld[27097]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Apr 06 01:25:08 db2106 mysqld[27097]: We will try our best to scrape up some info that will hopefully help
Apr 06 01:25:08 db2106 mysqld[27097]: diagnose the problem, but since we have already crashed,
Apr 06 01:25:08 db2106 mysqld[27097]: something is definitely wrong and this may fail.
Apr 06 01:25:08 db2106 mysqld[27097]: Server version: 10.4.15-MariaDB-log
Apr 06 01:25:08 db2106 mysqld[27097]: key_buffer_size=134217728
Apr 06 01:25:08 db2106 mysqld[27097]: read_buffer_size=131072
Apr 06 01:25:08 db2106 mysqld[27097]: max_used_connections=22
Apr 06 01:25:08 db2106 mysqld[27097]: max_threads=2010
Apr 06 01:25:08 db2106 mysqld[27097]: thread_count=20
Apr 06 01:25:08 db2106 mysqld[27097]: It is possible that mysqld could use up to
Apr 06 01:25:08 db2106 mysqld[27097]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4752266 K  bytes of memory
Apr 06 01:25:08 db2106 mysqld[27097]: Hope that's ok; if not, decrease some variables in the equation.
Apr 06 01:25:08 db2106 mysqld[27097]: Thread pointer: 0x7f948b158af8
Apr 06 01:25:08 db2106 mysqld[27097]: Attempting backtrace. You can use the following information to find out
Apr 06 01:25:08 db2106 mysqld[27097]: where mysqld died. If you see no messages after this, something went
Apr 06 01:25:08 db2106 mysqld[27097]: terribly wrong...
Apr 06 01:25:08 db2106 mysqld[27097]: stack_bottom = 0x7f975e685698 thread_stack 0x30000
Apr 06 01:25:16 db2106 mysqld[27097]: 2021-04-06  1:25:16 968179 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Apr 06 01:25:16 db2106 mysqld[27097]: 2021-04-06  1:25:16 968179 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'db2090-bin.006718' at position 516814095; GTID position '0-180359175-3375835514,171978876-171978876-2496072471,171966557-171966557-1948492266,180363436-180363436-1169124439,171978775-
Apr 06 01:25:16 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x55e74f97a6fe]
Apr 06 01:25:16 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x54d)[0x55e74f47357d]
Apr 06 01:25:32 db2106 mysqld[27097]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7ff7b4f54730]
Apr 06 01:25:48 db2106 mysqld[27097]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7ff7b46307bb]
Apr 06 01:25:48 db2106 mysqld[27097]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7ff7b461b535]
Apr 06 01:26:04 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0x5ad568)[0x55e74f16b568]
Apr 06 01:26:04 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0x59b83e)[0x55e74f15983e]
Apr 06 01:26:04 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xafffce)[0x55e74f6bdfce]
Apr 06 01:26:04 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xb377f8)[0x55e74f6f57f8]
Apr 06 01:26:04 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xb37e2f)[0x55e74f6f5e2f]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xb1297c)[0x55e74f6d097c]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xa5bfcf)[0x55e74f619fcf]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0xbb)[0x55e74f47f76b]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x1b27)[0x55e74f318cb7]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(_Z21mysql_execute_commandP3THD+0x2695)[0x55e74f26b145]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1c9)[0x55e74f270f89]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x750)[0x55e74f56b000]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0x607a52)[0x55e74f1c5a52]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x1322)[0x55e74f1cea22]
Apr 06 01:26:05 db2106 mysqld[27097]: /opt/wmf-mariadb104/bin/mysqld(+0xd6d19b)[0x55e74f92b19b]
Apr 06 01:26:20 db2106 mysqld[27097]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7ff7b4f49fa3]
Apr 06 01:26:36 db2106 mysqld[27097]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ff7b46f24cf]
Apr 06 01:26:36 db2106 mysqld[27097]: Trying to get some variables.
Apr 06 01:26:36 db2106 mysqld[27097]: Some pointers may be invalid and cause the dump to abort.
Apr 06 01:26:36 db2106 mysqld[27097]: Query (0x7f9261fae06a): UPDATE /* WikiPage::updateCategoryCounts  */  `category` SET cat_pages = cat_pages - 1,cat_files = cat_files - 1 WHERE cat_title = 'Images_from_Wiki_Loves_Monuments_missing_SDC_participant_in'
Apr 06 01:26:36 db2106 mysqld[27097]: Connection ID (thread ID): 968180
Apr 06 01:26:36 db2106 mysqld[27097]: Status: NOT_KILLED
Apr 06 01:26:36 db2106 mysqld[27097]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=off,condition_pushdown_from_having=on

The both crashed after the same query:

Query (0x7f9261fae06a): UPDATE /* WikiPage::updateCategoryCounts  */  `category` SET cat_pages = cat_pages - 1,cat_files = cat_files - 1 WHERE cat_title = 'Images_from_Wiki_Loves_Monuments_missing_SDC_participant_in'

We should probably reclone both of them.
I am also going to file a mariadb bug.

Event Timeline

The table look ok by the way:

+----------------------+-------+----------+----------+
| Table                | Op    | Msg_type | Msg_text |
+----------------------+-------+----------+----------+
| commonswiki.category | check | status   | OK       |
+----------------------+-------+----------+----------+
1 row in set (33.649 sec)
Marostegui reopened this task as Open.
Marostegui claimed this task.
Marostegui removed Marostegui as the assignee of this task.
Marostegui triaged this task as Medium priority.
Kormat subscribed.

I'll reclone both servers.

Mentioned in SAL (#wikimedia-operations) [2021-04-07T10:01:47Z] <kormat@cumin1001> dbctl commit (dc=all): 'Repool db2106 and db2147 T279406', diff saved to https://phabricator.wikimedia.org/P15215 and previous config saved to /var/cache/conftool/dbconfig/20210407-100147-kormat.json

Both machines are recloned, and been running fine for the last day, so i've repooled them.