Page MenuHomePhabricator

mysql crash on db2123
Closed, ResolvedPublic

Description

We got paged today because mysql crashed on db2123 (s5 master), and then restarted read-only. We switched master away (T382743).

The host stayed up, nothing obvious in dmesg.

Per journalctl, the crash log:

Dec 24 07:08:21 db2123 mysqld[3666]: 2024-12-24  7:08:21 53614645 [Note] Semi-sync replication switched ON with slave (server_id: 180367474) at (db2123-bin.005151, 74849323)
Dec 24 10:10:16 db2123 mysqld[3666]: 2024-12-24 10:10:16 412319852 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all 
InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing r
ecovery.
Dec 24 10:11:16 db2123 mysqld[3666]: 2024-12-24 10:11:16 0x7f94d077b6c0  InnoDB: Assertion failure in file /root/mariadb-10.6.17/storage/innobase/page/page0zip.cc line 717
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: Failing assertion: info_bits == min_mark
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: We intentionally generate a memory trap.
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: If you get repeated assertion failures or crashes, even
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: immediately after the mariadbd startup, there may be
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Dec 24 10:11:16 db2123 mysqld[3666]: InnoDB: about forcing recovery.
Dec 24 10:11:16 db2123 mysqld[3666]: 241224 10:11:16 [ERROR] mysqld got signal 6 ;
Dec 24 10:11:16 db2123 mysqld[3666]: Sorry, we probably made a mistake, and this is a bug.
Dec 24 10:11:16 db2123 mysqld[3666]: Your assistance in bug reporting will enable us to fix this for the next release.
Dec 24 10:11:16 db2123 mysqld[3666]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Dec 24 10:11:16 db2123 mysqld[3666]: We will try our best to scrape up some info that will hopefully help
Dec 24 10:11:16 db2123 mysqld[3666]: diagnose the problem, but since we have already crashed,
Dec 24 10:11:16 db2123 mysqld[3666]: something is definitely wrong and this may fail.
Dec 24 10:11:16 db2123 mysqld[3666]: Server version: 10.6.17-MariaDB-log source revision: 15c75ad083a55e198ae78324f22970694b72f22b
Dec 24 10:11:16 db2123 mysqld[3666]: key_buffer_size=134217728
Dec 24 10:11:16 db2123 mysqld[3666]: read_buffer_size=131072
Dec 24 10:11:16 db2123 mysqld[3666]: max_used_connections=276
Dec 24 10:11:16 db2123 mysqld[3666]: max_threads=2010
Dec 24 10:11:16 db2123 mysqld[3666]: thread_count=52
Dec 24 10:11:16 db2123 mysqld[3666]: It is possible that mysqld could use up to
Dec 24 10:11:16 db2123 mysqld[3666]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4768767 K  bytes of memory
Dec 24 10:11:16 db2123 mysqld[3666]: Hope that's ok; if not, decrease some variables in the equation.
Dec 24 10:11:16 db2123 mysqld[3666]: Thread pointer: 0x7f92e445b4f8
Dec 24 10:11:16 db2123 mysqld[3666]: Attempting backtrace. You can use the following information to find out
Dec 24 10:11:16 db2123 mysqld[3666]: where mysqld died. If you see no messages after this, something went
Dec 24 10:11:16 db2123 mysqld[3666]: terribly wrong...
Dec 24 10:11:16 db2123 mysqld[3666]: stack_bottom = 0x7f94d077ab98 thread_stack 0x30000
Dec 24 10:11:38 db2123 mysqld[3666]: /opt/wmf-mariadb106/bin/mysqld(my_print_stacktrace+0x2e)[0x5641967735be]
Dec 24 10:11:41 db2123 mysqld[3666]: /opt/wmf-mariadb106/bin/mysqld(handle_fatal_signal+0x2b3)[0x56419620a4c3]
Dec 24 10:11:56 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7ff4fea5b050]
Dec 24 10:12:01 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c)[0x7ff4feaa9e2c]
Dec 24 10:12:03 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x12)[0x7ff4fea5afb2]
Dec 24 10:12:03 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7ff4fea45472]
Dec 24 10:12:23 db2123 mysqld[3666]: /opt/wmf-mariadb106/bin/mysqld(+0x67965e)[0x564195e7e65e]
Dec 24 10:12:26 db2123 mysqld[3666]: /opt/wmf-mariadb106/bin/mysqld(+0x66324a)[0x564195e6824a]
Dec 24 10:12:26 db2123 mysqld[3666]: ut/ut0rbt.cc:460(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.0])[0x5641965b6e1a]
Dec 24 10:12:26 db2123 mysqld[3666]: page/page0zip.cc:702(page_zip_compress(buf_block_t*, dict_index_t*, unsigned long, mtr_t*) [clone .cold])[0x5641965a47aa]
Dec 24 10:12:26 db2123 mysqld[3666]: page/page0zip.cc:4455(page_zip_reorganize(buf_block_t*, dict_index_t*, unsigned long, mtr_t*, bool))[0x56419665c6cb]
Dec 24 10:12:26 db2123 mysqld[3666]: page/page0cur.cc:1865(page_cur_insert_rec_zip(page_cur_t*, unsigned char const*, unsigned short*, mtr_t*))[0x564196644ebd]
Dec 24 10:12:26 db2123 mysqld[3666]: include/page0cur.inl:195(btr_cur_optimistic_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t*, unsigned char**, big_rec_t**, unsigned long, que_thr_t*, mtr_t*))[0x564196645d3e]
Dec 24 10:12:26 db2123 mysqld[3666]: btr/btr0btr.cc:2477(btr_insert_on_non_leaf_level(unsigned long, dict_index_t*, unsigned long, dtuple_t*, mtr_t*))[0x56419665cc0f]
Dec 24 10:12:26 db2123 mysqld[3666]: include/mem0mem.inl:409(btr_page_split_and_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t const*, unsigned long, mtr_t*, dberr_t*))[0x5641965d559f]
Dec 24 10:12:26 db2123 mysqld[3666]: row/row0ins.cc:3155(row_ins_sec_index_entry_low(unsigned long, btr_latch_mode, dict_index_t*, mem_block_info_t*, mem_block_info_t*, dtuple_t*, unsigned long, que_thr_t*))[0x5641965d8383]
Dec 24 10:12:26 db2123 mysqld[3666]: row/row0ins.cc:3379(row_ins_step(que_thr_t*))[0x5641965d86a2]
Dec 24 10:12:26 db2123 mysqld[3666]: row/row0mysql.cc:1319(row_insert_for_mysql(unsigned char const*, row_prebuilt_t*, ins_mode_t))[0x5641965e881c]
Dec 24 10:12:26 db2123 mysqld[3666]: handler/ha_innodb.cc:7902(ha_innobase::write_row(unsigned char const*))[0x56419653e004]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/handler.cc:7659(handler::ha_write_row(unsigned char const*))[0x56419621990f]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_insert.cc:2187(write_record(THD*, TABLE*, st_copy_info*, select_result*))[0x564195f9270d]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_insert.cc:1133(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool, select_result*))[0x564195f9bd3a]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_parse.cc:4601(mysql_execute_command(THD*, bool))[0x564195fcee87]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_class.h:2854(THD::enter_stage(PSI_stage_info_v1 const*, char const*, char const*, unsigned int))[0x564195fd3406]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_parse.cc:1898(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x564195fd6115]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/sql_parse.cc:1409(do_command(THD*, bool))[0x564195fd7a44]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/threadpool_common.cc:436(tp_callback(TP_connection*))[0x56419619b4cc]
Dec 24 10:12:26 db2123 mysqld[3666]: sql/threadpool_generic.cc:1551(worker_main(void*))[0x5641963b59fd]
Dec 24 10:12:26 db2123 mysqld[3666]: perfschema/pfs.cc:2204(pfs_spawn_thread)[0x5641964837a0]
Dec 24 10:12:40 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(+0x89134)[0x7ff4feaa8134]
Dec 24 10:12:45 db2123 mysqld[3666]: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc)[0x7ff4feb287dc]
Dec 24 10:12:45 db2123 mysqld[3666]: Trying to get some variables.
Dec 24 10:12:45 db2123 mysqld[3666]: Some pointers may be invalid and cause the dump to abort.
Dec 24 10:12:45 db2123 mysqld[3666]: Query (0x7f92e4559030): INSERT /* RecentChange::save  */ INTO `recentchanges` (rc_type,rc_minor,rc_bot,rc_patrolled,rc_params,rc_timestamp,rc_logid,rc_log_type,rc_log_action,rc_source,rc_deleted,rc_new,rc_namespace,rc_title,rc_old_len,rc_new_len,rc_this_oldid,rc_last_oldid,rc_cur_id,rc_comment_id,rc_actor) VALUES (5,1,0,2,'a:1:{s:20:\"wikibase-repo-change\";a:14:{s:2:\"id\";i:1785540406;s:4:\"time\";s:14:\"20241224101012\";s:7:\"user_id\";s:6:\"197855\";s:11:\"revision_id\";s:10:\"2290980019\";s:9:\"object_id\";s:6:\"Q49658\";s:4:\"type\";s:20:\"wikibase-item~update\";s:11:\"entity_type\";s:4:\"item\";s:7:\"page_id\";i:51693;s:6:\"rev_id\";i:2290980019;s:9:\"parent_id\";i:2286054581;s:7:\"comment\";s:111:\"/* wbcreateclaim-create:1| */ [[Property:P1889]]: [[Q2094195]], using [[User:Frettie/consistency check add.js]]\";s:9:\"user_text\";s:13:\"Sjoerddebruin\";s:15:\"central_user_id\";i:8133267;s:3:\"bot\";i:0;}}','20241224101012',0,NULL,'','wb',0,0,0,'Чудовиште_из_Лох_Неса',23753,23753,28932345,28932345,75449,15599880,6389)
Dec 24 10:12:45 db2123 mysqld[3666]: Connection ID (thread ID): 412322033
Dec 24 10:12:45 db2123 mysqld[3666]: Status: NOT_KILLED
Dec 24 10:12:45 db2123 mysqld[3666]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=off,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off
Dec 24 10:12:45 db2123 mysqld[3666]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
Dec 24 10:12:45 db2123 mysqld[3666]: information that should help you find out what is causing the crash.
Dec 24 10:12:45 db2123 mysqld[3666]: Writing a core file...
Dec 24 10:12:45 db2123 mysqld[3666]: Working directory at /srv/sqldata
Dec 24 10:12:45 db2123 mysqld[3666]: Resource Limits:
Dec 24 10:12:45 db2123 mysqld[3666]: Limit                     Soft Limit           Hard Limit           Units
Dec 24 10:12:45 db2123 mysqld[3666]: Max cpu time              unlimited            unlimited            seconds
Dec 24 10:12:45 db2123 mysqld[3666]: Max file size             unlimited            unlimited            bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max data size             unlimited            unlimited            bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max stack size            8388608              unlimited            bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max core file size        0                    0                    bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max resident set          unlimited            unlimited            bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max processes             2057542              2057542              processes
Dec 24 10:12:45 db2123 mysqld[3666]: Max open files            200001               200001               files
Dec 24 10:12:45 db2123 mysqld[3666]: Max locked memory         8388608              8388608              bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max address space         unlimited            unlimited            bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max file locks            unlimited            unlimited            locks
Dec 24 10:12:45 db2123 mysqld[3666]: Max pending signals       2057542              2057542              signals
Dec 24 10:12:45 db2123 mysqld[3666]: Max msgqueue size         819200               819200               bytes
Dec 24 10:12:45 db2123 mysqld[3666]: Max nice priority         0                    0
Dec 24 10:12:45 db2123 mysqld[3666]: Max realtime priority     0                    0
Dec 24 10:12:45 db2123 mysqld[3666]: Max realtime timeout      unlimited            unlimited            us
Dec 24 10:12:45 db2123 mysqld[3666]: Core pattern: /var/tmp/core/core.%h.%e.%p.%t
Dec 24 10:12:45 db2123 mysqld[3666]: Kernel version: Linux version 6.1.0-20-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11)
Dec 24 10:13:40 db2123 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

Details

Related Changes in Gerrit:

Event Timeline

This host needs fully recloning (and upgrading too)

Marostegui triaged this task as High priority.
Marostegui moved this task from Triage to Ready on the DBA board.

From the logs posted on the task, this is a case of index corruption (as we've seen before). I've upgraded the host and rebuilt the table so replication can flow. However, given this is a candidate master, let's:

  • Reboot to pick up the latest kernel
  • Reclone
  • Rebuild all recentchanges tables (as we are now under 10.6.20)

Do you want me to do it? @Marostegui

This can wait until we are back

Change #1107882 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/1107882

Change #1107882 merged by Marostegui:

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/1107882

Mentioned in SAL (#wikimedia-operations) [2025-01-02T06:32:19Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2171 to clone db2123 T382744', diff saved to https://phabricator.wikimedia.org/P71750 and previous config saved to /var/cache/conftool/dbconfig/20250102-063218-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2025-01-02T06:32:19Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2171 to clone db2123 T382744', diff saved to https://phabricator.wikimedia.org/P71750 and previous config saved to /var/cache/conftool/dbconfig/20250102-063218-marostegui.json

Also upgraded db2171 to 10.6.20

Host recloned. I am going to rebuild all the pagelinks and recentchanges tables on db2123

For the record:

for i in `db-mysql db2123 information_schema -e "select table_schema from tables where table_name='pagelinks';"`; do echo $i; db-mysql db2123 $i -e "set session sql_log_bin=0; alter table pagelinks engine=innodb,force; alter table recentchanges engine=innodb,force;";done

For the record:

for i in `db-mysql db2123 information_schema -e "select table_schema from tables where table_name='pagelinks';"`; do echo $i; db-mysql db2123 $i -e "set session sql_log_bin=0; alter table pagelinks engine=innodb,force; alter table recentchanges engine=innodb,force;";done

This finished.

Downtime removed, notifications enabled and the host being slowly automatically repooled.
Thanks everyone who responded to this incident! Much appreciated.