Page MenuHomePhabricator

m2 codfw master crashed
Closed, ResolvedPublic

Description

m2 codfw master db2133 crashed during this operation:

Query: 'CREATE INDEX `ix_user_user_text` ON `sockpuppet`.`user` (user_text)'

There's not much on the log apart from a mysqld segfault.
Still investigating.

@WDoranWMF @gmodena @hnowlan Please do not do any more index creations on the primary master for now.

Event Timeline

LSobanski moved this task from Triage to In progress on the DBA board.

@Marostegui Just acknowledging that we've seen this and will wait for your input. Let us know if we can help directly with debugging.

Jan 21 16:09:57 db2133 mysqld[24583]: 210121 16:09:57 [ERROR] mysqld got signal 11 ;
Jan 21 16:09:57 db2133 mysqld[24583]: This could be because you hit a bug. It is also possible that this binary
Jan 21 16:09:57 db2133 mysqld[24583]: or one of the libraries it was linked against is corrupt, improperly built,
Jan 21 16:09:57 db2133 mysqld[24583]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jan 21 16:09:57 db2133 mysqld[24583]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jan 21 16:09:57 db2133 mysqld[24583]: We will try our best to scrape up some info that will hopefully help
Jan 21 16:09:57 db2133 mysqld[24583]: diagnose the problem, but since we have already crashed,
Jan 21 16:09:57 db2133 mysqld[24583]: something is definitely wrong and this may fail.
Jan 21 16:09:57 db2133 mysqld[24583]: Server version: 10.4.13-MariaDB-log
Jan 21 16:09:57 db2133 mysqld[24583]: key_buffer_size=134217728
Jan 21 16:09:57 db2133 mysqld[24583]: read_buffer_size=131072
Jan 21 16:09:57 db2133 mysqld[24583]: max_used_connections=23
Jan 21 16:09:57 db2133 mysqld[24583]: max_threads=511
Jan 21 16:09:57 db2133 mysqld[24583]: thread_count=30
Jan 21 16:09:57 db2133 mysqld[24583]: It is possible that mysqld could use up to
Jan 21 16:09:57 db2133 mysqld[24583]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1255610 K  bytes of memory
Jan 21 16:09:57 db2133 mysqld[24583]: Hope that's ok; if not, decrease some variables in the equation.
Jan 21 16:09:57 db2133 mysqld[24583]: Thread pointer: 0x7ee8540014f8
Jan 21 16:09:57 db2133 mysqld[24583]: Attempting backtrace. You can use the following information to find out
Jan 21 16:09:57 db2133 mysqld[24583]: where mysqld died. If you see no messages after this, something went
Jan 21 16:09:57 db2133 mysqld[24583]: terribly wrong...
Jan 21 16:09:57 db2133 mysqld[24583]: stack_bottom = 0x7f488ded1698 thread_stack 0x30000
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x55dd76ebe7de]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x54d)[0x55dd769b6b4d]
Jan 21 16:09:57 db2133 mysqld[24583]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f48c83a4730]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(_ZN12Item_func_in7cleanupEv+0x36)[0x55dd76adaec6]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(_ZN11Query_arena10free_itemsEv+0x2d)[0x55dd76766fbd]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(_ZN3THD19cleanup_after_queryEv+0x108)[0x55dd76768d88]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x23a)[0x55dd767b71ea]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x682)[0x55dd76aade52]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(+0x5faf02)[0x55dd7670bf02]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x12e2)[0x55dd76714ef2]
Jan 21 16:09:57 db2133 mysqld[24583]: /opt/wmf-mariadb104/bin/mysqld(+0xd5e28b)[0x55dd76e6f28b]
Jan 21 16:09:57 db2133 mysqld[24583]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f48c8399fa3]
Jan 21 16:09:57 db2133 mysqld[24583]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f48c7ace4cf]
Jan 21 16:09:57 db2133 mysqld[24583]: Trying to get some variables.
Jan 21 16:09:57 db2133 mysqld[24583]: Some pointers may be invalid and cause the dump to abort.
Jan 21 16:09:57 db2133 mysqld[24583]: Query (0x7ee854587056): CREATE INDEX `ix_user_user_text` ON `sockpuppet`.`user` (user_text)
Jan 21 16:09:57 db2133 mysqld[24583]: Connection ID (thread ID): 11
Jan 21 16:09:57 db2133 mysqld[24583]: Status: NOT_KILLED
Jan 21 16:09:57 db2133 mysqld[24583]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
Jan 21 16:09:57 db2133 mysqld[24583]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jan 21 16:09:57 db2133 mysqld[24583]: information that should help you find out what is causing the crash.
Jan 21 16:09:57 db2133 mysqld[24583]: Writing a core file...
Jan 21 16:09:57 db2133 mysqld[24583]: Working directory at /srv/sqld...
Jan 21 16:09:57 db2133 mysqld[24583]: Resource Limits:
Jan 21 16:09:57 db2133 mysqld[24583]: Limit                     Soft Limit           Hard Limit           Units
Jan 21 16:09:57 db2133 mysqld[24583]: Max cpu time              unlimited            unlimited            seconds
Jan 21 16:09:57 db2133 mysqld[24583]: Max file size             unlimited            unlimited            bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max data size             unlimited            unlimited            bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max stack size            8388608              unlimited            bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max core file size        0                    0                    bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max resident set          unlimited            unlimited            bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max processes             2057703              2057703              processes
Jan 21 16:09:57 db2133 mysqld[24583]: Max open files            200001               200001               files
Jan 21 16:09:57 db2133 mysqld[24583]: Max locked memory         65536                65536                bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max address space         unlimited            unlimited            bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max file locks            unlimited            unlimited            locks
Jan 21 16:09:57 db2133 mysqld[24583]: Max pending signals       2057703              2057703              signals
Jan 21 16:09:57 db2133 mysqld[24583]: Max msgqueue size         819200               819200               bytes
Jan 21 16:09:57 db2133 mysqld[24583]: Max nice priority         0                    0
Jan 21 16:09:57 db2133 mysqld[24583]: Max realtime priority     0                    0
Jan 21 16:09:57 db2133 mysqld[24583]: Max realtime timeout      unlimited            unlimited            us
Jan 21 16:09:57 db2133 mysqld[24583]: Core pattern: /var/tmp/core/core.%h.%e.%p....
[18346237.601951] mysqld[24681]: segfault at 8 ip 000055dd76adaec6 sp 00007f488ded11e0 error 4 in mysqld[55dd7667c000+8e4000]

There is not much logged on what caused the crash, apart that it was during the index creation and the host came back thinking the index creation didn't happen, while it actually did (the table isn't corrupted,):

Jan 21 16:35:42 db2133 mysqld[9391]: 2021-01-21 16:35:42 1210 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@db1107.eqiad.wmnet:3306' in log 'db1107-bin.000255' at position 4
Jan 21 16:35:42 db2133 mysqld[9391]: 2021-01-21 16:35:42 1210 [Note] Slave I/O thread: connected to master 'repl@db1107.eqiad.wmnet:3306',replication starts at GTID position '171970595-171970595-143176442,171970636-171970636-23122305,171970569-171970569-156638323,171966678-171966678-113832737,171978772-171978772-139568865,0-171970569-1006906062'
Jan 21 16:35:43 db2133 mysqld[9391]: 2021-01-21 16:35:43 1211 [ERROR] Slave SQL: Error 'Duplicate key name 'ix_user_user_text'' on query. Default database: ''. Query: 'CREATE INDEX `ix_user_user_text` ON `sockpuppet`.`user` (user_text)', Gtid 171966678-171966678-113832738, Internal MariaDB error code: 1061
Jan 21 16:35:43 db2133 mysqld[9391]: 2021-01-21 16:35:43 1211 [Warning] Slave: Duplicate key name 'ix_user_user_text' Error_code: 1061
Jan 21 16:35:43 db2133 mysqld[9391]: 2021-01-21 16:35:43 1211 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db1107-bin.000255' position 585940160; GTID position '0-171970569-1006906062,171966678-171966678-113832737,171970569-171970569-156638323,171970595-171970595-143176442,171970636-171970636-23122305,171978772-171978772-139568865'
Jan 21 16:35:43 db2133 mysqld[9391]: 2021-01-21 16:35:43 1211 [Note] Slave SQL thread exiting, replication stopped in log 'db1107-bin.000255' at position 585940160; GTID position '0-171970569-1006906062,171966678-171966678-113832737,171970569-171970569-156638323,171970595-171970595-143176442,171970636-171970636-23122305,171978772-171978772-139568865'

Master domain_id is 171966678

171966678-171966678-113832737 is where it started and it is where it is stopped now. That GTID position is:

#210121 16:09:36 server id 171966678  end_log_pos 585940202 CRC32 0xb4ee29e5    GTID 171966678-171966678-113832738 ddl
/*!100001 SET @@session.gtid_seq_no=113832738*//*!*/;
# at 585940202
#210121 16:09:36 server id 171966678  end_log_pos 585940347 CRC32 0x6600718e    Query   thread_id=46862513      exec_time=10    error_code=0
SET TIMESTAMP=1611245376.94704/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=63/*!*/;
SET @@session.collation_database=63/*!*/;
CREATE INDEX `ix_user_user_text` ON `sockpuppet`.`user` (user_text)

Which is the index creation, however the index is already there and hence it is complaining about it:

root@db2133.codfw.wmnet[sockpuppet]> show create table user\G
*************************** 1. row ***************************
       Table: user
Create Table: CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_text` varbinary(255) NOT NULL,
  `is_anon` tinyint(1) NOT NULL,
  `num_edits` int(11) DEFAULT NULL,
  `num_pages` int(11) DEFAULT NULL,
  `most_recent_edit` datetime DEFAULT NULL,
  `oldest_edit` datetime DEFAULT NULL,
  `insertion_time` datetime DEFAULT current_timestamp(),
  `dataset_id` varbinary(36) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_user_user_text` (`user_text`),
  CONSTRAINT `CONSTRAINT_1` CHECK (`is_anon` in (0,1))
) ENGINE=InnoDB AUTO_INCREMENT=7187535 DEFAULT CHARSET=binary

I have dropped the index and let replication re-create it. However, the fact that it came back with duplicate key is strange (GTID is enabled) so I am not very confident about the data on this host anymore. It is probably better to re-clone it (and its replica) from the eqiad slave either tomorrow or next week.

@WDoranWMF @gmodena @hnowlan there are some pretty big tables there already, like coedit (13GB) - if you are planning on creating more indexes that needs to be coordinated with us, creating indexes on big tables on the master is likely to cause lag and fire some alerts, so let's coordinate on tasks before anymore index creation happens.

@Marostegui ack. I'm deeply sorry about this.

Did we hit a memory/resource limit in mysql? We did index.a sister table this morning coedit, which is an order of magnitude bigger, with no issue. We don't expect to touch these indexes any further, but we'll coordinated might we need to.

From the client side of things (process list & mysql cli), I did realise the query caused a crash:
mysql:sockpuppet_import@m2-master.eqiad.wmnet [(none)]> CREATE INDEX ix_user_user_text ON sockpuppet.user (user_text);
Query OK, 0 rows affected (10.843 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql:sockpuppet_import@m2-master.eqiad.wmnet [(none)]> CREATE INDEX IF NOT EXISTS service_ix_temporal_user_text ON sockpuppet.temporal (user_text);

Query OK, 0 rows affected (25.165 sec)
Records: 0 Duplicates: 0 Warnings: 0

@Marostegui Thanks for your help, we'll make sure to coordinate any other changes.

From the client side of things (process list & mysql cli), I did realise the query caused a crash:

What did you observe?

From the client side of things (process list & mysql cli), I did realise the query caused a crash:

What did you observe?

I'm going to guess there is a n't missing on the sentence. @gmodena Please don't get stressed about this, crashes are (normally) due to software bugs, not people. :-) Let's follow marostegui advice to be conservative at the moment while situation stabilizes and a reason is found for the crash.

From the client side of things (process list & mysql cli), I did realise the query caused a crash:

What did you observe?

^ typo, sorry. I did *not* realise the query caused a crash. The output pasted in the comment below suggest a Query OK status. FWIW the client session was not terminated (as far as I could tell).

From the client side of things (process list & mysql cli), I did realise the query caused a crash:

What did you observe?

I'm going to guess there is a n't missing on the sentence. @gmodena Please don't get stressed about this, crashes are (normally) due to software bugs, not people. :-) Let's follow marostegui advice to be conservative at the moment while situation stabilizes and a reason is found for the crash.

Thanks @jcrespo. Let me know if there's anything I can do to help with troubleshooting this issue.

So it also made the slave in codfw crash:

Jan 21 17:15:50 db2078 mysqld[2936]: 210121 17:15:50 [ERROR] mysqld got signal 11 ;
Jan 21 17:15:50 db2078 mysqld[2936]: This could be because you hit a bug. It is also possible that this binary
Jan 21 17:15:50 db2078 mysqld[2936]: or one of the libraries it was linked against is corrupt, improperly built,
Jan 21 17:15:50 db2078 mysqld[2936]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jan 21 17:15:50 db2078 mysqld[2936]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jan 21 17:15:50 db2078 mysqld[2936]: We will try our best to scrape up some info that will hopefully help
Jan 21 17:15:50 db2078 mysqld[2936]: diagnose the problem, but since we have already crashed,
Jan 21 17:15:50 db2078 mysqld[2936]: something is definitely wrong and this may fail.
Jan 21 17:15:50 db2078 mysqld[2936]: Server version: 10.4.13-MariaDB-log
Jan 21 17:15:50 db2078 mysqld[2936]: key_buffer_size=134217728
Jan 21 17:15:50 db2078 mysqld[2936]: read_buffer_size=131072
Jan 21 17:15:50 db2078 mysqld[2936]: max_used_connections=21
Jan 21 17:15:50 db2078 mysqld[2936]: max_threads=511
Jan 21 17:15:50 db2078 mysqld[2936]: thread_count=31
Jan 21 17:15:50 db2078 mysqld[2936]: It is possible that mysqld could use up to
Jan 21 17:15:50 db2078 mysqld[2936]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1255610 K  bytes of memory
Jan 21 17:15:50 db2078 mysqld[2936]: Hope that's ok; if not, decrease some variables in the equation.
Jan 21 17:15:50 db2078 mysqld[2936]: Thread pointer: 0x7f77600014f8
Jan 21 17:15:50 db2078 mysqld[2936]: Attempting backtrace. You can use the following information to find out
Jan 21 17:15:50 db2078 mysqld[2936]: where mysqld died. If you see no messages after this, something went
Jan 21 17:15:50 db2078 mysqld[2936]: terribly wrong...
Jan 21 17:15:50 db2078 mysqld[2936]: stack_bottom = 0x7f913c60d698 thread_stack 0x30000
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x55976a09c7de]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x54d)[0x559769b94b4d]
Jan 21 17:15:50 db2078 mysqld[2936]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f9174a7f730]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(_ZN12Item_func_in7cleanupEv+0x76)[0x559769cb8f06]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(_ZN11Query_arena10free_itemsEv+0x2d)[0x559769944fbd]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(_ZN3THD19cleanup_after_queryEv+0x108)[0x559769946d88]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x23a)[0x5597699951ea]
Jan 21 17:15:50 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x682)[0x559769c8be52]
Jan 21 17:15:51 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(+0x5faf02)[0x5597698e9f02]
Jan 21 17:15:51 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x12e2)[0x5597698f2ef2]
Jan 21 17:15:51 db2078 mysqld[2936]: /opt/wmf-mariadb104/bin/mysqld(+0xd5e28b)[0x55976a04d28b]
Jan 21 17:15:51 db2078 mysqld[2936]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f9174a74fa3]
Jan 21 17:15:51 db2078 mysqld[2936]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f91741a94cf]
Jan 21 17:15:51 db2078 mysqld[2936]: Trying to get some variables.
Jan 21 17:15:51 db2078 mysqld[2936]: Some pointers may be invalid and cause the dump to abort.
Jan 21 17:15:51 db2078 mysqld[2936]: Query (0x7f7760608b60): alter table user drop index if exists ix_user_user_text
Jan 21 17:15:51 db2078 mysqld[2936]: Connection ID (thread ID): 14
Jan 21 17:15:51 db2078 mysqld[2936]: Status: NOT_KILLED
Jan 21 17:15:51 db2078 mysqld[2936]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
Jan 21 17:15:51 db2078 mysqld[2936]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jan 21 17:15:51 db2078 mysqld[2936]: information that should help you find out what is causing the crash.
Jan 21 17:15:51 db2078 mysqld[2936]: Writing a core file...
Jan 21 17:15:51 db2078 mysqld[2936]: Working directory at /srv/sqldata.m2
Jan 21 17:15:51 db2078 mysqld[2936]: Resource Limits:
Jan 21 17:15:51 db2078 mysqld[2936]: Limit                     Soft Limit           Hard Limit           Units
Jan 21 17:15:51 db2078 mysqld[2936]: Max cpu time              unlimited            unlimited            seconds
Jan 21 17:15:51 db2078 mysqld[2936]: Max file size             unlimited            unlimited            bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max data size             unlimited            unlimited            bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max stack size            8388608              unlimited            bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max core file size        0                    0                    bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max resident set          unlimited            unlimited            bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max processes             2063523              2063523              processes
Jan 21 17:15:51 db2078 mysqld[2936]: Max open files            200001               200001               files
Jan 21 17:15:51 db2078 mysqld[2936]: Max locked memory         65536                65536                bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max address space         unlimited            unlimited            bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max file locks            unlimited            unlimited            locks
Jan 21 17:15:51 db2078 mysqld[2936]: Max pending signals       2063523              2063523              signals
Jan 21 17:15:51 db2078 mysqld[2936]: Max msgqueue size         819200               819200               bytes
Jan 21 17:15:51 db2078 mysqld[2936]: Max nice priority         0                    0
Jan 21 17:15:51 db2078 mysqld[2936]: Max realtime priority     0                    0
Jan 21 17:15:51 db2078 mysqld[2936]: Max realtime timeout      unlimited            unlimited            us
Jan 21 17:15:51 db2078 mysqld[2936]: Core pattern: /var/tmp/core/core.%h.%e.%p....

There is not much data to investigate why this crashed as there are not much traces there. Both definitely need to be rebuilt.

Change 657725 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2133,db2078: Disable notifications

https://gerrit.wikimedia.org/r/657725

Mentioned in SAL (#wikimedia-operations) [2021-01-22T06:16:25Z] <marostegui> Stop MySQL on db1117 db2133 db2078 T272614

Change 657725 merged by Marostegui:
[operations/puppet@production] db2133,db2078: Disable notifications

https://gerrit.wikimedia.org/r/657725

I have rebuilt the hosts - and also upgraded their mariadb version.
There's very little information on what actually caused the crash, I think the index creation played a role here, but hard to know the details on what exactly happened. There is also not much point on filing a bug to mariadb with this generic crash message really.
@gmodena please coordinate with us any future index creation, especially on big tables.
Going to close this task as resolved as there's not much else we can do for now.

It could be that bug indeed. That would explain why both codfw hosts crashed, they both run 10.4.13.
However, eqiad master also runs 10.4.13 and that one didn't crash.